Support for bulk insert, update and delete operations at the DAOFactory level #569

Merged
merged 14 commits into from May 9, 2013

Projects

None yet

10 participants

@optilude
Contributor

See Sequelize issue #559

This change adds three methods to each DAO factory:

  • bulkCreate() can create multiple records as a single INSERT statement with multiple VALUES clauses
  • bulkDelete() can delete multiple records matching a WHERE clause (the standard delete operation is tied to a DAO's destroy() method and has a LIMIT 1 appended for MySQL and Postgres)
  • bulkUpdate() can update multiple records matching a WHERE clause with values provided in a hash (the standard update operation is tied to a DAO's save() method)

There are tests (Buster tests for all the new stuff, and some additional clauses in the Jasmine tests for the query generators) and all tests pass for me (note that there seems to be a flapping test related to a Postgres uniqueness constraint, but I don't think it has anything to do with this change: I sometimes get "error: duplicate key value violates unique constraint "pg_type_typname_nsp_index" when running npm test, but can't replicate when running test-buster-postgres-native or test-buster-postgres).

I'm happy to write some documentation for these features if the pull request accepted. I may need a bit of help understanding how the SequelizeJS.com documentation is actually maintained, though. There are internal comments in the code to explain the new methods at the DAOFactory level at least.

@janmeier
Member

That looks like some solid work! Hopefully we'll have time to fully review and merge it within the coming week.

@sdepold
Member
sdepold commented Apr 28, 2013

I only looked at the specs which look pretty good :) However, I would prefer Model.create to check if an array has been passed. bulkUpdate and bulkDelete should be update and destroy. opinions?

@ricardograca
Contributor

I also prefer not having a million different functions for doing slightly different things, but that share 98% similarity otherwise. Detecting an array argument would be great and surely lead to more maintainable code, not to mention it would be easier to memorize the function names.

@optilude
Contributor

Hi,

I'm happy to rework it, but I have some reservations:

@sdepold: I'm not a very big fan of functions that behave differently/inconsistently depending on how arguments are passed. This gets quite confusing for users of the API. In this case, create() will create a model and pass it to the success callback, but bulkCreate can't do that (or pass an array to the callback), because there isn't a way (as far as I can tell) to correctly match serial/auto_increment ids created by the database back to the records created (I did a fair bit of research on this before giving up, and I think it needs to be up to the caller to do a findAll() again as required if the caller cares).

There are also some differences in the handling of omitNull: in the single record create it's reasonable to omit null values, but what do you do with an array where some records have null in a field and some don't? Rather than guess, I think it's better for the caller to pass in the list of fields.

Using the names update() and destroy() on DAOFactory make sense and are more consistent with the individual DAO equivalents. I can rename them.

They do need to be different at the QueryInterface level, though, because the queryAndEmit call in the single-DAO case is passed the DAO, which isn't applicable in the bulk cases. This then affects how success handlers are called (with a DAO argument or not).

This also applies to the dialect-specific query generators: the algorithms and resultant SQL are subtly different (e.g. the single-DAO cases for delete want to pass a LIMIT modifier). It was easier to write code + tests for separate code paths than for variants in the same code path.

@ricardograca my personal preference would be to factor as much of the "common" functionality out into shared helper functions (already largely done, though I'm sure there are some areas that could be improved) that do one thing and can be individually tested. It gets tricky to test and maintain functions that have several "modes".

All that said, you know the library best, so happy to take a steer and refactor as required. The relevant touch points are:

  • bulkCreate (which is like create() in DAOFactory and bits of save() in DAO), bulkDelete and bulkUpdate in DAOFactory
  • bulkInsert, bulkUpdate and bulkDelete in QueryInterface
  • bulkInsertQuery and bulkDeleteQuery in each dialect's QueryGenerator

Looking at the code:

I think that if DAOFactory.create() was to be able to support both single and bulk insertion, it would virtually have an if array / else around the two bodies of today's create() and bulkCreate(). I'm not convinced that's a win for maintainability, and as mentioned the two "modes" would behave differently wrt to what's passed to success, but if you feel it's more in the spirit of the API, I'll refactor.

Renaming DAOFactory.bulkDelete() to destroy() and DAOFactory.bulkUpdate() to update() makes sense - I'll do that.

To unify the methods in QueryInterface, there'd be a check for if(dao === null) around the bodies of each. This would then need to be passed as options to the relevant QueryGenerator for the destroy case, to opt out of applying a LIMIT.

To unify the insert methods in QueryGenerator is possible, though there'd be a fair amount of if/else logic, and we'd need to explain the different omitNull behaviour for inserts in cases where an array is passed .

To unify the delete methods in QueryGenerator would be relatively straightforward, but we'd need to respect an option parameter to say "don't limit". Perhaps we can treat options.limit = null as don't limit and options.limit = undefined as "default to limit 1"?

@sdepold
Member
sdepold commented Apr 28, 2013

prequel: I still did not look at the code.

imho it would be perfectly ok to get an array as parameter of the success callback for create calls with arrays.
a first guess of how to achieve the creation is to check if the parameter is an array and if so I would just use async to spawn create calls for every element of the array. once all are done I would call the success event and pass the results of the async functions.

just my two cents

@optilude
Contributor

@sdepold The whole point of this is to not have dozens of trips to the database if you're trying to insert dozens of records. The SQL I want to generate is INSERT INTO foo (bar, baz) VALUES ('one', 1), ('two', 2'), .... This will be significantly faster (and atomic) than having one INSERT for each record.

I do think it's something the caller should make a decision about, though. There is a trade-off: if you don't care so much about performance and you want to continue to operate on a DAO object without re-querying, then I think you'd probably just use a QueryChainer and add calls to create() in a loop.

@sdepold
Member
sdepold commented Apr 28, 2013

aha! got the point. so ignore my point about how to implement it :)

@optilude
Contributor

Thinking about this some more, how about we:

  • Leave DAOFactory.create() and DAOFactory.bulkCreate() separate, since it should be a conscious choice by the caller which behaviour they wany
  • Rename DAOFactory.bulkUpdate() to update() and DAOFactory.bulkDelete() to destroy() to be consistent with DAO.
  • Leave the differences in QueryInterface intact: these are all two-liners that espouse different behaviour wrt to the inclusion of a DAO or not, and making them conditional on whether null is passed for the first dao parameter would likely make them more confusing and harder to test
  • Leave QueryGenerator.bulkInsert() separate from QueryGenerator.insert() in each dialect, because these are again quite different in behaviour and what they generate, but look for any more opportunities to factor out shared functionality
  • Unify QueryGenerator.bulkDelete and QueryGenerator.delete in each dialect by passing {limit: null} in options to say "I don't want to limit how many records are deleted".
@optilude
Contributor

I've made the changes suggested above in the above commits.

I also realised there were no tests for deleteQuery() in sqlite, which I've now added. That uncovered an error (use of this.getWhereConditions() instead of MySqlQueryGenerator.getWhereConditions()) that I believe also exists in master, independent of this pull request.

@optilude
Contributor
optilude commented May 4, 2013

Any news on merging this one? I'd like to be able to close the branch (and I need this feature :-))

@optilude
Contributor
optilude commented May 6, 2013

I've just rebased and re-pushed this onto the latest Sequelize master. The tests are passing here. TravisCI is complaining about the postgres-native build, but seems to die before it even runs any tests.

@janmeier janmeier and 1 other commented on an outdated diff May 7, 2013
spec/dao-factory.spec.js
+ { username: 'Bob', secretValue: '43' }]
+
+ this.User.bulkCreate(data).success(function() {
+
+ self.User.update({username: 'Bill'}, {secretValue: '42'})
+ .success(function() {
+ self.User.findAll({order: 'id'}).success(function(users) {
+ expect(users.length).toEqual(3)
+
+ expect(users[0].username).toEqual("Bill")
+ expect(users[1].username).toEqual("Bill")
+ expect(users[2].username).toEqual("Bob")
+
+ expect(parseInt(+users[0].createdAt/5000)).toEqual(parseInt(+new Date()/5000))
+ expect(parseInt(+users[1].createdAt/5000)).toEqual(parseInt(+new Date()/5000))
+ expect(parseInt(+users[2].createdAt)).not.toEqual(parseInt(+users[0].createdAt/5000))
@janmeier
janmeier May 7, 2013 Member

Why no / 5000 for users[2]?

And shouldn't this be testing on updatedAt?

@optilude
optilude May 7, 2013 Contributor

Actually, this test is bunk. It relies on time precision that doesn't seem to work reliably. I'm going to simplify it.

@janmeier janmeier commented on an outdated diff May 7, 2013
spec/dao-factory.spec.js
+ User.sync({ force: true }).success(function() {
+
+ User.bulkCreate(data).success(function() {
+
+ User.destroy({secretValue: '42'})
+ .success(function() {
+ User.findAll({order: 'id'}).success(function(users) {
+ expect(users.length).toEqual(3)
+
+ expect(users[0].username).toEqual("Peter")
+ expect(users[1].username).toEqual("Paul")
+ expect(users[2].username).toEqual("Bob")
+
+ expect(parseInt(+users[0].deletedAt/5000)).toEqual(parseInt(+new Date()/5000))
+ expect(parseInt(+users[1].deletedAt/5000)).toEqual(parseInt(+new Date()/5000))
+ expect(parseInt(+users[2].deletedAt)).not.toEqual(parseInt(+new Date()/5000))
@janmeier
janmeier May 7, 2013 Member

Again, shouldn't this be divided by 5000 as well?

@janmeier janmeier commented on the diff May 7, 2013
spec/dao-factory.spec.js
+ it('updates only values that match filter', function(done) {
+ var self = this
+ , data = [{ username: 'Peter', secretValue: '42' },
+ { username: 'Paul', secretValue: '42' },
+ { username: 'Bob', secretValue: '43' }]
+
+ this.User.bulkCreate(data).success(function() {
+
+ self.User.update({username: 'Bill'}, {secretValue: '42'})
+ .success(function() {
+ self.User.findAll({order: 'id'}).success(function(users) {
+ expect(users.length).toEqual(3)
+
+ expect(users[0].username).toEqual("Bill")
+ expect(users[1].username).toEqual("Bill")
+ expect(users[2].username).toEqual("Bob")
@janmeier
janmeier May 7, 2013 Member

I would rather test this with something like

users.each(function (user) {
    if (user.secretValue == '42') {
        expect(user.username).toEqual("Bill");
    } else {
        expect(user.username).toEqual("Bob");
    }
});

I know those assertion do exactly the same as yours, but here it as totally clear that if secretValue was 42 it should have been updated, and if not it should remain Bob

@optilude
optilude May 7, 2013 Contributor

Nice idea. Done.

@janmeier
Member
janmeier commented May 7, 2013

Loving it!

@janmeier
Member
janmeier commented May 8, 2013

Awesome, if you rebase I'll merge it :)

@optilude
Contributor
optilude commented May 9, 2013

Ok, I've just rebased (and force-pushed, because at least then I know it runs locally). Please merge.

@janmeier janmeier merged commit fedb9cd into sequelize:master May 9, 2013

1 check failed

default The Travis CI build failed
Details
@janmeier
Member
janmeier commented May 9, 2013

Really solid contributions @optilude, your PRs are some of the best documented I have ever seen.
If you feel like it, feel free to join us in the IRC channel, #sequelizejs at freenode

@optilude
Contributor

Thanks @janmeier :) I know first hand that's much easier to evaluate a patch when you have some context of the thinking behind it.

I do hang out in the IRC channel from time to time, though often at odd times of the day.

@joaomaia

This bulkCreate() thing is a great addon. I'm using it already.
Contratulations @optilude!

@calekennedy

Is there a way to use Object.bulkCreate(data) to also set foreign key associations? For instance a user may have a car. Is there anyway to bulk create all of those users with pointers to their cars?

@mickhansen
Member

@calekennedy if user belongsTo car you can just pass in car_id (or whatever the foreignKey) is with the rest of the values.

@jmejiaz
jmejiaz commented Sep 22, 2016

Hi I have an array of instances, then I change all the instances in the arren, When I use:
ModelName.bulkUpdate(myArray);
Throws bulkUpdate is not a function.
I have the 3.23.6 version, Im doing in this way beacause I need all the updates in a transaction, thanks in advance.

@felixfbecker
Member

@jmejiaz you can them in a transaction with sequelize.transaction().

@rahman2k9

models.Task.beforeBulkCreate(function(user) {
user.forEach(function(u) {
return models.User.findOne({
where:{
id: u.UserId
}
}).then(function(data){
if(!data)
throw new models.Sequelize.ValidationErrorItem(u.UserId +"Does not exists","UserId","UserId",u.UserId)
})
})
})

models.Task.bulkCreate([{
title: "abdul rahman",
UserId: 1121
},{
title: "abdul rahman",
UserId: 2222222
}]).then(function() {
res.redirect('/');
}).catch(function(e){
console.log(e);
});

I have given code i want to insert bulk records but before inserting i have to check whether user exists or not if does not exist i have throw validation error that user does not exist as user can easily get what is error.

Kindly suggest me a way I have lost my two days in searching used async methods series foreach waterfall and all but could not get success.

@felixfbecker
Member

@rahman2k9 please ask general questions like this on StackOverflow or Slack.

@felixfbecker felixfbecker locked and limited conversation to collaborators Nov 2, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.