New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for bulk insert, update and delete operations at the DAOFactory level #569

Merged
merged 14 commits into from May 9, 2013

Conversation

10 participants
@optilude
Contributor

optilude commented Apr 27, 2013

See Sequelize issue #559

This change adds three methods to each DAO factory:

  • bulkCreate() can create multiple records as a single INSERT statement with multiple VALUES clauses
  • bulkDelete() can delete multiple records matching a WHERE clause (the standard delete operation is tied to a DAO's destroy() method and has a LIMIT 1 appended for MySQL and Postgres)
  • bulkUpdate() can update multiple records matching a WHERE clause with values provided in a hash (the standard update operation is tied to a DAO's save() method)

There are tests (Buster tests for all the new stuff, and some additional clauses in the Jasmine tests for the query generators) and all tests pass for me (note that there seems to be a flapping test related to a Postgres uniqueness constraint, but I don't think it has anything to do with this change: I sometimes get "error: duplicate key value violates unique constraint "pg_type_typname_nsp_index" when running npm test, but can't replicate when running test-buster-postgres-native or test-buster-postgres).

I'm happy to write some documentation for these features if the pull request accepted. I may need a bit of help understanding how the SequelizeJS.com documentation is actually maintained, though. There are internal comments in the code to explain the new methods at the DAOFactory level at least.

@janmeier

This comment has been minimized.

Member

janmeier commented Apr 28, 2013

That looks like some solid work! Hopefully we'll have time to fully review and merge it within the coming week.

@sdepold

This comment has been minimized.

Member

sdepold commented Apr 28, 2013

I only looked at the specs which look pretty good :) However, I would prefer Model.create to check if an array has been passed. bulkUpdate and bulkDelete should be update and destroy. opinions?

@ricardograca

This comment has been minimized.

Contributor

ricardograca commented Apr 28, 2013

I also prefer not having a million different functions for doing slightly different things, but that share 98% similarity otherwise. Detecting an array argument would be great and surely lead to more maintainable code, not to mention it would be easier to memorize the function names.

@optilude

This comment has been minimized.

Contributor

optilude commented Apr 28, 2013

Hi,

I'm happy to rework it, but I have some reservations:

@sdepold: I'm not a very big fan of functions that behave differently/inconsistently depending on how arguments are passed. This gets quite confusing for users of the API. In this case, create() will create a model and pass it to the success callback, but bulkCreate can't do that (or pass an array to the callback), because there isn't a way (as far as I can tell) to correctly match serial/auto_increment ids created by the database back to the records created (I did a fair bit of research on this before giving up, and I think it needs to be up to the caller to do a findAll() again as required if the caller cares).

There are also some differences in the handling of omitNull: in the single record create it's reasonable to omit null values, but what do you do with an array where some records have null in a field and some don't? Rather than guess, I think it's better for the caller to pass in the list of fields.

Using the names update() and destroy() on DAOFactory make sense and are more consistent with the individual DAO equivalents. I can rename them.

They do need to be different at the QueryInterface level, though, because the queryAndEmit call in the single-DAO case is passed the DAO, which isn't applicable in the bulk cases. This then affects how success handlers are called (with a DAO argument or not).

This also applies to the dialect-specific query generators: the algorithms and resultant SQL are subtly different (e.g. the single-DAO cases for delete want to pass a LIMIT modifier). It was easier to write code + tests for separate code paths than for variants in the same code path.

@ricardograca my personal preference would be to factor as much of the "common" functionality out into shared helper functions (already largely done, though I'm sure there are some areas that could be improved) that do one thing and can be individually tested. It gets tricky to test and maintain functions that have several "modes".

All that said, you know the library best, so happy to take a steer and refactor as required. The relevant touch points are:

  • bulkCreate (which is like create() in DAOFactory and bits of save() in DAO), bulkDelete and bulkUpdate in DAOFactory
  • bulkInsert, bulkUpdate and bulkDelete in QueryInterface
  • bulkInsertQuery and bulkDeleteQuery in each dialect's QueryGenerator

Looking at the code:

I think that if DAOFactory.create() was to be able to support both single and bulk insertion, it would virtually have an if array / else around the two bodies of today's create() and bulkCreate(). I'm not convinced that's a win for maintainability, and as mentioned the two "modes" would behave differently wrt to what's passed to success, but if you feel it's more in the spirit of the API, I'll refactor.

Renaming DAOFactory.bulkDelete() to destroy() and DAOFactory.bulkUpdate() to update() makes sense - I'll do that.

To unify the methods in QueryInterface, there'd be a check for if(dao === null) around the bodies of each. This would then need to be passed as options to the relevant QueryGenerator for the destroy case, to opt out of applying a LIMIT.

To unify the insert methods in QueryGenerator is possible, though there'd be a fair amount of if/else logic, and we'd need to explain the different omitNull behaviour for inserts in cases where an array is passed .

To unify the delete methods in QueryGenerator would be relatively straightforward, but we'd need to respect an option parameter to say "don't limit". Perhaps we can treat options.limit = null as don't limit and options.limit = undefined as "default to limit 1"?

@sdepold

This comment has been minimized.

Member

sdepold commented Apr 28, 2013

prequel: I still did not look at the code.

imho it would be perfectly ok to get an array as parameter of the success callback for create calls with arrays.
a first guess of how to achieve the creation is to check if the parameter is an array and if so I would just use async to spawn create calls for every element of the array. once all are done I would call the success event and pass the results of the async functions.

just my two cents

@optilude

This comment has been minimized.

Contributor

optilude commented Apr 28, 2013

@sdepold The whole point of this is to not have dozens of trips to the database if you're trying to insert dozens of records. The SQL I want to generate is INSERT INTO foo (bar, baz) VALUES ('one', 1), ('two', 2'), .... This will be significantly faster (and atomic) than having one INSERT for each record.

I do think it's something the caller should make a decision about, though. There is a trade-off: if you don't care so much about performance and you want to continue to operate on a DAO object without re-querying, then I think you'd probably just use a QueryChainer and add calls to create() in a loop.

@sdepold

This comment has been minimized.

Member

sdepold commented Apr 28, 2013

aha! got the point. so ignore my point about how to implement it :)

@optilude

This comment has been minimized.

Contributor

optilude commented Apr 28, 2013

Thinking about this some more, how about we:

  • Leave DAOFactory.create() and DAOFactory.bulkCreate() separate, since it should be a conscious choice by the caller which behaviour they wany
  • Rename DAOFactory.bulkUpdate() to update() and DAOFactory.bulkDelete() to destroy() to be consistent with DAO.
  • Leave the differences in QueryInterface intact: these are all two-liners that espouse different behaviour wrt to the inclusion of a DAO or not, and making them conditional on whether null is passed for the first dao parameter would likely make them more confusing and harder to test
  • Leave QueryGenerator.bulkInsert() separate from QueryGenerator.insert() in each dialect, because these are again quite different in behaviour and what they generate, but look for any more opportunities to factor out shared functionality
  • Unify QueryGenerator.bulkDelete and QueryGenerator.delete in each dialect by passing {limit: null} in options to say "I don't want to limit how many records are deleted".
@optilude

This comment has been minimized.

Contributor

optilude commented Apr 28, 2013

I've made the changes suggested above in the above commits.

I also realised there were no tests for deleteQuery() in sqlite, which I've now added. That uncovered an error (use of this.getWhereConditions() instead of MySqlQueryGenerator.getWhereConditions()) that I believe also exists in master, independent of this pull request.

@optilude

This comment has been minimized.

Contributor

optilude commented May 4, 2013

Any news on merging this one? I'd like to be able to close the branch (and I need this feature :-))

@optilude

This comment has been minimized.

Contributor

optilude commented May 6, 2013

I've just rebased and re-pushed this onto the latest Sequelize master. The tests are passing here. TravisCI is complaining about the postgres-native build, but seems to die before it even runs any tests.

@janmeier

View changes

spec/dao-factory.spec.js Outdated
expect(parseInt(+users[0].createdAt/5000)).toEqual(parseInt(+new Date()/5000))
expect(parseInt(+users[1].createdAt/5000)).toEqual(parseInt(+new Date()/5000))
expect(parseInt(+users[2].createdAt)).not.toEqual(parseInt(+users[0].createdAt/5000))

This comment has been minimized.

@janmeier

janmeier May 7, 2013

Member

Why no / 5000 for users[2]?

And shouldn't this be testing on updatedAt?

This comment has been minimized.

@optilude

optilude May 7, 2013

Contributor

Actually, this test is bunk. It relies on time precision that doesn't seem to work reliably. I'm going to simplify it.

@janmeier

View changes

spec/dao-factory.spec.js Outdated
expect(parseInt(+users[0].deletedAt/5000)).toEqual(parseInt(+new Date()/5000))
expect(parseInt(+users[1].deletedAt/5000)).toEqual(parseInt(+new Date()/5000))
expect(parseInt(+users[2].deletedAt)).not.toEqual(parseInt(+new Date()/5000))

This comment has been minimized.

@janmeier

janmeier May 7, 2013

Member

Again, shouldn't this be divided by 5000 as well?

expect(users[0].username).toEqual("Bill")
expect(users[1].username).toEqual("Bill")
expect(users[2].username).toEqual("Bob")

This comment has been minimized.

@janmeier

janmeier May 7, 2013

Member

I would rather test this with something like

users.each(function (user) {
    if (user.secretValue == '42') {
        expect(user.username).toEqual("Bill");
    } else {
        expect(user.username).toEqual("Bob");
    }
});

I know those assertion do exactly the same as yours, but here it as totally clear that if secretValue was 42 it should have been updated, and if not it should remain Bob

This comment has been minimized.

@optilude

optilude May 7, 2013

Contributor

Nice idea. Done.

@janmeier

This comment has been minimized.

Member

janmeier commented May 7, 2013

Loving it!

@janmeier

This comment has been minimized.

Member

janmeier commented May 8, 2013

Awesome, if you rebase I'll merge it :)

@optilude

This comment has been minimized.

Contributor

optilude commented May 9, 2013

Ok, I've just rebased (and force-pushed, because at least then I know it runs locally). Please merge.

janmeier added a commit that referenced this pull request May 9, 2013

Merge pull request #569 from optilude/master
Support for bulk insert, update and delete operations at the DAOFactory level

@janmeier janmeier merged commit fedb9cd into sequelize:master May 9, 2013

1 check failed

default The Travis CI build failed
Details
@janmeier

This comment has been minimized.

Member

janmeier commented May 9, 2013

Really solid contributions @optilude, your PRs are some of the best documented I have ever seen.
If you feel like it, feel free to join us in the IRC channel, #sequelizejs at freenode

@optilude

This comment has been minimized.

Contributor

optilude commented May 10, 2013

Thanks @janmeier :) I know first hand that's much easier to evaluate a patch when you have some context of the thinking behind it.

I do hang out in the IRC channel from time to time, though often at odd times of the day.

@joaomaia

This comment has been minimized.

joaomaia commented May 10, 2013

This bulkCreate() thing is a great addon. I'm using it already.
Contratulations @optilude!

@calekennedy

This comment has been minimized.

calekennedy commented May 14, 2014

Is there a way to use Object.bulkCreate(data) to also set foreign key associations? For instance a user may have a car. Is there anyway to bulk create all of those users with pointers to their cars?

@mickhansen

This comment has been minimized.

Contributor

mickhansen commented May 14, 2014

@calekennedy if user belongsTo car you can just pass in car_id (or whatever the foreignKey) is with the rest of the values.

@jmejiaz

This comment has been minimized.

jmejiaz commented Sep 22, 2016

Hi I have an array of instances, then I change all the instances in the arren, When I use:
ModelName.bulkUpdate(myArray);
Throws bulkUpdate is not a function.
I have the 3.23.6 version, Im doing in this way beacause I need all the updates in a transaction, thanks in advance.

@felixfbecker

This comment has been minimized.

Contributor

felixfbecker commented Sep 22, 2016

@jmejiaz you can them in a transaction with sequelize.transaction().

@rahman2k9

This comment has been minimized.

rahman2k9 commented Nov 2, 2016

models.Task.beforeBulkCreate(function(user) {
user.forEach(function(u) {
return models.User.findOne({
where:{
id: u.UserId
}
}).then(function(data){
if(!data)
throw new models.Sequelize.ValidationErrorItem(u.UserId +"Does not exists","UserId","UserId",u.UserId)
})
})
})

models.Task.bulkCreate([{
title: "abdul rahman",
UserId: 1121
},{
title: "abdul rahman",
UserId: 2222222
}]).then(function() {
res.redirect('/');
}).catch(function(e){
console.log(e);
});

I have given code i want to insert bulk records but before inserting i have to check whether user exists or not if does not exist i have throw validation error that user does not exist as user can easily get what is error.

Kindly suggest me a way I have lost my two days in searching used async methods series foreach waterfall and all but could not get success.

@felixfbecker

This comment has been minimized.

Contributor

felixfbecker commented Nov 2, 2016

@rahman2k9 please ask general questions like this on StackOverflow or Slack.

@sequelize sequelize locked and limited conversation to collaborators Nov 2, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.