New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add insert_many to ActiveRecord models #35077
Conversation
acb4250
to
261dde9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always wanted bulk inserts - not sure if there's a reason we haven't written this feature in the past.
I left some comments for changes I think this needs. Can you also run a benchmark for inserting multiple records with many insert
calls vs insert_many
?
A few notes on the API:
|
activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb
Outdated
Show resolved
Hide resolved
David, thanks so much for the code review! Great feedback! @dhh and @eileencodes, I reworked this and pushed up a new commit. Here’s a summary of my changes:
:on_duplicate and :conflict_targetNew usage exampleBook.insert_all books, on_duplicate: :raise # default behavior — plain ol' INSERT
Book.insert_all books # same as above
Book.insert_all books, on_duplicate: :skip # skips duplicates
Book.insert_all books, on_duplicate: :update # upsert
# conflict_target specifies an index to handle conflicts for. This is like saying:
# - raise if we're inserting a duplicate primary key
# - skip if we're inserting a duplicate ISBN
Book.insert_all books,
on_duplicate: :skip,
conflict_target: { columns: %w{ isbn }, where: "published_on IS NOT NULL" } Notes
Command ObjectI'm totally open to extracting Separate methodsThere are three strategies supported here: vanilla-insert, insert-skip-duplicates, and upsert. I found, when trying to tease them apart, that they share 90% of the same concerns. Most of the heavy lifting is just constructing a bulk INSERT statement; but even # See <tt>ActiveRecord::Persistence#insert_many</tt> for documentation.
def upsert(attributes, options = {})
insert(attributes, options.merge(on_duplicate: :update))
end
# See <tt>ActiveRecord::Persistence#insert_many</tt> for documentation.
def upsert_all(attributes, options = {})
insert_all(attributes, options.merge(on_duplicate: :update))
end
# See <tt>ActiveRecord::Persistence#insert_many</tt> for documentation.
def insert(attributes, options = {})
insert_all([ attributes ], options)
end 👆 In particular, is that an OK way of documenting them? |
Performance
I ran this test code: def test_insert_performance
books = 1_000.times.map { { name: "Rework" } }
Benchmark.bmbm do |x|
x.report("create") { books.each { |book| Book.create!(book) } }
x.report("insert") { books.each { |book| Book.insert_all([book]) } }
x.report("insert_all") { Book.insert_all(books) }
end
end and got these results:
I also ran it with 100 books and the ratios were about the same. |
Bob, I think if you split out a command object, then you won’t have to route all the insert/upsert methods through the public insert_all method, and therefore don’t need to have the insert_all method carry all these options that we’re delegating to specific methods. I’d also like to skip having the on_duplicates option entirely, actually. We can get there with upsert and bang methods. So insert!/insert_all! will raise on dupes, the non-bang methods won’t. The conflict_target API isn’t quite my cup’o’tea either. Have you actually used this in anger in a real app? I’m even less enthused about it since it’s a pgsql specific set of operations. I’d prefer to move forward with the rest of this without a pgsql specific route. Then treat that as a concern to deal with afterwards. |
David, I really appreciate your feedback! 👍 I'll refactor to extract a command object. Conflict TargetI think I haven't quite gotten the API to express when In Postgres and SQLite (SQLite supports the same set of ops as Postgres), conflict target is optional in the expression In my PR, I default
Bang Methods🤔 I like the simplicity of the bang v. non-bang methods — but I have a couple of questions about what it would express:
|
Thinking about it just a little more ... would And if we split If you're still good with it, I think your original suggestion of 👆 I'll push something tomorrow. |
Bang vs non-bang is just about setting expectations of what's going to happen. With AR, we've set the expectation that bang-methods will raise an exception. There's no guarantee on what kind will be raised. So fine raising a different kind of exception. So I'd like to stick with that. That means insert/insert_all will just skip dupes, no errors. insert!/insert_all! will raise an appropriate exception on dupes. So conflict target is only necessary for upsert, yeah? Trying to understand the feature in fully. You specify a conflict_target when you want upsert to raise an exception given a unique key violation? |
👍 It's necessary for upsert, optional for skip-duplicates. You specify conflict target to say when to do an upsert — i.e. CT answers "do an upsert when a new record is in conflict with which unique index?" Example: Given a table with two unique indexes (one on create_table :books, id: :integer, force: true do |t|
t.column :title, :string
t.column :author, :string
t.index [:author, :title], unique: true
end Without specifying a conflict target, # Given
Book.create! id: 1, title: "Rework", author: "David"
# violation of index on id, skipped
Book.insert_all [{ id: 1, title: "Refactoring", author: "Martin" }],
on_duplicate: :skip
# violation of index on author+title, skipped
Book.insert_all [{ id: 2, title: "Rework", author: "David" }],
on_duplicate: :skip
If you specify a conflict target, INSERT will skip records that violate only the specified unique index (and raise if your record violates a different index): # Given
Book.create! id: 1, title: "Rework", author: "David"
# violation of index on author+title, skipped
Book.insert_all [{ id: 2, title: "Rework", author: "David" }],
on_duplicate: :skip,
conflict_target: %w{ author title }
# violation of index on id, raises ActiveRecord::RecordNotUnique
Book.insert_all [{ id: 1, title: "Refactoring", author: "Martin" }],
on_duplicate: :skip,
conflict_target: %w{ author title }
For upsert you must specify a conflict target, so you can only UPSERT on violations of one unique index (and it'll raise if your record violates a different index) # Given
Book.create! id: 1, title: "Rework", author: "David"
# violation of index on id, Refactoring overwrites Rework
Book.insert_all [{ id: 1, title: "Refactoring", author: "Martin" }],
on_duplicate: :update,
conflict_target: { columns: %w{ id } }
# violation of index on author+title, raises ActiveRecord::RecordNotUnique
Book.insert_all [{ id: 2, title: "Refactoring", author: "Martin" }],
on_duplicate: :update,
conflict_target: { columns: %w{ id } }
|
c688fef
to
58359e4
Compare
I just pushed a commit that:
|
80eeca8
to
99d56f4
Compare
activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb
Outdated
Show resolved
Hide resolved
@dhh, I pushed these changes (good call on all of them :+1:) as separate commits (if that's easier to review):
"conflict target"I'm on board with departing from the Postgres/SQLite docs on that awkward name 😄. So this option is doing something a bit like the block you pass to books.index_by(&:isbn)
books.uniq_by { |book| [ book.author_id, book.title ] } How about taking inspiration from those method names? (unique_by or distinct_by?) Book.insert_all([
{ id: 1, title: 'Rework', author: 'David' },
{ id: 1, title: 'Eloquent Ruby', author: 'Russ' }
], unique_by: %i[ author_id title ]) Alternately, with this key, we're telling the database how to identify existing records. Maybe the word identity is the key: Book.insert_all([
{ id: 1, title: 'Rework', author: 'David' },
{ id: 1, title: 'Eloquent Ruby', author: 'Russ' }
], identity: %i[ author_id title ]) Or, in the docs — and in our discussion — we say this is useful if you have more than one unique index on a table, so maybe we double down on unique index: Book.insert_all([
{ id: 1, title: 'Rework', author: 'David' },
{ id: 1, title: 'Eloquent Ruby', author: 'Russ' }
], unique_index: %i[ author_id title ]) What do you think of these options? Query BuildersDo you mind sharing just a little more about what you're picturing here?
|
Excellent, Bob. I'm thinking something like I like |
✅ Made both changes, @dhh! I like how the SqlBuilders turned out — good call on that. Pulling out objects revealed opportunities I wasn't expecting to extract little methods which, in turn, gave names to more of what was going on. Thanks for all the feedback! |
activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb
Outdated
Show resolved
Hide resolved
@brandoncc feel free to review and test
#35631 then.
Dne ne 17. 3. 2019 5:12 uživatel Brandon Conway <notifications@github.com>
napsal:
… I'm really excited to see this be built into Rails, nice job @boblail
<https://github.com/boblail>! I currently use
https://github.com/zdennis/activerecord-import in a couple of projects
and have needed to provide raw sql for the update logic. I second @palkan
<https://github.com/palkan>'s idea for that ability.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#35077 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAL1kFFL4oZpOgwrNVG0TnScEFt_va1Yks5vXc7EgaJpZM4aXRbm>
.
|
…#upsert etc. methods In rails#35077, `#insert_all` / `#upsert_all` / `#insert` / `#upsert` etc. methods are added. But Active Record logs only “Bulk Insert” log messages when they are invoked. This commit improves the log messages to use collect words for how invoked them.
Just start porting the existing code to Found one interesting case: the order of the columns in
Should we also be order-independent then? |
# { title: 'Eloquent Ruby', author: 'Russ' } | ||
# ]) | ||
# | ||
# # raises ActiveRecord::RecordNotUnique beacuse 'Eloquent Ruby' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beacuse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vitalyp : It is fixed on master branch -> https://github.com/rails/rails/blob/master/activerecord/lib/active_record/persistence.rb#L171
# | ||
# See <tt>ActiveRecord::Persistence#insert_all</tt> for documentation. | ||
def insert(attributes, returning: nil, unique_by: nil) | ||
insert_all([ attributes ], returning: returning, unique_by: unique_by) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be super-cool if create/save could also take a returning
argument so that I could tell Rails the db is going to calculate some additional values it should set on the instance. Does that seems like a reasons feature request? I could work on it. I wouldn't normally bring it up this way, but it seems like this PR paves the way for something like this.
Add insert_many to ActiveRecord models rails/rails#35077
I'm curious how you all get around autogenerated fields that are populated from Ruby and not inside the database. If I try to run this:
And I have nothing in the database then I get 2 books with null |
@schneems if I remember well we have considered this more advanced API, similar to You can add |
Thanks. I was already sending over Thanks for the reply, i'll move forward with the duplicate timestamps |
I had the thought that I could add a default value to change_column_default :issues, :created_at, from: nil, to: 'NOW()' and change_column :issues, :created_at, :datetime, :default => "NOW()" |
Here's the query one of my upsert_all calls generates: INSERT INTO "issues"("repo_id","title","url","state","html_url","number","pr_attached","last_touched_at","updated_at","created_at") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10), ($11, $12, $13, $14, $15, $16, $17, $18, $19, $20), ($21, $22, $23, $24, $25, $26, $27, $28, $29, $30), ($31, $32, $33, $34, $35, $36, $37, $38, $39, $40), ($41, $42, $43, $44, $45, $46, $47, $48, $49, $50), ($51, $52, $53, $54, $55, $56, $57, $58, $59, $60), ($61, $62, $63, $64, $65, $66, $67, $68, $69, $70), ($71, $72, $73, $74, $75, $76, $77, $78, $79, $80), ($81, $82, $83, $84, $85, $86, $87, $88, $89, $90), ($91, $92, $93, $94, $95, $96, $97, $98, $99, $100), ($101, $102, $103, $104, $105, $106, $107, $108, $109, $110), ($111, $112, $113, $114, $115, $116, $117, $118, $119, $120), ($121, $122, $123, $124, $125, $126, $127, $128, $129, $130), ($131, $132, $133, $134, $135, $136, $137, $138, $139, $140), ($141, $142, $143, $144, $145, $146, $147, $148, $149, $150), ($151, $152, $153, $154, $155, $156, $157, $158, $159, $160), ($161, $162, $163, $164, $165, $166, $167, $168, $169, $170), ($171, $172, $173, $174, $175, $176, $177, $178, $179, $180), ($181, $182, $183, $184, $185, $186, $187, $188, $189, $190), ($191, $192, $193, $194, $195, $196, $197, $198, $199, $200), ($201, $202, $203, $204, $205, $206, $207, $208, $209, $210), ($211, $212, $213, $214, $215, $216, $217, $218, $219, $220), ($221, $222, $223, $224, $225, $226, $227, $228, $229, $230), ($231, $232, $233, $234, $235, $236, $237, $238, $239, $240), ($241, $242, $243, $244, $245, $246, $247, $248, $249, $250), ($251, $252, $253, $254, $255, $256, $257, $258, $259, $260), ($261, $262, $263, $264, $265, $266, $267, $268, $269, $270), ($271, $272, $273, $274, $275, $276, $277, $278, $279, $280), ($281, $282, $283, $284, $285, $286, $287, $288, $289, $290), ($291, $292, $293, $294, $295, $296, $297, $298, $299, $300) ON CONFLICT ("number","repo_id") DO UPDATE SET "title"=excluded."title","url"=excluded."url","state"=excluded."state","html_url"=excluded."html_url","pr_attached"=excluded."pr_attached","last_touched_at"=excluded."last_touched_at","updated_at"=excluded."updated_at","created_at"=excluded."created_at" RETURNING "id" I think that what I would ideally like is to be able to say something like: Issue.upsert_all(upsert_mega_array, unique_by: [:number, :repo_id], on_conflict_skip: [:created_at]) So that way I could specify that my new records should get the created_at value, but existing records should keep their values. |
@schneems I was trying to build something to cover this kind of common problems, but I wasn't successful. I will be more than happy to work more on this feature, but I think we were missing a decision where to move this. I can try to revive my work on relations and also |
@schneems, what if you try wrapping the default value in a lambda? change_column_default :issues, :created_at, from: nil, to: ->() { 'NOW()' } (that's always worked for me) |
It's great that |
@schneems and the future readers who want to prevent updating the class MyRecord < ActiveRecord::Base
attr_readonly :created_at
end |
Options
[:returning]
(Postgres-only) An array of attributes that should be returned for all successfully inserted records. For databases that support
INSERT ... RETURNING
, this will default to returning the primary keys of the successfully inserted records. Passreturning: %w[ id name ]
to return the id and name of every successfully inserted record or passreturning: false
to omit the clause.[:unique_by]
(Postgres and SQLite only) In a table with more than one unique constaint or index, new records may considered duplicates according to different criteria. For MySQL, an upsert will take place if a new record violates any unique constraint. For Postgres and SQLite, new rows will replace existing rows when the new row has the same primary key as the existing row. By defining :unique_by, you can supply a different key for matching new records to existing ones than the primary key.
(For example, if you have a unique index on the ISBN column and use that as the :unique_by, a new record with the same ISBN as an existing record will replace the existing record but a new record with the same primary key as an existing record will raise
ActiveRecord::RecordNotUnique
.)Indexes can be identified by an array of columns:
Partial indexes can be identified by an array of columns and a :where condition:
Examples