Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add insert_many to ActiveRecord models #35077

Merged
merged 1 commit into from Mar 5, 2019

Conversation

Projects
None yet
@boblail
Copy link
Contributor

boblail commented Jan 29, 2019

Options

  • [:returning]
    (Postgres-only) An array of attributes that should be returned for all successfully inserted records. For databases that support INSERT ... RETURNING, this will default to returning the primary keys of the successfully inserted records. Pass returning: %w[ id name ] to return the id and name of every successfully inserted record or pass returning: false to omit the clause.

  • [:unique_by]
    (Postgres and SQLite only) In a table with more than one unique constaint or index, new records may considered duplicates according to different criteria. For MySQL, an upsert will take place if a new record violates any unique constraint. For Postgres and SQLite, new rows will replace existing rows when the new row has the same primary key as the existing row. By defining :unique_by, you can supply a different key for matching new records to existing ones than the primary key.

    (For example, if you have a unique index on the ISBN column and use that as the :unique_by, a new record with the same ISBN as an existing record will replace the existing record but a new record with the same primary key as an existing record will raise ActiveRecord::RecordNotUnique.)

    Indexes can be identified by an array of columns:

    unique_by: { columns: %w[ isbn ] }

    Partial indexes can be identified by an array of columns and a :where condition:

    unique_by: { columns: %w[ isbn ], where: "published_on IS NOT NULL" }

Examples

# Insert multiple records, performing an upsert when records have duplicate ISBNs
# ('Eloquent Ruby' will overwrite 'Rework' because its ISBN is duplicate)
Book.upsert_all([
  { title: 'Rework', author: 'David', isbn: '1' },
  { title: 'Eloquent Ruby', author: 'Russ', isbn: '1' }
],
   unique_by: { columns: %w[ isbn ] })

@rails-bot rails-bot bot added the activerecord label Jan 29, 2019

@boblail boblail force-pushed the boblail:insert_many branch 2 times, most recently from acb4250 to 261dde9 Jan 29, 2019

@eileencodes
Copy link
Member

eileencodes left a comment

I've always wanted bulk inserts - not sure if there's a reason we haven't written this feature in the past.

I left some comments for changes I think this needs. Can you also run a benchmark for inserting multiple records with many insert calls vs insert_many?

Show resolved Hide resolved activerecord/lib/active_record/persistence.rb Outdated
Show resolved Hide resolved activerecord/lib/active_record/persistence.rb Outdated
Show resolved Hide resolved activerecord/lib/active_record/persistence.rb Outdated
Show resolved Hide resolved activerecord/test/cases/persistence_test.rb Outdated

@boblail boblail force-pushed the boblail:insert_many branch from 261dde9 to 5b9a2ea Feb 1, 2019

@dhh

This comment has been minimized.

Copy link
Member

dhh commented Feb 1, 2019

A few notes on the API:

  1. I don't think #insert_all should take anything but an array. If we want a raw insert method for a single record, it should have it's on method. Could just be #insert.
  2. I'd rather break out the on_conflict options into a few different things. First, we can do skip_duplicates: true for that part. Second, we can have an explicit method called #upsert/upsert_all to handle upserts. Still need an api for how to declare the index logic, but it'll be less nested.

@boblail boblail force-pushed the boblail:insert_many branch from 5b9a2ea to 12ca9e9 Feb 2, 2019

@boblail

This comment has been minimized.

Copy link
Contributor Author

boblail commented Feb 2, 2019

David, thanks so much for the code review! Great feedback!


@dhh and @eileencodes, I reworked this and pushed up a new commit. Here’s a summary of my changes:

  • renamed the method insert_all
  • clarified docs and behavior of the returning option
  • added more tests
  • split :on_conflict into :on_duplicate and :conflict_target (more below)

:on_duplicate and :conflict_target

New usage example
Book.insert_all books, on_duplicate: :raise  # default behavior — plain ol' INSERT
Book.insert_all books                        # same as above
Book.insert_all books, on_duplicate: :skip   # skips duplicates
Book.insert_all books, on_duplicate: :update # upsert

# conflict_target specifies an index to handle conflicts for. This is like saying:
#   - raise if we're inserting a duplicate primary key
#   - skip if we're inserting a duplicate ISBN
Book.insert_all books,
  on_duplicate: :skip,
  conflict_target: { columns: %w{ isbn }, where: "published_on IS NOT NULL" }
Notes
  • I really like splitting :on_conflict into :on_duplicate and :conflict_target for two reasons:
    1. It's easier to express the default strategy with a value, :raise
    2. It's easier to express what a database adapter supports (MySQL supports the :skip and :update strategies using different syntax; but it doesn't support specifying a :conflict_target)
  • I picked the name :conflict_target because it's the name Postgres's docs use.
  • :conflict_target is still a nested hash, but it feels more similar to the options you can pass to :index or :foreign_key in TableDefinition#column

Command Object

I'm totally open to extracting DatabaseStatements#insert_all to a Command Object. I tried two other refactors first (consolidating input validation and extracting prepare_keys_and_values_for_insert). Now all but the last 3 lines of the method have one responsibility, validating/normalizing input. I'm curious what your thoughts are about that!

Separate methods

There are three strategies supported here: vanilla-insert, insert-skip-duplicates, and upsert. I found, when trying to tease them apart, that they share 90% of the same concerns. Most of the heavy lifting is just constructing a bulk INSERT statement; but even :conflict_target is a concern shared by both the insert-skip-duplicates and upsert strategies. I think it makes sense to have a common insert_all method (at least on the connection). Would you like me to add three shorthand methods to Persistence? I'm thinking:

# See <tt>ActiveRecord::Persistence#insert_many</tt> for documentation.
def upsert(attributes, options = {})
  insert(attributes, options.merge(on_duplicate: :update))
end

# See <tt>ActiveRecord::Persistence#insert_many</tt> for documentation.
def upsert_all(attributes, options = {})
  insert_all(attributes, options.merge(on_duplicate: :update))
end

# See <tt>ActiveRecord::Persistence#insert_many</tt> for documentation.
def insert(attributes, options = {})
  insert_all([ attributes ], options)
end

👆 In particular, is that an OK way of documenting them?

@boblail

This comment has been minimized.

Copy link
Contributor Author

boblail commented Feb 2, 2019

Performance

Can you also run a benchmark for inserting multiple records with many insert calls vs insert_many?

I ran this test code:

def test_insert_performance
  books = 1_000.times.map { { name: "Rework" } }
  Benchmark.bmbm do |x|
    x.report("create") { books.each { |book| Book.create!(book) } }
    x.report("insert") { books.each { |book| Book.insert_all([book]) } }
    x.report("insert_all") { Book.insert_all(books) }
  end
end

and got these results:

mysql2
--------------------------------------------------------
                 user     system      total        real
create       0.591878   0.074504   0.666382 (  0.775443)
insert       0.157255   0.025772   0.183027 (  0.251659)
insert_all   0.014531   0.000114   0.014645 (  0.030236)

sqlite3
--------------------------------------------------------
                 user     system      total        real
create       0.619300   0.036571   0.655871 (  0.656279)
insert       0.174633   0.012864   0.187497 (  0.187530)
insert_all   0.018640   0.000192   0.018832 (  0.018902)

postgresql
--------------------------------------------------------
                 user     system      total        real
create       0.642238   0.076162   0.718400 (  0.941469)
insert       0.189128   0.026410   0.215538 (  0.376370)
insert_all   0.016598   0.000107   0.016705 (  0.033562)

I also ran it with 100 books and the ratios were about the same.

@dhh

This comment has been minimized.

Copy link
Member

dhh commented Feb 3, 2019

Bob, I think if you split out a command object, then you won’t have to route all the insert/upsert methods through the public insert_all method, and therefore don’t need to have the insert_all method carry all these options that we’re delegating to specific methods.

I’d also like to skip having the on_duplicates option entirely, actually. We can get there with upsert and bang methods. So insert!/insert_all! will raise on dupes, the non-bang methods won’t.

The conflict_target API isn’t quite my cup’o’tea either. Have you actually used this in anger in a real app? I’m even less enthused about it since it’s a pgsql specific set of operations. I’d prefer to move forward with the rest of this without a pgsql specific route. Then treat that as a concern to deal with afterwards.

@boblail

This comment has been minimized.

Copy link
Contributor Author

boblail commented Feb 4, 2019

David, I really appreciate your feedback!

👍 I'll refactor to extract a command object.

Conflict Target

I think I haven't quite gotten the API to express when :conflict_target is required...

In Postgres and SQLite (SQLite supports the same set of ops as Postgres), conflict target is optional in the expression ON CONFLICT DO NOTHING but required in the expression ON CONFLICT (*something*) DO UPDATE ....

In my PR, I default :conflict_target to the primary key (which maybe isn't the right default for ON CONFLICT DO NOTHING); but you'd need to set :conflict_target any time you've added a unique index other than the primary key. —and that was an important use case for me (I extracted this PR from a product). I used :conflict_target heavily in import logic

My scenario
Customers import frequently, not just once; and imports have to be fast and idempotent.

Bang Methods

🤔 I like the simplicity of the bang v. non-bang methods — but I have a couple of questions about what it would express:

  1. Would it be misleading if the bang in insert_all/insert_all! controls a different exception than the bang in save/save!, create/create!, and update_attributes/update_attributes!?
    • In the existing methods, the bang methods raise ActiveRecord::RecordInvalid, but neither insert_all nor insert_all! would raise ActiveRecord::RecordInvalid.
    • In the existing methods, even the non-bang methods raise ActiveRecord::RecordNotUnique, but insert_all wouldn't.
  2. Should the difference been vanilla-insert and insert-skip-duplicates be more obvious? save does the same thing as save! (and fails under the same circumstances), it just returns false instead of raising; but the proposed insert_all and insert_all! would actually do different things to the database. Should that be made more explicit?
@boblail

This comment has been minimized.

Copy link
Contributor Author

boblail commented Feb 4, 2019

Thinking about it just a little more ... would :index or :unique_index be a better name than :conflict_target?

And if we split insert_all from upsert_all, they would have simpler options apiece since insert_all wouldn't care about "conflict target" and upsert_all wouldn't need a strategy argument.

If you're still good with it, I think your original suggestion of skip_duplicates: true plays a little better than bang/no-bang versions of insert_all.

👆 I'll push something tomorrow.

@dhh

This comment has been minimized.

Copy link
Member

dhh commented Feb 4, 2019

Bang vs non-bang is just about setting expectations of what's going to happen. With AR, we've set the expectation that bang-methods will raise an exception. There's no guarantee on what kind will be raised. So fine raising a different kind of exception. So I'd like to stick with that.

That means insert/insert_all will just skip dupes, no errors. insert!/insert_all! will raise an appropriate exception on dupes.

So conflict target is only necessary for upsert, yeah? Trying to understand the feature in fully. You specify a conflict_target when you want upsert to raise an exception given a unique key violation?

@boblail

This comment has been minimized.

Copy link
Contributor Author

boblail commented Feb 4, 2019

So conflict target is only necessary for upsert, yeah? Trying to understand the feature in fully. You specify a conflict_target when you want upsert to raise an exception given a unique key violation?

👍

It's necessary for upsert, optional for skip-duplicates.

You specify conflict target to say when to do an upsert — i.e. CT answers "do an upsert when a new record is in conflict with which unique index?"


Example:

Given a table with two unique indexes (one on id, one on [author, title])

create_table :books, id: :integer, force: true do |t|
  t.column :title, :string
  t.column :author, :string
  t.index [:author, :title], unique: true
end

Without specifying a conflict target, INSERT...DO NOTHING will skip a record that violates any unique index.

# Given
Book.create! id: 1, title: "Rework", author: "David"

# violation of index on id, skipped
Book.insert_all [{ id: 1, title: "Refactoring", author: "Martin" }],
  on_duplicate: :skip

# violation of index on author+title, skipped
Book.insert_all [{ id: 2, title: "Rework", author: "David" }],
  on_duplicate: :skip

Comment: Skipping inserts when they violate any unique index seems like a sensible default for INSERT ... DO NOTHING

If you specify a conflict target, INSERT will skip records that violate only the specified unique index (and raise if your record violates a different index):

# Given
Book.create! id: 1, title: "Rework", author: "David"

# violation of index on author+title, skipped
Book.insert_all [{ id: 2, title: "Rework", author: "David" }],
  on_duplicate: :skip,
  conflict_target: %w{ author title }

# violation of index on id, raises ActiveRecord::RecordNotUnique
Book.insert_all [{ id: 1, title: "Refactoring", author: "Martin" }],
  on_duplicate: :skip,
  conflict_target: %w{ author title }

Comment: I have wanted to specify "conflict target" in a INSERT ... DO NOTHING when I was working on an app with UUID ids, which could be generated on the client, and I'd want want to raise-not-skip in the unlikely event the client passed a given UUID twice.

For upsert you must specify a conflict target, so you can only UPSERT on violations of one unique index (and it'll raise if your record violates a different index)

# Given
Book.create! id: 1, title: "Rework", author: "David"

# violation of index on id, Refactoring overwrites Rework
Book.insert_all [{ id: 1, title: "Refactoring", author: "Martin" }],
  on_duplicate: :update,
  conflict_target: { columns: %w{ id } }

# violation of index on author+title, raises ActiveRecord::RecordNotUnique
Book.insert_all [{ id: 2, title: "Refactoring", author: "Martin" }],
  on_duplicate: :update,
  conflict_target: { columns: %w{ id } }

Comment: I find that I almost always want to specify a conflict target when I do INSERT ... DO UPDATE (because I seldom include id in the list of attributes I'm inserting; and upsert only makes sense if I we've got some other column(s) acting as a unique identifier for a record).

@boblail boblail force-pushed the boblail:insert_many branch 2 times, most recently from c688fef to 58359e4 Feb 6, 2019

@boblail

This comment has been minimized.

Copy link
Contributor Author

boblail commented Feb 6, 2019

I just pushed a commit that:

  • Extracts a command object
  • Splits the API into six methods
    • insert! / insert_all!
    • insert / insert_all (skip duplicates)
    • upsert / upsert_all

@boblail boblail force-pushed the boblail:insert_many branch 3 times, most recently from 80eeca8 to 99d56f4 Feb 6, 2019

Show resolved Hide resolved ...rd/lib/active_record/connection_adapters/abstract/database_statements.rb Outdated
Show resolved Hide resolved activerecord/lib/active_record/insert_all.rb Outdated
Show resolved Hide resolved activerecord/lib/active_record/insert_all.rb Outdated
Show resolved Hide resolved activerecord/lib/active_record/insert_all.rb Outdated
Show resolved Hide resolved activerecord/lib/active_record/insert_all.rb Outdated
Show resolved Hide resolved activerecord/lib/active_record/persistence.rb Outdated
Show resolved Hide resolved activerecord/lib/active_record/persistence.rb Outdated
Show resolved Hide resolved activerecord/lib/active_record/persistence.rb Outdated
Show resolved Hide resolved activerecord/lib/active_record/persistence.rb Outdated
Show resolved Hide resolved activerecord/lib/active_record/persistence.rb Outdated
@boblail

This comment has been minimized.

Copy link
Contributor Author

boblail commented Feb 11, 2019

@dhh,

I pushed these changes (good call on all of them 👍) as separate commits (if that's easier to review):

  • stack assignments on one line
  • prefer %w[ over %w{
  • replace guard clause
  • inline delegates (except :connection — it's used 12 times)
  • move prepare_keys_and_values_for_insert vs. throwing away _keys
  • extract conflict_columns
  • remove unnecessary memoization
  • gather argument-checking to InsertAll
  • improve documentation
  • use keyword args for options
  • sort methods in Table-of-Contents order

"conflict target"

I'm on board with departing from the Postgres/SQLite docs on that awkward name 😄.

So this option is doing something a bit like the block you pass to #uniq_by or #index_by:

books.index_by(&:isbn)
books.uniq_by { |book| [ book.author_id, book.title ] }

How about taking inspiration from those method names? (unique_by or distinct_by?)

Book.insert_all([
  { id: 1, title: 'Rework', author: 'David' },
  { id: 1, title: 'Eloquent Ruby', author: 'Russ' }
], unique_by: %i[ author_id title ])

Alternately, with this key, we're telling the database how to identify existing records. Maybe the word identity is the key:

Book.insert_all([
  { id: 1, title: 'Rework', author: 'David' },
  { id: 1, title: 'Eloquent Ruby', author: 'Russ' }
], identity: %i[ author_id title ])

Or, in the docs — and in our discussion — we say this is useful if you have more than one unique index on a table, so maybe we double down on unique index:

Book.insert_all([
  { id: 1, title: 'Rework', author: 'David' },
  { id: 1, title: 'Eloquent Ruby', author: 'Russ' }
], unique_index: %i[ author_id title ])

What do you think of these options?

Query Builders

Do you mind sharing just a little more about what you're picturing here?

  • Two new objects? (A generic builder and a MySQL builder?)
  • Would #connection still respond to build_insert_all_sql (its implementation extracted) or would it have a factory method for returning the right builder (insert_all_builder?)

@boblail boblail force-pushed the boblail:insert_many branch from 99d56f4 to 31dfe0c Feb 11, 2019

@dhh

This comment has been minimized.

Copy link
Member

dhh commented Feb 11, 2019

Excellent, Bob.

I'm thinking something like ActiveRecord::InsertAll::SqlBuilder and ActiveRecord::InsertAll::SqlBuilder::MySQL < ActiveRecord::InsertAll::SqlBuilder, and a way to lookup which builder you need based on the connection.

I like unique_by for the parameter 👍

@boblail boblail force-pushed the boblail:insert_many branch from 31dfe0c to ec34815 Feb 12, 2019

@boblail

This comment has been minimized.

Copy link
Contributor Author

boblail commented Feb 12, 2019

Made both changes, @dhh!

I like how the SqlBuilders turned out — good call on that. Pulling out objects revealed opportunities I wasn't expecting to extract little methods which, in turn, gave names to more of what was going on.

Thanks for all the feedback!

@kaspth
Copy link
Member

kaspth left a comment

Some more thoughts from yet another reviewer 😄

Show resolved Hide resolved activerecord/lib/active_record/insert_all.rb Outdated
values_list = insert_all.inserts.map do |attributes|
attributes = attributes.stringify_keys

unless attributes.keys.to_set == keys.to_set

This comment has been minimized.

@kaspth

kaspth Feb 26, 2019

Member

Calling keys on self twice in this loop seems needlessly wasteful. Especially since it creates an array and then maps it to strings.

We could do:

columns = …
keys    = insert_all.keys

values_list =

Or have insert_all memoize its keys.

This comment has been minimized.

@kaspth

kaspth Feb 26, 2019

Member

We should also add a validation in InsertAll that the "master" keys are uniq, so we don't have to use keys.to_set here.

We might even want to push this entire validation into InsertAll ala:

@keys = validate_keys(inserts)

def validate_keys(inserts)
  inserts.first.keys.map(&:to_s).uniq.tap do |keys|
    if inserts.any? { |insert| (insert.keys - keys).any? } # Aims to abort early. Catches non-unique keys in the first insert too.
      raise ArgumentError, "All inserts must have the same keys" # Simplified exception message too.
    end
  end
end
Show resolved Hide resolved activerecord/lib/active_record/insert_all.rb Outdated
Show resolved Hide resolved activerecord/lib/active_record/insert_all.rb Outdated

@boblail boblail force-pushed the boblail:insert_many branch from 9cf14ce to 987b6ba Feb 27, 2019

@boblail

This comment has been minimized.

Copy link
Contributor Author

boblail commented Feb 27, 2019

👋 Thanks for the feedback! I've applied changes to most of the comments from @tenderlove, @kamipo, and @kaspth in fixup! commits so its easy to review and to compare the before and after on the style changes. Also, I asked a couple of followup questions.

@boblail boblail force-pushed the boblail:insert_many branch 2 times, most recently from 828fe98 to 4cefb3b Feb 28, 2019

@boblail

This comment has been minimized.

Copy link
Contributor Author

boblail commented Feb 28, 2019

Pushed three more fixup! commits.

Let me know when you want me to squash them all 😄

Add insert_all to ActiveRecord models
Adds a method to ActiveRecord allowing records to be inserted in bulk without instantiating ActiveRecord models. This method supports options for handling uniqueness violations by skipping duplicate records or overwriting them in an UPSERT operation.

ActiveRecord already supports bulk-update and bulk-destroy actions that execute SQL UPDATE and DELETE commands directly. It also supports bulk-read actions through `pluck`. It makes sense for it also to support bulk-creation.

@boblail boblail force-pushed the boblail:insert_many branch from 4cefb3b to ae94122 Mar 5, 2019

@boblail

This comment has been minimized.

Copy link
Contributor Author

boblail commented Mar 5, 2019

Rebased, squashed, build passing — ready to merge? 😁 /cc @dhh, @eileencodes

(You can still see the fixup! commits here if you want to: https://github.com/rails/rails/commits/828fe98bf0b784d131eb8a151059ef3225db810b)

@eileencodes
Copy link
Member

eileencodes left a comment

Nice job!

@dhh dhh merged commit 91ed21b into rails:master Mar 5, 2019

3 checks passed

buildkite/rails Build #59234 passed (16 minutes, 40 seconds)
Details
codeclimate All good!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@dhh

This comment has been minimized.

Copy link
Member

dhh commented Mar 5, 2019

Great job, @boblail! You made it through quite the gauntlet 🌟

@boblail

This comment has been minimized.

Copy link
Contributor Author

boblail commented Mar 5, 2019

Thanks for your feedback, everyone — David especially — code review is time-consuming, and I appreciate your time. I'm very super-happy to contribute something I've been using back to Rails. 😄

@vishaltelangre

This comment has been minimized.

Copy link
Contributor

vishaltelangre commented Mar 7, 2019

@boblail I am trying to use #upsert_all as follows.

create_table :articles do |t|
      t.string :title, null: false
      t.string :slug, null: false
      t.string :author, null: false

      t.index :slug, unique: true
    end
Article.upsert_all(
  [
    {title: "Article 1", slug: "article", author: "Jane"},
    {title: "Article 2", slug: "article", author: "John"}
  ], 
  unique_by: { columns: %w[ slug ] }
)

As per this PR, it should replace Article 1 with Article 2 since its slug attribute is duplicate.

But I am getting the following error.

  Bulk Insert (4.9ms)  INSERT INTO "articles"("title","slug","author") VALUES ('Article 1', 'article', 'Jane'), ('Article 2', 'article', 'John') ON CONFLICT ("slug") DO UPDATE SET "title"=excluded."title","author"=excluded."author" RETURNING "id"
Traceback (most recent call last):
        1: from (irb):1
ActiveRecord::StatementInvalid (PG::CardinalityViolation: ERROR:  ON CONFLICT DO UPDATE command cannot affect row a second time)
HINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

I am using PostgreSQL 10.5. I am not sure why this is not working as per the docs described in this PR. Please let me know if you need some more information from my end about this issue.

P.S. Both #insert_all and #insert_all! are working as intended.

Thanks for the great work!

@eileencodes

This comment has been minimized.

Copy link
Member

eileencodes commented Mar 7, 2019

@vishaltelangre thanks for the report. Can you open a new issue for easier tracking with a script or repo that reproduces the issue?

We can't add closed PRs to our open milestones and this could easily get missed before Rails 6 is released.

@vishaltelangre

This comment has been minimized.

Copy link
Contributor

vishaltelangre commented Mar 7, 2019

Sure, @eileencodes. I will report a new issue.

@palkan

This comment has been minimized.

Copy link
Contributor

palkan commented Mar 11, 2019

Hi there!
Great feature @boblail 👍

I have a couple of suggestions (and I'd like to discuss them before proposing a PR; looks like this PR is a good place for that).

  1. What about adding an ability to provide raw SQL for update part of the upsert query?
    We might need this if we want to take into account the previous value of a column.

For example:

Article.insert_all(
 [
    {title: "Article 1", slug: "article-1", published: false},
    {title: "Article 2", slug: "article-2", published: false}
  ],
  # we don't want to change the published status of the already published article
  update_duplicates_with: Arel.sql("published = articles.published OR EXCLUDED.published")
)
  1. It would be great to have an ability to specify returning as a raw SQL string as well.
    For example:
Article.insert_all(
 [
    {title: "Article 1", slug: "article-1", published: false},
    {title: "Article 2", slug: "article-2", published: false}
  ],
  # Some PostgreSQL magic here to detect which rows have been actually inserted
  returning: Arel.sql("id, (xmax = '0') as inserted")
)

I'm currently running Rails 6 (beta 2), and these two things will block me from using a built-in upsert_all 😕

@boblail

This comment has been minimized.

Copy link
Contributor Author

boblail commented Mar 12, 2019

Thanks, @palkan!

I think both of those ideas have value! Maybe make them as separate PRs?

Passing a Arel::Nodes::SqlLiteral to returning: seems pretty straightforward since it wouldn't change the API of the methods.

update_duplicates_with: — that parameter name seems a little wordy 🤔 Trying out a few alternatives:

Article.upsert_all(articles, with: Arel.sql("published = articles.published OR EXCLUDED.published")
Article.upsert_all(articles, update: Arel.sql("published = articles.published OR EXCLUDED.published")
Article.upsert_all(articles, on_update: Arel.sql("published = articles.published OR EXCLUDED.published")
@palkan

This comment has been minimized.

Copy link
Contributor

palkan commented Mar 12, 2019

Maybe make them as separate PRs?

Yep, will definitely do)

update_duplicates_with: — that parameter name seems a little wordy 🤔 Trying out a few alternatives:

Yeah, I've also tried update_as and update_with–both doesn't seem clear to me.

Hm, maybe just update_sql?

@kaspth

This comment has been minimized.

Copy link
Member

kaspth commented Mar 12, 2019

Article.insert_all(
 [
    {title: "Article 1", slug: "article-1", published: false},
    {title: "Article 2", slug: "article-2", published: false}
  ],
  # we don't want to change the published status of the already published article
  update_duplicates_with: Arel.sql("published = articles.published OR EXCLUDED.published")
)

Isn't there a gotcha here? Wouldn't you need to manually write out the remaining UPDATE SET for the other attributes for the upsert to work?

@palkan

This comment has been minimized.

Copy link
Contributor

palkan commented Mar 12, 2019

Wouldn't you need to manually write out the remaining UPDATE SET for the other attributes for the upsert to work?

Yep, you need to specify all the attributes you want to be upserted.
In my case I don't need other attributes to be updated on conflict.

That's useful for simple, intermediate, tables.

My use case (from the app I'm working on right now) is the following: we have a "Mass Invite" functionality with Invitation(user_id,event_id,rsvp:bool,disposalbe:bool) model, and when performing this action we don't want to check that a user has already been invited but want to update invitation properties.

With the current solution we have the following query:

Invitation.pg_batch_insert(
  columns,
  values,
  on_conflict:
    "(user_id, event_id) DO UPDATE " \
    "SET disposable = (events.disposable AND EXCLUDED.disposable), " \
    "rsvp = (events.rsvp OR EXCLUDED.rsvp)",
  returning: "user_id, (xmax = '0') as inserted"
)
@boblail

This comment has been minimized.

Copy link
Contributor Author

boblail commented Mar 14, 2019

Yeah, UPSERT could also be used to tally records like:

create_table :exception_reports, force: true do |t|
  t.references :project
  t.string  :fingerprint
  t.integer :count, default: 1
  # ...

  t.index [:project_id, :fingerprint], unique: true
end
ExceptionReport.upsert(new_report, update: "count = count + 1")

By giving count a default value of 1, a new report is "seen" once. From there, we increment count on upsert every time an exception is seen again.

@brandoncc

This comment has been minimized.

Copy link
Contributor

brandoncc commented Mar 17, 2019

I'm really excited to see this be built into Rails, nice job @boblail! I currently use https://github.com/zdennis/activerecord-import in a couple of projects and have needed to provide raw sql for the update logic. I second @palkan's idea for that ability.

@simi

This comment has been minimized.

Copy link
Contributor

simi commented Mar 17, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.