Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance AR#insert_all/upsert_all options #35636

Open
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@palkan
Copy link
Contributor

commented Mar 15, 2019

Summary

Follow-up for #35077 (comment)

  • Added ability to use custom SQL in returning: option:
Article.insert_all(
 [
    {title: "Article 1", slug: "article-1", published: false},
    {title: "Article 2", slug: "article-2", published: false}
  ],
  # Some PostgreSQL magic here to detect which rows have been actually inserted
  returning: "id, (xmax = '0') as inserted, name as new_name"
)
  • Added new update_sql: option to specify SQL fragment to use when updating rows on conflict:
Book.upsert_all(
  [{ id: 1, status: 1 }, { id: 2, status: 1 }],
  update_sql: "status = GREATEST(books.status, EXCLUDED.status)"
)

Another example by
@boblail:

ExceptionReport.upsert(new_report, update_sql: "count = count + 1")

/cc @boblail @kaspth

@rails-bot rails-bot bot added the activerecord label Mar 15, 2019

@simi

This comment has been minimized.

Copy link
Contributor

commented Mar 16, 2019

Maybe we can unify API for update (also related to #35631).

What about:

# in my pull request
Post.upsert(row1, unique_by: { columns: [:slug]}, update: { columns: [:post, :updated_at]})
# in this pull request
Post.upsert(row1, unique_by: { columns: [:slug]}, update: { sql: "status = GREATEST(books.status, EXCLUDED.status)"})
@palkan

This comment has been minimized.

Copy link
Contributor Author

commented Mar 16, 2019

@simi Haven’t noticed your PR, looks good)

Having both update_columns and update_sql looks better to me than having nested options (from both API and implementation point of view).

Another option is to use a general update option and choose a strategy depending on the data type (array - columns, string - sql).

That’s seems to be more like an ActiveRecord way (‘cause we already have such kind of automatic inference in the API).

# inserted records. For databases that support <tt>INSERT ... RETURNING</tt>, this will default
# to returning the primary keys of the successfully inserted records. Pass
# <tt>returning: %w[ id name ]</tt> to return the id and name of every successfully inserted
# record or pass <tt>returning: false</tt> to omit the clause.
#
# You can also pass a string if you need more control on the return values
# (for example, <tt>returning: "id, name as new_name"</tt>).

This comment has been minimized.

Copy link
@boblail

boblail Mar 16, 2019

Contributor

Should we say <tt>returning: Arel.sql("id, name as new_name")</tt>?

This comment has been minimized.

Copy link
@boblail

boblail Mar 16, 2019

Contributor

🤔 Maybe it doesn't matter — or maybe we should raise if we're given a String that isn't explicitly declared "safe" like this.

This comment has been minimized.

Copy link
@simi

simi Mar 16, 2019

Contributor

I think it should behave the same as another methods allowing sql string. Post.joins('INNER JOIN posts p on 1 = 1') works on latest master.

Forcing Arel.sql to whitelist sql fragements in various methods should be done in consistent way all across whole ActiveRecord API IMHO.

This comment has been minimized.

Copy link
@palkan

palkan Mar 16, 2019

Author Contributor

I was thinking about whitelisting (Arel.sql) initially but while writing the implementation decided that it's unnecessary: as @simi mentioned, we have other methods accepting SQL strings, so, we do not force users to explicit wrapping everywhere; as far as I understand, the reason why we force Arel.sql in .order is that there is a high chance that it's used with user input (e.g. .order("name #{params[:sort_order]}")) and we want developers to be explicit in such dangerous cases.
On the other hand, both returning: and update_sql: are unlikely to be used with dynamic and user-generated string, so, there is no need in whitelisting.

@palkan palkan force-pushed the palkan:feat/ar-insert-all-enhance-options branch 3 times, most recently from e8d3e1d to 92bf3ab Mar 16, 2019

kaspth added a commit that referenced this pull request Mar 20, 2019

Remove the need for upsert/upsert_all
In light of #35636 we're trying to
extend upsert/upsert_all to handle both an array of columns or some
update SQL.

This tries to tackle this problem in a different by replacing upsert_all
with an upsert option on insert_all:

```ruby
Book.insert_all rows, upsert: true
Book.insert_all rows, upsert: %i( id name author_id )
Book.insert_all rows, upsert: "status = GREATEST(books.status, excluded.status)"
```

The latter two are not added in this PR, but that's how they'd look.

upsert_all still looks cleaner for the plain true case, so perhaps this
would be better:

```ruby
Book.upsert_all rows, only: %( id name author_id )
Book.upsert_all rows, with: "status = GREATEST(books.status, excluded.status)"
```

Though the minute differences between insert_all/insert_all! and
upsert_all are quite hard to distinguish. And perhaps a method fewer is
worth it for what is a rare call.
@azbshiri

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

Is this going to replace all gems related to bulk insert/update and hopefully less dependencies? @kaspth

@palkan

This comment has been minimized.

Copy link
Contributor Author

commented Mar 29, 2019

@azbshiri Not really (or not all of them). At least for now.
Some gems (e.g activerecord-import) also support validations and callbacks for mass inserts, which is not supported by current implementation.

@azbshiri

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2019

@palkan It's a good start though, thank you for your clarification.

palkan added some commits Mar 15, 2019

@palkan palkan force-pushed the palkan:feat/ar-insert-all-enhance-options branch from 92bf3ab to 2d1f7a4 Apr 7, 2019

@palkan

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2019

@kaspth Conflicts resolved. Anything else we need to do here?

@simi

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2019

Is initial comment with examples up to date as well @palkan?

@palkan

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2019

Is initial comment with examples up to date as well

Yep

@simi

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2019

I think update_sql should be renamed to update and in next insert/upsert iteration I can make it compatible with #35631 again. It would accept array of symbols as column names to update on conflict. We can do the same for returning as well.

@palkan

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2019

@simi I see.
I like the idea of separate only and with options more than the unified update. That sounds clearer to me (though increases the number of arguments–but that's an advanced feature, it couldn't be dead simple).

Let's wait for @kaspth.

@simi

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2019

Maybe we (all interested in upsert) can plan some quick chat to sync and finalize plan somewhere. Maybe IRC?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.