Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find_or_create_by: handle race condition by finding again #45720

Merged
merged 1 commit into from Aug 2, 2022

Conversation

casperisfine
Copy link
Contributor

@casperisfine casperisfine commented Aug 1, 2022

Using create_or_find_by in codepaths where most of the time the record already exist is wasteful on several accounts.

create_or_find_by should be the method to use when most of the time the record doesn't already exist, not a race condition safe version of find_or_create_by.

To make find_or_create_by race-condition free, we can search the record again if the creation failed because of an unicity constraint.

This PR was sparked by a recent comment from @jhawthorn.

Apparently this change was considered in #35543 by @matthewd, and #35633 was opened, but it went stale.

Ref: #35543
Ref: #35573
Close: #44885

NB: I have no idea how we could unit test the race-condition. Ideas welcome.

@simi
Copy link
Contributor

simi commented Aug 1, 2022

NB: I have no idea how we could unit test the race-condition. Ideas welcome.

I have seen tests in AR using Concurrent::CountDownLatch.new for similar purpose. I can try to craft some initial test for this if welcomed.

@casperisfine
Copy link
Contributor Author

@simi thanks, but my problem is not so much the construct to use to synchronize threads (a simple Monitor would do), but how to "inject" ourself in the middle of that find_by || create_or_find_by combo.

I guess we could write a test that inline the method to make it easier, but then it's not longer really testing the method, and it more or less a copy of the create_or_find_by unit test.

@simi
Copy link
Contributor

simi commented Aug 1, 2022

Usually I do subclass in test and hijack original method definition, but indeed that is not testing the original method. Other idea would be to somehow intercept the find_by query, but the is no known way to me of doing that with public API.

@casperisfine
Copy link
Contributor Author

Yeah, I guess mocking find_by to return nil on first call and something on the second call is probably the best way to test this.

Far from perfect, but good enough.

@casperisfine casperisfine force-pushed the find-or-create-or-find-by branch 3 times, most recently from 80d1566 to 3fdc867 Compare August 1, 2022 13:45
@casperisfine
Copy link
Contributor Author

Ok, I added a mock based test that hopefully shouldn't be too fragile.

@@ -161,31 +161,27 @@ def first_or_initialize(attributes = nil, &block) # :nodoc:
# failed due to validation errors it won't be persisted, you get what
# #create returns in such situation.
#
# Please note <b>this method is not atomic</b>, it runs first a SELECT, and if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this warning is probably worth keeping... if you don't have a uniqueness constraint defined, it will mostly seem to work but have the described race.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, I don't think it should be kept as is, but yes, rather than to remove it I should rewod it to say that if you don't have a unique constaint, you'll end up with two records.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I reworded the documentation, let me know what you think.

@skipkayhil
Copy link
Member

Also related: #44885

@casperisfine
Copy link
Contributor Author

Credited @alexcameron89 as co-author because ultimately our PRs are pretty much identical and there was no reason #35633 went stale.

Using `create_or_find_by` in codepaths where most of the time
the record already exist is wasteful on several accounts.

`create_or_find_by` should be the method to use when most of the
time the record doesn't already exist, not a race condition safe
version of `find_or_create_by`.

To make `find_or_create_by` race-condition free, we can search
the record again if the creation failed because of an unicity
constraint.

Co-Authored-By: Alex Kitchens <alexcameron98@gmail.com>
@byroot byroot merged commit 408d061 into rails:main Aug 2, 2022
@jhawthorn
Copy link
Member

I like this.

My one reservation is that it could change user's expectations around callbacks or especially the block passed to find_or_create_by if they have side-effects. Previously that code being run meant that either you would commit the changes or would see an error. Now it's possible for the block to run, the changes in it not be committed, but this call returning successfully.

That reservation aside (which is really and edge case, maybe we should document this possibility), strong 👍

@stanhu
Copy link
Contributor

stanhu commented Feb 13, 2024

This pull request can introduce subtransactions in PostgreSQL, which can cause significant performance issues. Please see #51052.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants