Using `create_or_find_by` led to primary keys running out #35543

alexcameron89 · 2019-03-08T22:43:47Z

Steps to reproduce

While this doesn't fully assert the issue, it shows that the primary key can grow tremendously without actually writing new rows.

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "rails", github: "rails/rails"
  gem "pg"
end

require "active_record"
require "minitest/autorun"
require "logger"

# This connection will do for database-independent bug reports.
ActiveRecord::Base.establish_connection(adapter: "postgresql", database: "rails")
ActiveRecord::Base.logger = Logger.new(STDOUT)

ActiveRecord::Schema.define do
  create_table :posts, force: true do |t|
  end

  create_table :comments, force: true do |t|
    t.integer :post_id
  end

  add_index :comments, :post_id, unique: true
end

class Post < ActiveRecord::Base
  has_many :comments
end

class Comment < ActiveRecord::Base
  belongs_to :post
end

class BugTest < Minitest::Test

  # PROBLEM
  # When create fails, the autoincremented id does not roll back
  def test_primary_key_growth
    post_one = Post.create!

    comment_for_post_one = Comment.create_or_find_by(post: post_one)

    # Create fails for Post 1's second comment and it finds the same record for
    # first_comment_for_post_one
    1000.times do
      second_comment_for_post_one = Comment.create_or_find_by(post: post_one)
      assert_equal second_comment_for_post_one, comment_for_post_one
    end

    post_two = Post.create!
    # This fails
    # Failure:
    #   BugTest#test_primary_key_growth [create_or_find_by.rb:56]:
    #   Expected: 2
    #     Actual: #<Comment id: 1002, post_id: 2>
    comment_for_post_two = Comment.create_or_find_by(post: post_two)
    assert_equal comment_for_post_one.id + 1, comment_for_post_two.id
  end

   # This is nothing new, as the same thing can happen with `create`
  def test_create
    post_one = Post.create!

    comment_for_post_one = Comment.create(post: post_one)

    1000.times do
      assert_raises ActiveRecord::RecordNotUnique do
        second_comment_for_post_one = Comment.create(post: post_one)
      end
    end

    post_two = Post.create!
    comment_for_post_two = Comment.create(post: post_two)
    assert_equal comment_for_post_one.id + 1, comment_for_post_two.id
  end
end

Problem

Each time create_or_find_by hits ActiveRecord::RecordNotUnique, it rolls back, but the auto-increment does not. The fact that auto-increment does not roll back seems to be expected behavior from PostgreSQL.

Still, the fact that this happens with create_or_find_by led to one of our database tables running out of primary keys in production, preventing the creation of any new rows.

System configuration

Rails version:
6.0

Ruby version:
2.5.3

The text was updated successfully, but these errors were encountered:

kaspth · 2019-03-09T18:59:59Z

Hey @alexcameron89 👋, glad to have you back again 😄

I'm unsure what we can really do about this. Just for completeness, is this also a problem with the old find_or_create_by with a rescue ActiveRecord::RecordNotUnique?

alexcameron89 · 2019-03-11T17:05:39Z

Thank you @kaspth, I'm glad to be back! 😄

Just for completeness, is this also a problem with the old find_or_create_by with a rescue ActiveRecord::RecordNotUnique

Yes, this will happen in the case of the data race of find_or_create_by, and in general any time a create fails at the database level. But given that it's a data race issue in find_or_create_by, the majority of uses cases are likely to be a successful find or create, leaving the failed create and incrementing primary key issue with a much smaller footprint.

My concern with create_or_find_by is that the growing primary key issue is brought front and center because failing at the database level is promoted. In the best case, create succeeds more often than not, and a growing primary key is not much of an issue. In the worst case, create fails often because the item already exists, leading to the issue we had.

I can already see that create_or_find_by may be seen as a possible jump in performance over find_or_create_by for some people, and it may be adopted without thought or knowledge of consequences of this issue that they may hit.

All in all, I feel that create_or_find_by has a dangerous side effect that can lead to large production issues. I think it might be worth considering removing the method or replacing it with a safer method (I don't know what that might be). For instance, is there much more gain from create_or_find_by than doing the retry logic that was documented before? Could that become its own method that users don't have to re-create in their applications?

kaspth · 2019-03-11T19:33:33Z

All in all, I feel that create_or_find_by has a dangerous side effect that can lead to large production issues.

But only on Postgres, correct? Or did SQLite and MySQL auto-increment the primary key without rolling back too? At the very least we should mention this in the documentation.

I'm not entirely sure an outright removal is the right course of action. This seems somewhat par for the course when we're talking uniqueness and Postgres. E.g. this could happen with a failed uniqueness validation that's retried as you said, create_or_find_by just exacerbates the issue. How did you work around this in your app? Was it enough to go back to find_or_create_by + rescue?

@dhh what's your take on this? As I understand it, create_or_find_by has been a fine extraction for Basecamp 3.

dhh · 2019-03-11T19:57:43Z

It's certainly good to point out this gotcha in the documentation, at the very least. We use MySQL at Basecamp and are happy users of this method in production.

@alexcameron98 Can you share your concerns with ID auto-incremental growth in general? We're defaulting id columns to bigint, so we have room for 9,223,372,036,854,775,807 values in pgsql. Doesn't seem so likely that this method is going to make scarcely anyone run out of room? Even if you called this method 10 million times per day, it would take 2.5 billion years to run out of IDs 😄

kaspth · 2019-03-11T20:18:30Z

it would take 2.5 billion years to run out of IDs

The only logical explanation is that @alexcameron89 is speaking to us from 2.5 billion years in the future! 😄

@alexcameron89 I take it your column isn't a big int?

alexcameron89 · 2019-03-11T20:59:57Z

Column Type: Int

I take it your column isn't a big int?

That's a good callout, the table that was affected had int as primary key. In reality, this is likely to only affect int tables. But given that bigint didn't become a default until 5.0+, there's a high likelihood that this method will be used on many tables with int primary keys, and that's a little concerning.

Other affected DB's

Or did SQLite and MySQL auto-increment the primary key without rolling back too? At the very least we should mention this in the documentation.

It looks like Mysql is affected by it (script), but SQLite is not (script).

Documentation

It's certainly good to point out this gotcha in the documentation, at the very least.

I'll push up a PR with a callout in the documentation.

The Future

The only logical explanation is that @alexcameron89 is speaking to us from 2.5 billion years in the future! 😄

It's true!bigbigbigint is the new hip thing these days. 😄 🤖

dhh · 2019-03-11T21:04:08Z

Whatever PR for doc changes should recognize that you're only to be alarmed if you're using int for your pks. And if you are, you should be concerned whether you run this method or not! We ran out of ints a while back at Basecamp. Not a fun time. So you carrying a time bomb in your app if you keep pks as ints.

…

On Mon, Mar 11, 2019 at 2:01 PM Alex Kitchens ***@***.***> wrote: *Column Type: Int* I take it your column isn't a big int? That's a good callout, the table that was affected had int as primary key. In reality, this is likely to only affect int tables. But given that bigint didn't become a default until 5.0+, there's a high likelihood that this method will be used on many tables with int primary keys, and that's a little concerning. *Other affected DB's* Or did SQLite and MySQL auto-increment the primary key without rolling back too? At the very least we should mention this in the documentation. It looks like Mysql is affected by it (script <https://gist.github.com/401fd48d0f21fcf7262c4d20422efee5>), but SQLite is not (script <https://gist.github.com/alexcameron89/4da7d4be2a8380a74f9d0708cc66d275>). *Documentation* It's certainly good to point out this gotcha in the documentation, at the very least. I'll push up a PR with a callout in the documentation. *The Future* The only logical explanation is that @alexcameron89 <https://github.com/alexcameron89> is speaking to us from 2.5 billion years in the future! 😄 It's true!bigbigbigint is the new hip thing these days. 😄 🤖 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#35543 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAKtWQJjPUPXtFEUR3s6xaKARPGjgbEks5vVsQfgaJpZM4bmQjM> .

kaspth · 2019-03-11T21:17:58Z

Perhaps we should deprecate having int for primary keys in Rails 6? Not sure about all the ramifications, but it could be good to something here. Or we could add a task to help generate migrations for it.

alexcameron89 · 2019-03-11T21:35:13Z

Perhaps we should deprecate having int for primary keys in Rails 6? Not sure about all the ramifications, but it could be good to something here. Or we could add a task to help generate migrations for it.

I think that's a really interesting idea. 👍

soulcutter · 2019-03-11T22:12:00Z

We experienced an outage from create_or_find_by with a table on the order of a few million rows - big, but not growing at a rate that we'd have expected to exhaust the sequence. We had assumed the sequence roughly corresponded to the table size. Even then migrating from an int column to a bigint took a fair amount of downtime. Is anybody monitoring their database sequences? (I doubt it)

It's true that int PKs are a timebomb, but this method accelerates the clock significantly. Running out of ints with a table of a few million wasn't expected. The sequence basically becomes a query count.

I've come around to the idea that find_or_create_or_find_by is a better behavior in our case (though I'd never suggest rails having that method).

I'm dubious of the general-purpose utility of create_or_find_by after having experienced that outage. There are already 4 bullet point drawbacks listed in the documentation for this method, this would be a 5th. Better to warn people than have them discover it "naturally" as we did.

dhh · 2019-03-11T22:16:04Z

Bradley, as discussed here, whether it’s a query count or not doesn’t really matter when you have bigint columns. We use the method at Basecamp and it’s a great fit under the constraints detailed. If the race conditions it was designed for isn’t a problem for you, just use find_or_create_by 👍

…

On Mar 11, 2019, at 18:13, Bradley Schaefer ***@***.***> wrote: We experienced an outage from create_or_find_by with a table on the order of a few million rows - big, but not growing at a rate that we'd have expected to exhaust the sequence. We had assumed the sequence roughly corresponded to the table size. Even then migrating from an int column to a bigint took a fair amount of downtime. Is anybody monitoring their database sequences? (I doubt it) It's true that int PKs are a timebomb, but this method accelerates the clock significantly. Running out of ints with a table of a few million wasn't expected. The sequence basically becomes a query count. I've come around to the idea that find_or_create_or_find_by is a better behavior in our case (though I'd never suggest rails having that method). I'm dubious of the general-purpose utility of create_or_find_by after having experienced that outage. There are already 4 bullet point drawbacks listed in the documentation for this method, this would be a 5th. Better to warn people than have them discover it "naturally" as we did. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

soulcutter · 2019-03-11T23:51:12Z

The race condition it was designed for IS a problem for me, but find_or_create_or_find is a less-surprising solution. The race condition was a low-grade annoyance in our application's BugSnag, and so we applied create_or_find_by thinking it's no big deal. Months later we had an outage over an hour long because of this decision.

It's true it doesn't matter as much with bigint columns. Migrating to bigint columns can take a fair amount of work depending on your system - we took out service offline to isolate the database and ran the migration and it took about 45 minutes against a few million rows - it's rewriting the table and indexes which is expensive the naive way. Zero downtime migrations take a bit of prep that you may not be prepared for if you're thinking you're not going to run out of ids because your table is a few orders of magnitude smaller than the limit.

I'm happy it works for you in Basecamp. The people that will hit this are likely to be those upgrading pre-existing apps that have not converted everything to bigints.

dhh · 2019-03-12T01:25:38Z

Yeah. Hitting the int ceiling is nasty. Sorry that happened to you. We should definitely document that using this method can hasten that, and suggest that people only use it once they’re on bigint. On board with that part.

…

On Mar 11, 2019, at 19:52, Bradley Schaefer ***@***.***> wrote: The race condition it was designed for IS a problem for me, but find_or_create_or_find is a less-surprising solution. The race condition was a low-grade annoyance in our application's BugSnag, and so we applied create_or_find_by thinking it's no big deal. Months later we had an outage over an hour long because of this decision. It's true it doesn't matter as much with bigint columns. Migrating to bigint columns can take a fair amount of work depending on your system - we took out service offline to isolate the database and ran the migration and it took about 45 minutes against a few million rows - it's rewriting the table and indexes which is expensive the naive way. Zero downtime migrations take a bit of prep that you may not be prepared for if you're thinking you're not going to run out of ids because your table is a few orders of magnitude smaller than the limit. I'm happy it works for you in Basecamp. The people that will hit this are likely to be those upgrading pre-existing apps that have not converted everything to bigints. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

matthewd · 2019-03-12T01:57:17Z

I've come around to the idea that find_or_create_or_find_by is a better behavior in our case (though I'd never suggest rails having that method).

I'd like to consider making that the behaviour of find_or_create_by, if you're interested in PRing it.

To me the ideal would be that they both have similar[ly rare] failure scenarios, and the choice of find_or_create_by vs create_or_find_by would boil down to whether you, on average, expect the row to exist.. and thus which one is more likely to save you a query.

JoshCheek · 2019-03-12T14:50:28Z

Was looking at this, in terms of PostgreSQL, there isn't an obvious way to change anything. Figure I should post the experiment in case any others find value in it.

require 'pg'

db = PG.connect dbname: 'josh_testing'
%I[exec exec_params].each do |name|
  db.define_singleton_method name do |*args, &block|
    super(*args, &block).to_a
  rescue
    $!.set_backtrace caller.drop(1)
    raise
  end
end

db.exec <<~SQL
  drop table if exists omghi;
  create table if not exists omghi (
    id serial primary key,
    val text unique
  )
SQL

insert = -> val {
  db.exec_params <<~SQL, [val]
    insert into omghi (val) values ($1)
    returning *
  SQL
}

# insert value
insert['a']                 # => [{"id"=>"1", "val"=>"a"}]

# failing to insert increments the sequence
insert['a'] rescue $!.class # => PG::UniqueViolation
insert['b']                 # => [{"id"=>"3", "val"=>"b"}]

# an explicit transaction changes nothing
db.exec 'begin transaction'
insert['a'] rescue $!.class # => PG::UniqueViolation
db.exec 'rollback transaction'
insert['c']                 # => [{"id"=>"5", "val"=>"c"}]

# "on conflict do nothing" changes nothing
db.exec <<~SQL
  insert into omghi (val) values ('a')
  on conflict do nothing
  returning *
SQL
insert['d']                 # => [{"id"=>"7", "val"=>"d"}]

# the db's values
db.exec('select * from omghi').to_a
                            # => [{"id"=>"1", "val"=>"a"},
                            #     {"id"=>"3", "val"=>"b"},
                            #     {"id"=>"5", "val"=>"c"},
                            #     {"id"=>"7", "val"=>"d"}]

apetrov · 2019-03-12T16:11:15Z

actually, there is a quick workaround for int to bigint migration in postgres:

DROP TABLE IF EXISTS users_temp;
START TRANSACTION;
LOCK TABLE users IN ACCESS EXCLUSIVE mode;
SELECT * INTO users_temp FROM users;
TRUNCATE users;
ALTER TABLE users ALTER COLUMN id type bigint;
INSERT INTO users SELECT * FROM users_temp;
DROP TABLE users_temp;
COMMIT; -- to release lock

Takes seconds on a table with 1M records.

tylerj · 2019-03-12T16:14:13Z

I'm surprised by the unwillingness to recognize the problem that this issue is attempting to address. While bigint columns certainly pushes the ultimate breaking problem down the road (billions of years, even), it doesn't make the actual problem go away (unnecessary skipping of unused primary key values).

In my opinion, it'd be similar to saying:

You think it's a problem that too many straws are being opened and discarded unnecessarily before being used, and you're upset that the lake behind your house is getting filled with straws? I'm sorry you're experiencing this, but the lake behind my house got filled a long time ago, so I understand how you feel! Just start discarding your unused straws into the ocean, it's much bigger than your lake and you won't notice they're there until you're LONG gone!

While I'm probably more OCD than most about missing chunks of primary keys in my tables, I think a quick education in the docs around the scenarios in which create_or_find_by should be used instead of find_or_create_by would be appropriate, given the potential drawbacks. Rails is awesome for its accessibility to newer developers, and I would be concerned about create_or_find_by being accidentally misused when find_or_create_by would be more appropriate (especially if it were updated to behave like find_or_create_or_find_by, as has been suggested).

dhh · 2019-03-12T17:03:55Z

The "problem" is recognized. There'll be a note stating that this is the behavior, which matters if you're on an int pk. But if you're on bigint, then it won't matter, unless you for some bizarre reason considers incrementing a counter the same as littering. And if that's the case, you are, as in all other cases, utterly free not to use this feature. I for one do not have any sentimentality around integer sequence gaps, and for our use, I'm OK with having the limited runway of a few thousand billions of years of runway for our applications. I mean, the sun is going to explore in a mere 5 billion years...

…

On Tue, Mar 12, 2019 at 12:19 PM Tyler Johnson ***@***.***> wrote: I'm surprised by the unwillingness to recognize the problem that this issue is attempting to address. While bigint columns certainly pushes the ultimate breaking problem down the road (billions of years, even), it doesn't make the actual problem go away (unnecessary skipping of unused primary key values). In my opinion, it'd be similar to saying: You think it's a problem that too many straws are being opened and discarded unnecessarily before being used, and you're upset that the lake behind your house is getting filled with straws? I'm sorry you're experiencing this, but the lake behind my house got filled a long time ago, so I understand how you feel! Just start discarding your unused straws into the ocean, it's much bigger than your lake and you won't notice they're there until you're LONG gone! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#35543 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAKtfsj10zSTLKKGZgREYR2ocJbGNTcks5vV9OggaJpZM4bmQjM> .

tylerj · 2019-03-12T17:15:45Z

But if you're on bigint, then it won't matter, unless you for some bizarre reason considers incrementing a counter the same as littering.

I do consider it unnecessary litter in my db. Maybe I'm the anomaly, and if so, I apologize for the inaccurate comparison.

And if that's the case, you are, as in all other cases, utterly free not to use this feature.

In the case I'm not the anomaly, my concern isn't that I would use the feature inappropriately, but that countless others might; and that they'd eventually be upset at their unintentional littering of their own db.

This commit addresses the issue in rails#35543 by making note of the growing primary key issue with `create_or_find_by`.

alexcameron89 · 2019-03-13T01:07:13Z

I've come around to the idea that find_or_create_or_find_by is a better behavior in our case (though I'd never suggest rails having that method).

I'd like to consider making that the behaviour of find_or_create_by, if you're interested in PRing it.

@matthewd I'm working on this and trying to find a good way to test it.

This commit addresses the issue in #35543 by making note of the growing primary key issue with `create_or_find_by`.

alexcameron89 · 2019-03-15T14:46:36Z

Since the mitigation to this issue was decided to be documenting the issue, and since that was merged in #35573, I'll close this issue.

Updates `#create_or_find_by` to accept an optional `options` key to house a new `find_first` flag. This flag specifies that the method runs a `#find_by` before attempting the create. This can help mitigate a few potential issues: * (Impact) Description * (Medium) Reduces write load. In most cases, this method is useful to create a record once with all subsequent calls using `#find_by` as a fallback. The proactive `#find_by` can skip the create/rescue loop and mostly rely on the initial fetch. * (Low) Reduces the risk of unneccesary PK increments when an existing record exists (ie. some versions of MySQL increment the PK on INSERT, regardless of whether or not it succeeds). References rails#35543 [Previous implementation](rails#35633)

alexcameron89 mentioned this issue Mar 11, 2019

Document int Primary Key issue with create_or_find_by #35573

Merged

alexcameron89 added a commit to alexcameron89/rails that referenced this issue Mar 13, 2019

Document int Primary Key with create_or_find_by

aa30b07

This commit addresses the issue in rails#35543 by making note of the growing primary key issue with `create_or_find_by`.

dhh pushed a commit that referenced this issue Mar 13, 2019

Document int Primary Key with create_or_find_by (#35573)

18ff602

This commit addresses the issue in #35543 by making note of the growing primary key issue with `create_or_find_by`.

alexcameron89 closed this as completed Mar 15, 2019

alexcameron89 mentioned this issue Mar 15, 2019

Update find or create by #35633

Closed

ricksuggs mentioned this issue Oct 18, 2019

create_or_find_by doesn't attempt to find the record when you also have validates_uniqueness_of :field #36027

Closed

ChanChar mentioned this issue Apr 13, 2022

#create_or_find_by: find_first option #44885

Closed

casperisfine mentioned this issue Aug 1, 2022

find_or_create_by: handle race condition by finding again #45720

Merged

brendanshean mentioned this issue Dec 14, 2022

Fix checkbox callback race condition empirical-org/Empirical-Core#10068

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using `create_or_find_by` led to primary keys running out #35543

Using `create_or_find_by` led to primary keys running out #35543

alexcameron89 commented Mar 8, 2019 •

edited

Loading

kaspth commented Mar 9, 2019

alexcameron89 commented Mar 11, 2019 •

edited

Loading

kaspth commented Mar 11, 2019

dhh commented Mar 11, 2019

kaspth commented Mar 11, 2019

alexcameron89 commented Mar 11, 2019

dhh commented Mar 11, 2019 via email

kaspth commented Mar 11, 2019

alexcameron89 commented Mar 11, 2019

soulcutter commented Mar 11, 2019

dhh commented Mar 11, 2019 via email

soulcutter commented Mar 11, 2019

dhh commented Mar 12, 2019 via email

matthewd commented Mar 12, 2019

JoshCheek commented Mar 12, 2019

apetrov commented Mar 12, 2019

tylerj commented Mar 12, 2019 •

edited

Loading

dhh commented Mar 12, 2019 via email

tylerj commented Mar 12, 2019

alexcameron89 commented Mar 13, 2019

alexcameron89 commented Mar 15, 2019

Using create_or_find_by led to primary keys running out #35543

Using create_or_find_by led to primary keys running out #35543

Comments

alexcameron89 commented Mar 8, 2019 • edited Loading

Steps to reproduce

Problem

System configuration

kaspth commented Mar 9, 2019

alexcameron89 commented Mar 11, 2019 • edited Loading

kaspth commented Mar 11, 2019

dhh commented Mar 11, 2019

kaspth commented Mar 11, 2019

alexcameron89 commented Mar 11, 2019

dhh commented Mar 11, 2019 via email

kaspth commented Mar 11, 2019

alexcameron89 commented Mar 11, 2019

soulcutter commented Mar 11, 2019

dhh commented Mar 11, 2019 via email

soulcutter commented Mar 11, 2019

dhh commented Mar 12, 2019 via email

matthewd commented Mar 12, 2019

JoshCheek commented Mar 12, 2019

apetrov commented Mar 12, 2019

tylerj commented Mar 12, 2019 • edited Loading

dhh commented Mar 12, 2019 via email

tylerj commented Mar 12, 2019

alexcameron89 commented Mar 13, 2019

alexcameron89 commented Mar 15, 2019

Using `create_or_find_by` led to primary keys running out #35543

Using `create_or_find_by` led to primary keys running out #35543

alexcameron89 commented Mar 8, 2019 •

edited

Loading

alexcameron89 commented Mar 11, 2019 •

edited

Loading

tylerj commented Mar 12, 2019 •

edited

Loading