Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using create_or_find_by led to primary keys running out #35543

Closed
alexcameron89 opened this issue Mar 8, 2019 · 21 comments
Closed

Using create_or_find_by led to primary keys running out #35543

alexcameron89 opened this issue Mar 8, 2019 · 21 comments

Comments

@alexcameron89
Copy link
Member

alexcameron89 commented Mar 8, 2019

Steps to reproduce

While this doesn't fully assert the issue, it shows that the primary key can grow tremendously without actually writing new rows.

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "rails", github: "rails/rails"
  gem "pg"
end

require "active_record"
require "minitest/autorun"
require "logger"

# This connection will do for database-independent bug reports.
ActiveRecord::Base.establish_connection(adapter: "postgresql", database: "rails")
ActiveRecord::Base.logger = Logger.new(STDOUT)

ActiveRecord::Schema.define do
  create_table :posts, force: true do |t|
  end

  create_table :comments, force: true do |t|
    t.integer :post_id
  end

  add_index :comments, :post_id, unique: true
end

class Post < ActiveRecord::Base
  has_many :comments
end

class Comment < ActiveRecord::Base
  belongs_to :post
end

class BugTest < Minitest::Test

  # PROBLEM
  # When create fails, the autoincremented id does not roll back
  def test_primary_key_growth
    post_one = Post.create!

    comment_for_post_one = Comment.create_or_find_by(post: post_one)

    # Create fails for Post 1's second comment and it finds the same record for
    # first_comment_for_post_one
    1000.times do
      second_comment_for_post_one = Comment.create_or_find_by(post: post_one)
      assert_equal second_comment_for_post_one, comment_for_post_one
    end

    post_two = Post.create!
    # This fails
    # Failure:
    #   BugTest#test_primary_key_growth [create_or_find_by.rb:56]:
    #   Expected: 2
    #     Actual: #<Comment id: 1002, post_id: 2>
    comment_for_post_two = Comment.create_or_find_by(post: post_two)
    assert_equal comment_for_post_one.id + 1, comment_for_post_two.id
  end

   # This is nothing new, as the same thing can happen with `create`
  def test_create
    post_one = Post.create!

    comment_for_post_one = Comment.create(post: post_one)

    1000.times do
      assert_raises ActiveRecord::RecordNotUnique do
        second_comment_for_post_one = Comment.create(post: post_one)
      end
    end

    post_two = Post.create!
    comment_for_post_two = Comment.create(post: post_two)
    assert_equal comment_for_post_one.id + 1, comment_for_post_two.id
  end
end

Problem

Each time create_or_find_by hits ActiveRecord::RecordNotUnique, it rolls back, but the auto-increment does not. The fact that auto-increment does not roll back seems to be expected behavior from PostgreSQL.

Still, the fact that this happens with create_or_find_by led to one of our database tables running out of primary keys in production, preventing the creation of any new rows.

System configuration

Rails version:
6.0

Ruby version:
2.5.3

@kaspth
Copy link
Contributor

kaspth commented Mar 9, 2019

Hey @alexcameron89 👋, glad to have you back again 😄

I'm unsure what we can really do about this. Just for completeness, is this also a problem with the old find_or_create_by with a rescue ActiveRecord::RecordNotUnique?

@alexcameron89
Copy link
Member Author

alexcameron89 commented Mar 11, 2019

Thank you @kaspth, I'm glad to be back! 😄

Just for completeness, is this also a problem with the old find_or_create_by with a rescue ActiveRecord::RecordNotUnique

Yes, this will happen in the case of the data race of find_or_create_by, and in general any time a create fails at the database level. But given that it's a data race issue in find_or_create_by, the majority of uses cases are likely to be a successful find or create, leaving the failed create and incrementing primary key issue with a much smaller footprint.

My concern with create_or_find_by is that the growing primary key issue is brought front and center because failing at the database level is promoted. In the best case, create succeeds more often than not, and a growing primary key is not much of an issue. In the worst case, create fails often because the item already exists, leading to the issue we had.

graph

I can already see that create_or_find_by may be seen as a possible jump in performance over find_or_create_by for some people, and it may be adopted without thought or knowledge of consequences of this issue that they may hit.

All in all, I feel that create_or_find_by has a dangerous side effect that can lead to large production issues. I think it might be worth considering removing the method or replacing it with a safer method (I don't know what that might be). For instance, is there much more gain from create_or_find_by than doing the retry logic that was documented before? Could that become its own method that users don't have to re-create in their applications?

@kaspth
Copy link
Contributor

kaspth commented Mar 11, 2019

All in all, I feel that create_or_find_by has a dangerous side effect that can lead to large production issues.

But only on Postgres, correct? Or did SQLite and MySQL auto-increment the primary key without rolling back too? At the very least we should mention this in the documentation.

I'm not entirely sure an outright removal is the right course of action. This seems somewhat par for the course when we're talking uniqueness and Postgres. E.g. this could happen with a failed uniqueness validation that's retried as you said, create_or_find_by just exacerbates the issue. How did you work around this in your app? Was it enough to go back to find_or_create_by + rescue?

@dhh what's your take on this? As I understand it, create_or_find_by has been a fine extraction for Basecamp 3.

@dhh
Copy link
Member

dhh commented Mar 11, 2019

It's certainly good to point out this gotcha in the documentation, at the very least. We use MySQL at Basecamp and are happy users of this method in production.

@alexcameron98 Can you share your concerns with ID auto-incremental growth in general? We're defaulting id columns to bigint, so we have room for 9,223,372,036,854,775,807 values in pgsql. Doesn't seem so likely that this method is going to make scarcely anyone run out of room? Even if you called this method 10 million times per day, it would take 2.5 billion years to run out of IDs 😄

@kaspth
Copy link
Contributor

kaspth commented Mar 11, 2019

it would take 2.5 billion years to run out of IDs

The only logical explanation is that @alexcameron89 is speaking to us from 2.5 billion years in the future! 😄

@alexcameron89 I take it your column isn't a big int?

@alexcameron89
Copy link
Member Author

Column Type: Int

I take it your column isn't a big int?

That's a good callout, the table that was affected had int as primary key. In reality, this is likely to only affect int tables. But given that bigint didn't become a default until 5.0+, there's a high likelihood that this method will be used on many tables with int primary keys, and that's a little concerning.

Other affected DB's

Or did SQLite and MySQL auto-increment the primary key without rolling back too? At the very least we should mention this in the documentation.

It looks like Mysql is affected by it (script), but SQLite is not (script).

Documentation

It's certainly good to point out this gotcha in the documentation, at the very least.

I'll push up a PR with a callout in the documentation.

The Future

The only logical explanation is that @alexcameron89 is speaking to us from 2.5 billion years in the future! 😄

It's true!bigbigbigint is the new hip thing these days. 😄 🤖

@dhh
Copy link
Member

dhh commented Mar 11, 2019 via email

@kaspth
Copy link
Contributor

kaspth commented Mar 11, 2019

Perhaps we should deprecate having int for primary keys in Rails 6? Not sure about all the ramifications, but it could be good to something here. Or we could add a task to help generate migrations for it.

@alexcameron89
Copy link
Member Author

Perhaps we should deprecate having int for primary keys in Rails 6? Not sure about all the ramifications, but it could be good to something here. Or we could add a task to help generate migrations for it.

I think that's a really interesting idea. 👍

@soulcutter
Copy link

We experienced an outage from create_or_find_by with a table on the order of a few million rows - big, but not growing at a rate that we'd have expected to exhaust the sequence. We had assumed the sequence roughly corresponded to the table size. Even then migrating from an int column to a bigint took a fair amount of downtime. Is anybody monitoring their database sequences? (I doubt it)

It's true that int PKs are a timebomb, but this method accelerates the clock significantly. Running out of ints with a table of a few million wasn't expected. The sequence basically becomes a query count.

I've come around to the idea that find_or_create_or_find_by is a better behavior in our case (though I'd never suggest rails having that method).

I'm dubious of the general-purpose utility of create_or_find_by after having experienced that outage. There are already 4 bullet point drawbacks listed in the documentation for this method, this would be a 5th. Better to warn people than have them discover it "naturally" as we did.

@dhh
Copy link
Member

dhh commented Mar 11, 2019 via email

@soulcutter
Copy link

The race condition it was designed for IS a problem for me, but find_or_create_or_find is a less-surprising solution. The race condition was a low-grade annoyance in our application's BugSnag, and so we applied create_or_find_by thinking it's no big deal. Months later we had an outage over an hour long because of this decision.

It's true it doesn't matter as much with bigint columns. Migrating to bigint columns can take a fair amount of work depending on your system - we took out service offline to isolate the database and ran the migration and it took about 45 minutes against a few million rows - it's rewriting the table and indexes which is expensive the naive way. Zero downtime migrations take a bit of prep that you may not be prepared for if you're thinking you're not going to run out of ids because your table is a few orders of magnitude smaller than the limit.

I'm happy it works for you in Basecamp. The people that will hit this are likely to be those upgrading pre-existing apps that have not converted everything to bigints.

@dhh
Copy link
Member

dhh commented Mar 12, 2019 via email

@matthewd
Copy link
Member

I've come around to the idea that find_or_create_or_find_by is a better behavior in our case (though I'd never suggest rails having that method).

I'd like to consider making that the behaviour of find_or_create_by, if you're interested in PRing it.

To me the ideal would be that they both have similar[ly rare] failure scenarios, and the choice of find_or_create_by vs create_or_find_by would boil down to whether you, on average, expect the row to exist.. and thus which one is more likely to save you a query.

@JoshCheek
Copy link
Contributor

Was looking at this, in terms of PostgreSQL, there isn't an obvious way to change anything. Figure I should post the experiment in case any others find value in it.

require 'pg'

db = PG.connect dbname: 'josh_testing'
%I[exec exec_params].each do |name|
  db.define_singleton_method name do |*args, &block|
    super(*args, &block).to_a
  rescue
    $!.set_backtrace caller.drop(1)
    raise
  end
end

db.exec <<~SQL
  drop table if exists omghi;
  create table if not exists omghi (
    id serial primary key,
    val text unique
  )
SQL

insert = -> val {
  db.exec_params <<~SQL, [val]
    insert into omghi (val) values ($1)
    returning *
  SQL
}

# insert value
insert['a']                 # => [{"id"=>"1", "val"=>"a"}]

# failing to insert increments the sequence
insert['a'] rescue $!.class # => PG::UniqueViolation
insert['b']                 # => [{"id"=>"3", "val"=>"b"}]

# an explicit transaction changes nothing
db.exec 'begin transaction'
insert['a'] rescue $!.class # => PG::UniqueViolation
db.exec 'rollback transaction'
insert['c']                 # => [{"id"=>"5", "val"=>"c"}]

# "on conflict do nothing" changes nothing
db.exec <<~SQL
  insert into omghi (val) values ('a')
  on conflict do nothing
  returning *
SQL
insert['d']                 # => [{"id"=>"7", "val"=>"d"}]

# the db's values
db.exec('select * from omghi').to_a
                            # => [{"id"=>"1", "val"=>"a"},
                            #     {"id"=>"3", "val"=>"b"},
                            #     {"id"=>"5", "val"=>"c"},
                            #     {"id"=>"7", "val"=>"d"}]

@apetrov
Copy link

apetrov commented Mar 12, 2019

actually, there is a quick workaround for int to bigint migration in postgres:

DROP TABLE IF EXISTS users_temp;
START TRANSACTION;
LOCK TABLE users IN ACCESS EXCLUSIVE mode;
SELECT * INTO users_temp FROM users;
TRUNCATE users;
ALTER TABLE users ALTER COLUMN id type bigint;
INSERT INTO users SELECT * FROM users_temp;
DROP TABLE users_temp;
COMMIT; -- to release lock

Takes seconds on a table with 1M records.

@tylerj
Copy link

tylerj commented Mar 12, 2019

I'm surprised by the unwillingness to recognize the problem that this issue is attempting to address. While bigint columns certainly pushes the ultimate breaking problem down the road (billions of years, even), it doesn't make the actual problem go away (unnecessary skipping of unused primary key values).

In my opinion, it'd be similar to saying:

You think it's a problem that too many straws are being opened and discarded unnecessarily before being used, and you're upset that the lake behind your house is getting filled with straws? I'm sorry you're experiencing this, but the lake behind my house got filled a long time ago, so I understand how you feel! Just start discarding your unused straws into the ocean, it's much bigger than your lake and you won't notice they're there until you're LONG gone!

While I'm probably more OCD than most about missing chunks of primary keys in my tables, I think a quick education in the docs around the scenarios in which create_or_find_by should be used instead of find_or_create_by would be appropriate, given the potential drawbacks. Rails is awesome for its accessibility to newer developers, and I would be concerned about create_or_find_by being accidentally misused when find_or_create_by would be more appropriate (especially if it were updated to behave like find_or_create_or_find_by, as has been suggested).

@dhh
Copy link
Member

dhh commented Mar 12, 2019 via email

@tylerj
Copy link

tylerj commented Mar 12, 2019

But if you're on bigint, then it won't matter, unless you for some bizarre reason considers incrementing a counter the same as littering.

I do consider it unnecessary litter in my db. Maybe I'm the anomaly, and if so, I apologize for the inaccurate comparison.

And if that's the case, you are, as in all other cases, utterly free not to use this feature.

In the case I'm not the anomaly, my concern isn't that I would use the feature inappropriately, but that countless others might; and that they'd eventually be upset at their unintentional littering of their own db.

alexcameron89 added a commit to alexcameron89/rails that referenced this issue Mar 13, 2019
This commit addresses the issue in
rails#35543 by making note of the
growing primary key issue with `create_or_find_by`.
@alexcameron89
Copy link
Member Author

I've come around to the idea that find_or_create_or_find_by is a better behavior in our case (though I'd never suggest rails having that method).

I'd like to consider making that the behaviour of find_or_create_by, if you're interested in PRing it.

@matthewd I'm working on this and trying to find a good way to test it.

dhh pushed a commit that referenced this issue Mar 13, 2019
This commit addresses the issue in
#35543 by making note of the
growing primary key issue with `create_or_find_by`.
@alexcameron89
Copy link
Member Author

Since the mitigation to this issue was decided to be documenting the issue, and since that was merged in #35573, I'll close this issue.

ChanChar added a commit to ChanChar/rails that referenced this issue Apr 13, 2022
Updates `#create_or_find_by` to accept an optional `options` key to
house a new `find_first` flag. This flag specifies that the method runs
a `#find_by` before attempting the create.

This can help mitigate a few potential issues:
* (Impact) Description
* (Medium) Reduces write load. In most cases, this method is useful to
  create a record once with all subsequent calls using `#find_by` as a
fallback. The proactive `#find_by` can skip the create/rescue loop and
mostly rely on the initial fetch.
* (Low) Reduces the risk of unneccesary PK increments when an existing record
  exists (ie. some versions of MySQL increment the PK on INSERT,
regardless of whether or not it succeeds). References rails#35543

[Previous implementation](rails#35633)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants