Duplicate Key Value problems by using history and parallel processing #180

Closed
gedankenstuecke opened this Issue Nov 8, 2011 · 6 comments

Projects

None yet

3 participants

@gedankenstuecke

Hi,
we're using friendly_id (4.0.0.beta14) with the history-feature enabled on our project, together with resque for managing queues and found that some jobs are failing with PGError: ERROR: duplicate key value violates unique constraint "index_friendly_id_slugs_on_slug_and_sluggable_type" : INSERT INTO "friendly_id_slugs" ("slug", "sluggable_id", "sluggable_type", "created_at") VALUES ('rs696974', 1676331, 'Snp', '2011-11-08 17:04:02.299340') RETURNING "id"

This happened as soon as an object was supposed to be saved in this resque-job. After a bit of bug hunting I found the problem:

Some actions in our application trigger multiple jobs on the same object that – if the queue is empty enough – are worked on at the same time as we have multiple resque-workers (with different queues) running in parallel. By chance it can happen that 2 jobs trigger the save-action at more or less the same time. This is fine with Rails/Postgres in general, as it blocks such stuff.

But it is not fine with friendly_id as it does not perform updates on the friendly_id_slugs-table if the combination of (object-ID & slug) are already in the table. Instead the old entry gets deleted and a new one is created:

def build_slug
  return unless should_generate_new_friendly_id?
  # Allow reversion back to a previously used slug
  slugs.where(:slug => friendly_id).delete_all
  slugs.build :slug => friendly_id
end

See https://github.com/norman/friendly_id/blob/master/lib/friendly_id/history.rb#L71

If those two save-actions are performed in fast succession it can happen that worker A has already performed deleting and creating the new slug-entry while worker B had nothing to delete but went on trying to create the new entry. Which then fails as worker A was some milliseconds faster.

Real world example:

The timestamp of an entry in the friendly_id_slugs-table: 2011-11-08 17:04:02.151681
The timestamp of the same entry where the save-transaction failed: 2011-11-08 17:04:02.299340

Possible solution?

None that are really obvious, but what about updating entries instead of deleting/creating anew?

@norman
Owner

This should be fairly straightforward to resolve with a lock. Can you take a look at the changes in this branch and let me know if it works for you? I realize this can be difficult to test and will come up with a more proper test for it soon.

https://github.com/norman/friendly_id/tree/concurrency-fix

@norman norman added a commit that referenced this issue Nov 11, 2011
@norman Use AR-provided locking API
For GH issue #180
4a42b1a
@gedankenstuecke

I just tested the latest version of the branch in my development environment and the same PGError: ERROR: duplicate key value violates unique constraint "index_friendly_id_slugs_on_slug_and_sluggable_type" remains on this setup with parallel working resque-jobs.

@norman
Owner

Ok, it's probably going to be necessary to lock the rows in the model table as well when selecting for the conflicting candidates. I can't work on this at this very moment, but if you want to hack on it a bit yourself, you can try editing slug_generator.rb at line 37 to add:

@conflict = conflicts.lock(true).first

Will take a longer look and solve this soon, but I have to run right now.

@norman norman added a commit that referenced this issue Nov 14, 2011
@norman Added (failing) threading test
This tests the concurrency issue described in GH #180.
6f0fff1
@norman norman added a commit that referenced this issue Nov 14, 2011
@norman Lock table to make test work
This shows a solution to the issue raised in #180.
273486b
@norman
Owner

Looking into this some more, I think the best way to accomplish a solution is with a table-level lock, though I'm undecided as to whether this should be the default behavior. It may be best to just document the need for a table lock so that developers in your situation can find an easy solution. Let me think about this for a bit more.

In the mean time, please see the last commit referenced above for a solution to your immediate problem.

@gedankenstuecke

I had not that much time in the last days to play around with this, but as soon as I find the time I'll try to solve this using the table-level-lock. Thanks for your help!

@norman norman closed this Nov 15, 2011
@norman norman added a commit that referenced this issue Dec 15, 2011
@norman Document some common issues with slugs.
This pertains to issues #180 and #185
c717fb9
@norman norman added a commit that referenced this issue Dec 21, 2011
@norman Use AR-provided locking API
For GH issue #180
a465ef0
@norman norman added a commit that referenced this issue Dec 21, 2011
@norman Added (failing) threading test
This tests the concurrency issue described in GH #180.
471d2b7
@norman norman added a commit that referenced this issue Dec 21, 2011
@norman Lock table to make test work
This shows a solution to the issue raised in #180.
2f65150
@leehambley

I'm just commenting here that I'm seeing the same problem, I resolved it with something like as recommended in SHA: 2F65150, but the problem is now, as the row locks for so long, that the site goes offline when running background tasks.

        def with_episodes_table_lock &block
          Episode.transaction do
            Episode.connection.execute "LOCK TABLE #{Episode.table_name}"
            yield
          end
        end
@saizai saizai pushed a commit to saizai/friendly_id that referenced this issue Jan 23, 2014
@norman Attempted fix for concurrency issue. 34ca0b6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment