Skip to content
This repository

Duplicate Key Value problems by using history and parallel processing #180

Closed
gedankenstuecke opened this Issue November 08, 2011 · 6 comments

3 participants

Bastian Greshake Norman Clarke Lee Hambley
Bastian Greshake

Hi,
we're using friendly_id (4.0.0.beta14) with the history-feature enabled on our project, together with resque for managing queues and found that some jobs are failing with PGError: ERROR: duplicate key value violates unique constraint "index_friendly_id_slugs_on_slug_and_sluggable_type" : INSERT INTO "friendly_id_slugs" ("slug", "sluggable_id", "sluggable_type", "created_at") VALUES ('rs696974', 1676331, 'Snp', '2011-11-08 17:04:02.299340') RETURNING "id"

This happened as soon as an object was supposed to be saved in this resque-job. After a bit of bug hunting I found the problem:

Some actions in our application trigger multiple jobs on the same object that – if the queue is empty enough – are worked on at the same time as we have multiple resque-workers (with different queues) running in parallel. By chance it can happen that 2 jobs trigger the save-action at more or less the same time. This is fine with Rails/Postgres in general, as it blocks such stuff.

But it is not fine with friendly_id as it does not perform updates on the friendly_id_slugs-table if the combination of (object-ID & slug) are already in the table. Instead the old entry gets deleted and a new one is created:

def build_slug
  return unless should_generate_new_friendly_id?
  # Allow reversion back to a previously used slug
  slugs.where(:slug => friendly_id).delete_all
  slugs.build :slug => friendly_id
end

See https://github.com/norman/friendly_id/blob/master/lib/friendly_id/history.rb#L71

If those two save-actions are performed in fast succession it can happen that worker A has already performed deleting and creating the new slug-entry while worker B had nothing to delete but went on trying to create the new entry. Which then fails as worker A was some milliseconds faster.

Real world example:

The timestamp of an entry in the friendly_id_slugs-table: 2011-11-08 17:04:02.151681
The timestamp of the same entry where the save-transaction failed: 2011-11-08 17:04:02.299340

Possible solution?

None that are really obvious, but what about updating entries instead of deleting/creating anew?

Norman Clarke
Owner

This should be fairly straightforward to resolve with a lock. Can you take a look at the changes in this branch and let me know if it works for you? I realize this can be difficult to test and will come up with a more proper test for it soon.

https://github.com/norman/friendly_id/tree/concurrency-fix

Norman Clarke norman referenced this issue from a commit November 11, 2011
Norman Clarke Use AR-provided locking API
For GH issue #180
4a42b1a
Bastian Greshake

I just tested the latest version of the branch in my development environment and the same PGError: ERROR: duplicate key value violates unique constraint "index_friendly_id_slugs_on_slug_and_sluggable_type" remains on this setup with parallel working resque-jobs.

Norman Clarke
Owner

Ok, it's probably going to be necessary to lock the rows in the model table as well when selecting for the conflicting candidates. I can't work on this at this very moment, but if you want to hack on it a bit yourself, you can try editing slug_generator.rb at line 37 to add:

@conflict = conflicts.lock(true).first

Will take a longer look and solve this soon, but I have to run right now.

Norman Clarke norman referenced this issue from a commit November 13, 2011
Norman Clarke Added (failing) threading test
This tests the concurrency issue described in GH #180.
6f0fff1
Norman Clarke norman referenced this issue from a commit November 13, 2011
Norman Clarke Lock table to make test work
This shows a solution to the issue raised in #180.
273486b
Norman Clarke
Owner

Looking into this some more, I think the best way to accomplish a solution is with a table-level lock, though I'm undecided as to whether this should be the default behavior. It may be best to just document the need for a table lock so that developers in your situation can find an easy solution. Let me think about this for a bit more.

In the mean time, please see the last commit referenced above for a solution to your immediate problem.

Bastian Greshake

I had not that much time in the last days to play around with this, but as soon as I find the time I'll try to solve this using the table-level-lock. Thanks for your help!

Norman Clarke norman closed this November 15, 2011
Norman Clarke norman referenced this issue from a commit December 15, 2011
Norman Clarke Document some common issues with slugs.
This pertains to issues #180 and #185
c717fb9
Norman Clarke norman referenced this issue from a commit November 11, 2011
Norman Clarke Use AR-provided locking API
For GH issue #180
a465ef0
Norman Clarke norman referenced this issue from a commit November 13, 2011
Norman Clarke Added (failing) threading test
This tests the concurrency issue described in GH #180.
471d2b7
Norman Clarke norman referenced this issue from a commit November 13, 2011
Norman Clarke Lock table to make test work
This shows a solution to the issue raised in #180.
2f65150
Lee Hambley

I'm just commenting here that I'm seeing the same problem, I resolved it with something like as recommended in SHA: 2F65150, but the problem is now, as the row locks for so long, that the site goes offline when running background tasks.

        def with_episodes_table_lock &block
          Episode.transaction do
            Episode.connection.execute "LOCK TABLE #{Episode.table_name}"
            yield
          end
        end
Sai saizai referenced this issue from a commit in saizai/friendly_id November 11, 2011
Norman Clarke Attempted fix for concurrency issue. 34ca0b6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.