Duplicate Key Value problems by using history and parallel processing #180

Closed
gedankenstuecke opened this Issue Nov 8, 2011 · 6 comments

Comments

Projects
None yet
3 participants
@gedankenstuecke

Hi,
we're using friendly_id (4.0.0.beta14) with the history-feature enabled on our project, together with resque for managing queues and found that some jobs are failing with PGError: ERROR: duplicate key value violates unique constraint "index_friendly_id_slugs_on_slug_and_sluggable_type" : INSERT INTO "friendly_id_slugs" ("slug", "sluggable_id", "sluggable_type", "created_at") VALUES ('rs696974', 1676331, 'Snp', '2011-11-08 17:04:02.299340') RETURNING "id"

This happened as soon as an object was supposed to be saved in this resque-job. After a bit of bug hunting I found the problem:

Some actions in our application trigger multiple jobs on the same object that – if the queue is empty enough – are worked on at the same time as we have multiple resque-workers (with different queues) running in parallel. By chance it can happen that 2 jobs trigger the save-action at more or less the same time. This is fine with Rails/Postgres in general, as it blocks such stuff.

But it is not fine with friendly_id as it does not perform updates on the friendly_id_slugs-table if the combination of (object-ID & slug) are already in the table. Instead the old entry gets deleted and a new one is created:

def build_slug
  return unless should_generate_new_friendly_id?
  # Allow reversion back to a previously used slug
  slugs.where(:slug => friendly_id).delete_all
  slugs.build :slug => friendly_id
end

See https://github.com/norman/friendly_id/blob/master/lib/friendly_id/history.rb#L71

If those two save-actions are performed in fast succession it can happen that worker A has already performed deleting and creating the new slug-entry while worker B had nothing to delete but went on trying to create the new entry. Which then fails as worker A was some milliseconds faster.

Real world example:

The timestamp of an entry in the friendly_id_slugs-table: 2011-11-08 17:04:02.151681
The timestamp of the same entry where the save-transaction failed: 2011-11-08 17:04:02.299340

Possible solution?

None that are really obvious, but what about updating entries instead of deleting/creating anew?

@norman

This comment has been minimized.

Show comment
Hide comment
@norman

norman Nov 11, 2011

Owner

This should be fairly straightforward to resolve with a lock. Can you take a look at the changes in this branch and let me know if it works for you? I realize this can be difficult to test and will come up with a more proper test for it soon.

https://github.com/norman/friendly_id/tree/concurrency-fix

Owner

norman commented Nov 11, 2011

This should be fairly straightforward to resolve with a lock. Can you take a look at the changes in this branch and let me know if it works for you? I realize this can be difficult to test and will come up with a more proper test for it soon.

https://github.com/norman/friendly_id/tree/concurrency-fix

norman added a commit that referenced this issue Nov 11, 2011

@gedankenstuecke

This comment has been minimized.

Show comment
Hide comment
@gedankenstuecke

gedankenstuecke Nov 11, 2011

I just tested the latest version of the branch in my development environment and the same PGError: ERROR: duplicate key value violates unique constraint "index_friendly_id_slugs_on_slug_and_sluggable_type" remains on this setup with parallel working resque-jobs.

I just tested the latest version of the branch in my development environment and the same PGError: ERROR: duplicate key value violates unique constraint "index_friendly_id_slugs_on_slug_and_sluggable_type" remains on this setup with parallel working resque-jobs.

@norman

This comment has been minimized.

Show comment
Hide comment
@norman

norman Nov 11, 2011

Owner

Ok, it's probably going to be necessary to lock the rows in the model table as well when selecting for the conflicting candidates. I can't work on this at this very moment, but if you want to hack on it a bit yourself, you can try editing slug_generator.rb at line 37 to add:

@conflict = conflicts.lock(true).first

Will take a longer look and solve this soon, but I have to run right now.

Owner

norman commented Nov 11, 2011

Ok, it's probably going to be necessary to lock the rows in the model table as well when selecting for the conflicting candidates. I can't work on this at this very moment, but if you want to hack on it a bit yourself, you can try editing slug_generator.rb at line 37 to add:

@conflict = conflicts.lock(true).first

Will take a longer look and solve this soon, but I have to run right now.

norman added a commit that referenced this issue Nov 14, 2011

Added (failing) threading test
This tests the concurrency issue described in GH #180.

norman added a commit that referenced this issue Nov 14, 2011

Lock table to make test work
This shows a solution to the issue raised in #180.
@norman

This comment has been minimized.

Show comment
Hide comment
@norman

norman Nov 14, 2011

Owner

Looking into this some more, I think the best way to accomplish a solution is with a table-level lock, though I'm undecided as to whether this should be the default behavior. It may be best to just document the need for a table lock so that developers in your situation can find an easy solution. Let me think about this for a bit more.

In the mean time, please see the last commit referenced above for a solution to your immediate problem.

Owner

norman commented Nov 14, 2011

Looking into this some more, I think the best way to accomplish a solution is with a table-level lock, though I'm undecided as to whether this should be the default behavior. It may be best to just document the need for a table lock so that developers in your situation can find an easy solution. Let me think about this for a bit more.

In the mean time, please see the last commit referenced above for a solution to your immediate problem.

@gedankenstuecke

This comment has been minimized.

Show comment
Hide comment
@gedankenstuecke

gedankenstuecke Nov 14, 2011

I had not that much time in the last days to play around with this, but as soon as I find the time I'll try to solve this using the table-level-lock. Thanks for your help!

I had not that much time in the last days to play around with this, but as soon as I find the time I'll try to solve this using the table-level-lock. Thanks for your help!

@norman norman closed this Nov 15, 2011

norman added a commit that referenced this issue Dec 15, 2011

norman added a commit that referenced this issue Dec 21, 2011

norman added a commit that referenced this issue Dec 21, 2011

Added (failing) threading test
This tests the concurrency issue described in GH #180.

norman added a commit that referenced this issue Dec 21, 2011

Lock table to make test work
This shows a solution to the issue raised in #180.
@leehambley

This comment has been minimized.

Show comment
Hide comment
@leehambley

leehambley Feb 26, 2012

Contributor

I'm just commenting here that I'm seeing the same problem, I resolved it with something like as recommended in SHA: 2F65150, but the problem is now, as the row locks for so long, that the site goes offline when running background tasks.

        def with_episodes_table_lock &block
          Episode.transaction do
            Episode.connection.execute "LOCK TABLE #{Episode.table_name}"
            yield
          end
        end
Contributor

leehambley commented Feb 26, 2012

I'm just commenting here that I'm seeing the same problem, I resolved it with something like as recommended in SHA: 2F65150, but the problem is now, as the row locks for so long, that the site goes offline when running background tasks.

        def with_episodes_table_lock &block
          Episode.transaction do
            Episode.connection.execute "LOCK TABLE #{Episode.table_name}"
            yield
          end
        end

saizai pushed a commit to saizai/friendly_id that referenced this issue Jan 23, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment