Redesign save() methods in order to avoid deadlocks #817

Open
czeslav87 opened this Issue Jan 22, 2014 · 14 comments

Projects

None yet

3 participants

@czeslav87

Hello,

I've created post on stack overflow. http://stackoverflow.com/questions/21281300/propel-orm-saving-related-objects-may-cause-deadlocks

Here's the copy of issue:

we're using Propel 1.6.7 in our project with SQL Server DB. Recently during heavy traffic, we've been having more and more deadlocks. During SQL profiler analysis, we've found out that reason of that is the way that proples handles saving of related objects using Depth-first search. That may lead to different object access order, causing deadlocks.

Here's a simple scenario, which shows how it may happen:

Let's say we have 3 tables, related 1-n (A->B, B->C). So table A is related to mutliple B records and B is related with multiple C records. Let's assume we have 2 parallel processes both perform save on object A. Furthermore, following conditions are met:

Process 1: object A didn't change, 5th related object B changed and some objects C related do 1st object B changed.
Process 2: object A didn't change, 1st object B changed and some objects C related to 10th object B changed.

Here's what happens in SQL Server DB:

Step 1: Process 1 acquires page lock for table C (update/insert/delete), Process 2 acquires page lock for table B.

Step 2: Process 1 ties to acquire page lock for table B. It waits for Process 2 to release lock. Process 2 tries to acquire page lock for table C. It waits for Process 1 to release lock.

Step 3: Deadlock.

Did anyone ran across similar issue?

We've came up with solution using Breadth-first search algorithm:

Create class with queue handling methods, dedicated for DB objects.
Overload save methods for propel objects with following algorithm:
    Check for parents object changes. If changed, return parent->save
    Save current object to db.
    Find descendant object and insert them into end of queue. (instead of running descendant->save instantly)
    Process queue - run save on every object available in queue (removing them from queue).

I'd like to ask you for your opinion for presented solution (or different solution if you have one).

Thank you in advance for any suggestions/opinions.

Best regards.

EDIT (also noticed some typos, which i've edited):

To visualise the issue. Here's the diagram showing db object with relations.

db_objects

  1. Process 1 starts with acquiring lock for page on table C while updating C1,C2,C3.
  2. Process 2 starts with acquiring lock for page on table B while updating B1.
  3. Process 1 tries to acquire lock for page on table B, but it is already locked by process 2. Wait for P2 to finish
  4. Process 2 tries to acquire lock for page on table C, but it is already locked by process 2. Wait for P1 to finish

Deadlock (or "mexican standoff" if someone prefers :))

db_locks

If we'd use queue it'd look like:

  1. Process 1 starts with acquiring lock for page on table B while updating B5.
  2. Process 2 tries to acquire lock for page on table B, but it is already locked by process 1. Wait for P1 to finish.
  3. Process 1 acquires lock on table C. Process is finished after updating C1,C2,C3.
  4. Locks from B and C are released. P2 continue.
@staabm
Member
staabm commented Jan 22, 2014

Sounds like "the usual problems" you get when having optimistic locks and high load/concurrency.

Not sure what your SQL Server is, but it should have something like a timout for the locks, so one of the two concurrent locks will be freed and the other one get the remaing locks he needs.
In case it is not an option for you that one of the processes returns with a "cannot aquire lock error" you need to move to pesimistic locks, but this could have performance impacts.

In case you only get the problem because of processes acquiring the same pages but not the same row, you could also consider row-level locking for certain tables.

@czeslav87

Thanks for reply.

  1. We use Sql Server 2008 R2 SP1. And yes, there are timouts for locks, which result with Deadlock exception (with note to rerun transaction).
  2. Is there row level lock support in Propel? From note on the bottom of http://propelorm.org/Propel/documentation/06-transactions.html I figured out, there isn't.

Anyway, i think proposed solution would help avoid a great amount of deadlocks instead of relying on reruning transaction whenever deadlock occured. (or pessimistic locks, which as you stated have performance impact)

@staabm
Member
staabm commented Jan 22, 2014

Propel itself doesn't do locking but relies on the db locking capabilities.

I have no experience with MSSql Server but I think it should also have locking capabilites on row level.

Using a queue like you proposed would shorten the window in which this situations can occur, but it will not solve the underlying problem.. it will only get less likely because your transaction get smaller (depending on the runtime of the php code, but I think it will only save a few millisecs). In case I read your algo correctly... ;)

Are you sure Propel re-runs your transaction? I can't remember we have such features in the propel core... sounds more like you implemented this retry thing on App level? Or is your user re-trying by clicking the save-button a 2nd time?

@czeslav87
  1. Propel does not rerun transaction. It has to be done manually (which is suggested by exception message) in try-catch statement.
  2. Yes it won't prevent deadlocks at all. From what I've read during last week while analysing the problem, avoiding deadlocks completly may be impossible in large databases, but can be minimised.
  3. According to article you've linked it seems to me that row lock has to be forced from query level. So i guess I'd have to overload methods in Propel to add WITH (ROWLOCK) to generated statement, am i right?
@czeslav87

Moreover, solution I've proposed isn't about shortening the query - it's main purpose is to acquire locks in certain order. In my example it would always be A->B->C.

So if process 1 acquires locks on B and C, process 2 would wait for process 1 to finish, because B would have an exclusive lock. I think that such approach would help to avoid deadlocks significantly.

@staabm
Member
staabm commented Jan 22, 2014

I have to re-read your initial algo ... but have to leave now, will come back later this week...

@czeslav87

Thanks for taking thorough look into my ticket. I'll update it with some graphic presentation of example.

@NobleUplift

The way you explain the deadlocking does not make sense to me:

Step 1: Process 1 acquires page lock for table C (update/insert/delete), Process 2 acquires page lock for table B.
Step 2: Process 1 ties to acquire page lock for table B. It waits for Process 2 to release lock. Process 2 tries to acquire page lock for table B. It waits for Process 2 to release lock.

  • Why does process 2 try to acquire the page lock for table B? It already has a page lock on table B. You never said Process 2 released the lock.
  • And why would Process 2 wait for Process 2 to release the lock? Unless your antecedent is Process 1, which was not clear at all.

So if process 1 acquires locks on B and C, process 2 would wait for process 2 to finish, because B would have an exclusive lock. I think that such approach would help to avoid deadlocks significantly.

  • How does process 2 wait for process 2 to finish?

I'm sorry, but I'm very confused.

@czeslav87

My bad, there are some typos - corrected both ticket and comment.

@NobleUplift

Thanks lol. It was a long night so I couldn't tell if I was just having a derp moment or not.

@czeslav87

Any further feedback?

@staabm
Member
staabm commented Jan 27, 2014

just returned from vacation ... cleaning up my backlog right now... will get back to this hopefully in a few days.

@czeslav87

Same here... :)

Will be getting back to this soon. If we'll make any progress, I'll keep you up to date. If anyone would like to share some more feedback about the idea, I'd appreciate that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment