Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock produced in Postgres when running migrations in parallel with a migration task containing CREATE INDEX CONCURRENTLY #1654

Closed
jhinch opened this issue May 31, 2017 · 8 comments

Comments

@jhinch
Copy link

@jhinch jhinch commented May 31, 2017

What version of Flyway are you using?

Reproduced in 4.1.0 and on master. Affects 4.1.0 and later.

Which client are you using? (Command-line, Java API, Maven plugin, Gradle plugin, SBT plugin, ANT tasks)

Java API

What database are you using (type & version)?

Reproduced on Postgres 9.5.x

What operating system are you using?

Linux / Mac

What did you do?

(Please include the content causing the issue, any relevant configuration settings, and the command you ran)

I attempted to create an index concurrently within a migration task. The migration task ran at startup in a multi-node service. The pending acquisition of the advisory lock Flyway creates on one thread caused a deadlock in the in-progress migration which was creating the index in another thread. I have reproduced the issue in your test suite with some modifications and can be found on a branch in my fork, link below.

https://github.com/flyway/flyway/compare/master...jhinch:deadlock-concurrent-create-index-concurrently?expand=1

What did you expect to see?

As per the FAQ in the documentation, Flyway should be able to handle multiple nodes attempting to run migration tasks concurrently:

https://flywaydb.org/documentation/faq#parallel

What did you see instead?

A deadlock and failure of the migration task. As the migration task is not transaction for CREATE INDEX CONCURRENTLY it leaves the database in an inconsistent state (with an index marked as invalid).

I am more than happy to provide a PR with a fix, but I'm unsure as to what the best way to fix this is. I figured I would create a bug with tests to reproduce the issue and we could discuss from there.

@axelfontaine
Copy link
Contributor

@axelfontaine axelfontaine commented May 31, 2017

Thanks for the report. Happy to accept a pull request with a fix. I haven't debugged it yet, but how exactly is it deadlocking? Is this even fixable? Does it also occur on PostgreSQL 9.6?

@jhinch
Copy link
Author

@jhinch jhinch commented May 31, 2017

This is the error actual error which occurs in flyway:

org.flywaydb.core.internal.command.DbMigrate$FlywayMigrateSqlException: 
Migration V2__CreateIndexConcurrently.sql failed
------------------------------------------------
SQL State  : 40P01
Error Code : 0
Message    : ERROR: deadlock detected
  Detail: Process 587 waits for ShareLock on virtual transaction 10/12; blocked by process 588.
Process 588 waits for ExclusiveLock on advisory lock [47999,18028,1462734915,1]; blocked by process 587.
  Hint: See server log for query details.
Location   : migration/dbsupport/postgresql/sql/concurrent/V2__CreateIndexConcurrently.sql (REDACTED)
Line       : 5
Statement  : CREATE INDEX CONCURRENTLY idx_cities_name ON cities(name)

The error in the Postgres logs (effectively the same information):

ERROR:  deadlock detected
DETAIL:  Process 587 waits for ShareLock on virtual transaction 10/12; blocked by process 588.
        Process 588 waits for ExclusiveLock on advisory lock [47999,18028,1462734915,1]; blocked by process 587.
        Process 587: CREATE INDEX CONCURRENTLY idx_cities_name ON cities(name)
        Process 588: SELECT pg_advisory_lock(77431133147203)
HINT:  See server log for query details.

I haven't tested Postgres 9.6.x yet. In terms of whether this is even fixable, I think that the strategy for locking a schema for migration would need to be changed. You may be able to switch to using pg_try_advisory_lock and poll for it to go true instead of relying on pg_advisory_lock's blocking nature.

@jhinch
Copy link
Author

@jhinch jhinch commented May 31, 2017

I have pushed another commit to my branch which contains the test reproducing the issue. It switches out pg_advisory_lock for pg_try_advisory_lock with a naive polling implementation. The tests now pass.

https://github.com/flyway/flyway/compare/master...jhinch:deadlock-concurrent-create-index-concurrently?expand=1

If you are happy with me pursuing this approach, I'm happy to clean up the branch and create a PR

@axelfontaine
Copy link
Contributor

@axelfontaine axelfontaine commented Jun 1, 2017

@jhinch Thanks for investigating. Sounds great! Wouldn't it be better to spin indefinitely though? With your current implementation if a migration on a node that has acquired the lock takes more than 5 seconds other nodes will start failing due to being unable to acquire the lock. That doesn't sound right.

@jhinch
Copy link
Author

@jhinch jhinch commented Jun 1, 2017

Yes, that part of my branch is very naive and was something I wanted to clean up. I think there are two options. Either spin indefinitely or have a configurable overall timeout. There would also be the question of whether it should have some sort of backoff while it waits (linear or exponential).

I'm happy to code up the solution with an indefinite spin. The interrupt exception on the sleep can be used to terminate if the process if the consumer would prefer it to timeout.

@axelfontaine
Copy link
Contributor

@axelfontaine axelfontaine commented Jun 1, 2017

None of the other locks in Flyway currently have timeouts, so let's stick with the simple indefinite spin. No need for backoff. This is most likely causing at most a very low overhead on the DB.

@vhmth
Copy link

@vhmth vhmth commented Sep 26, 2017

We were seeing something similar when trying to lock our DB migrations that used CONCURRENTLY when creating indexes. We used advisory-lock with Sequelize and switched towards indefinitely spinning and it has been resolved.

@axelfontaine
Copy link
Contributor

@axelfontaine axelfontaine commented Nov 15, 2017

@jhinch Thanks again for investigating this and providing a solution your work has now been merged and I have also added you to the hall of fame page (Flyway 5.0 branch). Truly appreciated!

dohrayme pushed a commit to dohrayme/flyway that referenced this issue Feb 3, 2020
…ons in parallel with a migration task containing CREATE INDEX CONCURRENTLY
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants