-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postgres deadlocking when multiple processes run CREATE INDEX CONCURRENTLY
#960
Comments
As mentioned in the description, this issue has been observed before in The fix here would be to change the locking strategy (for postgres) from using I'm happy to work on this fix as I already have a fork and a failing test to work with, but wanted to open for discussion here first :) |
Thanks for the investigation, @AkuSilvenius 🏆. I've just been bitten by this as well. Any chance of @AkuSilvenius's fix being merged in? |
thanks @jackh-ncl, I have a draft PR in #962 with a rather naive approach, and also haven't been able to run the pipeline successfully (looks like it's running out is memory) so I've kept the PR as draft for now. If someone knows how to fix the problems with the pipeline happy to continue working on it |
Hello, |
Do you know why |
I've been evaluating different migration tools and I looked into this issue, because it seems like every tool was affected by it. As mentioned earlier, the flyway issue is a good explanation and that issue is referenced by many other tools. Sourcegraph has also written a blog post about this exact issue: https://sourcegraph.com/blog/introducing-migrator-service#a-deeper-contributing-factor. I'd love for this to be resolved because other than this little issue, this tool is my top pick.
Here is a summary of everything I've read. The error is coming from a deadlock detection mechanism in Postgres. You can see an example of the error in flyway/flyway#1654.
The implementation of
https://www.postgresql.org/docs/16/sql-createindex.html#id-1.9.3.69.6.4.2 so now process A is waiting for process B, and process B is waiting for process A, and we have a deadlock and postgres kills them both, returning an error to |
@dhui I had attached the error to the description of the issue, the error returned from And good question, why the error instead of waiting indefinitely as commented? Even though docs about try_advisory_lock say |
thanks for the interest on this topic @vmercierfr @Gibstick :) #962 has been updated based on earlier feedback |
It's mostly a drive-by as I occasionally work on teams that use golang-migrate, and this issue is a bit of a nasty case when you hit it. In terms of the implementation, I'd echo moving towards We ended up doing something similar in |
Describe the Bug
When multiple processes run migrations in parallel (e.g. multiple application replicas running
m.Up()
), havingCREATE INDEX CONCURRENTLY
as part of the migration script will result in a deadlock, leaving behind a dirty database version as well as anINVALID
index (expected as mentioned in docs).Steps to Reproduce
I have added a failing test in my fork to mimic the behavior of multiple processes running the migrations in parallel here - it's basically the same test as in master branch here (which includes a migration file with a
CREATE INDEX CONCURRENTLY
), but just runs multiple times in parallel. Settingconcurrency = 1
will pass the test, but any larger number of concurrency will result in a deadlock and a failing test.Expected Behavior
Run migrations successfully
Migrate Version
v4.15.2
Loaded Source Drivers
s3, github, gitlab, go-bindata, file, bitbucket, github-ee, godoc-vfs, gcs
Loaded Database Drivers
cockroachdb, neo4j, postgresql, redshift, clickhouse, mysql, pgx, postgres, sqlserver, crdb-postgres, mongodb, spanner, cassandra, cockroach, firebird, firebirdsql, mongodb+srv, stub
Go Version
go version go1.20.3 darwin/arm64
Stacktrace
Failing test output
Additional context
This looks to be exactly the same problem as reported and fixed in the
flyway
library hereThe text was updated successfully, but these errors were encountered: