Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to obtain Postgres 11 advisory lock #2759

Closed
ben-manes opened this issue Apr 7, 2020 · 4 comments
Closed

Failure to obtain Postgres 11 advisory lock #2759

ben-manes opened this issue Apr 7, 2020 · 4 comments
Assignees
Milestone

Comments

@ben-manes
Copy link
Contributor

ben-manes commented Apr 7, 2020

Which version and edition of Flyway are you using?

6.3.2

Which client are you using? (Command-line, Java API, Maven plugin, Gradle plugin)

Java API

Which database are you using (type & version)?

Postgres 11.1 and jdbc driver 42.2.12

Which operating system are you using?

Ubuntu 18.04.1

What did you do?

Upgraded from 5.2.4 and tried to perform a new migration.

What did you expect to see?

The migration succeed.

What did you see instead?

Number of retries exceeded while attempting to acquire PostgreSQL advisory lock

This appears to be due to the fix in #2634 where the previous code performed a string check. It was changed to a boolean, which fails for this version of the database / driver.

return "t".equals(rs.getString("pg_try_advisory_lock")); 

became

return rs.getBoolean("pg_try_advisory_lock");
@juliahayward juliahayward self-assigned this Apr 7, 2020
@juliahayward juliahayward added this to the Flyway 6.4 milestone Apr 7, 2020
@juliahayward
Copy link
Member

juliahayward commented Apr 8, 2020

I can't reproduce this with the above setup (Postgres 11.1, either 42.2.12 or 42.2.8.jdbc6 driver, Ubuntu 18.04), and our test suite doesn't reproduce it either. Postgres documentation confirms that pg_try_advisory_lock() returns a boolean in all versions, so the code above is correct, at least in theory.

If you're using Community edition, would you be able to fork the public source code, reinstate the above line, compile and try again? That would confirm whether this is the actual problem or there's something else going on in your environment.

@ben-manes
Copy link
Contributor Author

ben-manes commented Apr 8, 2020

I am also having trouble reproducing. This was on a test environment that I cannot muck with, but on a different environment it wouldn't fail. The only aspect of note is that this is a long-running migration (60 minutes) and the Hikari connection pool has a leak detector that logs (but shouldn't fail). When a migration fails it forces a restart of the JVM.

My best guess is that despite the JVM restart, the connection wasn't deemed closed by Postgres and the subsequent calls failed to lock. However I did do a clean shutdown, verified it was unlocked in psql, restarted to see the problem, and saw it was locked in psql. So that guess isn't accurate and sorry for misdiagnosed it originally.

I'll add some hardening of our logic and see if it reproduces in the future.

@ben-manes
Copy link
Contributor Author

ben-manes commented Apr 8, 2020

I do notice that (in both versions) there is not error handling logic to close the connection on an exception. In DatabaseFactory.createDatabase(...) it opens a connection and allows it to leak if an error occurs. I wonder if perhaps there is some internal retry logic that, since the connection can leak, caused this issue.

In complex code flows, I often pass around a Guava Closer to aggregate any resources that should be closed at the end of the outer flow. I think something similar would be beneficial to ensure that the resources are being cleaned up.

@j-martin
Copy link

j-martin commented Apr 23, 2020

@juliahayward @MikielAgutu this issue is in the release note for version 6.4.0, but I am not seeing any commits that would have fixed the issue. Can you confirm the issue has been fixed? We worked around the issue by wrapping flyway migrate (the command line) with a retry logic, but it's far from optimal. After enough retries, flyway is able to get the lock.

We've observed the issue with flyway 6.2.1 and 6.3.3. We upgraded to 6.4.0 today, but don't have any migrations to apply. We also observed the issue with either applying migrations inside our app via flyway-core (with the default flyway connection logic OR with HikariCP our connection pool) AND flyway command line

From what we observed the more pending migrations, the more likely the issue is likely to happen. We use only SQL migrations if it matters. We observed flyway failing to get the lock between applying migrations. e.g.: we got the error after applying 5 migrations while still having 5 others to apply.

Note that we haven't observed the issue on our production database or locally. Only in our staging environments running postgresql inside a container (on k8s). Note that we nuked the containers volumes multiple times so it's unlikely we are dealing with some kind of data corruption on our side.

Here's the postgres instance's config:

datestyle = 'iso, mdy'
default_text_search_config = 'pg_catalog.english'
dynamic_shared_memory_type = posix
fsync = off
lc_messages = 'en_US.utf8'
lc_monetary = 'en_US.utf8'
lc_numeric = 'en_US.utf8'
lc_time = 'en_US.utf8'
listen_addresses = '*'
log_timezone = 'UTC'
max_connections = 1000
max_wal_senders = 0
max_wal_size = 10GB
min_wal_size = 80MB
shared_buffers = 128MB
synchronous_commit = off
timezone = 'UTC'
wal_level = minimal
work_mem = 32MB

stefanobaghino-da added a commit to digital-asset/daml that referenced this issue Jun 29, 2020
Prevents incurring into flyway/flyway#2759 (which was apparently solved in 6.4.0)

changelog_begin
changelog_end
mergify bot pushed a commit to digital-asset/daml that referenced this issue Jun 29, 2020
* Bump Flyway version to 6.5

Prevents incurring into flyway/flyway#2759 (which was apparently solved in 6.4.0)

changelog_begin
changelog_end

* Comply with changed method signature
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants