-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
postgres VACUUM exposes deadlock #369
Comments
Will investigate soon, flagging as a possible release blocker. |
I don't think this is an IRRd bug. Quoting from the docs:
So it makes sense that a full vacuum locks everything and therefore halts all importers. Or, the vacuum may wait for current importers to complete. It looks like your vacuum took quite long, looking at the number of importers that were running during it. If there was indeed a deadlock, PostgreSQL should detect that generally, and kill something to resolve it. If that's an IRRd importer, that import run will fail, but IRRd will recover the next time that importer is run. I don't know why PostgreSQL didn't detect the deadlock. The I have tried the full vacuum on a few of my instances, and it ran without any issues at all. Have you seen this issue reoccur? On a sidenote, #326 will reduce the amount of locks needed by the imports, if successful. |
At this point I disagree. :-)
VACUUM FULL requires an exclusive lock, yes, but I believe it can't get it because of the Why would IRRd issue a This is reproducible every time. This is the ps output right after it happens, prior to a bunch of INSERTs and DELETEs being queued up:
|
Upon closer look, that may indeed be a cause. The reasons are a bit complicated, but it looks fixable. Will update. I am curious though why you're running full vacuums, especially under load? As far as I know, they only add something if the database shrunk significantly, which should be very rare in IRRd. Full vacuum are also quite slow and rather disruptive, as they lock for all reads and writes. |
Thanks.
I guess it is my habit from running postgres instances over the years. The vacuum we do includes an "--analyze" which can be useful for planning and I also normally do a monthly "clusterdb --all". |
Particularly whois worker database connections are long lived, and would always start a session. This results in many long-lived sessions. This commit adds a readonly flag that sets the database to use true PostgreSQL autocommit (which strangely, is indeed set through the isolation level). See https://www.oddbird.net/2014/06/14/sqlalchemy-postgres-autocommit/ for explanation on implicit autocommit vs true autocommit.
b84e66b is a fix for the long-running transactions. However, I do still recommend against full vacuums, especially under load, unless there is a specific need to recover disk space. |
Testing confirms fix. No deadlock now. While concurrently testing frequent serialized bgpq4 queries, I did encounter some bgpq4 |
Describe the bug
Running f4e5797 (just shy of 4.1.0) have noticed that a postgres
vacuumdb --all --analyze --full
results in some kind of deadlock as follows:Note the pid 23230
VACUUM waiting
above, which was prompted by:kill 23230
terminates the waiting VACUUM and then rest of the commands in progress are able to continue and things appear to have gotten caught up without problem.After the VACUUM is killed, I note:
I imagine the
idle in transaction
processes are what is blocking the VACUUM from being able to run.I am pretty sure that prior to 4.1.0x that VACUUMs were able to run without problem.
To Reproduce
Run
PGHOST="/run/postgresql" vacuumdb --all --analyze --full
or equivalent.Expected behaviour
My concern and reason for filing this bug is that a VACUUM should not be blocked, nor should it result in a deadlock situation.
IRRd version you are running
f4e5797 (just shy of 4.1.0)
The text was updated successfully, but these errors were encountered: