-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition in XLogWaitForReplayOf() #7804
base: main
Are you sure you want to change the base?
Conversation
I spotted this while debugging #7791, but I don't think this has been causing any noticeable problems in practice. |
3078 tests run: 2952 passed, 0 failed, 126 skipped (full report)Flaky tests (2)Postgres 15
Postgres 14
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
8dc1c48 at 2024-05-19T18:55:45.098Z :recycle: |
0d47a19
to
1bb66fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why removing replayRecPtr = GetXLogReplayRecPtr(NULL);
assignment in XLogWaitForReplayOf
is correct. Can it cause unintentional wait for 10 seconds?
Aslo not quite clear to me why bufmgr fixes are present only pg16 version?
ConditionVariablePrepareToSleep has this comment on it: > * Caution: "before entering the loop" means you *must* test the exit > * condition between calling ConditionVariablePrepareToSleep and calling > * ConditionVariableSleep. If that is inconvenient, omit calling > * ConditionVariablePrepareToSleep. We were not obeying that: we did not test the exit condition correctly between the ConditionVariablePrepareToSleep and ConditionVariableSleep calls, because the test that we had in between them only checked the local 'replayRecPtr' variable, without updating it from shared memory. That wasn't too serious, because the loop includes a 10 second timeout, and would retry and succeed if the original update was missed. Still, better fix it. To fix, just remove the ConditionVariablePrepareToSleep() call. As the comment says, that's also correct, and even more efficient if we assume that sleeping is rare.
1bb66fe
to
8dc1c48
Compare
It seems correct to me. Because I'm also removing the call to
That was a mistake, there were not supposed to be any bufmgr changs for pg16 either. Thanks! |
ConditionVariablePrepareToSleep has this comment on it:
We were not obeying that: we did not test the exit condition correctly between the ConditionVariablePrepareToSleep and ConditionVariableSleep calls, because the test that we had in between them only checked the local 'replayRecPtr' variable, without updating it from shared memory.
That wasn't too serious, because the loop includes a 10 second timeout, and would retry and succeed if the original update was missed. Still, better fix it.
To fix, just remove the ConditionVariablePrepareToSleep() call. As the comment says, that's also correct, and even more efficient if we assume that sleeping is rare.