-
Notifications
You must be signed in to change notification settings - Fork 23
Online repl checkpoint #546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
So @MMeent added this comment: It sounds like the PR as written wouldn't account for this panic? Should we keep everything the same, but just move the call to |
|
Actually, after thinking about this more, it looks good. Let's get a review from Matthias too. |
|
A more descriptive commit message would also be useful for future git archaeologists. |
I just restore behaviour which we have for all other versions of Postgres: pg14-16. In any case, I think it is good idea to have the same code in all PG versions. |
hlinnaka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty confusing. Even before this PR, but this isn't helping..
- The function is called "CheckPointReplicationState", but CheckPointRelationMap() has nothing to do with replication.
- CheckPointRelationMap() and CheckPointReplicationOrigin() don't write any WAL records, so they don't need this special treatment. They can be called from CheckPointGuts() just like in upstream
- In a shutdown checkpoint, all these functions are now being called twice, once in the PreCheckPointGuts() stage and again in CheckPointGuts(). Seems unnecessary, but also wrong; won't those functions PANIC again, when the try to write the WAL records?
|
Yes,
It is real bug that |
|
I committed fix of double call of Do you think that we should include in this PR all other changes: move I agree with you that they are not writing WAL and so there is no need to call them here. I prefer to leave it is as it is now or create separate PR for it, because it should affect all other Postgres versions. |
9845a53 to
4671da9
Compare
4671da9 to
7e3f397
Compare
Related: neondatabase/neon#8620 |
hlinnaka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I committed fix of double call of
CheckPointReplicationState. So now pg17 is ding exactly the same as all there versions.Do you think that we should include in this PR all other changes: move
CheckPointReplicationOrigin()andCheckPointRelationMap()out ofCheckPointReplicationState?I agree with you that they are not writing WAL and so there is no need to call them here. But as I wrote above, I prefer to reserve order of calling this functions in
CheckpointGutsandCheckPointReplicationOrigin()definitely is related to replication state, although in not walloging written file.I prefer to leave it is as it is now or create separate PR for it, because it should affect all other Postgres versions.
Ok, let's commit this, to bring all the Postgres versions to the same state, and open a separate PR for the other cleanup.
## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1733180965970089 Replication state is checkpointed only by shutdown checkpoint. It means that replication snapshots are not removed till compute shutdown. ## Summary of changes Checkpoint replication state during online checkpoint Related Postgres PR: neondatabase/postgres#546 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Checkpoint replication state not only in shutdown, but also in online checkpoint