Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep walproposer alive until shutdown checkpoint is safe on safekepeers #6712

Merged
merged 2 commits into from Mar 11, 2024

Conversation

hlinnaka
Copy link
Contributor

@hlinnaka hlinnaka commented Feb 9, 2024

The walproposer pretends to be a walsender in many ways. It has a WalSnd slot, it claims to be a walsender by calling MarkPostmasterChildWalSender() etc. But one different to real walsenders was that the postmaster still treated it as a bgworker rather than a walsender. The difference is that at shutdown, walsenders are not killed until the very end, after the checkpointer process has written the shutdown checkpoint and exited.

As a result, the walproposer always got killed before the shutdown checkpoint was written, so the shutdown checkpoint never made it to safekeepers. That's fine in principle, we don't require a clean shutdown after all. But it also feels a bit silly not to stream the shutdown checkpoint. It could be useful for initializing hot standby mode in a read replica, for example.

Change postmaster to treat background workers that have called MarkPostmasterChildWalSender() as walsenders. That unfortunately requires another small change in postgres core.

After doing that, walproposers stay alive longer. However, it also means that the checkpointer will wait for the walproposer to switch to WALSNDSTATE_STOPPING state, when the checkpointer sends the PROCSIG_WALSND_INIT_STOPPING signal. We don't have the machinery in walproposer to receive and handle that signal reliably. Instead, we mark walproposer as being in WALSNDSTATE_STOPPING always.

In commit 568f914, I assumed that shutdown will wait for all the remaining WAL to be streamed to safekeepers, but before this commit that was not true, and the test became flaky. This should make it stable again.

Closes: #559

@hlinnaka hlinnaka requested a review from a team as a code owner February 9, 2024 20:02
@hlinnaka hlinnaka requested review from conradludgate, MMeent and arssher and removed request for a team and conradludgate February 9, 2024 20:02
Copy link

github-actions bot commented Feb 9, 2024

2502 tests run: 2380 passed, 0 failed, 122 skipped (full report)


Flaky tests (1)

Postgres 15

  • test_vm_bit_clear_on_heap_lock: debug

Code coverage* (full report)

  • functions: 28.8% (7031 of 24443 functions)
  • lines: 47.6% (43441 of 91295 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
ba204b0 at 2024-03-11T19:08:26.723Z :recycle:

Copy link
Contributor

@MMeent MMeent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't check the changes made in the various PG versions due to what seems to be GitHub UI error, but looks OK.

@arssher
Copy link
Contributor

arssher commented Feb 10, 2024

I can't check the changes made in the various PG versions due to what seems to be GitHub UI error

Looks like PG change push is missing in the repo.

@arssher
Copy link
Contributor

arssher commented Feb 10, 2024

So I can't yet check out the core changes, but I wonder why postmaster treated walproposer as bgw prevously. Looking at SignalSomeChildren/CountChildren it checks only bkend_type, which is set to BACKEND_TYPE_WALSND soon after MarkPostmasterChildWalSender...

@hlinnaka
Copy link
Contributor Author

Looks like PG change push is missing in the repo.

Created branches for those and pushed, should be visible now.

So I can't yet check out the core changes, but I wonder why postmaster treated walproposer as bgw prevously. Looking at SignalSomeChildren/CountChildren it checks only bkend_type, which is set to BACKEND_TYPE_WALSND soon after MarkPostmasterChildWalSender...

The postgres changes show that, but in a nutshell, because walproposer is marked as BACKEND_TYPE_BGWORKER, postmaster never calls IsPostmasterChildWalSender() on it. Postmaster expects that a BACKEND_TYPE_NORMAL process can morph into a walsender, but not a bgworker.

@hlinnaka hlinnaka force-pushed the send-shutdown-checkpoint-to-safekeepers branch from 11b473a to ddfd4ef Compare February 10, 2024 10:01
@arssher
Copy link
Contributor

arssher commented Feb 12, 2024

There several more things to do:

  • Check currently is in wrong place, it won't fire until latch is set.
  • We need to wait not only for flush_lsn on the quorum, but commit_lsn -- safekeepers don't advance commit position in the absense of compute.
  • Also commit_lsn should be not only recorded, but persisted on safekeepers (otherwise their restart would pull it back); they do that periodically, but we want here that ASAP.
  • Add test.

I'll try to finish it.

@arssher
Copy link
Contributor

arssher commented Feb 12, 2024

Interesting, locally it already works without any fixes. Checked, this is because after checkpoint is flushed postmaster kills walproposer with SIGUSR2, which sets the latch, and walproposer manages to commit (and send commit_lsn) to safekeeper before this.

@jcsp
Copy link
Contributor

jcsp commented Feb 14, 2024

In #6751 I'm reverting the previous change that appears to have made test_pg_regress less stable -- would probably make sense to revert the revert in this PR, along with the walproposer changes. Hopefully together they make the test stable.

jcsp added a commit that referenced this pull request Feb 14, 2024
…t flaky" (#6751)

The #6666  change appears to have made the test fail more often.

PR #6712 should re-instate this
change, along with its change to make the overall flow more reliable.

This reverts commit 568f914.
@jcsp
Copy link
Contributor

jcsp commented Feb 14, 2024

In case this is useful: a failure on a branch where I had an explicit wait_for_flush_lsn before compute stop and logged the before + after LSNs, where it looks like the safekeeper has seen the post-stop LSN, but the pageserver hasn't #6752 (comment)

arssher added a commit that referenced this pull request Feb 14, 2024
PR #6712 might add a delay in graceful shutdown without change to compute -> sk
protocol to force sync of control file on safekeeper to persist commit_lsn, so
use 'immediate' mode instead; 'fast' one isn't useful for neon anyway.
@arssher arssher force-pushed the send-shutdown-checkpoint-to-safekeepers branch from ddfd4ef to 3b09bd1 Compare February 15, 2024 15:23
@arssher
Copy link
Contributor

arssher commented Feb 15, 2024

There several more things to do:

Mostly did them, now need to merge PG PRs and then this one. @hlinnaka please have a look, I can't request your review :)

@arssher arssher force-pushed the send-shutdown-checkpoint-to-safekeepers branch 3 times, most recently from a554700 to 98315c8 Compare February 15, 2024 15:52
@arssher arssher force-pushed the send-shutdown-checkpoint-to-safekeepers branch from 98315c8 to 6f2ae2b Compare March 11, 2024 08:10
@hlinnaka hlinnaka requested a review from a team as a code owner March 11, 2024 08:10
@arssher arssher force-pushed the send-shutdown-checkpoint-to-safekeepers branch from 6f2ae2b to 09cc433 Compare March 11, 2024 08:15
@arssher arssher force-pushed the send-shutdown-checkpoint-to-safekeepers branch from 09cc433 to d22a092 Compare March 11, 2024 08:25
@arssher arssher force-pushed the send-shutdown-checkpoint-to-safekeepers branch from d22a092 to 72bf53b Compare March 11, 2024 09:04
@arssher arssher requested a review from knizhnik March 11, 2024 09:05
@arssher arssher force-pushed the send-shutdown-checkpoint-to-safekeepers branch 2 times, most recently from af08861 to 47022a6 Compare March 11, 2024 18:01
The walproposer pretends to be a walsender in many ways. It has a
WalSnd slot, it claims to be a walsender by calling
MarkPostmasterChildWalSender() etc. But one different to real
walsenders was that the postmaster still treated it as a bgworker
rather than a walsender. The difference is that at shutdown,
walsenders are not killed until the very end, after the checkpointer
process has written the shutdown checkpoint and exited.

As a result, the walproposer always got killed before the shutdown
checkpoint was written, so the shutdown checkpoint never made it to
safekeepers. That's fine in principle, we don't require a clean
shutdown after all. But it also feels a bit silly not to stream the
shutdown checkpoint. It could be useful for initializing hot standby
mode in a read replica, for example.

Change postmaster to treat background workers that have called
MarkPostmasterChildWalSender() as walsenders. That unfortunately
requires another small change in postgres core.

After doing that, walproposers stay alive longer. However, it also
means that the checkpointer will wait for the walproposer to switch to
WALSNDSTATE_STOPPING state, when the checkpointer sends the
PROCSIG_WALSND_INIT_STOPPING signal. We don't have the machinery in
walproposer to receive and handle that signal reliably. Instead, we
mark walproposer as being in WALSNDSTATE_STOPPING always.

In commit 568f914, I assumed that shutdown will wait for all the
remaining WAL to be streamed to safekeepers, but before this commit
that was not true, and the test became flaky. This should make it
stable again.

Some tests wrongly assumed that no WAL could have been written between
pg_current_wal_flush_lsn and quick pg stop after it. Fix them by introducing
flush_ep_to_pageserver which first stops the endpoint and then waits till all
committed WAL reaches the pageserver.

In passing extract safekeeper http client to its own module.
This test occasionally fails with a difference in "pg_xact/0000" file
between the local and restored datadirs. My hypothesis is that
something changed in the database between the last explicit checkpoint
and the shutdown. I suspect autovacuum, it could certainly create
transactions.

To fix, be more precise about the point in time that we compare. Shut
down the endpoint first, then read the last LSN (i.e. the shutdown
checkpoint's LSN), from the local disk with pg_controldata. And use
exactly that LSN in the basebackup.

Closes #559
@arssher arssher force-pushed the send-shutdown-checkpoint-to-safekeepers branch from 47022a6 to ba204b0 Compare March 11, 2024 18:23
@arssher arssher enabled auto-merge (rebase) March 11, 2024 19:00
@arssher arssher merged commit 621ea2e into main Mar 11, 2024
53 checks passed
@arssher arssher deleted the send-shutdown-checkpoint-to-safekeepers branch March 11, 2024 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

check_restored_datadir_content failure in pg_regress
5 participants