spec.CheckpointBeforePgrewind #644

lawrencejones · 2019-05-17T10:23:25Z

Provide a configuration parameter that enables issuing of a CHECKPOINT
prior to starting a pg_rewind, ensuring the control file is up-to-date
with the latest timeline information before the rewind process starts.

When this setting is disabled, it's possible for a newly promoted master
to have a control file with a timeline from before the promotion
occured. If a follower tries to resync using rewind against this
database it will quit without doing anything as it will see no timeline
deviation when it reads the control file, stating "no rewind required".

Some users run large databases where the cost of a basebackup is many
times more expensive than rewind. These users would probably trade the
performance impact of immediate checkpointing for faster recovery of the
standby, especially for those using synchronous replication who may be
unable to accept writes until the standby is recovered.

For now, this change will at least enable a "rewind at all costs" option
as opposed to the "maybe rewind if possible" that existed before. Future
work might support a retry timeout for rewind operations, for those
users who don't want the performance hit but do value rewinding over
basebackups. We could also use the backup API to trigger a checkpoint
with the target Postgres spread configuration, then retry rewinding
until the checkpoint_timeout has elapsed.

I think this is in a good place now. I've been running these tests like so:

$ STCTL_BIN="$(pwd)/bin/stolonctl" STKEEPER_BIN="$(pwd)/bin/stolon-keeper" STSENTINEL_BIN="$(pwd)/bin/stolon-sentinel" STDCTL_BIN="$(pwd)/bin/stolonctl" STPROXY_BIN="$(pwd)/bin/stolon-proxy" STOLON_TEST_STORE_BACKEND=etcdv3 ETCD_BIN=$(which etcd) go test tests/integration/*.go -v -test.run=TestTimelineForkPgRewindCheckpoint -timeout=3m

If you build the stolon binaries with master (which will silently ignore the CheckpointBeforePgrewind configuration parameter) then you'll see the following failure:

    utils.go:260: [keeper fdc71fc5]: 2019-06-12T17:26:45.669+0100       INFO    cmd/keeper.go:824       syncing using pg_rewind {"followedDB": "d6a67b88", "keeper": "e3414a18"}                                                                                                                                                                                                                                 
    utils.go:260: [keeper fdc71fc5]: 2019-06-12T17:26:45.669+0100       INFO    postgresql/postgresql.go:789    running pg_rewind                                                                                                                                                                                                                                                                                
    ha_test.go:1275: keeper tried rewinding but rewind thought it was not required: fdc71fc5                                                                                                                                                                                                                                                                                                                     
    utils.go:260: [keeper fdc71fc5]: fetched file "global/pg_control", length 8192                                                                                                                                                                                                                                                                                                                               
    utils.go:260: [keeper fdc71fc5]: source and target cluster are on the same timeline                                                                                                                                                                                                                                                                                                                          
    utils.go:260: [keeper fdc71fc5]: no rewind required

Running with binaries built against this branch, the tests pass and the log output looks like this:

    ha_test.go:1213: Waiting 2 * sleepInterval for checkpoint configuration to be applied
    ha_test.go:1216: Creating lots of un-CHECKPOINT'd data
    ha_test.go:1221: Write token 1 to master: f0333453
    ha_test.go:1234: SIGSTOPping current standby keeper: 17d6df63
    ha_test.go:1238: SIGSTOPping current standby wal receivers: 17d6df63
    ha_test.go:1243: Writing token 2 to master, advancing current timeline: f0333453
    ha_test.go:1248: Stopping master: f0333453
    ha_test.go:1255: SIGKILLing current standby wal receivers: 17d6df63
    ha_test.go:1261: SIGCONTing current standby keeper: 17d6df63
    ha_test.go:1267: Waiting for new master to assume role: 17d6df63
    ha_test.go:1272: Write token 3 to new master, advancing current timeline: 17d6df63
    ha_test.go:1279: Starting old master and verifying it successfully rewinds: f0333453
    utils.go:260: [keeper f0333453]: 2019-06-12T17:59:57.228+0100       INFO    cmd/keeper.go:844       syncing using pg_rewind {"followedDB": "51bca7ec", "keeper": "17d6df63", "forcingCheckpoint": true}
    utils.go:260: [keeper f0333453]: 2019-06-12T17:59:57.229+0100       INFO    postgresql/postgresql.go:804    issuing checkpoint on primary
    utils.go:260: [keeper 17d6df63]: 2019-06-12 17:59:57.293 BST [23830] LOG:  checkpoint complete: wrote 1671 buffers (10.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=4.847 s, sync=0.014 s, total=4.863 s; sync files=335, longest=0.000 s, average=0.000 s; distance=34075 kB, estimate=34075 kB
    utils.go:260: [keeper 17d6df63]: 2019-06-12 17:59:57.295 BST [23830] LOG:  checkpoint starting: immediate force wait
    utils.go:260: [keeper 17d6df63]: 2019-06-12 17:59:57.296 BST [23830] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.001 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=30668 kB
    utils.go:260: [keeper f0333453]: CHECKPOINT
    utils.go:260: [keeper f0333453]: 2019-06-12T17:59:57.351+0100       INFO    postgresql/postgresql.go:815    running pg_rewind
    utils.go:260: [keeper f0333453]: fetched file "global/pg_control", length 8192
    utils.go:260: [keeper f0333453]: fetched file "pg_wal/00000002.history", length 41
    utils.go:260: [keeper f0333453]: Source timeline history:
    utils.go:260: [keeper f0333453]: Target timeline history:
    utils.go:260: [keeper f0333453]: 1: 0/0 - 0/0
    utils.go:260: [keeper f0333453]: servers diverged at WAL location 0/4146FE0 on timeline 1
    utils.go:260: [keeper f0333453]: rewinding from last common checkpoint at 0/2000060 on timeline 1
    ha_test.go:1304: Waiting for the old master to receive token 3 from forked timeline: f0333453

lawrencejones · 2019-05-17T12:59:43Z

This actually addresses #601, though not via any of the remediations recommended in that issue.

rearden-steel · 2019-05-29T17:03:11Z

I've also experienced this issue.
@sgotti why wal_keep_segments is hard-coded to such small value as 8 and can I increase it manually in code as a temporary workaround of this problem?

Widerstehen · 2019-06-05T06:42:36Z

wal_keep_segments

how to change wal_keep_segments？

lawrencejones · 2019-06-05T08:35:25Z

Hey @rearden-steel @frank111, I'm not sure how this is related to the wal keep segments GUC? This change is about recovering from forked timelines rather than having insufficient wal to perform a recovery.

can I increase it manually in code

We've done so in our stolon fork as we also needed a higher value than 8. It's safe to do so, but I'd recommend pushing #591 so we can get this solved for everyone.

sgotti · 2019-06-07T12:08:55Z

tests/integration/ha_test.go

+	// take hours to recover when pg_rewind is <1m. Here we verify that the keeper
+	// successfully recovered the forked timeline via a rewind rather than basebackup.
+	if usePgrewind {
+		output := standbys[1].ReadOutput()


This should already be covered by using Expect (see config_test.go)

sgotti · 2019-06-07T12:09:03Z

tests/integration/utils.go

@@ -253,7 +255,9 @@ func (p *Process) start() error {
 	go func() {
 		scanner := bufio.NewScanner(pr)
 		for scanner.Scan() {
-			p.t.Logf("[%s %s]: %s", p.name, p.uid, scanner.Text())


This should already be covered by using Expect (see config_test.go)

Hey @sgotti I'm struggling to figure out how to use the expect interface to try matching against what I need here.

I need to verify that this string occurs after a particular place in the code (so flush the buffer, then run the assertion) and I need to perform multiple assertions for several strings that may appear out of order.

Can you help me figure out how to do this with Expect?

Figured this out actually, thanks!

rearden-steel · 2019-06-11T11:50:57Z

Hey @lawrencejones!
Is there any chance to make the requested changes?
We experience issues which should be fixed with this PR.

lawrencejones · 2019-06-11T13:13:11Z

Hey @rearden-steel, I'm.midway through making this work. It needs integration tests which are what I've been working on, and to provide a configurable variable that enables/disables checkpointing.

I'm out at a conference today/tomorrow but this is right on my to-do list come Thursday. I'm going to push the WIP changes now so you can see them, and caveat this update by saying I'm explicitly ignoring Simone's advice until I've got the tests working (at which point I'll action them).

lawrencejones · 2019-06-11T13:17:53Z

From where I got to last time, it's difficult to simulate a Postgres that takes a while to checkpoint after a promotion. While pg_ctl promote won't force a synchronous checkpoint it will schedule one, and in the test databases that scheduled checkpoint is almost instantaneous leading to the controlfile being updated immediately.

Hopefully you can wait a couple of days for me to get back to this?

rearden-steel · 2019-06-11T14:02:47Z

it's difficult to simulate a Postgres that takes a while to checkpoint after a promotion

I was able to reach such situation when doing manual failover while running pgbench (-c 100 -T 120 -j 20) on a cluster (with pre-generated database around 80 GB, pgbench -i -s 5000).

Hopefully you can wait a couple of days for me to get back to this?

Sure! Thank you for your work!

lawrencejones · 2019-06-12T17:11:14Z

@sgotti @rearden-steel this is ready for review now.

Instead of adding this logic into an existing test, I've created another test with just two keepers that verifies we're safe from this specific race. The details of that test can be seen here:

stolon/tests/integration/ha_test.go

Lines 1174 to 1187 in bd61324

    
           // This test verifies that pg_rewind isn't subject to a race against a newly promoted 
        
           // Postgres promoting into a new timeline and only checkpointing after the fact. We 
        
           // simulate this with a two node cluster and: 
        
           // 
        
           // 1. Write junk data that we assume to be un-CHECKPOINT'ed 
        
           // 2. Pause the standby at 1a and advance the master to 1b 
        
           // 3. Shutdown master on timeline 1b 
        
           // 4. Resume the standby on timeline 1a and wait for it to promote into 2b 
        
           // 5. Start old master at 1b which can never rejoin the current 2b timeline, so it will 
        
           //    resync. This resync should use pg_rewind and we verify the pg_rewind works as 
        
           //    expected, despite our expectation that the new master would not have independently 
        
           //    checkpointed after promotion without stolon having requested it 
        
           // 
        
           // This will only pass when CheckpointBeforePgrewind is set to true in the cluster spec.

I've added some test helpers and I'm going to go through the diff and comment where things may not be totally clear.

Any questions, please do ask!

lawrencejones · 2019-06-12T17:12:01Z

cmd/keeper/cmd/keeper.go

-			// wals and we'll force a full resync.
-			// We have to find a better way to detect if a standby is waiting
-			// for unavailable wals.
+			// TODO(sgotti) pg_rewind considers databases on the same timeline as in sync and


I edited this comment as part of an unrelated change and line-wrapped it to match the rest of the file. Can leave it or remove it as makes most sense!

lawrencejones · 2019-06-12T17:13:20Z

tests/integration/ha_test.go

@@ -124,18 +124,21 @@ func TestInitWithMultipleKeepers(t *testing.T) {
 }

 func setupServers(t *testing.T, clusterName, dir string, numKeepers, numSentinels uint8, syncRepl bool, usePgrewind bool, primaryKeeper *TestKeeper) (testKeepers, testSentinels, *TestProxy, *TestStore) {
-	var initialClusterSpec *cluster.ClusterSpec
+	var initialClusterSpec = &cluster.ClusterSpec{


Small refactor here to put the common ClusterSpec up top, then assign the overrides. It might make sense to use something like https://github.com/imdario/mergo in future, so we can merge optional overrides provided in this setupServers helper.

ok (I assume nothing has been changed)

lawrencejones · 2019-06-12T17:14:18Z

tests/integration/ha_test.go

+		// If we want to pg_rewind then also checkpoint beforehand, otherwise we may fail due
+		// to uncheckpoint'ed timeline changes that have us silently fallback to basebackup.
+		UsePgrewind:              cluster.BoolP(usePgrewind),
+		CheckpointBeforePgrewind: cluster.BoolP(usePgrewind),


This is open for questioning: should we assign this by default? It doesn't harm us now but we may want to test when the parameter is not set (though I think this is unlikely).

Perhaps ok for now, until we have an issue?

ok (I assume nothing has been changed)

lawrencejones · 2019-06-12T17:17:29Z

tests/integration/ha_test.go

@@ -256,6 +270,32 @@ func waitLines(t *testing.T, q Querier, num int, timeout time.Duration) error {
 	return fmt.Errorf("timeout waiting for %d lines, got: %d", num, c)
 }

+// waitForLine waits until a row with the expected ID appears in table01
+func waitForLine(t *testing.T, q Querier, expected int, timeout time.Duration) error {


This was what I expected waitForLines to do! Tripped myself up by assuming its behaviour. Two small things here:

I've tried to comment my new test helpers to make it clear why you might use them, and how they behave. I would have found it really useful to have comments like these on the ones that already exist when getting familiar with the test suite, though I totally understand when this is mostly a one man operation why they wouldn't be there

I'm a bit worried waitLines might be too loose an assertion in some tests. It could be that we're passing tests when a specific token value never replicated, while a subsequent one did. Not sure on this but I'd normally avoid this type of assertion in favour of something more specific, like this waitForLine

Neither are actionable, just food for thought.

I think the result is the same just with different logic:

What the tests really want to check is that all the transactions are applied. Waiting for the number of rows in an append only table is a way to do this.

If you only append rows to a table you know what the expected total number should be. So I think waiting for the total number of rows is enough. Waiting for the last row you inserted is another way for doing the same thing. Or am I missing something?

I'll stick to just one of them to not add additional confusion the the integration tests (if you really prefer waiting for the last row feel free to create a PR that replaces it)

P.S. The real issue is the function name that will require a good s/line/row/ :smile:

In this test, we're expecting to lose data. So we do three writes of 1, 2 and 3, but the final database should only ever get 1 and 3.

If I use the count of lines then I'd have to expect there to be 2 lines, but that will pass even if I never received the correct write and only have 1 and 2 in the final standby (while the primary node has 1 and 3).

Does that make any sense?

lawrencejones · 2019-06-12T17:19:36Z

tests/integration/utils.go

+	defaultPGParameters = cluster.PGParameters{
+		"log_destination":   "stderr",
+		"logging_collector": "false",
+		"log_checkpoints":   "on",


Can't see why we wouldn't have this on. It doesn't hurt us and can be extremely useful.

lawrencejones · 2019-06-12T17:19:54Z

tests/integration/utils.go

+// GetPGControldata calls pg_controldata for the Postgres database directory. This can be
+// useful in debugging database state, especially when comparing what was successfully
+// checkpointed.
+func (tk *TestKeeper) GetPGControldata() string {


Unused but was useful while debugging. Can remove if we want, or leave for future use.

OK the keep it.

sgotti

@lawrencejones Thanks for your PR.

Beside the inline comments, one thing that I'd like to avoid at all costs is issuing "write" commands from one keeper to another instance since it looks quite dirty (against the fact that the postgres should be managed only by its keeper and, in this case, every resync will issue an unneeded checkpoint).

So, shouldn't be possible to just issue a checkpoint after a promote by the primary keeper?

The main issue I see here is that there'll be a window between issuing a promote and finishing the checkpoint where the keeper already reported its state to the sentinel and it has updated the old primary follow config.
One possible solution to fix this and also other realted issues will be to report to the sentinel the current checkpoint status after promote or the checkpoint timeline (new field in cluster.go DBStatus) so the sentinel will set the old primary followConfig only when it's sure it's in good state for a pg_rewind.

P.S. the same logic could be expanded to also report the current replication slots so the sentinel can avoid similar issues (and remove the use of the --slot command in pg_basebackup that is another hacky thing).

sgotti · 2019-06-13T14:01:47Z

tests/integration/ha_test.go

+		// If we want to pg_rewind then also checkpoint beforehand, otherwise we may fail due
+		// to uncheckpoint'ed timeline changes that have us silently fallback to basebackup.
+		UsePgrewind:              cluster.BoolP(usePgrewind),
+		CheckpointBeforePgrewind: cluster.BoolP(usePgrewind),


ok (I assume nothing has been changed)

sgotti · 2019-06-13T14:02:06Z

tests/integration/ha_test.go

@@ -124,18 +124,21 @@ func TestInitWithMultipleKeepers(t *testing.T) {
 }

 func setupServers(t *testing.T, clusterName, dir string, numKeepers, numSentinels uint8, syncRepl bool, usePgrewind bool, primaryKeeper *TestKeeper) (testKeepers, testSentinels, *TestProxy, *TestStore) {
-	var initialClusterSpec *cluster.ClusterSpec
+	var initialClusterSpec = &cluster.ClusterSpec{


ok (I assume nothing has been changed)

sgotti · 2019-06-13T14:03:21Z

doc/cluster_spec.md

@@ -27,6 +27,7 @@ Some options in a running cluster specification can be changed to update the des
 | additionalWalSenders      | number of additional wal_senders in addition to the ones internally defined by stolon, useful to provide enough wal senders for external standbys (changing this value requires an instance restart)                                                                                                                                                                                                                                                                              | no                        | uint16            | 5                                                                                                                                   |
 | additionalMasterReplicationSlots | a list of additional physical replication slots to be created on the master postgres instance. They will be prefixed with `stolon_` (like internal replication slots used for standby replication) to make them "namespaced" from other replication slots. Replication slots starting with `stolon_` and not defined here (and not used for standby replication) will be dropped from the master instance.                                                                                                                                                                | no                        | []string          | null                                                                                                                                |
 | usePgrewind               | try to use pg_rewind for faster instance resyncronization.                                                                                                                                                                                                                                                                                                                                                                                                                        | no                        | bool              | false                                                                                                                               |
+| checkpointBeforePgrewind  | Force a checkpoint before pg_rewind to prevent the rewind racing the checkpointer process after a standby is newly promoted. This will cause increased IO on whatever Postgres node the currently resync'ing Postgres is following as the checkpoint will not immediate, and not respect spread configuration.


Perhaps we should make clearer than the checkpoint is issued on the current primary. At a first sight it's not clear.

sgotti · 2019-06-13T14:03:38Z

cmd/keeper/cmd/keeper.go

-			// wals and we'll force a full resync.
-			// We have to find a better way to detect if a standby is waiting
-			// for unavailable wals.
+			// TODO(sgotti) pg_rewind considers databases on the same timeline as in sync and


sgotti · 2019-06-13T14:06:25Z

tests/integration/utils.go

+// GetPGControldata calls pg_controldata for the Postgres database directory. This can be
+// useful in debugging database state, especially when comparing what was successfully
+// checkpointed.
+func (tk *TestKeeper) GetPGControldata() string {


OK the keep it.

sgotti · 2019-06-13T14:06:47Z

tests/integration/utils.go

+	defaultPGParameters = cluster.PGParameters{
+		"log_destination":   "stderr",
+		"logging_collector": "false",
+		"log_checkpoints":   "on",


sgotti · 2019-06-13T14:41:36Z

internal/postgresql/postgresql.go

+	// of downtime.
+	if forceCheckpoint {
+		log.Infow("issuing checkpoint on primary")
+		psqlName := filepath.Join(p.pgBinPath, "psql")


why are you using psql instead of directly calling it from go sql?

I was doing this to ensure the way we connect to the Postgres for pg_rewind, pg_basebackup and for issuing a checkpoint was consistent. psql should behave exactly the same as rewind/basebackup, whereas connecting from within Go could be subtly different in many ways.

Does that make sense, or do you think we should try constructing a Go connection?

https://gist.github.com/viggy28/954ff01cd3c29d317834a5c25951a1cd

@lawrencejones and @sgotti does this look okay? (since sslmode prefer is not applicable for lib/pq I have to replace that based on the SSL settings on the cluster).

lawrencejones · 2019-07-05T14:32:53Z

Hey @sgotti! I've been rubbish here: PRs were left hanging while I went on holiday 🏖

I'm going to reply to each bit of your review separately. Hopefully we can reason our way to a conclusion!

Beside the inline comments, one thing that I'd like to avoid at all costs is issuing "write" commands from one keeper to another instance since it looks quite dirty (against the fact that the postgres should be managed only by its keeper and, in this case, every resync will issue an unneeded checkpoint).

So, shouldn't be possible to just issue a checkpoint after a promote by the primary keeper?

Postgres will automatically begin a checkpoint whenever a promote happens, but it will do so in a spread fashion (ie, using all the checkpointing configuration parameters to ensure it throttles the checkpointing). If we were to get the keeper to issue a checkpoint itself after promotion (optionally, via a similar configuration parameter as this PR) then we'd force the master to checkpoint faster, which is good but...

At the same time as the master checkpoints, our slaves will attempt to resync against the master. Until the checkpoint completes the pg_rewind could fail, and we'll fallback to pg_basebackup, which is what we really want to avoid.

If we really want to avoid this, then a possible solution would be to provide this checkpoint-after-promote functionality we've been discussing, but additionally have keepers retry pg_rewind for a specific time threshold. A user could then set the retry timeout to be 5m, for example, and enable checkpoint-after-promote to ensure the rewind will be retried after the checkpoint is expected to be completed.

The main issue I see here is that there'll be a window between issuing a promote and finishing the checkpoint where the keeper already reported its state to the sentinel and it has updated the old primary follow config.

One possible solution to fix this and also other realted issues will be to report to the sentinel the current checkpoint status after promote or the checkpoint timeline (new field in cluster.go DBStatus) so the sentinel will set the old primary followConfig only when it's sure it's in good state for a pg_rewind.

It's quite sad for me that we have to do this, if you know what I mean? pg_basebackup has an option to wait for a checkpoint to complete before starting, so it's quite sad to me that pg_rewind doesn't also have this option. Perhaps we can simulate it by creating a backup label, as that appears to be what pg_basebackup does internally? Do you have any thoughts on this?

lawrencejones · 2019-08-19T11:59:35Z

@sgotti I know you're probably busy on a lot of other things right now, but would really appreciate your feedback for this change. It matters quite a lot to us that rewinds are working, and this change makes that the case.

If we want to take another path that's great, but we'll need to know what it is before we can get implementing it!

sgotti · 2019-08-23T13:22:31Z

@lawrencejones Also if theoretically I'd prefer a solution that doesn't require doing operations from a keeper to another, in the end this is already done when doing a pg_basebackup or a pg_rewind.

So I don't want to stop this PR trying to find a cleanliness that's not really possible. Lets go ahead with your solution! (in the end we are just doing what pg_rewind should do: a checkpoint inside pg_rewind like pg_basebackup already does)

P.S. We should report this upstream if not already done.

lawrencejones · 2019-08-27T11:11:59Z

Hey @sgotti!

So I don't want to stop this PR trying to find a cleanliness that's not really possible. Lets go ahead with your solution! (in the end we are just doing what pg_rewind should do: a checkpoint inside pg_rewind like pg_basebackup already does)

Makes sense to me :)

I've adjusted the doc comment to make it clearer we'll be hitting the primary. All your other review comments were questions/have been addressed, I think.

Can I get another look? Thanks.

Provide a configuration parameter that enables issuing of a CHECKPOINT prior to starting a pg_rewind, ensuring the control file is up-to-date with the latest timeline information before the rewind process starts. When this setting is disabled, it's possible for a newly promoted master to have a control file with a timeline from before the promotion occured. If a follower tries to resync using rewind against this database it will quit without doing anything as it will see no timeline deviation when it reads the control file, stating "no rewind required". Some users run large databases where the cost of a basebackup is many times more expensive than rewind. These users would probably trade the performance impact of immediate checkpointing for faster recovery of the standby, especially for those using synchronous replication who may be unable to accept writes until the standby is recovered. For now, this change will at least enable a "rewind at all costs" option as opposed to the "maybe rewind if possible" that existed before. Future work might support a retry timeout for rewind operations, for those users who don't want the performance hit but do value rewinding over basebackups. We could also use the backup API to trigger a checkpoint with the target Postgres spread configuration, then retry rewinding until the checkpoint_timeout has elapsed.

viggy28 · 2020-06-23T00:00:05Z

Hi Team, wondering any updates on this? Anything I can help with to get this merged?

sgotti · 2020-06-27T11:52:15Z

@viggy28 If no one is going to work in this PR (@lawrencejones) feel free to adopt and update it.

lawrencejones · 2020-06-30T19:15:13Z

Hey @viggy28, if you want to take this over then please let me know. I’m not working on stolon things right now so if you need this, I’ll be happy to hand it over!

viggy28 · 2020-07-01T04:26:28Z

Sure @lawrencejones. Looks like you have done most of the work. Thanks much.

Trying to understand what is needed.

https://github.com/sorintlab/stolon/pull/644/files#r293416129
spec.CheckpointBeforePgrewind #644 (review) is unresolved but @sgotti also mentioned here spec.CheckpointBeforePgrewind #644 (comment) that shouldn't be a blocker. which means can we resolve that?
Looks like the build is failing.

Anything else I am missing?

pirateunclejack · 2024-02-27T10:13:17Z

Dear contributors ,
We have a stolon cluster which is such big to cost hours to recovery with basebackup.
I apply this change to latest master and run on our test environment . the standby keepers keep failed after pg_rewind done :


pg_rewind: Done!
2024-02-27T09:24:12.615Z	INFO	cmd/keeper.go:928	syncing using pg_rewind	{"masterDB": "1f5bbc57", "keeper": "keeper0"}
2024-02-27T09:24:12.636Z	INFO	postgresql/postgresql.go:921	issuing checkpoint on primary
CHECKPOINT
2024-02-27T09:24:12.993Z	INFO	postgresql/postgresql.go:932	running pg_rewind
2024-02-27T09:24:12.994Z	DEBUG	postgresql/postgresql.go:936	execing cmd	{"cmd": "/usr/lib/postgresql/14/bin/pg_rewind --debug -D /stolon-data/postgres --source-server=application_name=stolon_7805b990 dbname=zabbix host=172.16.250.153 options=-c\\ synchronous_commit=off password=******************** port=5432 sslmode=prefer user=stolon"}
pg_rewind: executing "/usr/lib/postgresql/14/bin/postgres" for target server to complete crash recovery
2024-02-27 09:24:13.068 GMT [1590] LOG:  skipping missing configuration file "/stolon-data/postgres/postgresql.auto.conf"
2024-02-27T09:24:13.173Z	ERROR	cmd/keeper.go:736	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
2024-02-27T09:24:15.675Z	ERROR	cmd/keeper.go:736	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
2024-02-27 09:24:16.349 UTC [1590] LOG:  database system was interrupted while in recovery at log time 2024-02-27 09:23:11 UTC
2024-02-27 09:24:16.349 UTC [1590] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2024-02-27 09:24:16.455 UTC [1590] LOG:  redo starts at B/280161F8
2024-02-27 09:24:16.556 UTC [1590] LOG:  invalid record length at B/28182E08: wanted 24, got 0
2024-02-27 09:24:16.556 UTC [1590] LOG:  redo done at B/28182DE0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.10 s

PostgreSQL stand-alone backend 14.0 (Debian 14.0-1.pgdg110+1)
2024-02-27T09:24:17.548Z	DEBUG	postgresql/postgresql.go:625	execing cmd	{"cmd": "/usr/lib/postgresql/14/bin/postgres -V"}
2024-02-27T09:24:17.561Z	DEBUG	cmd/keeper.go:815	pgstate dump: (*cluster.PostgresState)(0xc0008f84d0)({
 UID: (string) (len=8) "7805b990",
 Generation: (int64) 0,
 ListenAddress: (string) (len=14) "172.16.115.237",
 Port: (string) (len=4) "5432",
 Healthy: (bool) false,
 SystemID: (string) "",
 TimelineID: (uint64) 0,
 XLogPos: (uint64) 0,
 TimelinesHistory: (cluster.PostgresTimelinesHistory) <nil>,
 PGParameters: (common.Parameters) <nil>,
 SynchronousStandbys: ([]string) <nil>,
 OlderWalFile: (string) ""
})

2024-02-27T09:24:18.177Z	ERROR	cmd/keeper.go:736	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
pg_rewind: fetched file "global/pg_control", length 8192
pg_rewind: source and target cluster are on the same timeline
pg_rewind: no rewind required
2024-02-27T09:24:18.222Z	DEBUG	postgresql/postgresql.go:625	execing cmd	{"cmd": "/usr/lib/postgresql/14/bin/postgres -V"}
2024-02-27T09:24:18.324Z	INFO	postgresql/postgresql.go:844	writing standby signal file
2024-02-27T09:24:18.368Z	INFO	postgresql/postgresql.go:319	starting database
2024-02-27T09:24:18.368Z	DEBUG	postgresql/postgresql.go:323	execing cmd	{"cmd": "/usr/lib/postgresql/14/bin/postgres -D /stolon-data/postgres -c unix_socket_directories=/tmp"}
2024-02-27T09:24:20.682Z	ERROR	cmd/keeper.go:736	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
2024-02-27 09:24:21.540 UTC [1593] LOG:  starting PostgreSQL 14.0 (Debian 14.0-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2024-02-27 09:24:21.540 UTC [1593] LOG:  listening on IPv4 address "172.16.115.237", port 5432
2024-02-27 09:24:21.589 UTC [1593] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-02-27 09:24:22.064 UTC [1594] LOG:  database system was shut down at 2024-02-27 09:24:17 UTC
2024-02-27 09:24:22.064 UTC [1595] FATAL:  the database system is starting up
2024-02-27 09:24:22.085 UTC [1594] LOG:  entering standby mode
2024-02-27 09:24:22.221 UTC [1594] LOG:  invalid record length at B/28182EF8: wanted 24, got 0
2024-02-27 09:24:22.245 UTC [1598] LOG:  started streaming WAL from primary at B/28000000 on timeline 31
2024-02-27 09:24:22.267 UTC [1599] FATAL:  the database system is not yet accepting connections
2024-02-27 09:24:22.267 UTC [1599] DETAIL:  Consistent recovery state has not been yet reached.
2024-02-27 09:24:22.271 UTC [1594] LOG:  invalid resource manager ID 48 at B/28182EF8
2024-02-27 09:24:22.279 UTC [1598] FATAL:  terminating walreceiver process due to administrator command
2024-02-27 09:24:22.279 UTC [1594] LOG:  invalid resource manager ID 48 at B/28182EF8
2024-02-27 09:24:22.280 UTC [1594] LOG:  invalid resource manager ID 48 at B/28182EF8
2024-02-27 09:24:22.469 UTC [1601] FATAL:  the database system is not yet accepting connections
2024-02-27 09:24:22.469 UTC [1601] DETAIL:  Consistent recovery state has not been yet reached.
2024-02-27T09:24:22.583Z	DEBUG	postgresql/postgresql.go:625	execing cmd	{"cmd": "/usr/lib/postgresql/14/bin/postgres -V"}
2024-02-27T09:24:22.593Z	DEBUG	cmd/keeper.go:815	pgstate dump: (*cluster.PostgresState)(0xc0006ac630)({
 UID: (string) (len=8) "7805b990",
 Generation: (int64) 0,
 ListenAddress: (string) (len=14) "172.16.115.237",
 Port: (string) (len=4) "5432",
 Healthy: (bool) false,
 SystemID: (string) "",
 TimelineID: (uint64) 0,
 XLogPos: (uint64) 0,
 TimelinesHistory: (cluster.PostgresTimelinesHistory) <nil>,
 PGParameters: (common.Parameters) <nil>,
 SynchronousStandbys: ([]string) <nil>,
 OlderWalFile: (string) ""
})

2024-02-27 09:24:22.671 UTC [1603] FATAL:  the database system is not yet accepting connections
2024-02-27 09:24:22.671 UTC [1603] DETAIL:  Consistent recovery state has not been yet reached.
2024-02-27 09:24:22.873 UTC [1604] FATAL:  the database system is not yet accepting connections
2024-02-27 09:24:22.873 UTC [1604] DETAIL:  Consistent recovery state has not been yet reached.
2024-02-27 09:24:23.094 UTC [1605] FATAL:  the database system is not yet accepting connections
2024-02-27 09:24:23.094 UTC [1605] DETAIL:  Consistent recovery state has not been yet reached.
2024-02-27 09:24:23.185 UTC [1606] FATAL:  the database system is not yet accepting connections
2024-02-27 09:24:23.185 UTC [1606] DETAIL:  Consistent recovery state has not been yet reached.
2024-02-27T09:24:23.185Z	ERROR	cmd/keeper.go:736	cannot get configured pg parameters	{"error": "pq: the database system is not yet accepting connections"}
2024-02-27 09:24:23.297 UTC [1607] FATAL:  the database system is not yet accepting connections
2024-02-27 09:24:23.297 UTC [1607] DETAIL:  Consistent recovery state has not been yet reached.
2024-02-27 09:24:23.499 UTC [1608] FATAL:  the database system is not yet accepting connections

Is there any body still work on this issue ?

lawrencejones mentioned this pull request Jun 6, 2019

cluster: support cascading replication #17

Open

lawrencejones force-pushed the lawrence-checkpoint-before-rewind branch from 487d25b to 54f0eb3 Compare June 6, 2019 14:08

sgotti requested changes Jun 7, 2019

View reviewed changes

lawrencejones force-pushed the lawrence-checkpoint-before-rewind branch from 3963d9b to 5577b6a Compare June 11, 2019 13:14

lawrencejones force-pushed the lawrence-checkpoint-before-rewind branch 2 times, most recently from bd0582a to bd61324 Compare June 12, 2019 16:58

lawrencejones changed the title ~~[WIP] Force checkpoint before rewind~~ spec.CheckpointBeforePgrewind Jun 12, 2019

lawrencejones commented Jun 12, 2019

View reviewed changes

lawrencejones force-pushed the lawrence-checkpoint-before-rewind branch from bd61324 to 904b7cb Compare June 12, 2019 17:18

lawrencejones commented Jun 12, 2019

View reviewed changes

sgotti requested changes Jun 13, 2019

View reviewed changes

lawrencejones force-pushed the lawrence-checkpoint-before-rewind branch from 904b7cb to 9a49a91 Compare August 27, 2019 11:11

lawrencejones mentioned this pull request Aug 27, 2019

Retry pgrewind gocardless/stolon#14

Closed

lawrencejones force-pushed the lawrence-checkpoint-before-rewind branch from 9a49a91 to afb5211 Compare September 11, 2019 17:09

spec.CheckpointBeforePgrewind #644

Are you sure you want to change the base?

spec.CheckpointBeforePgrewind #644

Conversation

lawrencejones commented May 17, 2019 • edited Loading

lawrencejones commented May 17, 2019

rearden-steel commented May 29, 2019

Widerstehen commented Jun 5, 2019

lawrencejones commented Jun 5, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rearden-steel commented Jun 11, 2019

lawrencejones commented Jun 11, 2019 • edited Loading

lawrencejones commented Jun 11, 2019

rearden-steel commented Jun 11, 2019 • edited Loading

lawrencejones commented Jun 12, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lawrencejones Jun 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgotti left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lawrencejones commented Jul 5, 2019

lawrencejones commented Aug 19, 2019

sgotti commented Aug 23, 2019

lawrencejones commented Aug 27, 2019

viggy28 commented Jun 23, 2020

sgotti commented Jun 27, 2020

lawrencejones commented Jun 30, 2020

viggy28 commented Jul 1, 2020

pirateunclejack commented Feb 27, 2024

lawrencejones commented May 17, 2019 •

edited

Loading

lawrencejones commented Jun 11, 2019 •

edited

Loading

rearden-steel commented Jun 11, 2019 •

edited

Loading

lawrencejones Jun 12, 2019 •

edited

Loading