Fix/consider timeline when reaching secondary #695

DimCitus · 2021-05-04T09:33:37Z

Check that we are on the same timeline as our upstream node.

When a node reports state SECONDARY it can be the target of a failover.
Because we want a single straight line history in streaming replication, we
need to ensure that standby nodes are on the same timeline as their upstream
node before transitioning to SECONDARY.

See #683

src/bin/pg_autoctl/fsm_transition.c

JelteF · 2021-05-06T15:40:02Z

src/bin/pg_autoctl/fsm_transition.c

+	}
+
+	/* finally, check that we're on the same timeline as the new primary */
+	return standby_check_timeline_with_upstream(postgres);


Do we really need this here? This function is used for the transition from secondary to catchingup. I think it's enough to only do it in the transition from catchingup back to secondary.

Good point. Because we use the same transition function in both places, and there's a lot of common code, I just added a check that way:

/* * Finally, check that we're on the same timeline as the new primary when * assigned secondary as a goal state. This transition function is also * used when going from secondary to catchingup, as the primary might have * changed also in that situation. */ if (keeper->state.assigned_role == SECONDARY_STATE) { return standby_check_timeline_with_upstream(postgres); }

JelteF · 2021-05-06T15:42:34Z

src/bin/pg_autoctl/pgsql.h

+#define InvalidXLogRecPtr 0
+#define XLogRecPtrIsInvalid(r) ((r) == InvalidXLogRecPtr)
+
+#define PG_AUTOCTL_MAX_TIMELINES 1024


It seems dangerous to put a limit on this, since the amount of timelines could reasonably be very high. It might be better to use a malloc instead.

I would like to avoid malloc and then having to deal with who's responsible for calling free as much as possible, here we don't have to think about the lifetime of the “cache”. It's trivial and quite cheap to enlarge that array even tenfold, if you think we need to do that?

At the moment the array (given 64 bits alignment) is going to be 128kB of memory...

JelteF · 2021-05-06T15:50:24Z

src/bin/pg_autoctl/primary_standby.c

+	NodeAddress *primaryNode = &(replicationSource->primaryNode);
+
+	/* pg_control_checkpoint() returns the local timeline, update it now */
+	if (!pgsql_checkpoint(&(postgres->sqlClient)))


Creating a checkpoint can be quite heavy. If we only do this to get the current timeline id, maybe instead we should use some other function, such as pg_walfile_name or pg_control_data: https://www.postgresql.org/message-id/3FB63E54-403A-440E-88CB-CFA6FEF3E0F7%40icloud.com

We can grab pg_stat_wal_receiver.received_tli, I believe that having received it locally is what matters in our case. If we have to restart (crash recovery or normal startup), we will be able to catch-up with the new tli, and more importantly in case of a promotion, again, we can replay the WAL files and switch to the new TLI then.

Note that using pg_walfile_name requires knowing which WAL file is the most recently received, and should give us the same information as pg_stat_wal_receiver I believe, and pg_control_data needs a checkpoint too and shows the same value as the pg_control_checkpoint view I was using before.

JelteF · 2021-05-06T15:57:37Z

src/bin/pg_autoctl/primary_standby.c

+
+		return false;
+	}
+	else if (upstreamTimeline == localTimeline)


Can we also check that the moment when the timeline split is the same?

Do we need to? I don't think so. If yes, then we need to also parse the timeline history file that's local to the standby to grab that piece of information, and this file is named 00000004.history for timeline 4, and the standby grabs it from the primary as part of the streaming replication protocol anyway...

JelteF · 2021-05-26T13:19:37Z

src/bin/pg_autoctl/pgsql.c

 		"select pg_is_in_recovery(),"
 		" coalesce(rep.sync_state, '') as sync_state,"
 		" case when pg_is_in_recovery()"
 		" then coalesce(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())"


Why do we use pg_last_wal_receive_lsn here? It seems pg_last_wal_replay_lsn is the one that reflects what we want the most.

If memory serves, that's because that's the LSN that we can reach when restarting Postgres before promotion or for REPORT_LSN. It's safe on-disk even if it's not applied yet. Applying might have delays because of read-only traffic.

JelteF

I tried this out and I think it's good to merge, since it's an improvement over what we have now. I'm not entirely sure if received_tli is the right timeline to use though. I guess it could be that it did receive the timeline, but hasn't actually applied it because of some error.

However, it seems that there's currently no way to get the applied timeline: https://www.postgresql.org/message-id/flat/20190723180518.635ac554%40firost

When a node reports state SECONDARY it can be the target of a failover. Because we want a single straight line history in streaming replication, we need to ensure that standby nodes are on the same timeline as their upstream node before transitioning to SECONDARY.

What matters is that we have received the most recent timeline from the primary node both in case of promotion and in case of restart and crash recovery; so use that value from the streaming replication stats rather than force a checkpoint to obtain it.

DimCitus added bug Something isn't working Size: S Effort Estimate: Small labels May 4, 2021

DimCitus added this to the Sprint 2021 W16 W17 milestone May 4, 2021

DimCitus requested a review from JelteF May 4, 2021 09:33

DimCitus self-assigned this May 4, 2021

DimCitus modified the milestones: Sprint 2021 W16 W17, Sprint 2021 W18 W19 May 4, 2021

JelteF reviewed May 6, 2021

View reviewed changes

src/bin/pg_autoctl/fsm_transition.c Outdated Show resolved Hide resolved

JelteF suggested changes May 6, 2021

View reviewed changes

DimCitus force-pushed the fix/consider-timeline-when-reaching-secondary branch from e3a5dca to 174e98a Compare May 6, 2021 16:49

DimCitus modified the milestones: Sprint 2021 W18 W19, Sprint 2021 W20 W21 May 17, 2021

JelteF reviewed May 26, 2021

View reviewed changes

JelteF approved these changes May 26, 2021

View reviewed changes

DimCitus added 7 commits May 26, 2021 15:57

Implement TIMELINE_HISTORY replication protocol query.

de79907

Only run TIMELINE_HISTORY when current timeline > 1.

bc96589

Fix Makefile docker options for make run-tests.

78559da

Review pgsql_identify_system connection closing.

72079c2

Per review.

f8fabf8

DimCitus force-pushed the fix/consider-timeline-when-reaching-secondary branch from ab53dda to 5c1201c Compare May 26, 2021 13:57

DimCitus merged commit 40c60b4 into master May 26, 2021

DimCitus deleted the fix/consider-timeline-when-reaching-secondary branch May 26, 2021 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/consider timeline when reaching secondary #695

Fix/consider timeline when reaching secondary #695

Uh oh!

DimCitus commented May 4, 2021

Uh oh!

Uh oh!

JelteF May 6, 2021

Uh oh!

DimCitus May 6, 2021

Uh oh!

JelteF May 6, 2021

Uh oh!

DimCitus May 6, 2021

Uh oh!

JelteF May 6, 2021

Uh oh!

DimCitus May 6, 2021

Uh oh!

JelteF May 6, 2021

Uh oh!

DimCitus May 6, 2021 •

edited

Loading

Uh oh!

JelteF May 26, 2021

Uh oh!

DimCitus May 26, 2021

Uh oh!

JelteF left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix/consider timeline when reaching secondary #695

Fix/consider timeline when reaching secondary #695

Uh oh!

Conversation

DimCitus commented May 4, 2021

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DimCitus May 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JelteF left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DimCitus May 6, 2021 •

edited

Loading