-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spec.CheckpointBeforePgrewind #644
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -760,7 +760,7 @@ func (p *Manager) createPostgresqlAutoConf() error { | |
return nil | ||
} | ||
|
||
func (p *Manager) SyncFromFollowedPGRewind(followedConnParams ConnParams, password string) error { | ||
func (p *Manager) SyncFromFollowedPGRewind(followedConnParams ConnParams, password string, forceCheckpoint bool) error { | ||
// Remove postgresql.auto.conf since pg_rewind will error if it's a symlink to /dev/null | ||
pgAutoConfPath := filepath.Join(p.dataDir, postgresAutoConf) | ||
if err := os.Remove(pgAutoConfPath); err != nil && !os.IsNotExist(err) { | ||
|
@@ -786,6 +786,32 @@ func (p *Manager) SyncFromFollowedPGRewind(followedConnParams ConnParams, passwo | |
followedConnParams.Set("options", "-c synchronous_commit=off") | ||
followedConnString := followedConnParams.ConnString() | ||
|
||
// We need to issue a checkpoint on the source before pg_rewind'ing as until the primary | ||
// checkpoints the global/pg_control file won't contain up-to-date information about | ||
// what timeline the primary exists in. | ||
// | ||
// Imagine everyone is on timeline 1, then we promote a node to timeline 2. Standbys | ||
// attempt to replicate from the newly promoted node but fail due to diverged timelines. | ||
// pg_rewind is then used to resync the standbys, but if the new primary hasn't yet | ||
// checkpointed, the pg_control file will tell us we're both on the same timeline (1) | ||
// and pg_rewind will exit without performing any action. | ||
// | ||
// If we checkpoint before invoking pg_rewind we will avoid this problem, at the slight | ||
// cost of forcing a checkpoint on a newly promoted node, which might hurt performance. | ||
// We (GoCardless) can't afford this, so we take the performance penalty to avoid hours | ||
// of downtime. | ||
if forceCheckpoint { | ||
log.Infow("issuing checkpoint on primary") | ||
psqlName := filepath.Join(p.pgBinPath, "psql") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why are you using psql instead of directly calling it from go sql? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was doing this to ensure the way we connect to the Postgres for pg_rewind, pg_basebackup and for issuing a checkpoint was consistent. psql should behave exactly the same as rewind/basebackup, whereas connecting from within Go could be subtly different in many ways. Does that make sense, or do you think we should try constructing a Go connection? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://gist.github.com/viggy28/954ff01cd3c29d317834a5c25951a1cd @lawrencejones and @sgotti does this look okay? (since sslmode prefer is not applicable for lib/pq I have to replace that based on the SSL settings on the cluster). |
||
cmd := exec.Command(psqlName, followedConnString, "-c", "CHECKPOINT;") | ||
cmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSFILE=%s", pgpass.Name())) | ||
cmd.Stdout = os.Stdout | ||
cmd.Stderr = os.Stderr | ||
if err := cmd.Run(); err != nil { | ||
return fmt.Errorf("error: %v", err) | ||
} | ||
} | ||
|
||
log.Infow("running pg_rewind") | ||
name := filepath.Join(p.pgBinPath, "pg_rewind") | ||
cmd := exec.Command(name, "--debug", "-D", p.dataDir, "--source-server="+followedConnString) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I edited this comment as part of an unrelated change and line-wrapped it to match the rest of the file. Can leave it or remove it as makes most sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok