Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec.CheckpointBeforePgrewind #644

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 16 additions & 15 deletions cmd/keeper/cmd/keeper.go
Original file line number Diff line number Diff line change
Expand Up @@ -840,8 +840,10 @@ func (p *PostgresKeeper) resync(db, followedDB *cluster.DB, tryPgrewind bool) er
// fallback to pg_basebackup
if tryPgrewind && p.usePgrewind(db) {
connParams := p.getSUConnParams(db, followedDB)
log.Infow("syncing using pg_rewind", "followedDB", followedDB.UID, "keeper", followedDB.Spec.KeeperUID)
if err := pgm.SyncFromFollowedPGRewind(connParams, p.pgSUPassword); err != nil {
checkpointBeforePgrewind := db.Spec.CheckpointBeforePgrewind
log.Infow("syncing using pg_rewind", "followedDB", followedDB.UID,
"keeper", followedDB.Spec.KeeperUID, "forcingCheckpoint", checkpointBeforePgrewind)
if err := pgm.SyncFromFollowedPGRewind(connParams, p.pgSUPassword, checkpointBeforePgrewind); err != nil {
// log pg_rewind error and fallback to pg_basebackup
log.Errorw("error syncing with pg_rewind", zap.Error(err))
} else {
Expand Down Expand Up @@ -1284,19 +1286,18 @@ func (p *PostgresKeeper) postgresKeeperSM(pctx context.Context) {
tryPgrewind = false
}

// TODO(sgotti) pg_rewind considers databases on the same timeline
// as in sync and doesn't check if they diverged at different
// position in previous timelines.
// So check that the db as been synced or resync again with
// pg_rewind disabled. Will need to report this upstream.

// TODO(sgotti) The rewinded standby needs wal from the master
// starting from the common ancestor, if they aren't available the
// instance will keep waiting for them, now we assume that if the
// instance isn't ready after the start timeout, it's waiting for
// wals and we'll force a full resync.
// We have to find a better way to detect if a standby is waiting
// for unavailable wals.
// TODO(sgotti) pg_rewind considers databases on the same timeline as in sync and
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I edited this comment as part of an unrelated change and line-wrapped it to match the rest of the file. Can leave it or remove it as makes most sense!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

// doesn't check if they diverged at different position in previous timelines. So
// check that the db has been synced or resync again with pg_rewind disabled. Will
// need to report this upstream.

// TODO(sgotti) The rewinded standby needs wal from the master starting from the
// common ancestor, if they aren't available the instance will keep waiting for
// them, now we assume that if the instance isn't ready after the start timeout,
// it's waiting for wals and we'll force a full resync.
//
// We have to find a better way to detect if a standby is waiting for unavailable
// wals.
if err = p.resync(db, followedDB, tryPgrewind); err != nil {
log.Errorw("failed to resync from followed instance", zap.Error(err))
return
Expand Down
1 change: 1 addition & 0 deletions cmd/sentinel/cmd/sentinel.go
Original file line number Diff line number Diff line change
Expand Up @@ -383,6 +383,7 @@ func (s *Sentinel) setDBSpecFromClusterSpec(cd *cluster.ClusterData) {
db.Spec.RequestTimeout = *clusterSpec.RequestTimeout
db.Spec.MaxStandbys = *clusterSpec.MaxStandbys
db.Spec.UsePgrewind = *clusterSpec.UsePgrewind
db.Spec.CheckpointBeforePgrewind = *clusterSpec.CheckpointBeforePgrewind
db.Spec.PGParameters = clusterSpec.PGParameters
db.Spec.PGHBA = clusterSpec.PGHBA
if db.Spec.FollowConfig != nil && db.Spec.FollowConfig.Type == cluster.FollowTypeExternal {
Expand Down
1 change: 1 addition & 0 deletions doc/cluster_spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Some options in a running cluster specification can be changed to update the des
| additionalWalSenders | number of additional wal_senders in addition to the ones internally defined by stolon, useful to provide enough wal senders for external standbys (changing this value requires an instance restart) | no | uint16 | 5 |
| additionalMasterReplicationSlots | a list of additional physical replication slots to be created on the master postgres instance. They will be prefixed with `stolon_` (like internal replication slots used for standby replication) to make them "namespaced" from other replication slots. Replication slots starting with `stolon_` and not defined here (and not used for standby replication) will be dropped from the master instance. | no | []string | null |
| usePgrewind | try to use pg_rewind for faster instance resyncronization. | no | bool | false |
| checkpointBeforePgrewind | Force a checkpoint against the current master before executing pg_rewind, preventing the rewind racing the checkpointer process after a standby is newly promoted. This will cause increased IO on whatever Postgres node the currently resync'ing Postgres is following as the checkpoint will not immediate, and not respect spread configuration.
| initMode | The cluster initialization mode. Can be *new* or *existing*. *new* means that a new db cluster will be created on a random keeper and the other keepers will sync with it. *existing* means that a keeper (that needs to have an already created db cluster) will be choosed as the initial master and the other keepers will sync with it. In this case the `existingConfig` object needs to be populated. | yes | string | |
| existingConfig | configuration for initMode of type "existing" | if initMode is "existing" | ExistingConfig | |
| mergePgParameters | merge pgParameters of the initialized db cluster, useful the retain initdb generated parameters when InitMode is new, retain current parameters when initMode is existing or pitr. | no | bool | true |
Expand Down
8 changes: 8 additions & 0 deletions internal/cluster/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ const (
DefaultMaxSynchronousStandbys uint16 = 1
DefaultAdditionalWalSenders = 5
DefaultUsePgrewind = false
DefaultCheckpointBeforePgrewind = false
DefaultMergePGParameter = true
DefaultRole ClusterRole = ClusterRoleMaster
DefaultSUReplAccess SUReplAccessMode = SUReplAccessAll
Expand Down Expand Up @@ -261,6 +262,8 @@ type ClusterSpec struct {
AdditionalMasterReplicationSlots []string `json:"additionalMasterReplicationSlots"`
// Whether to use pg_rewind
UsePgrewind *bool `json:"usePgrewind,omitempty"`
// Whether to issue a CHECKPOINT; before attempting a rewind
CheckpointBeforePgrewind *bool `json:"checkpointBeforePgrewind,omitempty"`
// InitMode defines the cluster initialization mode. Current modes are: new, existing, pitr
InitMode *ClusterInitMode `json:"initMode,omitempty"`
// Whether to merge pgParameters of the initialized db cluster, useful
Expand Down Expand Up @@ -379,6 +382,9 @@ func (os *ClusterSpec) WithDefaults() *ClusterSpec {
if s.UsePgrewind == nil {
s.UsePgrewind = BoolP(DefaultUsePgrewind)
}
if s.CheckpointBeforePgrewind == nil {
s.CheckpointBeforePgrewind = BoolP(DefaultCheckpointBeforePgrewind)
}
if s.MinSynchronousStandbys == nil {
s.MinSynchronousStandbys = Uint16P(DefaultMinSynchronousStandbys)
}
Expand Down Expand Up @@ -607,6 +613,8 @@ type DBSpec struct {
SynchronousReplication bool `json:"synchronousReplication,omitempty"`
// Whether to use pg_rewind
UsePgrewind bool `json:"usePgrewind,omitempty"`
// Whether to issue a CHECKPOINT; before attempting a rewind
CheckpointBeforePgrewind bool `json:"checkpointBeforePgrewind,omitempty"`
// AdditionalWalSenders defines the number of additional wal_senders in
// addition to the ones internally defined by stolon
AdditionalWalSenders uint16 `json:"additionalWalSenders"`
Expand Down
28 changes: 27 additions & 1 deletion internal/postgresql/postgresql.go
Original file line number Diff line number Diff line change
Expand Up @@ -760,7 +760,7 @@ func (p *Manager) createPostgresqlAutoConf() error {
return nil
}

func (p *Manager) SyncFromFollowedPGRewind(followedConnParams ConnParams, password string) error {
func (p *Manager) SyncFromFollowedPGRewind(followedConnParams ConnParams, password string, forceCheckpoint bool) error {
// Remove postgresql.auto.conf since pg_rewind will error if it's a symlink to /dev/null
pgAutoConfPath := filepath.Join(p.dataDir, postgresAutoConf)
if err := os.Remove(pgAutoConfPath); err != nil && !os.IsNotExist(err) {
Expand All @@ -786,6 +786,32 @@ func (p *Manager) SyncFromFollowedPGRewind(followedConnParams ConnParams, passwo
followedConnParams.Set("options", "-c synchronous_commit=off")
followedConnString := followedConnParams.ConnString()

// We need to issue a checkpoint on the source before pg_rewind'ing as until the primary
// checkpoints the global/pg_control file won't contain up-to-date information about
// what timeline the primary exists in.
//
// Imagine everyone is on timeline 1, then we promote a node to timeline 2. Standbys
// attempt to replicate from the newly promoted node but fail due to diverged timelines.
// pg_rewind is then used to resync the standbys, but if the new primary hasn't yet
// checkpointed, the pg_control file will tell us we're both on the same timeline (1)
// and pg_rewind will exit without performing any action.
//
// If we checkpoint before invoking pg_rewind we will avoid this problem, at the slight
// cost of forcing a checkpoint on a newly promoted node, which might hurt performance.
// We (GoCardless) can't afford this, so we take the performance penalty to avoid hours
// of downtime.
if forceCheckpoint {
log.Infow("issuing checkpoint on primary")
psqlName := filepath.Join(p.pgBinPath, "psql")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you using psql instead of directly calling it from go sql?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was doing this to ensure the way we connect to the Postgres for pg_rewind, pg_basebackup and for issuing a checkpoint was consistent. psql should behave exactly the same as rewind/basebackup, whereas connecting from within Go could be subtly different in many ways.

Does that make sense, or do you think we should try constructing a Go connection?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://gist.github.com/viggy28/954ff01cd3c29d317834a5c25951a1cd

@lawrencejones and @sgotti does this look okay? (since sslmode prefer is not applicable for lib/pq I have to replace that based on the SSL settings on the cluster).

cmd := exec.Command(psqlName, followedConnString, "-c", "CHECKPOINT;")
cmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSFILE=%s", pgpass.Name()))
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
if err := cmd.Run(); err != nil {
return fmt.Errorf("error: %v", err)
}
}

log.Infow("running pg_rewind")
name := filepath.Join(p.pgBinPath, "pg_rewind")
cmd := exec.Command(name, "--debug", "-D", p.dataDir, "--source-server="+followedConnString)
Expand Down
Loading