Skip to content

Conversation

@DimCitus
Copy link
Collaborator

@DimCitus DimCitus commented Feb 15, 2021

This allows for adjusting to a lost monitor while keeping our services
online. At run-time, we can now

$ pg_autoctl disable monitor
$ pg_autoctl enable monitor --monitor ...

Taking care of the order of operations, the cluster can continue behaving
correctly with a minimum amount of disturbance.

@DimCitus DimCitus added the Size:M Effort Estimate: Medium label Feb 15, 2021
@DimCitus DimCitus added this to the Sprint 2021 W6 W7 milestone Feb 15, 2021
@DimCitus DimCitus requested a review from JelteF February 15, 2021 17:10
@DimCitus DimCitus self-assigned this Feb 15, 2021
}

NodeState initialState =
pgSetup->is_in_recovery ? WAIT_STANDBY_STATE : SINGLE_STATE;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a primary and a demoted primary when the monitor is enabled. How does the initialState affect things?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are pointing exactly to why we don't have a fully automated way to register nodes again to a new monitor. To solve the problem, we ask the user to be careful about the node registration ordering. Once a primary has been registered, the other primary might be more interesting to support.

I suppose on this case we are back to the previous situation where the current state needs to be dropped and a new node created from scratch again:

$ pg_autoctl stop
$ pg_autoctl drop node --pgdata ...
$ pg_autoctl create postgres --pgdata ... ...

But that's only when you're in the unfortunate position of losing the monitor node just after having lost a primary node and then you now have all the nodes back online and you're dealing with all the pieces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to avoid this issue we should not to use is_in_recovery to determine in which state the node should be registered. I think it makes more sense to do this based on the current state of the node.

  1. if state is primary, wait_primary or single (and maybe draining) we should try to join as single.
  2. any other state we should try to join as wait_standby

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a scenario where it's better to use is_in_recovery: when deploying to two regions for disaster recovery purposes, and the first (primary) region has just been lost, then we want to promote the only node left in the DR region. It's possible to do so with:

  1. pg_autoctl disable monitor
  2. pg_ctl promote (or call the SQL function)
  3. start a new monitor
  4. pg_autoctl enable monitor

At this point the FSM state on-disk is not relevant anymore. It can be argued that one could use pg_autoctl do fsm assign single in step 2, though this is not documented at the moment, and required PG_AUTOCTL_DEBUG=1 in the environment.

}

NodeState initialState =
pgSetup->is_in_recovery ? WAIT_STANDBY_STATE : SINGLE_STATE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to avoid this issue we should not to use is_in_recovery to determine in which state the node should be registered. I think it makes more sense to do this based on the current state of the node.

  1. if state is primary, wait_primary or single (and maybe draining) we should try to join as single.
  2. any other state we should try to join as wait_standby

DimCitus added 11 commits March 18, 2021 16:18
This allows for adjusting to a lost monitor while keeping our services
online. At run-time, we can now

  $ pg_autoctl disable monitor
  $ pg_autoctl enable monitor --monitor ...

Taking care of the order of operations, the cluster can continue behaving
correctly with a minimum amount of disturbance.
Reduce each matrix job to a smaller work unit, it seems Travis is having
trouble again with too many things in a single work unit.
@DimCitus DimCitus force-pushed the feature/reset-state branch from 8eca60c to a869445 Compare March 18, 2021 15:18
@DimCitus DimCitus merged commit 78132d1 into master Mar 22, 2021
@DimCitus DimCitus deleted the feature/reset-state branch March 22, 2021 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Size:M Effort Estimate: Medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants