Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preventing Infinite Rounds of Leader Establishment #9498

Closed
mkeeler opened this issue Jan 5, 2021 · 0 comments · Fixed by #9570
Closed

Preventing Infinite Rounds of Leader Establishment #9498

mkeeler opened this issue Jan 5, 2021 · 0 comments · Fixed by #9570
Assignees
Labels
type/bug Feature does not function as expected

Comments

@mkeeler
Copy link
Member

mkeeler commented Jan 5, 2021

Problem

Currently the process of establishing a leader for set of consul servers involves a few things.

  1. Election of a leader via Raft.
  2. Initializing ACLs
  3. Setting up KV deletion management timers. We keep deleted KV entries around a while longer to prevent the blocking query index from going backwards
  4. Initializing session timers.
  5. Initializing namespaces. Enterprise only
  6. Bootstrap Config Entries. Config entries put into the Consul configuration will get inserted if they don’t already exist
  7. Initializing and starting autopilot.
  8. Initializing the Connect CA
  9. Starting config entry replication
  10. Starting Connect leader routines. There are a few of these which perform activities such as monitoring of the primary dc’s roots and refreshing the secondary dc’s intermediate certificate

The failure of many of the following steps can halt leader establishment.

  • ACL initialization
  • Session timer initialization
  • Namespace initialization
  • Config entry bootstrapping
  • Connect CA initialization

Some of these we can have a reasonable expectation that there failure would indicate something intrinsically wrong with a single node.

  • Session timer initialization - For this one it can only fail if we fail to lookup sessions from memdb or if we fail to reset timers associated with the session. Both cases should generally be impossible in the absence of some sort of memory corruption.
  • Namespace initialization - Here we simply insert the default namespace definition if it does not exist. This would require memdb or raft to be inoperable for it to fail.
  • ACL initialization - This “step” actually performs a number of operations such as inserting the master token, anonymous token and the global management policy. Additionally there is a step performed in secondary datacenters to cleanup tokens affected by a security issue fixed back in the 1.7.x timeframe. All of these operations basically read from memdb to see if data is there and if not inserts it via Raft.

The two that can halt leader establishment but cannot necessarily imply an issue with the node itself are:

  • Config entry bootstrapping
  • Connect CA initialization

Config entry bootstrapping is similar to namespace initialization in that all its doing is reading from memdb and conditionally inserting new data through raft. However where it differs is that the data being inserted is wholly controlled by the user writing the configuration. Config entries also have dependencies between each other and must be in the proper order within the configuration. Therefore if the configuration is syntactically valid but the dependencies and ordering between the config entries isn’t correct, it will prevent establishing a leader. Assuming that all the servers have the same configuration then it would be impossible to elect a leader with the only remediation being to fix the configurations and restart the servers. This is bad but at least its fixable by an operator such as the HashiCorp team operating Consul for HCP/HCS.

Connect CA initialization involves many steps but they generally fall into the categories of ca provider configuration, certificate generation or memdb state management. The memdb state management routines can fail but they would indicate either a Raft/memdb failure similar to namespace initialization or acl initialization. The ca provider configuration involves interfaces for various CA provider implementations. As its an interface extension point this could fail for reasons related to an issue on the local node or could involve issues that would affect all nodes and are outside our control. Similarly the certificate generation failure modes do involve things outside our control. If you are using the Vault or AWS PCA CA providers and the external systems are not available then the process of generating certificates would fail and halt leader establishment. One big problem with this is that in order to fix the problem related to how the Consul servers might contact the CA provider it requires a leader to make the changes.

Suggested Changes

Connect CA initialization should not be able to halt leader establishment. As this process can have dependencies on external systems (or even just other Consul datacenters), we need to prevent network connectivity issues or failures in those remote systems from preventing Consul leader establishment. For some added context there is issue #7529 which is a bug report regarding the inability of a set of servers to establish a leader due to region configuration issues.

Whether config entry bootstrapping should be made to not halt the process is debatable. At least in this case operators who provide access to others cannot have those others cause issues in the cluster. Additionally and maybe more importantly, if there is an issue the operator can simply fix the config.

Implementation Notes

It will not be sufficient to just suppress the errors that could happen during Connect CA initialization. Instead we will probably need to start go routines to periodically retry that initialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant