Preventing Infinite Rounds of Leader Establishment #9498

mkeeler · 2021-01-05T14:52:00Z

Problem

Currently the process of establishing a leader for set of consul servers involves a few things.

Election of a leader via Raft.
Initializing ACLs
Setting up KV deletion management timers. We keep deleted KV entries around a while longer to prevent the blocking query index from going backwards
Initializing session timers.
Initializing namespaces. Enterprise only
Bootstrap Config Entries. Config entries put into the Consul configuration will get inserted if they don’t already exist
Initializing and starting autopilot.
Initializing the Connect CA
Starting config entry replication
Starting Connect leader routines. There are a few of these which perform activities such as monitoring of the primary dc’s roots and refreshing the secondary dc’s intermediate certificate

The failure of many of the following steps can halt leader establishment.

ACL initialization
Session timer initialization
Namespace initialization
Config entry bootstrapping
Connect CA initialization

Some of these we can have a reasonable expectation that there failure would indicate something intrinsically wrong with a single node.

Session timer initialization - For this one it can only fail if we fail to lookup sessions from memdb or if we fail to reset timers associated with the session. Both cases should generally be impossible in the absence of some sort of memory corruption.
Namespace initialization - Here we simply insert the default namespace definition if it does not exist. This would require memdb or raft to be inoperable for it to fail.
ACL initialization - This “step” actually performs a number of operations such as inserting the master token, anonymous token and the global management policy. Additionally there is a step performed in secondary datacenters to cleanup tokens affected by a security issue fixed back in the 1.7.x timeframe. All of these operations basically read from memdb to see if data is there and if not inserts it via Raft.

The two that can halt leader establishment but cannot necessarily imply an issue with the node itself are:

Config entry bootstrapping
Connect CA initialization

Config entry bootstrapping is similar to namespace initialization in that all its doing is reading from memdb and conditionally inserting new data through raft. However where it differs is that the data being inserted is wholly controlled by the user writing the configuration. Config entries also have dependencies between each other and must be in the proper order within the configuration. Therefore if the configuration is syntactically valid but the dependencies and ordering between the config entries isn’t correct, it will prevent establishing a leader. Assuming that all the servers have the same configuration then it would be impossible to elect a leader with the only remediation being to fix the configurations and restart the servers. This is bad but at least its fixable by an operator such as the HashiCorp team operating Consul for HCP/HCS.

Connect CA initialization involves many steps but they generally fall into the categories of ca provider configuration, certificate generation or memdb state management. The memdb state management routines can fail but they would indicate either a Raft/memdb failure similar to namespace initialization or acl initialization. The ca provider configuration involves interfaces for various CA provider implementations. As its an interface extension point this could fail for reasons related to an issue on the local node or could involve issues that would affect all nodes and are outside our control. Similarly the certificate generation failure modes do involve things outside our control. If you are using the Vault or AWS PCA CA providers and the external systems are not available then the process of generating certificates would fail and halt leader establishment. One big problem with this is that in order to fix the problem related to how the Consul servers might contact the CA provider it requires a leader to make the changes.

Suggested Changes

Connect CA initialization should not be able to halt leader establishment. As this process can have dependencies on external systems (or even just other Consul datacenters), we need to prevent network connectivity issues or failures in those remote systems from preventing Consul leader establishment. For some added context there is issue #7529 which is a bug report regarding the inability of a set of servers to establish a leader due to region configuration issues.

Whether config entry bootstrapping should be made to not halt the process is debatable. At least in this case operators who provide access to others cannot have those others cause issues in the cluster. Additionally and maybe more importantly, if there is an issue the operator can simply fix the config.

Implementation Notes

It will not be sufficient to just suppress the errors that could happen during Connect CA initialization. Instead we will probably need to start go routines to periodically retry that initialization.

The text was updated successfully, but these errors were encountered:

mkeeler added the type/bug Feature does not function as expected label Jan 5, 2021

mkeeler self-assigned this Jan 14, 2021

This was referenced Jan 15, 2021

Ensure that CA initialization does not block leader election. #9570

Merged

Backport #9570 to release/1.8.x: Ensure that CA initialization does not block leader election. #9571

Merged

mkeeler closed this as completed in #9570 Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preventing Infinite Rounds of Leader Establishment #9498

Preventing Infinite Rounds of Leader Establishment #9498

mkeeler commented Jan 5, 2021 •

edited

Preventing Infinite Rounds of Leader Establishment #9498

Preventing Infinite Rounds of Leader Establishment #9498

Comments

mkeeler commented Jan 5, 2021 • edited

Problem

Suggested Changes

Implementation Notes

mkeeler commented Jan 5, 2021 •

edited