Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid raft change when no config is provided on persistNewRootAndConfig #12298

Merged
merged 5 commits into from
Mar 3, 2022

Conversation

jorgemarey
Copy link
Contributor

I saw in the consul logs that every 10 minutes:

Feb  8 20:07:08 ctrl-5c33edfc-2b {"@level":"info","@message":"updated root certificates from primary datacenter","@module":"agent.server.connect","@timestamp":"2022-02-08T20:07:08.217605Z"}
Feb  8 20:17:14 ctrl-5c33edfc-2b {"@level":"info","@message":"updated root certificates from primary datacenter","@module":"agent.server.connect","@timestamp":"2022-02-08T20:17:14.111028Z"}
Feb  8 20:27:34 ctrl-5c33edfc-2b {"@level":"info","@message":"updated root certificates from primary datacenter","@module":"agent.server.connect","@timestamp":"2022-02-08T20:27:34.153630Z"}
Feb  8 20:37:37 ctrl-5c33edfc-2b {"@level":"info","@message":"updated root certificates from primary datacenter","@module":"agent.server.connect","@timestamp":"2022-02-08T20:37:37.123134Z"}
Feb  8 20:46:14 ctrl-5c33edfc-2b {"@level":"info","@message":"updated root certificates from primary datacenter","@module":"agent.server.connect","@timestamp":"2022-02-08T20:46:14.057424Z"}
Feb  8 20:56:18 ctrl-5c33edfc-2b {"@level":"info","@message":"updated root certificates from primary datacenter","@module":"agent.server.connect","@timestamp":"2022-02-08T20:56:18.133358Z"}

I checked that actually no change was made to the roots or configuration so I started to look at the code and I think I found the problem.

// secondaryCARootWatch maintains a blocking query to the primary datacenter's
// ConnectCA.Roots endpoint to monitor when it needs to request a new signed
// intermediate certificate.
func (c *CAManager) secondaryCARootWatch(ctx context.Context) error {
....
		// Attempt to update the roots using the returned data.
		if err := c.secondaryUpdateRoots(roots); err != nil {
			return err
		}
....

	return nil
}

// secondaryUpdateRoots updates the cached roots from the primary and regenerates the intermediate
// certificate if necessary.
func (c *CAManager) secondaryUpdateRoots(roots structs.IndexedCARoots) error {
	// Update the state first to claim the 'lock'.
	if _, err := c.setState(caStateReconfig, true); err != nil {
		return err
	}
	defer c.setState(caStateInitialized, false)

	// Update the cached primary roots now that the lock is held.
	c.secondarySetPrimaryRoots(roots)

	provider, _ := c.getCAProvider()
	if provider == nil {
		// this happens when leadership is being revoked and this go routine will be stopped
		return nil
	}

	// Run the secondary CA init routine to see if we need to request a new
	// intermediate.
	if c.secondaryHasProviderRoots() {
		if err := c.secondaryInitializeIntermediateCA(provider, nil); err != nil {
			return fmt.Errorf("Failed to initialize the secondary CA: %v", err)
		}
		return nil
	}
...
}


func (c *CAManager) secondaryInitializeIntermediateCA(provider ca.Provider, config *structs.CAConfiguration) error {
....
	// Determine whether a root update is needed, and persist the roots/config accordingly.
	var newRoot *structs.CARoot
	if activeRoot == nil || needsNewIntermediate {
		newRoot = newActiveRoot
	}
	if err := c.persistNewRootAndConfig(provider, newRoot, config); err != nil {
		return err
	}

	c.setCAProvider(provider, newActiveRoot)
	return nil
}

persistNewRootAndConfig is called throught the replication process with nil config and roots when no change needs to be performed.

With this PR, a change to the raft store is avoided when no roots or config are provided to persistNewRootAndConfig

@dnephin dnephin added theme/certificates Related to creating, distributing, and rotating certificates in Consul type/bug Feature does not function as expected labels Feb 9, 2022
@dnephin dnephin self-assigned this Feb 9, 2022
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! I think you are correct, we can avoid this unnecessary raft apply, nice find!

This comparison is already pretty challenging to read. I think we'd benefit from extracting this to a function. That allows us to break up the logic a little bit like this:

(writing this quickly, so the naming and logic may not be right)

func shouldPersistRootAndConfig(root *structs.CARoot, prevConfig, nextConfig *structs.CAConfiguration) bool {
    if root != nil {
        return true
    }

    if nextConfig == nil {
        return false
    }

    return nextConfig.Provider != prevConfig.Provider || !reflect.DeepEqual(nextConfig.Config, prevConfig.Config)
}

It would also be great to have a test case that shows this change fixes the problem. I guess that updating the config in the primary in such a way that changes the CARoot struct, but does not create a new root cert would accomplish that. This #11811 (comment) mentions one case I noticed where this can happen.

@jorgemarey jorgemarey force-pushed the b-persistnewrootandconfig branch 2 times, most recently from cca9bfa to f447167 Compare February 10, 2022 08:25
@jorgemarey
Copy link
Contributor Author

Hi @dnephin I made the suggested changes and added a test case for this. I also added a chagelog file.

Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test! Sorry it has taken me a few weeks to respond.

This is looking great. I think the test case is close, but doesn't quite exercise the behaviour we want yet. I left a few suggestions for how to remove some things we no longer need, and to better test the behaviour that was changed.

Comment on lines +770 to +768
if newConfig == nil {
return false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just calling this out for anyone else reviewing or reading over this PR.

This condition is the new one that fixes the problem. Previously we were checking if config != nil to prevent reading a field from a nil. Here we instead say that if the config is nil then we know there is nothing to update, otherwise use the existing logic to see if the config has changed.

agent/consul/leader_connect_ca_test.go Outdated Show resolved Hide resolved
agent/consul/leader_connect_ca_test.go Outdated Show resolved Hide resolved
agent/consul/leader_connect_ca_test.go Outdated Show resolved Hide resolved
agent/consul/leader_connect_ca_test.go Outdated Show resolved Hide resolved
agent/consul/leader_connect_ca_test.go Outdated Show resolved Hide resolved
@jorgemarey
Copy link
Contributor Author

Hi @dnephin I pushed a new commit with the suggested changes. I hope I have understood everything correctly.

Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is exactly what I was thinking. I think with these two small changes this should be ready to merge. I'll commit these suggestions and see if CI is happy.

agent/consul/leader_connect_ca_test.go Show resolved Hide resolved
agent/consul/leader_connect_ca_test.go Outdated Show resolved Hide resolved
Also change the path used for the secondary so that both primary and secondary do not overwrite each other.
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like CI is happy with those changes. Thanks for making this improvement!

@dnephin dnephin merged commit 5ba994a into hashicorp:main Mar 3, 2022
@hc-github-team-consul-core
Copy link
Collaborator

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/601036.

dclaisse added a commit to dclaisse/consul that referenced this pull request Oct 4, 2022
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
dclaisse added a commit to dclaisse/consul that referenced this pull request Oct 4, 2022
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
dclaisse added a commit to dclaisse/consul that referenced this pull request Oct 4, 2022
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
dclaisse added a commit to dclaisse/consul that referenced this pull request Oct 4, 2022
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
dclaisse added a commit to dclaisse/consul that referenced this pull request Oct 4, 2022
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
dclaisse added a commit to dclaisse/consul that referenced this pull request Oct 4, 2022
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
dclaisse added a commit to dclaisse/consul that referenced this pull request Oct 4, 2022
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
dclaisse added a commit to dclaisse/consul that referenced this pull request Oct 5, 2022
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
dclaisse added a commit to dclaisse/consul that referenced this pull request Oct 5, 2022
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
dclaisse added a commit to dclaisse/consul that referenced this pull request Oct 5, 2022
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
dclaisse added a commit to dclaisse/consul that referenced this pull request Oct 5, 2022
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
dclaisse added a commit to criteo-forks/consul that referenced this pull request Oct 7, 2022
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
dclaisse added a commit to criteo-forks/consul that referenced this pull request Oct 10, 2022
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
dclaisse added a commit to criteo-forks/consul that referenced this pull request Oct 18, 2022
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
dclaisse added a commit to criteo-forks/consul that referenced this pull request Oct 24, 2022
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
dclaisse added a commit to criteo-forks/consul that referenced this pull request Nov 15, 2022
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
mbrulatout pushed a commit to criteo-forks/consul that referenced this pull request Mar 31, 2023
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
mbrulatout pushed a commit to criteo-forks/consul that referenced this pull request Apr 3, 2023
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
mbrulatout pushed a commit to criteo-forks/consul that referenced this pull request Apr 18, 2023
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
mbrulatout pushed a commit to criteo-forks/consul that referenced this pull request May 23, 2023
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
mbrulatout pushed a commit to criteo-forks/consul that referenced this pull request May 26, 2023
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
mougams pushed a commit to mougams/consul that referenced this pull request Jun 7, 2023
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
mougams pushed a commit to criteo-forks/consul that referenced this pull request Jun 9, 2023
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
mbrulatout pushed a commit to criteo-forks/consul that referenced this pull request Jun 9, 2023
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
mbrulatout pushed a commit to criteo-forks/consul that referenced this pull request Jun 9, 2023
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
mougams pushed a commit to mougams/consul that referenced this pull request Jul 11, 2023
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
mougams pushed a commit to mougams/consul that referenced this pull request Jul 11, 2023
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
mougams pushed a commit to criteo-forks/consul that referenced this pull request Jul 11, 2023
hashicorp#12298 introduced an improvement
to prevent raft commit for no-op operations, but while moving conditions
into a function, author inversed the logic while keep same boolean
operations. As a consequence, this prevents one to update provider config
of a secondary CA.
Correct code was suggested in a review comment anyway, so apply it:
hashicorp#12298 (review)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/certificates Related to creating, distributing, and rotating certificates in Consul type/bug Feature does not function as expected
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants