Fix race in "spire" UpstreamAuthority plugin #1917
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The "spire" UpstreamAuthority plugin maintains a local copy of the upstream bundle. When the upstream bundle changes, updates are sent back to SPIRE core. The local copy is adjusted synchronously with the response of NewDownstreamX509CA and PublishJWTAuthority RPCs. However, there is also an asynchronous goroutine polling for updates to the bundle.
To prevent the pollling goroutine from overwriting the results of the aforementioned RPCs, the plugin maintains a simple "version" number that is bumped whenever the bundle is updated. The polling goroutine captures
the version number before polling for the bundle and is supposed to only update the local copy if the local copy hasn't otherwise been updated (by NewDownstreamX509CA or PublishJWTAuthority responses).
However, the version number check and update operation, i.e. "replace if version matches" operation does not take place under the same lock, opening up a race condition where the version check succeeds, the local copy is updated by other goroutines, and then those updates are overwritten by the now-stale bundle retrieved by the polling goroutine.
This PR fixes that race condition by performing the version check and replacement while under the lock.
It also fixes races in the unit-tests due to the wrong mock clock implemtnation being used and not triggering timers enough to ensure updates have been picked up before asserting the response.