Prevent mutations by entry cache callers #3215
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Recently, we introduced a change wherein the agents supply an output mask to the server (#3123) to reduce bandwidth usage.
This exposed a bug in the interactions between the SVID API handler and the entry cache. The cache currently returns its owned copy of the entry to callers. This was done for performance reasons.... making a copy of each entry increases memory pressure in one of the hottest codepaths in the server.
Due to this behavior however, the SVID handler, when applying the mask to remove fields from the entries before including them in the response, was inadvertently stripping off fields from entries within the cache. This was not only resulting in temporary data loss (e.g. dns names) on the entries (next cache refresh would restore the fields) but could easily become a data race, wherein entries could get mutated by multiple entities at once (since the fields are mutated concurrently without any sort of lock protection).
This change updates the cache to clone the entries before returning them to the caller. Although this results in some increase in memory pressure, it is the cleanest, and most robust approach. If the increase in memory pressure turns out to be too much, we can explore other options, though those may come with a large cost in code complexity (e.g. on-demand cloning of shared data structure). Even if we did something cute, the GetAuthorizedEntries RPC is by far the most called RPC in the agent and would need to clone anyway to apply the mask.
Fixes: #3184