fix(google): don't leave orphaned applications in the cache #4123
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If an
AbstractGoogleServerGroupCachingAgent
has some applications stored,but in the next run it finds no applications, it doesn't put anything
under the
applications
key.DefaultProviderCache
then has some weirdbehavior where it removes all relationships associated with that
application, but leaves the existing application in the cache. That
means we'll just repeat that behavior again the next time the caching
agent runs. The application data is just orphaned but triggers
relationship evictions every single caching cycle.
Amazon doesn't have this problem because they always stick a
List<CacheData>
under theapplications
key, even if it's empty. This isenough to tell
putCacheResult()
to store that empty list, overwritingwhatever was there before.
The orphaned data isn't the cause of spinnaker/spinnaker#4511, but it
dramatically exacerbates that issue. (The real issue is that every
single regional/zonal caching agent asserts ownership over the
application and will happily delete it, which triggers deletions of
relationships that were put there by other caching agents.)
I was working on a larger refactor of CATS to make this bug less likely
to happen, but since we're about to cut a release branch, I'd rather
just check in this smaller fix.