Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid panic on concurrent writes to cached service config map #10647

Merged
merged 3 commits into from
Jul 20, 2021

Conversation

freddygv
Copy link
Contributor

@freddygv freddygv commented Jul 19, 2021

If multiple instances of a service are co-located on the same node then
their proxies will all share a cache entry for their resolved service
configuration. This is because the cache key contains the name of the
watched service but does not take into account the ID of the watching
proxies.

This means that there will be multiple agent service manager watches
that can wake up on the same cache update. These watchers then
concurrently modify the value in the cache when merging the resolved
config into the local proxy definitions.

To avoid this concurrent map write we will only delete the key from
opaque config in the local proxy definition after the merge, rather
than from the cached value before the merge.

Fixes #10636

@freddygv freddygv requested a review from a team July 19, 2021 22:06
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging July 19, 2021 22:09 Inactive
@vercel vercel bot temporarily deployed to Preview – consul July 19, 2021 22:09 Inactive
@freddygv
Copy link
Contributor Author

freddygv commented Jul 19, 2021

Alternatively we could use different cache keys for each requesting proxy:

w.cacheKey = req.CacheInfo().Key

Rather than relying on the default:

v, err := hashstructure.Hash(struct {

This would lead to having a separate entry in the cache for each proxy.

If multiple instances of a service are co-located on the same node then
their proxies will all share a cache entry for their resolved service
configuration. This is because the cache key contains the name of the
watched service but does not take into account the ID of the watching
proxies.

This means that there will be multiple agent service manager watches
that can wake up on the same cache update. These watchers then
concurrently modified the value in the cache when merging the resolved
config into the local proxy definitions.

To avoid this concurrent map write we will only delete the key from
opaque config from the local proxy definition after the merge, rather
than from the cached value before the merge.

Fixes #10636
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging July 19, 2021 22:28 Inactive
@vercel vercel bot temporarily deployed to Preview – consul July 19, 2021 22:28 Inactive
@freddygv freddygv changed the title Avoid deleting from resolved upstream config map Avoid panic on concurrent writes to cached service config map Jul 19, 2021
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Change LGTM. Thinking about how to test this, I guess trying to reproduce the panic would be difficult. Maybe we could add a check to the two existing tests that call mergeServiceConfig to verify that the defaults are not modified after mergeServiceConfig returns?

The test copies the entire struct since there can be multiple opaque
config maps in it.
@vercel vercel bot temporarily deployed to Preview – consul July 19, 2021 23:37 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging July 19, 2021 23:37 Inactive
@freddygv freddygv requested a review from dnephin July 20, 2021 15:32
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@freddygv freddygv merged commit cf48218 into main Jul 20, 2021
@freddygv freddygv deleted the panic/avoid-map-delete branch July 20, 2021 16:09
@hc-github-team-consul-core
Copy link
Collaborator

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/413107.

@hc-github-team-consul-core
Copy link
Collaborator

🍒✅ Cherry pick of commit cf48218 onto release/1.10.x succeeded!

hc-github-team-consul-core pushed a commit that referenced this pull request Jul 20, 2021
If multiple instances of a service are co-located on the same node then
their proxies will all share a cache entry for their resolved service
configuration. This is because the cache key contains the name of the
watched service but does not take into account the ID of the watching
proxies.

This means that there will be multiple agent service manager watches
that can wake up on the same cache update. These watchers then
concurrently modify the value in the cache when merging the resolved
config into the local proxy definitions.

To avoid this concurrent map write we will only delete the key from
opaque config in the local proxy definition after the merge, rather
than from the cached value before the merge.
chrisboulton pushed a commit to bigcommerce/consul that referenced this pull request Sep 21, 2021
…orp#10647)

If multiple instances of a service are co-located on the same node then
their proxies will all share a cache entry for their resolved service
configuration. This is because the cache key contains the name of the
watched service but does not take into account the ID of the watching
proxies.

This means that there will be multiple agent service manager watches
that can wake up on the same cache update. These watchers then
concurrently modify the value in the cache when merging the resolved
config into the local proxy definitions.

To avoid this concurrent map write we will only delete the key from
opaque config in the local proxy definition after the merge, rather
than from the cached value before the merge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

panic: fatal error: concurrent map writes
3 participants