Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent long delays in ExpirationManager.Stop due to blanking a large pending map #23282

Merged
merged 6 commits into from
Sep 27, 2023

Conversation

ncabatoff
Copy link
Collaborator

No description provided.

@github-actions github-actions bot added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label Sep 26, 2023
@github-actions
Copy link

github-actions bot commented Sep 26, 2023

CI Results:
All Go tests succeeded! ✅

@ncabatoff ncabatoff added this to the 1.13.9 milestone Sep 26, 2023
@ncabatoff ncabatoff marked this pull request as ready for review September 26, 2023 16:29
@github-actions
Copy link

github-actions bot commented Sep 26, 2023

Build Results:
All builds succeeded! ✅

vault/expiration.go Show resolved Hide resolved
vault/expiration.go Show resolved Hide resolved
Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good and seems safe based on the diff here. That said I don't have lots of context of this subsystem so there is a whole world of subtleties I may be missing. I'd be happy to approve though it probably makes more sense to test this out with the reproduction of bad behaviour we have in a test environment to confirm it has the performance properties we expect before a final pass/approval.

newStrategy := ExpireLeaseStrategy(expireNoop)
m.expireFunc.Store(&newStrategy)
oldPending := m.pending
m.pending, m.nonexpiring, m.irrevocable = sync.Map{}, sync.Map{}, sync.Map{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes way more sense than iterating and deleting elements one by one!

// We need to stop the timers to prevent memory leaks.
info.timer.Stop()
return true
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale for why this doesn't leak all these pendingInfos:

  1. m.pending is no longer pointing at this map
  2. Assuming nothing else ever holds a pointer to m.pending without also holding m.pendingLock (but need to confirm that).
  3. When this goroutine completes, nothing else in here will still be holding a reference to oldPending.
  4. So the entire oldPending map and all the pendingInfos should be GCed.

Did I miss anything? Did you already verify the assumption I called out Nick to your satisfaction? Happy to take a closer look if you'd like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading to the end, it seems the two places that do access this do (now) take the lock for the access but not for the iteration.

So it's possible that even after this goroutine goes away that one of the walk* methods will still be iterating this pending map which is why you have the no-op expire func. Once they are done iterating though we should be OK to GC all the old maps and items. 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you missed anything, thanks for verifying!

vault/expiration.go Outdated Show resolved Hide resolved
vault/expiration.go Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants