Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent long delays in ExpirationManager.Stop due to blanking a large pending map #23282

Merged
merged 6 commits into from
Sep 27, 2023
42 changes: 22 additions & 20 deletions vault/expiration.go
Original file line number Diff line number Diff line change
Expand Up @@ -860,27 +860,19 @@ func (m *ExpirationManager) Stop() error {
close(m.quitCh)

m.pendingLock.Lock()
// Replacing the entire map would cause a race with
// a simultaneous WalkTokens, which doesn't hold pendingLock.
m.pending.Range(func(key, value interface{}) bool {
info := value.(pendingInfo)
info.timer.Stop()
m.pending.Delete(key)
return true
})
oldPending := m.pending
m.pending, m.nonexpiring, m.irrevocable = sync.Map{}, sync.Map{}, sync.Map{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes way more sense than iterating and deleting elements one by one!

m.leaseCount = 0
m.nonexpiring.Range(func(key, value interface{}) bool {
m.nonexpiring.Delete(key)
return true
})
m.uniquePolicies = make(map[string][]string)
m.irrevocable.Range(func(key, _ interface{}) bool {
m.irrevocable.Delete(key)
return true
})
m.irrevocableLeaseCount = 0
m.pendingLock.Unlock()

go oldPending.Range(func(key, value interface{}) bool {
info := value.(pendingInfo)
info.timer.Stop()
ncabatoff marked this conversation as resolved.
Show resolved Hide resolved
return true
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale for why this doesn't leak all these pendingInfos:

  1. m.pending is no longer pointing at this map
  2. Assuming nothing else ever holds a pointer to m.pending without also holding m.pendingLock (but need to confirm that).
  3. When this goroutine completes, nothing else in here will still be holding a reference to oldPending.
  4. So the entire oldPending map and all the pendingInfos should be GCed.

Did I miss anything? Did you already verify the assumption I called out Nick to your satisfaction? Happy to take a closer look if you'd like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading to the end, it seems the two places that do access this do (now) take the lock for the access but not for the iteration.

So it's possible that even after this goroutine goes away that one of the walk* methods will still be iterating this pending map which is why you have the no-op expire func. Once they are done iterating though we should be OK to GC all the old maps and items. 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you missed anything, thanks for verifying!


if m.inRestoreMode() {
for {
if !m.inRestoreMode() {
Expand Down Expand Up @@ -2471,8 +2463,13 @@ func (m *ExpirationManager) WalkTokens(walkFn ExpirationWalkFunction) error {
return true
}

m.pending.Range(callback)
m.nonexpiring.Range(callback)
m.pendingLock.RLock()
toWalk := []sync.Map{m.pending, m.nonexpiring}
m.pendingLock.RUnlock()

for _, m := range toWalk {
m.Range(callback)
}
raskchanky marked this conversation as resolved.
Show resolved Hide resolved

return nil
}
Expand All @@ -2495,8 +2492,13 @@ func (m *ExpirationManager) walkLeases(walkFn leaseWalkFunction) error {
return walkFn(key.(string), expireTime)
}

m.pending.Range(callback)
m.nonexpiring.Range(callback)
m.pendingLock.RLock()
toWalk := []sync.Map{m.pending, m.nonexpiring}
m.pendingLock.RUnlock()

for _, m := range toWalk {
m.Range(callback)
}

return nil
}
Expand Down
Loading