-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vault: always renew tokens using the renewal loop #18998
Conversation
Previously, a Vault token could renewed either periodically via the renewal loop or immediately by calling `RenewToken()`. But a race condition in the renewal loop could cause an attempt to renew an expired token. If both `updateCh` and `renewalCh` are active (such as when a task stops at the same time its token is waiting for renewal), the following `select` picks a `case` at random. https://github.com/hashicorp/nomad/blob/78f0c6b2a910c25522ac88ff63343bb371d72bff/client/vaultclient/vaultclient.go#L557-L564 If `case <-renewalCh` is picked, the token is incorrectly re-added to the heap, causing unnecessary renewals of a token that is already expired. https://github.com/hashicorp/nomad/blob/1604dba508b2ae8187b6b9405ce6d9b6ca8e8b4b/client/vaultclient/vaultclient.go#L505-L510 To prevent this situation, the `renew()` function should only renew tokens that are currently in the heap, so `RenewToken()` must first push the token to the heap and wait for the renewal to happen instead of calling `renew()` directly since this could cause another race condition where the token is renewed twice: once by `RenewToken()` calling `renew()` directly and a second time if the renewal happens to pick the token as soon as `RenewToken()` adds it to the heap.
84f7c5f
to
f447c0b
Compare
client/vaultclient/vaultclient.go
Outdated
// Any error returned during the periodical renewal will be written to a | ||
// buffered channel and the channel is returned instead of an actual error. | ||
// This helps the caller be notified of a renewal failure asynchronously for | ||
// appropriate actions to be taken. The caller of this function need not have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing a closed channel causes a panic. If the renewal loop has ownership over closing the channel, should we be more firm with this comment and say that the caller of the function must not close the channel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the comment already existed, so I just kept it as is, but reading again it's indeed weirdly phrased 😅
It's also not necessary because RenewToken
returns a read channel and trying to close it is a compilation error: https://go.dev/play/p/MQxK_XyGa-7
I will remove this last sentence.
client/vaultclient/vaultclient.go
Outdated
errCh := make(chan error, 1) | ||
|
||
// Create a renewal request and indicate that the identifier in the | ||
// request is a token and not a lease | ||
renewalReq := &vaultClientRenewalRequest{ | ||
renewalCh: renewalCh, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using a variable renewalCh
in the run
method to pick up timer events to start a renew
call (something is about to happen in the near future). Whereas this channel with the same name is signaling that this particular request has been successfully renewed (something has just happened in the recent past). So maybe for clarity we should name this something like completeCh
or renewedCh
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum...good point. I renamed it to renewalLoopCh
to be more explicit, but I found very tricky to find a good name 😅
I feel like completeCh
and renewedCh
are a bit misleading because the operation is not complete (the channel will still receive values) and the token may not have been renewed due to an error.
renewalLoopCh
is kind of too specific because it leaks the notion that the renewal loop exists, but this is an internal struct so probably OK?
client/vaultclient/vaultclient.go
Outdated
// Always notify listeners that the request has been processed before | ||
// exiting. | ||
defer func() { | ||
for { | ||
select { | ||
case req.renewalCh <- struct{}{}: | ||
default: | ||
return | ||
} | ||
} | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this block. If we send on the channel and it's blocked, we'll fall through the default and then return. Otherwise, we'll send on the channel and then loop again and try to send again. Only until the channel is blocked do we end up returning? If we're going to return if we're blocked anyways, why have the loop instead of something like this?
defer func() {
select {
case req.renewalCh <- struct{}{}:
default:
}
}()
Also, it looks like RenewToken
is the only method that populates renewalCh
and it doesn't return the channel for another caller. So we only ever read from this channel once (in RenewToken)
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, I used a loop here in case there were multiple listeners but never actually happens so I removed the outer for
loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
…se/1.7.x #19967 Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Previously, a Vault token could renewed either periodically via the renewal loop or immediately by calling `RenewToken()`. But a race condition in the renewal loop could cause an attempt to renew an expired token. If both `updateCh` and `renewalCh` are active (such as when a task stops at the same time its token is waiting for renewal), the following `select` picks a `case` at random. https://github.com/hashicorp/nomad/blob/78f0c6b2a910c25522ac88ff63343bb371d72bff/client/vaultclient/vaultclient.go#L557-L564 If `case <-renewalCh` is picked, the token is incorrectly re-added to the heap, causing unnecessary renewals of a token that is already expired. https://github.com/hashicorp/nomad/blob/1604dba508b2ae8187b6b9405ce6d9b6ca8e8b4b/client/vaultclient/vaultclient.go#L505-L510 To prevent this situation, the `renew()` function should only renew tokens that are currently in the heap, so `RenewToken()` must first push the token to the heap and wait for the renewal to happen instead of calling `renew()` directly since this could cause another race condition where the token is renewed twice: once by `RenewToken()` calling `renew()` directly and a second time if the renewal happens to pick the token as soon as `RenewToken()` adds it to the heap.
) * Revert "vault: always renew tokens using the renewal loop (hashicorp#18998)" This reverts commit 7054fe1. * test: add case for concurrent Vault token renewal
Previously, a Vault token could renewed either periodically via the renewal loop or immediately by calling `RenewToken()`. But a race condition in the renewal loop could cause an attempt to renew an expired token. If both `updateCh` and `renewalCh` are active (such as when a task stops at the same time its token is waiting for renewal), the following `select` picks a `case` at random. https://github.com/hashicorp/nomad/blob/78f0c6b2a910c25522ac88ff63343bb371d72bff/client/vaultclient/vaultclient.go#L557-L564 If `case <-renewalCh` is picked, the token is incorrectly re-added to the heap, causing unnecessary renewals of a token that is already expired. https://github.com/hashicorp/nomad/blob/1604dba508b2ae8187b6b9405ce6d9b6ca8e8b4b/client/vaultclient/vaultclient.go#L505-L510 To prevent this situation, the `renew()` function should only renew tokens that are currently in the heap, so `RenewToken()` must first push the token to the heap and wait for the renewal to happen instead of calling `renew()` directly since this could cause another race condition where the token is renewed twice: once by `RenewToken()` calling `renew()` directly and a second time if the renewal happens to pick the token as soon as `RenewToken()` adds it to the heap.
) * Revert "vault: always renew tokens using the renewal loop (hashicorp#18998)" This reverts commit 7054fe1. * test: add case for concurrent Vault token renewal
Previously, a Vault token could renewed either periodically via the renewal loop or immediately by calling
RenewToken()
.But a race condition in the renewal loop could cause an attempt to renew an expired token. If both
updateCh
andrenewalCh
are active (such as when a task stops at the same time its token is waiting for renewal), the followingselect
picks acase
at random.nomad/client/vaultclient/vaultclient.go
Lines 557 to 564 in 78f0c6b
If
case <-renewalCh
is picked, the token is incorrectly re-added to the heap, causing unnecessary renewals of a token that is already expired.nomad/client/vaultclient/vaultclient.go
Lines 505 to 510 in 1604dba
To prevent this situation, the
renew()
function should only renew tokens that are currently in the heap, soRenewToken()
must first push the token to the heap and wait for the renewal to happen instead of callingrenew()
directly since this could cause another race condition where the token is renewed twice: once byRenewToken()
callingrenew()
directly and a second time if the renewal happens to pick the token as soon asRenewToken()
adds it to the heap.Extracted from #18985