-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connect Vault provider accesses privileged endpoints during rotation #13090
Comments
Hey @bazaah Thanks for bringing this to our attention. I'm not too familiar with the vault managed PKI paths, but from what you just showed I can see how this behavior can be confusing. I'll ask around and get back to you with a better explanation of what might be happening here, just wanted to let you know that we're looking into this 😄 |
Hi @bazaah, For context, when you received the initial root CA rotation error (403 on the call to
On item 1, I agree that if the documentation is missing a Vault permission needed to enable root rotation to work, we need to correct the documentation. The context questions above will help me better under what actually triggered the attempted root rotation. On item 2, Consul would just need |
It definitely wasn't an intentional rotation, though I can see how it might have been accidental. I no longer have access to the cluster (previous job), but IRRC there was a Vault agent responsible for rotating the credential Consul uses, via calls to the http version of On point 2: I'm not sure. On the one hand, signing intermediates is already a pretty dangerous privilege; however signing arbitrary CA certs does feel like an escalation, though I'm not enough of a PKI whiz to understand if this feeling pans out or not. From a usability perspective, it would be a lot harder to get signoff on giving out a |
I'm not an expert in this area of the code, but I'd be surprised if rotation was intended to be triggered upon something like changing the Vault token. There only part of the code that seems to call the
Org problems are just as valid as technology problems! If root rotation is needed, I'm not sure whether there's an alternate (set of) endpoint(s) that could be used in Vault to achieve the same outcome with lesser privileges. |
Leaving possible breadcrumbs on the if-guard mentioned in the previous comment: consul/agent/consul/leader_connect_ca.go Line 908 in d8983fc
The value of the ID field seems to uniquely map to root's cert: consul/agent/consul/leader_connect_ca.go Line 267 in d8983fc
The original of the consul/agent/consul/leader_connect_ca.go Line 880 in d8983fc
The Vault provider's implementation of consul/agent/connect/ca/provider_vault.go Line 278 in d8983fc
So if there were something amiss here, it might be within the Vault provider's |
Do you happen to know what Vault version you may have been using at the time you first saw this? Had Vault's version been changed recently? (I realize this was months back and you may not know/remember.) I'm wondering if the rotation of the Vault token used by Consul's Connect CA was previously working, then stopped working, after a Consul or Vault version change. That might help narrow down what may have happened. |
It was |
Leaving more possible breadcrumbs that could be related, assuming the problem started after an upgrade in ~April 2022 to Vault 1.10.x and Consul ~1.11.5. Around that time, some changes were made on the Consul and Vault side to support having an external CA as the trusted CA when using the Vault CA provider. In other words, the "RootPKIPath" could then actually contain an intermediate CA rather than a true root CA. The Consul change was made in Consul 1.11.4. The related Vault change was made in Vault 1.10.0, affecting endpoints used by the GenerateRoot function. Those changes could be entirely unrelated to the observed behavior, but the interaction with the code path that leads to the |
Following up on:
We recently tested what happens when What could trigger a root CA rotation without changing the provider (Vault) is changing the |
Updated documentationBased on the original concerns: We've updated the documentation on the Vault CA provider page to explicitly state:
The documentation cross-references that guidance from both the Requesting Feedback@bazaah : Do you feel like these documentation changes address your original concerns? (I realize there's also the open question of why this root CA rotation process was triggered in your environment in the first place, per my previous comment.) |
No, the pki path never changed, either root or intermediate.
Yes, they do. I'm happy to close this now as if I had known the linked information at the time, I wouldn't have encountered this issue. There still does remain the mystery of the original rotation, but I can't provide a real bug report for it, or an MVP, so its not fair to keep this open for that. |
I'll close the issue for now, but please re-open if you re-experience an unexpected root rotation (triggered by something other than a provider or |
When filing a bug, please include the following headings if possible. Any example text in this template can be deleted.
Overview of the Issue
I'm using Consul Connect with a Vault backed CA provider. Specially the Vault managed PKI paths flow, wherein Consul is provided read access to the root PKI engine (and full access to it's intermediate PKI engine).
While reviewing an incident report involving Connect proxies suddenly ceasing communication, I noticed x509 errors suggesting that the local envoy proxy did not trust its upstream's certificate. This lead me to review the logs from Consul where I noticed a permission error for an endpoint I didn't recognize during a Connect CA rotation event.
Sure enough, this endpoint does appear to be required during rotation as per: https://github.com/hashicorp/consul/blob/v1.11.5/agent/connect/ca/provider_vault.go#L650-L653
However, this endpoint:
sudo
privileges in Vault to access (link), which seems to void the use case of providing read only access to the root PKI engineI'm not sure how to categorize this report, as on the one hand it could be a doc issue, but on the other, requiring access to a
sudo
protected endpoint seems to violate the use case of this particular flow.I'm also pretty sure that Consul doesn't correctly handle rolling back to an old working CA, as I had to force another 2 rotations (Consul Provider -> Vault Provider) to fix the envoy x509 errors. That said, I have no hard evidence of this, so I'll leave it out of this report's scope.
Consul info for both Client and Server
Client info
Server info
Operating system and Environment details
OS, Architecture, and any other information you can provide about the environment.
uname -s -r -v -m -p -i -o
cat /etc/os-release
Log Fragments
Include appropriate Client or Server log fragments. If the log is longer than a few dozen lines, please include the URL to the gist of the log instead of posting it in the issue. Use
-log-level=TRACE
on the client and server to capture the maximum log detail.The text was updated successfully, but these errors were encountered: