New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: add jwks rpc and http api #18035
Changes from 1 commit
ab1ea61
59dd317
1f76474
b28436b
027c6b2
3f252d1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,12 +4,79 @@ | |
package agent | ||
|
||
import ( | ||
"crypto/ed25519" | ||
"fmt" | ||
"net/http" | ||
"strings" | ||
"time" | ||
|
||
"github.com/go-jose/go-jose/v3" | ||
"github.com/hashicorp/nomad/helper" | ||
"github.com/hashicorp/nomad/nomad/structs" | ||
) | ||
|
||
// jwksMinMaxAge is the minimum amount of time the JWKS endpoint will instruct | ||
// consumers to cache a response for. | ||
const jwksMinMaxAge = 15 * time.Minute | ||
|
||
// JWKSRequest is used to handle JWKS requests. JWKS stands for JSON Web Key | ||
// Sets and returns the public keys used for signing workload identities. Third | ||
// parties may use this endpoint to validate workload identities. Consumers | ||
// should cache this endpoint, preferably until an unknown kid is encountered. | ||
func (s *HTTPServer) JWKSRequest(resp http.ResponseWriter, req *http.Request) (any, error) { | ||
if req.Method != http.MethodGet { | ||
return nil, CodedError(405, ErrInvalidMethod) | ||
} | ||
|
||
args := structs.GenericRequest{} | ||
if s.parse(resp, req, &args.Region, &args.QueryOptions) { | ||
return nil, nil | ||
} | ||
|
||
var rpcReply structs.KeyringListPublicResponse | ||
if err := s.agent.RPC("Keyring.ListPublic", &args, &rpcReply); err != nil { | ||
return nil, err | ||
} | ||
setMeta(resp, &rpcReply.QueryMeta) | ||
|
||
// Key set will change after max(CreateTime) + RotationThreshold. | ||
var newestKey int64 | ||
jwks := make([]jose.JSONWebKey, 0, len(rpcReply.PublicKeys)) | ||
for _, pubKey := range rpcReply.PublicKeys { | ||
if pubKey.CreateTime > newestKey { | ||
newestKey = pubKey.CreateTime | ||
} | ||
|
||
jwk := jose.JSONWebKey{ | ||
KeyID: pubKey.KeyID, | ||
Algorithm: pubKey.Algorithm, | ||
Use: pubKey.Use, | ||
} | ||
switch alg := pubKey.Algorithm; alg { | ||
case structs.PubKeyAlgEdDSA: | ||
// Convert public key bytes to an ed25519 public key | ||
jwk.Key = ed25519.PublicKey(pubKey.PublicKey) | ||
default: | ||
s.logger.Warn("unknown public key algorithm. server is likely newer than client", "alg", alg) | ||
} | ||
Comment on lines
+55
to
+61
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this error message is suggesting that the request came to a Nomad client and was forwarded to a server that was newer, but the text isn't super clear and it's not necessarily the case (ex. the request was sent to a Nomad server that hadn't been upgraded and then that was forwarded to the leader which was). What if the RPC returned both the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I struggled with what responsibilities to put in the RPC vs HTTP handlers. If I understand your proposal it would mean the type KeyringPublicKey struct {
KeyID string `json:"kid"`
// Always "OKP" (octet key pair) currently
KeyType string `json:"kty"`
PublicKey []byte `json:"x"`
// Always "EdDSA" currently
Algorithm string `json:"crv"`
// Always "Ed25519" currently, must match alg
Curve string `json:"alg"`
// Always "sig" currently
Use string `json:"use"`
// Purely for Nomad's cache control use. Not part of JWK spec.
CreateTime int64 `json:"-"`
} Then we would just JSON marshal Reason 1 Against: Too DIYI avoided this initially because it felt too DIY. The nice thing about That being said go-jose is "saving" us 2 fields currently ( So my "too DIY to skip go-jose" argument seems weak at best... Reason 2 Against: Validating JWTs in the AgentBut there's another reason to do the conversion here: if we ever want to validate JWTs in the Agent, the Agent needs to do the conversion we're doing here. There's no way to avoid converting The auth middleware used for the Task API and the WhoAmI RPC would both benefit from being able to validate JWTs using these PublicKeys. In my other branch I have a I think this is reason enough to keep Alternative: JSONWebKeySet internallyPlan: just have the RPC return Pro: All of the key handling is the Con: I don't think it's safe to send go-jose's My main concern is I'm also a little worried that if we started using X509 certificates in the future that the I don't know... these may be unreasonable concerns, but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah that's pretty much what I was thinking. To be clear, I'm less concerned about where in the code the conversion is happening than avoiding weird cross-version problems in doing that conversion with info we got from the server. But...
I think I'm probably sold on the basis of this alone. We know we're likely to want to do this in the near-ish future. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you think about my comment about the text of the error message though? |
||
|
||
jwks = append(jwks, jwk) | ||
} | ||
|
||
// Have nonzero create times and threshold so set a reasonable cache time. | ||
if newestKey > 0 && rpcReply.RotationThreshold > 0 { | ||
exp := time.Unix(0, newestKey).Add(rpcReply.RotationThreshold) | ||
maxAge := helper.ExpiryToRenewTime(exp, time.Now, jwksMinMaxAge) | ||
Check failure on line 69 in command/agent/keyring_endpoint.go GitHub Actions / test-e2e
Check failure on line 69 in command/agent/keyring_endpoint.go GitHub Actions / test-windows
Check failure on line 69 in command/agent/keyring_endpoint.go GitHub Actions / tests-api
Check failure on line 69 in command/agent/keyring_endpoint.go GitHub Actions / tests-groups (nomad)
Check failure on line 69 in command/agent/keyring_endpoint.go GitHub Actions / tests-groups (client)
Check failure on line 69 in command/agent/keyring_endpoint.go GitHub Actions / tests-groups (command)
Check failure on line 69 in command/agent/keyring_endpoint.go GitHub Actions / tests-groups (drivers)
Check failure on line 69 in command/agent/keyring_endpoint.go GitHub Actions / tests-groups (quick)
Check failure on line 69 in command/agent/keyring_endpoint.go GitHub Actions / checks / checks
|
||
resp.Header().Set("Cache-Control", fmt.Sprintf("max-age=%d", int(maxAge.Seconds()))) | ||
} | ||
|
||
out := &jose.JSONWebKeySet{ | ||
Keys: jwks, | ||
} | ||
|
||
return out, nil | ||
} | ||
|
||
// KeyringRequest is used route operator/raft API requests to the implementing | ||
// functions. | ||
func (s *HTTPServer) KeyringRequest(resp http.ResponseWriter, req *http.Request) (interface{}, error) { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -360,3 +360,65 @@ func (k *Keyring) Delete(args *structs.KeyringDeleteRootKeyRequest, reply *struc | |
reply.Index = index | ||
return nil | ||
} | ||
|
||
// ListPublic signing keys used for workload identities. This RPC is used to | ||
// back a JWKS endpoint. | ||
// | ||
// Unauthenticated. | ||
func (k *Keyring) ListPublic(args *structs.GenericRequest, reply *structs.KeyringListPublicResponse) error { | ||
|
||
// JWKS is a public endpoint: intentionally ignore auth errors and only | ||
// authenticate to measure rate metrics. | ||
k.srv.Authenticate(k.ctx, args) | ||
if done, err := k.srv.forward("Keyring.ListPublic", args, args, reply); done { | ||
return err | ||
} | ||
schmichael marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
defer metrics.MeasureSince([]string{"nomad", "keyring", "list_public"}, time.Now()) | ||
|
||
// Expose root_key_rotation_threshold so consumers can determine reasonable | ||
// cache settings. | ||
reply.RotationThreshold = k.srv.config.RootKeyRotationThreshold | ||
|
||
// Setup the blocking query | ||
opts := blockingOptions{ | ||
queryOpts: &args.QueryOptions, | ||
queryMeta: &reply.QueryMeta, | ||
run: func(ws memdb.WatchSet, s *state.StateStore) error { | ||
|
||
// retrieve all the key metadata | ||
snap, err := k.srv.fsm.State().Snapshot() | ||
if err != nil { | ||
return err | ||
} | ||
iter, err := snap.RootKeyMetas(ws) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
pubKeys := []*structs.KeyringPublicKey{} | ||
for { | ||
raw := iter.Next() | ||
if raw == nil { | ||
break | ||
} | ||
|
||
keyMeta := raw.(*structs.RootKeyMeta) | ||
if keyMeta.Inactive() { | ||
// Skip inactive keys | ||
continue | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I went back to the RFC on this and it's not totally clear to me how this will work with root key rotation and identities being pulled from the client asynchronously. Suppose the following scenario:
I think we still need to show the public key for any key that can possibly have an unexpired JWT? If we cap the maximum JWT expiry to the root key rotation window (30d), that would suggest we need to display the active key + at least one previous key. (I might suggest there's no harm in showing all the inactive keys but because the root key is used to encrypt Variables too and we can't GC them without a full rekey, his could be a fairly large set over time.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah when writing the RFC I assumed When fixing this up I was going to switch this logic to only skip JWKS must include the key for any credential that should be considered valid. JWKS should not include keys for any credentials that should already be expired. I can think of 2 paths forward, but both will require a new
|
||
|
||
pubKey, err := k.encrypter.GetPublicKey(keyMeta.KeyID) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
pubKeys = append(pubKeys, pubKey) | ||
} | ||
reply.PublicKeys = pubKeys | ||
return k.srv.replySetIndex(state.TableRootKeyMeta, &reply.QueryMeta) | ||
}, | ||
} | ||
return k.srv.blockingRPC(&opts) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically the HTTP server will be protected with mTLS, which won't be reachable by third parties. So this probably needs to get served on a separate HTTP listener. (On its own port, I suppose?) This PR is already a nice size so we don't have to solve that in this PR, but just wanted to make sure we didn't forget about that problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah in the RFC (internal only, sorry folks) I call this out as a problem.
The Task API (+Workload Identity) gets around the mTLS problem for things running in Nomad... which our friends Consul and Vault usually are not.
...so then you're left with the option of running an mTLS terminating proxy just for this endpoint which is a hassle to say the least. How the proxy and the JWT's
issuer
field interacted could be problematic as well, although I don't think for Consul and Vault because their JWT auth methods allow you to specify the JWKSURL explicitly and only seem to use theissuer
claim for mapping.That being said this is only a problem for folks running with
tls.verify_https_client = true
explicitly as it defaults tofalse
even with mTLS enabled. I think this is still an ok default and recommendation as the HTTP endpoints will still be protected by (asymmetric) TLS, and the Consul and Vault JWT auth methods support setting a custom CA cert for that........even after all of that we should probably add a "Only TLS, Not mTLS" HTTP listener for this, the eventual OIDC discovery endpoint, and maybe other "low risk" endpoints like metrics.
While I feel like the situation has fairly reasonable options, I think it's clearly too complicated. Somebody hardening their cluster by setting
tls.verify_https_client = true
one day breaking their Consul integration due to the JWT auth method lacking a client certificate would just be hellish to debug. Definitely a : (╯°□°)╯︵ ┻━┻ situation I want to avoid.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep forgetting about
tls.verify_https_client = false
being the default. Even Vault's equivalent is set to false.We definitely don't need to solve this in this PR for sure 😀