K8s server and agent nodeattestor plugin#557
Conversation
3c385ed to
fabc04a
Compare
doc/plugin_agent_nodeattestor_k8s.md
Outdated
doc/plugin_agent_nodeattestor_k8s.md
Outdated
There was a problem hiding this comment.
this should reset p.kubeClient so that it can be recreated with a potentially changed config
There was a problem hiding this comment.
this is racy in the face of more than one attestation happening at a time. i would suggest moving the client initialization into the Configure call. that would do two things 1) fails the configure call if bad paths are provided, and 2) allows you to provide the client as part of the configuration which is retrieved under a log (i.e. getConfig)
evan2645
left a comment
There was a problem hiding this comment.
Thanks for this work @enricoschiattarella!! Left a few comments, curious to hear your thoughts on them
doc/plugin_agent_nodeattestor_k8s.md
Outdated
There was a problem hiding this comment.
The : character is defined as reserved in RFC 3986. SPIFFE IDs must conform with this RFC. It states that if reserved characters are used outside of their special meaning(s), then they must be percent-encoded.
I recommend simply avoiding the use of : altogether
There was a problem hiding this comment.
Right. I liked the idea of carrying-over opaquely from Kubernetes RBAC encoding but I 100% agree that this is looking for trouble. If it looks ok, I will change it to "/spire/agent/k8s/system/node/hostname".
The reason to maintain the "system/node" part is that if we ever use Kubernetes to attest anything else using the certs api, there will be no conflicts.
doc/plugin_agent_nodeattestor_k8s.md
Outdated
There was a problem hiding this comment.
Why is this optional, and what do I get if I specify it?
There was a problem hiding this comment.
There is a "standard" sequence to determine the location kubeconfig, if not provided explicitly.
First try env variable KUBECONFIG, then try home directory, then return error.
It is the sequence followed by kubectl.
I will add an explanation.
There was a problem hiding this comment.
There is an assumption in SPIRE that all agents can be uniquely addressed... as such, I think it would be best to raise an error here instead of set unknown since it seems that this name is ultimately used in the agent's spiffe id? Also brings up questions about hosts that might have the same hostname? Curious to hear your thoughts on this in the context of kubernetes cluster
More info here #558
There was a problem hiding this comment.
In a Kubernetes environment we can rely on each node (and pod) having a unique IP address because essentially NAT is prohibited, so if the trust domain coincides with the Kubernetes cluster, we could use the IP address.
https://kubernetes.io/docs/concepts/cluster-administration/networking/
If the trust domain includes multiple Kubernetes clusters, then I think the best we can do is to include a fingerprint of the CA certificate associated to the kubernetes cluster.
Something like "spire/agent/k8s/cluster/XXXX/system:node:".
As a last resort, we can completely forget about the hostname and use a fingerprint of the cert, same as the x509pop attestation does.
I'm not sure what's the current thinking is on the question you referred.
Agent naming uniqueness is a nice property...
Human-readable SVIDs are nice too...
I will be happy to implement whatever you guys believe is the best overall solution.
There was a problem hiding this comment.
We discussed this topic on the SIG-Kubernetes call today. We concluded that it is probably OK for the resulting ID to be less human friendly than it could be. It is likely that a SPIRE server will manage multiple kubernetes clusters, so including the thumbprint of the CA seems like a good idea. The proposed path including thumbprint and node name is 👍.
There was a problem hiding this comment.
Server side node attestors can optionally kick back selectors which can be used to group agents together. In this case, it would be awesome if we provide a selector for CA thumbprint k8s:ca_thumbprint:XXXXX. Doing so would allow users to create SPIFFE IDs which address entire clusters, allowing them to register workloads to run in one cluster or another.
There was a problem hiding this comment.
It would be good if we can document this search behavior
There was a problem hiding this comment.
Will do
There was a problem hiding this comment.
I noticed the beta1 suffix here - how stable is this API? Do we expect it to be the same a year or more from now?
There was a problem hiding this comment.
The feature that depends on this API is going GA in the next release, so in the long term the mechanics of node attestation should be stable.
The API itself will not graduate yet, though, so minor changes are possible.
This is the issue for tracking and discussion:
There was a problem hiding this comment.
Does the CSR signing process in kubernetes do anything to assert the correct value in NodeName? In other words, could I get kubernetes to sign a CSR with the name of a different node here?
There was a problem hiding this comment.
Right now it does not. It just checks that CN conforms to the expected format and key usage is what is expected. There are plans to strengthen this and support proof-of-ownership, TPMs, etc.
Options for improving over the baseline are:
- Disable automatic CSR approval, rely on Kubernetes admin to verify the content of the CSR manually before approving it
- Create a custom CSR signer that will perform extra checks/challenges
- K8s attestor in SPIRE server checks the IP address of the agent connection and compares it against what is in the Kubernetes certificate
I think all options are doable but IMO, unless this is a showstopper, the best course of action is to live with the limitation for the time being and leverage the improvements of the Kubernetes node attestation mechanism as they become available.
There was a problem hiding this comment.
We discussed this topic on the SIG-Kubernetes call today. The biggest drawback here is that we are essentially asserting the ability of the caller to get a CSR approved, and not asserting anything about the actual identity of the caller. In certain situations, this can result in a non-agent workload being able to obtain the identities of all other workloads in the cluster.
To mitigate this, we should try to assert that the caller is in fact an agent. We discussed using a service account token to do this, configuring the server node attestor with the namespace and service account that the agent runs as. This token can either 1) be used as credentials to submit the CSR, and verified by the server, or 2) passed directly to the server.
Taking this approach gives stronger assurance that the caller we are attesting is an agent, however it still allows for a dishonest agent (or anything that can run as the named service account) to claim that it is a different node/agent. To mitigate this, we could use the kubelet identity to submit the CSR rather than the service account. The server attestor can then verify that the kubelet identity which submitted the CSR matches the SPIFFE ID we are about to issue. IMO, this is the ideal approach, and provides the strongest level of attestation of all the options. @enricoschiattarella has mentioned that obtaining the kubelet identity may be non-trivial, but perhaps it warrants some further research.
To avoid the non-agents-could-obtain-arbitrary-identities problem, we should either add the service account token logic, or use the kubelet identity directly for CSR submission, with a preference for the latter. If we choose the service account path, it probably warrants a direct call out in the plugin documentation (perhaps in the form of a Security Considerations section?). SPIRE is known for its ability to strongly attest and issue identity, but some sacrifices are made in SA token approach (namely, that we are not actually asserting the identity of a node, rather we are asserting that the caller lives in a particular service account).
Some related notes:
- Using SA token for CSR submission instead of sending it directly to the server avoids the need to pass a SA token to a system "outside" of the kubernetes control plane
- @mattmoyer informed us that the identity used to submit the CSR is included in the signed certificate - this could be directly extracted by the server attestor for validation, as opposed to having to hit the api server to interrogate the identity of the submitter
- Node identity may be embedded in SA tokens in the future, it is unclear whether the node identity will 1) be included in resulting signed certificates, 2) be discoverable through interrogation of the API server, looking at the CSR approval stuff. If neither 1 or 2, we would end up having to pass the token directly to SPIRE anyways.
There was a problem hiding this comment.
nit: might be a good idea to be more specific with these errors, like k8s node attestor: not configured or something
There was a problem hiding this comment.
will do
There was a problem hiding this comment.
I think we need to ensure that req.AttestationData is not nil?
There was a problem hiding this comment.
will add the check
There was a problem hiding this comment.
Same comment here - what if req.AttestationData is nil?
There was a problem hiding this comment.
will fix
There was a problem hiding this comment.
Hmmm... I am curious if this can be abused by a client that never sends the challenge response... or triggered by clients which crash before being able to do so? Perhaps this Recv should be guarded with a select block on context cancellation?
@azdagron does the streaming shim have a default timeout on the context here? Is this a problem elsewhere?
There was a problem hiding this comment.
The Recv() call should return with an error in the case of context cancellation. You bring up a good point though. We should probably define timeouts for all plugin related calls. I don't think there is anything specific to do in the plugin code.
ecc354e to
8b9b152
Compare
|
After discussion, we came to consensus that for now using the service account token was good enough and likely to get better if more identity components end up included in the service account token (like node name, for example). I've rebased this on top of current master and pushed a commit that transitions the code to use the service account token instead. |
There was a problem hiding this comment.
should cluster name be included here?
There was a problem hiding this comment.
The notion that selectors are typed is something that is not necessarily exposed to users. For instance, the registration CLI takes -selector k8s_sat:cluster-name:MyCluster. In the interest of simplicity, I'd lean towards s/Value/Selector/ and include the k8s_sat type prefix
There was a problem hiding this comment.
nit: perhaps "unable to read token from %v: %v", config.tokenPath, err or something similar will provide more information?
There was a problem hiding this comment.
if we make users configure the cluster name on the agent attestor, maybe we could avoid this bit
There was a problem hiding this comment.
Do we need to check that alg is correctly set somewhere in here?
There was a problem hiding this comment.
verifiers in square's go-jose library make sure that the algorithm in the token lines up with the verification key that is passed. we're good here.
There was a problem hiding this comment.
any chance we could simplify this by dropping the suffixes?
k8s_sat:cluster:foo
k8s_sat:service-account:bar
k8s_sat:namespace:baz
There was a problem hiding this comment.
the plugin name (and hence beginning of all selectors) is k8s_sat which makes me lean towards snake case for service account too
k8s_sat:service_account
Signed-off-by: Enrico Schiattarella <enrico.schiattarella@gmail.com>
Signed-off-by: Andrew Harding <azdagron@gmail.com>
Signed-off-by: Andrew Harding <azdagron@gmail.com>
dcc81fd to
f72dd3a
Compare
Signed-off-by: Andrew Harding <azdagron@gmail.com>
Pull Request check list
Affected functionality
Description of change
Provides a new node attestation plugin for server and agent aimed at Kubernetes environments.
The agent plugin uses the Kubernetes Certificates API to submit a CSR and get it signed by Kubernetes API Server. Kubernetes API Server signs it using its CA and returns the certificate to the agent.
The agent presents the certificate to the server as an identity document.
Server verifies that agent actually possesses corresponding private key and returns a SVID if validation succeeds.
NOTE: The change in glide.yaml (moving package github.com/stretchr/testify from testImport to import section) was done to work around a glide issue. Without that change, glide would report an error " Failed to generate lock file: Generating lock produced conflicting versions of github.com/stretchr/testify. import (~1.1.3), testImport (~1.1.0)"
There are a couple of open glide issues related to this:
Masterminds/glide#597
Masterminds/glide#564
Another way to work around it would be to not use glide for k8s.io/client-go dependency but do go get ... from Makefile/build script