New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use built-in SA token refresh functionality within client-go #7023
use built-in SA token refresh functionality within client-go #7023
Conversation
5a94515
to
63a89d0
Compare
@nrfox I think I need to write at least one unit test here (if not more) but I'm still figuring what they will look like. However, I want you to take a look at what I have now. All unit tests pass (indeed, all CI tests are green) but I would like a second pair of eyes on this to make sure I touched all the places (correctly) and that I didn't miss others. |
I'm not sure how to test this. The SA token that gets auto-mounted at Reading some info on the internets, I found this:
But I can't find anything that seems like it will make this easy in, say, a unit test (or even a molecule test). To see the expiration date default, I have Kiali installed and looked at the SA token: $ kubectl exec -it -n istio-system deployments/kiali -- cat /var/run/secrets/kubernetes.io/serviceaccount/token | hack/jwt-decode.sh
Header
{
"alg": "RS256",
"kid": "WaO1w3fabzz0RKyNh5Lp9-8ycXIzFDtZOAQ6KXS1gtk"
}
Payload
{
"aud": [
"https://kubernetes.default.svc.cluster.local"
],
"exp": 1736946190,
"iat": 1705410190,
"iss": "https://kubernetes.default.svc.cluster.local",
"kubernetes.io": {
"namespace": "istio-system",
"pod": {
"name": "kiali-68b89947f-jfrpn",
"uid": "c4777af2-cb0b-4fb9-beb1-5ab2ca7dd298"
},
"serviceaccount": {
"name": "kiali-service-account",
"uid": "66977dbb-1689-4291-8b14-a2e0c45ea2c2"
},
"warnafter": 1705413797
},
"nbf": 1705410190,
"sub": "system:serviceaccount:istio-system:kiali-service-account"
}
Signature is NOT valid That "exp" expiry date ( |
@jmazzitelli iirc the default is 1 hour. If the token gets updated automatically after the expiry time then it should be working.
I don't think we need to test the rotation specifically. We rely on the client-go library to do the rotation as long as we set the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just glancing over everything and it looks good to me. I think you touched all the areas that needed to get cleaned up.
kubernetes/client.go
Outdated
cc.BearerToken = "" // force use of BearerTokenFile so client can refresh the token | ||
cc.BearerTokenFile = remoteSecretPath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to do this for the remote secrets. They will use long lived tokens: https://github.com/kiali/kiali/blob/master/hack/istio/multicluster/kiali-prepare-remote-cluster.sh#L127-L141 and I don't think those can be managed or refreshed by client-go since it is for a service account in another cluster.
kubernetes/client.go
Outdated
@@ -139,6 +145,7 @@ func getConfigForLocalCluster() (*rest.Config, error) { | |||
log.Errorf("Error '%v' getting config for local cluster", err.Error()) | |||
return nil, err | |||
} | |||
incluster.BearerToken = "" // force use of BearerTokenFile so client can refresh the token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to set this explicitly here either since it gets set already in rest.InClusterConfig
: https://github.com/kubernetes/client-go/blob/fb8b7346aacefea5ee2ab2e234afc4451c90c435/rest/config.go#L511-L541.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code now sets BearerTokenFile everywhere that it is known, and it also sets BearerToken to "" everywhere that the BearerTokenFile is known. I did this for two reasons:
- This makes everything consistent - where a file is known, then set it; and blank the token in those instances. Otherwise we'll have code that will look strange to future devs ("why is the file set here but not here?"). Just always set the bearer file everywhere that it is known. The file is used under the covers by the go-client and the token is loaded at that time (at least, that's how I am seeing it work). If, for some reason, setting the BearerTokenFile is a no-op and that file is not read to obtain the token, that we can look to adjust this (but if that really happens and the file is ignored, that sounds like a bug - which gets me to my second reason why I did this...)
- I want to ensure that BearerTokenFile is actually being used - if we set both, we actually cannot be assured all underlying code is using the File if it also has the Token. And we'll never know we have a bug unless some one tests a token past its expiration time (and that would only happen when deployed out in the wild since we don't have any tests that literally test an expiring token).
My latest commit has a unit test that now looks at the bearer token and bearer token file fields to make sure they are set for the home cluster and a remote cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to ensure that BearerTokenFile is actually being used - if we set both, we actually cannot be assured all underlying code is using the File if it also has the Token.
I think you can legitimately set one but not the other i.e. you can set BearerToken
but not BearerTokenFile
. Right now for the remote config, we only override qps
and burst
but all other settings come straight from user configuration and I don't want that to change by overriding more fields:
kiali/kubernetes/client_factory.go
Lines 491 to 497 in 7eb7107
remoteConfig, err := clusterInfo.Config.ClientConfig() | |
if err != nil { | |
return nil, err | |
} | |
// Use the remote config entirely for remote clusters. | |
clientConfig = *remoteConfig |
The only thing that needs to change is for the kiali SA client for the home cluster to also set the BearerTokenFile
. If we were to just use rest.InClusterConfig
then this would be happening already. For instance, I'm not sure why we copy over the token here instead of using rest.InClusterConfig
:
kiali/kubernetes/client_factory.go
Lines 499 to 507 in 7eb7107
// Just read the token and then use the base config. | |
// We're an in cluster client. Read the kiali service account token. | |
kialiToken, err := GetKialiTokenForHomeCluster() | |
if err != nil { | |
return nil, fmt.Errorf("unable to get Kiali service account token: %s", err) | |
} | |
// Copy over the base rest config and the token | |
clientConfig.BearerToken = kialiToken |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can legitimately set one but not the other i.e. you can set
BearerToken
but notBearerTokenFile
.
Docs imply you can set both, but a non-empty File means the Token will be refreshed (except for when using OAuth2, which is why I wasn't sure if I should leave Token as-is or empty it out.. I'm thinking I'm going to revert that change and just leave them both set.). Here's the docs:
https://pkg.go.dev/k8s.io/client-go@v0.28.3/rest#Config.BearerToken
// Server requires Bearer authentication. This client will not attempt to use
// refresh tokens for an OAuth2 flow.
// TODO: demonstrate an OAuth2 compatible client.
BearerToken [string](https://pkg.go.dev/builtin#string) `datapolicy:"token"`
// Path to a file containing a BearerToken.
// If set, the contents are periodically read.
// The last successfully read value takes precedence over BearerToken.
BearerTokenFile [string](https://pkg.go.dev/builtin#string)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance, I'm not sure why we copy over the token here instead of using
rest.InClusterConfig
Yeah, there was a couple things in the code that made me scratch my head wondering why we were doing things. I think changes happened rapidly in here and we just missed a few places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted some of the code I originally pushed. I no longer blank the token. The docs say that the token will be overridden if File is provided and the File is refreshed. My paranoia just caused me to do more work, so I reverted that. I'll trust the go-client does the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance, I'm not sure why we copy over the token here instead of using
rest.InClusterConfig
:
Well, I found out why we do this. It uses the base config that was passed to the factory, not the InClusterConfig. And it does this because it is mock-friendly (we must pass in mock configs when creating factories in tests - indeed, I see errors in the tests when I changed this code to use InClusterConfig).
See: https://github.com/kiali/kiali/blob/v1.78.0/kubernetes/client_factory.go#L119-L121
// newClientFactory allows for specifying the config and expiry duration
// Mock friendly for testing purposes
func newClientFactory(restConfig *rest.Config) (*clientFactory, error) {
ok, I think I found a problem in this entire thing. Unit test failures are bringing this up. func (cf *clientFactory) GetClients(authInfo *api.AuthInfo) (map[string]ClientInterface, error) { expects a token in the authInfo. And I think we use this sometimes with SA auth. |
Ah.. no, I think it is because these unit tests are using some functions as a convenience to create some user tokens, but the functions aren't behaving like they did before. Gonna need to rethink some of the test code here. Instead, I just reverted the code I added that blanks the token. So what I will end up having is what we already have, except now the token file will be set. |
a41b28b
to
ea00ab5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments but otherwise looks good. I'll test this out to make sure that when the token gets rotated, access still works.
kubernetes/client_factory.go
Outdated
// Set the token file so the underlying client can refresh it when needed. | ||
remoteConfig.BearerTokenFile = clusterInfo.SecretFile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still leery of manually overriding these settings and I think this already gets set by client-go
if you have a SecretFile
in your kubeconfig?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not set. I change this line 444 to this:
//MAZZ DELETE remoteConfig.BearerTokenFile = clusterInfo.SecretFile
log.Errorf("MAZZ2 btf=[%v];sf=[%v]", remoteConfig.BearerTokenFile, clusterInfo.SecretFile)
and in the logs I see:
2024-01-17T19:09:02Z ERR MAZZ2 btf=[];sf=[/kiali-remote-cluster-secrets/kiali-remote-cluster-secret-west/west]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But yet when I leave line 444 the way it is here, I get this error:
2024-01-17T19:12:17Z ERR MAZZ2 btf=[/kiali-remote-cluster-secrets/kiali-remote-cluster-secret-west/west];sf=[/kiali-remote-cluster-secrets/kiali-remote-cluster-secret-west/west]
...
W0117 19:12:27.423526 1 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: failed to list *v1.ReplicaSet: Get "https://172.18.0.5:6443/apis/apps/v1/replicasets?limit=500&resourceVersion=0": net/http: invalid header field value for "Authorization"
I suspect I don't understand what value the BearerTokenFile should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was assuming "BearerTokenFile" is the same as the kubeconfig file, but apparently it is not. The BearerTokenFile can be specified in the kubeconfig, and I believe that is what should get set here. We do not have BearerTokenFiles defined in our generated remote cluster secrets.
So I have a feeling this client-go feature is for something slightly different that what we are expecting to be able to use it. I don't think this is something we can use (i.e. I think we should keep our code that refreshes).
@nrfox See if you understand this differently than me, and let me know what I'm missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was assuming "BearerTokenFile" is the same as the kubeconfig file, but apparently it is not.
BearerTokenFile
is not the same as the kubeconfig. It is a file that contains the BearerToken
. In the case of the InClusterConfig
it is just /var/run/secrets/kubernetes.io/serviceaccount/token
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I get it. I was thinking this was going to refresh everything, not just the in-cluster client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this.
kubernetes/client.go
Outdated
if err != nil { | ||
return cc, err | ||
} | ||
cc.BearerTokenFile = remoteSecretPath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here about not overriding what the user provides in the remote kube config. We should be able to verify that if you set SecretFile
in your kubeconfig that client-go
will in turn set BearerTokenFile
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me check that. I was assuming it wasn't going to set that under the covers. But I'll check. This could be why I am getting errors in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this
int tests showing failures like this:
Looks like I nicked something and causing that header to have an invalid value. The logs don't tell me the value, though. |
ff4934c
to
f3c0ab1
Compare
OK, now I only have the bearer token file explicitly being set for when using the in-cluster SA token. |
c076628
to
bd4fd3d
Compare
bd4fd3d
to
9bfe966
Compare
fixes: #6924