Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use built-in SA token refresh functionality within client-go #7023

Merged
merged 1 commit into from Jan 23, 2024

Conversation

jmazzitelli
Copy link
Collaborator

fixes: #6924

@jmazzitelli jmazzitelli self-assigned this Jan 12, 2024
@jmazzitelli jmazzitelli force-pushed the 6924-refresh-cache-via-client branch 5 times, most recently from 5a94515 to 63a89d0 Compare January 12, 2024 19:07
@jmazzitelli
Copy link
Collaborator Author

@nrfox I think I need to write at least one unit test here (if not more) but I'm still figuring what they will look like. However, I want you to take a look at what I have now. All unit tests pass (indeed, all CI tests are green) but I would like a second pair of eyes on this to make sure I touched all the places (correctly) and that I didn't miss others.

@jmazzitelli
Copy link
Collaborator Author

I'm not sure how to test this. The SA token that gets auto-mounted at /var/run/secrets/kubernetes.io/serviceaccount/token has a long expiration time. I think we'd need to customize the cluster (at startup) to change that.

Reading some info on the internets, I found this:

But I can't find anything that seems like it will make this easy in, say, a unit test (or even a molecule test).

To see the expiration date default, I have Kiali installed and looked at the SA token:

$ kubectl exec -it -n istio-system deployments/kiali -- cat /var/run/secrets/kubernetes.io/serviceaccount/token | hack/jwt-decode.sh 
Header
{
  "alg": "RS256",
  "kid": "WaO1w3fabzz0RKyNh5Lp9-8ycXIzFDtZOAQ6KXS1gtk"
}
Payload
{
  "aud": [
    "https://kubernetes.default.svc.cluster.local"
  ],
  "exp": 1736946190,
  "iat": 1705410190,
  "iss": "https://kubernetes.default.svc.cluster.local",
  "kubernetes.io": {
    "namespace": "istio-system",
    "pod": {
      "name": "kiali-68b89947f-jfrpn",
      "uid": "c4777af2-cb0b-4fb9-beb1-5ab2ca7dd298"
    },
    "serviceaccount": {
      "name": "kiali-service-account",
      "uid": "66977dbb-1689-4291-8b14-a2e0c45ea2c2"
    },
    "warnafter": 1705413797
  },
  "nbf": 1705410190,
  "sub": "system:serviceaccount:istio-system:kiali-service-account"
}
Signature is NOT valid

That "exp" expiry date (1736946190) is a 1-year expiration date (Wednesday, 15 January 2025).

@nrfox
Copy link
Contributor

nrfox commented Jan 16, 2024

I'm not sure how to test this. The SA token that gets auto-mounted at /var/run/secrets/kubernetes.io/serviceaccount/token has a long expiration time. I think we'd need to customize the cluster (at startup) to change that.

@jmazzitelli iirc the default is 1 hour. If the token gets updated automatically after the expiry time then it should be working.

But I can't find anything that seems like it will make this easy in, say, a unit test (or even a molecule test).

I don't think we need to test the rotation specifically. We rely on the client-go library to do the rotation as long as we set the BearerTokenFile field so we can test that we set the field appropriately. Also iirc you can't set the expiry time lower than some threshold (15 mins mabye) so writing an automated real world test for this would require a long running test.

Copy link
Contributor

@nrfox nrfox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just glancing over everything and it looks good to me. I think you touched all the areas that needed to get cleaned up.

cc.BearerToken = "" // force use of BearerTokenFile so client can refresh the token
cc.BearerTokenFile = remoteSecretPath
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to do this for the remote secrets. They will use long lived tokens: https://github.com/kiali/kiali/blob/master/hack/istio/multicluster/kiali-prepare-remote-cluster.sh#L127-L141 and I don't think those can be managed or refreshed by client-go since it is for a service account in another cluster.

@@ -139,6 +145,7 @@ func getConfigForLocalCluster() (*rest.Config, error) {
log.Errorf("Error '%v' getting config for local cluster", err.Error())
return nil, err
}
incluster.BearerToken = "" // force use of BearerTokenFile so client can refresh the token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to set this explicitly here either since it gets set already in rest.InClusterConfig: https://github.com/kubernetes/client-go/blob/fb8b7346aacefea5ee2ab2e234afc4451c90c435/rest/config.go#L511-L541.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code now sets BearerTokenFile everywhere that it is known, and it also sets BearerToken to "" everywhere that the BearerTokenFile is known. I did this for two reasons:

  1. This makes everything consistent - where a file is known, then set it; and blank the token in those instances. Otherwise we'll have code that will look strange to future devs ("why is the file set here but not here?"). Just always set the bearer file everywhere that it is known. The file is used under the covers by the go-client and the token is loaded at that time (at least, that's how I am seeing it work). If, for some reason, setting the BearerTokenFile is a no-op and that file is not read to obtain the token, that we can look to adjust this (but if that really happens and the file is ignored, that sounds like a bug - which gets me to my second reason why I did this...)
  2. I want to ensure that BearerTokenFile is actually being used - if we set both, we actually cannot be assured all underlying code is using the File if it also has the Token. And we'll never know we have a bug unless some one tests a token past its expiration time (and that would only happen when deployed out in the wild since we don't have any tests that literally test an expiring token).

My latest commit has a unit test that now looks at the bearer token and bearer token file fields to make sure they are set for the home cluster and a remote cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to ensure that BearerTokenFile is actually being used - if we set both, we actually cannot be assured all underlying code is using the File if it also has the Token.

I think you can legitimately set one but not the other i.e. you can set BearerToken but not BearerTokenFile. Right now for the remote config, we only override qps and burst but all other settings come straight from user configuration and I don't want that to change by overriding more fields:

remoteConfig, err := clusterInfo.Config.ClientConfig()
if err != nil {
return nil, err
}
// Use the remote config entirely for remote clusters.
clientConfig = *remoteConfig

The only thing that needs to change is for the kiali SA client for the home cluster to also set the BearerTokenFile. If we were to just use rest.InClusterConfig then this would be happening already. For instance, I'm not sure why we copy over the token here instead of using rest.InClusterConfig:

// Just read the token and then use the base config.
// We're an in cluster client. Read the kiali service account token.
kialiToken, err := GetKialiTokenForHomeCluster()
if err != nil {
return nil, fmt.Errorf("unable to get Kiali service account token: %s", err)
}
// Copy over the base rest config and the token
clientConfig.BearerToken = kialiToken

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can legitimately set one but not the other i.e. you can set BearerToken but not BearerTokenFile.

Docs imply you can set both, but a non-empty File means the Token will be refreshed (except for when using OAuth2, which is why I wasn't sure if I should leave Token as-is or empty it out.. I'm thinking I'm going to revert that change and just leave them both set.). Here's the docs:

https://pkg.go.dev/k8s.io/client-go@v0.28.3/rest#Config.BearerToken

// Server requires Bearer authentication. This client will not attempt to use
// refresh tokens for an OAuth2 flow.
// TODO: demonstrate an OAuth2 compatible client.
BearerToken [string](https://pkg.go.dev/builtin#string) `datapolicy:"token"`

// Path to a file containing a BearerToken.
// If set, the contents are periodically read.
// The last successfully read value takes precedence over BearerToken.
BearerTokenFile [string](https://pkg.go.dev/builtin#string)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, I'm not sure why we copy over the token here instead of using rest.InClusterConfig

Yeah, there was a couple things in the code that made me scratch my head wondering why we were doing things. I think changes happened rapidly in here and we just missed a few places.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted some of the code I originally pushed. I no longer blank the token. The docs say that the token will be overridden if File is provided and the File is refreshed. My paranoia just caused me to do more work, so I reverted that. I'll trust the go-client does the right thing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, I'm not sure why we copy over the token here instead of using rest.InClusterConfig:

Well, I found out why we do this. It uses the base config that was passed to the factory, not the InClusterConfig. And it does this because it is mock-friendly (we must pass in mock configs when creating factories in tests - indeed, I see errors in the tests when I changed this code to use InClusterConfig).

See: https://github.com/kiali/kiali/blob/v1.78.0/kubernetes/client_factory.go#L119-L121

// newClientFactory allows for specifying the config and expiry duration
// Mock friendly for testing purposes
func newClientFactory(restConfig *rest.Config) (*clientFactory, error) {

@jmazzitelli
Copy link
Collaborator Author

ok, I think I found a problem in this entire thing. Unit test failures are bringing this up.

func (cf *clientFactory) GetClients(authInfo *api.AuthInfo) (map[string]ClientInterface, error) {

expects a token in the authInfo. And I think we use this sometimes with SA auth.

@jmazzitelli
Copy link
Collaborator Author

jmazzitelli commented Jan 16, 2024

expects a token in the authInfo. And I think we use this sometimes with SA auth.

Ah.. no, I think it is because these unit tests are using some functions as a convenience to create some user tokens, but the functions aren't behaving like they did before. Gonna need to rethink some of the test code here.

Instead, I just reverted the code I added that blanks the token. So what I will end up having is what we already have, except now the token file will be set.

Copy link
Contributor

@nrfox nrfox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments but otherwise looks good. I'll test this out to make sure that when the token gets rotated, access still works.

Comment on lines 443 to 444
// Set the token file so the underlying client can refresh it when needed.
remoteConfig.BearerTokenFile = clusterInfo.SecretFile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still leery of manually overriding these settings and I think this already gets set by client-go if you have a SecretFile in your kubeconfig?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not set. I change this line 444 to this:

		//MAZZ DELETE remoteConfig.BearerTokenFile = clusterInfo.SecretFile
		log.Errorf("MAZZ2 btf=[%v];sf=[%v]", remoteConfig.BearerTokenFile, clusterInfo.SecretFile)

and in the logs I see:

2024-01-17T19:09:02Z ERR MAZZ2 btf=[];sf=[/kiali-remote-cluster-secrets/kiali-remote-cluster-secret-west/west]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yet when I leave line 444 the way it is here, I get this error:

2024-01-17T19:12:17Z ERR MAZZ2 btf=[/kiali-remote-cluster-secrets/kiali-remote-cluster-secret-west/west];sf=[/kiali-remote-cluster-secrets/kiali-remote-cluster-secret-west/west]
...
W0117 19:12:27.423526       1 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: failed to list *v1.ReplicaSet: Get "https://172.18.0.5:6443/apis/apps/v1/replicasets?limit=500&resourceVersion=0": net/http: invalid header field value for "Authorization"

I suspect I don't understand what value the BearerTokenFile should be.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was assuming "BearerTokenFile" is the same as the kubeconfig file, but apparently it is not. The BearerTokenFile can be specified in the kubeconfig, and I believe that is what should get set here. We do not have BearerTokenFiles defined in our generated remote cluster secrets.

So I have a feeling this client-go feature is for something slightly different that what we are expecting to be able to use it. I don't think this is something we can use (i.e. I think we should keep our code that refreshes).

@nrfox See if you understand this differently than me, and let me know what I'm missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was assuming "BearerTokenFile" is the same as the kubeconfig file, but apparently it is not.

BearerTokenFile is not the same as the kubeconfig. It is a file that contains the BearerToken. In the case of the InClusterConfig it is just /var/run/secrets/kubernetes.io/serviceaccount/token.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I get it. I was thinking this was going to refresh everything, not just the in-cluster client.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this.

if err != nil {
return cc, err
}
cc.BearerTokenFile = remoteSecretPath
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about not overriding what the user provides in the remote kube config. We should be able to verify that if you set SecretFile in your kubeconfig that client-go will in turn set BearerTokenFile.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me check that. I was assuming it wasn't going to set that under the covers. But I'll check. This could be why I am getting errors in CI.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this

@jmazzitelli
Copy link
Collaborator Author

int tests showing failures like this:

E0116 20:51:37.072155       1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Failed to watch *v1.Deployment: failed to list *v1.Deployment: Get "https://172.18.0.5:6443/apis/apps/v1/deployments?limit=500&resourceVersion=0": net/http: invalid header field value for "Authorization"

Looks like I nicked something and causing that header to have an invalid value. The logs don't tell me the value, though.

@jmazzitelli
Copy link
Collaborator Author

OK, now I only have the bearer token file explicitly being set for when using the in-cluster SA token.

@jmazzitelli jmazzitelli force-pushed the 6924-refresh-cache-via-client branch 2 times, most recently from c076628 to bd4fd3d Compare January 17, 2024 21:32
@jmazzitelli jmazzitelli marked this pull request as ready for review January 17, 2024 22:21
@jmazzitelli jmazzitelli merged commit ce3d19f into kiali:master Jan 23, 2024
8 checks passed
@jmazzitelli jmazzitelli deleted the 6924-refresh-cache-via-client branch January 23, 2024 14:19
@jshaughn jshaughn added test: n/a PR does not need test additions or updates test: back-end/integration PR adds/updates back-end tests (unit and/or integration) and removed test: n/a PR does not need test additions or updates labels Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test: back-end/integration PR adds/updates back-end tests (unit and/or integration)
Projects
Development

Successfully merging this pull request may close these issues.

Use client-go's service account token client refresh
3 participants