New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka keycloak authentication issues possible token expiry #232
Comments
A full stack trace would help here, and all the log files you can provide. Do you see any exceptions on your clients indicating that they were unable to obtain a new access token during re-login? Do you see any other errors or warnings in your broker logs? Your connections.max.reauth.ms looks like it's set at 2 minutes. That's quite short. If Keycloak issues tokens with 5 minutes expiry but the broker requires clients to re-authenticate every 2 minutes then maybe some dynamic might possibly occur where the client (which re-logins based on the 5 minute expiry with some jitter window) might authenticate to the broker with a token that has very short validity time left. Just guessing. A full DEBUG log from broker would be required to try and get to the root of the problem. |
One thing I just noticed in your configuration ... You have I wonder if it has something to do with your issue. What it means is that there is no client re-login happening on the client since clients don't use SASL_OAUTHBEARER to connect to kafka, but use SASL_PLAIN and send client id and secret as username and password. Are you by any chance configuring your clients with access tokens rather than secrets? |
The
The clients are using the secrets from the service account and there are no access tokens in our configuration |
I was hoping that making the max auth significantly shorter would avoid any race conditions compared to setting it to the same value are there any recommendations on what to do here - should I use 4 minutes? |
Sadly debug wasn't running at the beginning of the issue and then when I turned it on the logs were flooded with these failure messages until it resolved itself. It looks like I have around 30 minutes of these failure messages. The messages in the log start and end with the same message there is no token initiation message. I've left debug on for KeycloakRBACAuthorizer would you need anything else in terms of debugging? I'm not against supplying the logs but there are some 11million log messages just for KeycloakRBACAuthorizer in this time period and from my initial analysis I'm not sure I've captured what we need to diagnose this - this time. The client stack trace was:
|
Another approach would be if you can replicate the issue in a smaller setup - only one client. Setting |
It happened again today I can find an example of a token first log entry at 9:09:48 when it was working and then expired at 09:11:56 first entry for this token: "2024-03-19 09:09:48,936 DEBUG Got grants for 'OAuthKafkaPrincipal(User:service-account-kafka-all-access, groups: null, session: 1113386382, token: eyJh**NIZA)': [{"scopes":["Describe","Alter","IdempotentWrite","ClusterAction","Create","DescribeConfigs","AlterConfigs"],"rsid":"c817583a-9a9d-4acf-9c3c-111f51e06cf4","rsname":"Cluster:"},{"scopes":["Write","Describe","Read","Alter","Delete","Create","DescribeConfigs","AlterConfigs"],"rsid":"cc0009ab-6f78-4737-993e-adf09e67dd1e","rsname":"Topic:"},{"scopes":["Describe","Read","Delete"],"rsid":"781a7395-fe5b-4a4a-a372-f6a4bf6d0797","rsname":"Group:"},{"scopes":["Write","Describe"],"rsid":"6282daff-d792-457b-8eb8-b2760b88d82a","rsname":"TransactionalId:"}] (io.strimzi.kafka.oauth.server.authorizer.KeycloakRBACAuthorizer) [data-plane-kafka-request-handler-7]" last working entry: then failure: There are a lot of failures for a lot of tokens Shouldn't the broker refresh the token when it get's this error? |
The way it works is that when the token expires, the broker should automatically require the Kafka client to reauthenticate (provided that When in this So, one option to try is to set Another option to try is to catch the exception on the client and close you current KafkaProducer / KafkaConsumer and create a new instance which will start a new session. See: README Yet another option is to enable OAUTHBEARER mode rather than using PLAIN. |
Should the connections.max.reauth.ms be set on the broker or client? |
One thing I note is I'm using the new KRAFT mode and from the readme it should be using KeycloakAuthorizer from the logging it looks like I'm using KeycloakRBACAuthorizer I see there is also the oauth.max.token.expiry.seconds but it says this would only be used to force more frequent refreshes than necessary. |
That's a broker-side setting. If using Strimzi Operator use |
You should see both in the log because KeycloakAuthorizer instantiates and delegates to KeycloakRBACAuthorizer. At the start of Kafka broker log there is a full dump of the strimzi.properties file. In there the authorizer installed should be KeycloakAuthorizer.
That's a client-side setting for OAUTHBEARER which would not have any effect for you since your clients use PLAIN. |
Great, that is as you say
OK thanks. We have another 3 deployments of strimzi and they have not experienced this issue as far as I know, but this is the only deployment using kraft. We are using the operator I'll add
|
That's an interesting observation. |
what's the differnce between the |
The |
The issue is back even with the |
Have you tried disabling PLAIN mode, and only using OAUTHBEARER on your clients? I can only troubleshoot further if I can reproduce it locally, so you have to be able to reliably reproduce it locally and provide instructions on how to reproduce. Ideally you put together a reproducer - an example setup that anyone can start up locally and see the failures happen. |
It might be that this only happens under load and it's not a simple quite setup that reproduces this. |
We've been running kafka authenticating with keycloak for about a month without issue when last week all authentication suddendly failed
Clients were getting the error:
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TopicAuthorizationException: Topic authorization failed.
and there were no logs showing anything interesting on keycloak or kafka
Turning on debugging for kafka I saw hundreds of mesages of the form:
"2024-03-15 10:20:57,439 DEBUG Authorization DENIED due to token expiry - The token expired at: 1710498016000 (2024-03-15T10:20:16 UTC), for token: eyJh**86eA (io.strimzi.kafka.oauth.server.authorizer.KeycloakRBACAuthorizer.deny) [data-plane-kafka-request-handler-7]"
I tried restarting the clients and that made no difference but I believe that when I restarted all the brokers with the new config the error seemed to stop
I'm a little confused how restarting the brokers could fix this yet also see this error after the restart.
If this is a non recoverable error then I the logging level of debug is perhaps too low?
I already have set
"connections.max.reauth.ms" = 120000
and checked on the brokers and I can see that config in the /tmp/strimzi.properties
Our config is:
Keycloak settings - I think these are the relevant ones:
Do you know what timeout would be causing the token expiry I would have assumed the token lifespan and why the token isn't auto refreshing, It sounded from the config for
oauth.max.token.expiry.seconds
like it should automatically refresh?The text was updated successfully, but these errors were encountered: