-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MACProvider NullPointerException when validating JWT #209
Comments
which micronaut version and which version of Micronaut security ? |
micronaut-bom:1.3.3, micronaut-security-jwt:1.3.0, micronaut.configuration:micronaut-security-oauth2:1.3.0 |
We are experiencing the identical intermittent failure on a system that we are trying to prepare for Production. For us it is happening on AWS Elastic Beanstalk, with Micronaut version 1.3.4 and the Auth0 OAuth setup. We see exactly the same stack trace. It appears that in class For us, it appears that it's an intermittent startup problem. Sometimes the EB instance starts up fine and everything runs with no problem. Other times something seems to go wrong and the config is not picked up correctly. We've not seen it fail after it has started up ok. I added a test endpoint (anonymous) to display the JVM environment variables and the expected env vars are present and have the expected values as set in the AWS config. The handful of unprotected endpoints we have seem to respond as expected, but anything that requires JWT validation blows up. We set the system-specific properties via env vars. In the case of the This appears to be a race condition or some other timing-dependent issue. When it happens, we can bounce the app and it will usually startup ok with no problems. Sometimes it will have the problem several times in a row. We're not sure, but it seems to happen more frequently now. It would be helpful if this null pointer was handled more gracefully. Preferably, by preventing startup and reporting an explanation in the log. As it is, end users see something that looks broken and the stack trace in the log does not offer any clues. Please advise when a fix is available for this. It's holding up our upgrade to production (planned for this week) until we can rely on deployments running smoothly. Please let me know if I can provide any additional information. Unfortunately, I don't know where to go from here in terms of debugging the code myself. |
Thanks for the report. The NPE does need to be handled more gracefully but I believe a better fix is to add more validation at startup. Could you try this as a workaround whilst we investigate: @Context
@Named("generator")
class GeneratorSecretSignature extends SecretSignature {
GeneratorSecretSignature(@Property(name="generator.secret") String secret) {
super(createSecretConfig(secret));
}
private static SecretSignatureConfiguration createSecretConfig(String secret) {
SecretSignatureConfiguration config = new SecretSignatureConfiguration("generator");
config.setSecret(secret);
return config;
}
} Then set If you are able to provide an example with steps to reproduce (even if it happens intermittently) that would help too. Thanks for the report. |
Thank you for the workaround. I was about to do something like that anyway so you've saved me the trouble of creating the bean myself. I only hope it doesn't have the same issue resolving that property as it does the original one. If it does, I guess that will be a useful thing to know. In terms of more gracefully handling the exception, I was referring to startup as well as a more helpful error message. JWT signing/verification is an essential service and the app is useless without it. Perhaps it's a good idea to have the initialization for JWT done eagerly so that Micronaut can abort startup if there's a problem, rather than waiting for the first use of JWT validation. When it's deferred as it is, the AWS console shows that the app has started up and is responded fine to hearbeat requests. Even when users are seeing errors, the status of the app shows green. I can share info about the AWS Elastic Beanstalk instance we're using if that would be helpful. It's pretty generic. We use Terraform to configure them. We've had similar instances running for well over a year running Micronaut apps with no problems. We're building and deploying with CodeFresh, Shippable before that. Only on the dev branch where we're implementing the switch to Auth0 has this issue popped up, so I think it's something specific to that codebase. When we deploy a new build, or change environment variables, and reboot, this problem randomly occurs. We could probably give you access to the instance when it's experiencing the problem (though I don't know what you could discover from that). We'd also be happy to test candidate fixes for you if need be. This is a showstopper bug for us at the moment (fingers crossed for the workaround). Thanks |
Yes I agree that the verification should be done as early as possible. Thanks for your feedback |
Thanks for the workaround. We will test and post here the results. |
Tried, but we now receive an 401 response. |
Some notes: |
Yes with my suggested workaround you have to remove MICRONAUT_SECURITY_TOKEN_JWT_SIGNATURES_SECRET_GENERATOR_SECRET |
I removed the MICRONAUT_SECURITY_TOKEN_JWT_SIGNATURES_SECRET_GENERATOR_SECRET and added the GENERATOR_SECRET with the workaround and didn't work (I was getting 401 responses). Then I removed the workaround and let only the GENERATOR_SECRET environment variable and worked |
Just to update: Just renaming the environment variable MICRONAUT_SECURITY_TOKEN_JWT_SIGNATURES_SECRET_GENERATOR_SECRET to GENERATOR_SECRET didn't seems to work. I think that we had some problems with the kubernets pod configuration that keep the old var setted. |
I have improved the error message, however without an example that reproduces the issue of the config being null, I don't know what to do there. If there is an issue, it would likely be in core as this library is reading config in a standard way. |
I did some experimentation with this yesterday and can reliably and easily reproduce this problem. It basically boils down to the fact that specifying the property via an environment variable doesn't work. There's also an issue that not specifying the signing secret does not cause a startup failure. Without the workaround bean above in place, I tried the four combinations, defining the property in the
The config we had was the second, the secret defined in the env with a fallback in It's worrying that startup is not aborted when this critical property is not defined. This leads to mysterious and inexplicable behavior that's very hard to debug. I'm guessing that a The workaround bean is working fine for me. I discovered that with it in scope, and I added an additional check to your workaround bean, just FYI
Please let me know if you need any further information from me. Cheers. |
Thanks for the comprehensive feedback. @jameskleeh the change we should make is to initialize the config early and add validation to fail fast. This should be done for all security related configuration IMO. |
If the secret is null it will now fail on application startup |
@jameskleeh can you reference the commit with the change? |
Thanks for the workaround. I made some testing and made a Koltin variation of the code written by @ronhitchens and now is working. Probably I was facing some problems with the environment variables. |
Still having a similar issue with AWS cognito and micronaut. It happens when I restart the application or upload a new version to the server.
|
We have a Micronaut application deployed via Kubernetes. We use 3 pods with the same image.
Our Micronaut setup has JWT enabled with Oauth2 via Cognito for authentication. Our main development language is Kotlin.
Locally, the setup works. However, in production and staging/develop environment, we are getting intermittent
NullPointerException
s fromMACProvider.java
after the pods have been running for a while. When a pod starts throwing this exception, it will throw it for every request, even if you log-out and log-in again or use a token that was previouslly working.A pod failure seems to not affect the other running pods and it appears to happen randomly. So, when your request is routed to the failing pod, it will return a
500
HTTP status with{"message":"Internal Server Error: null"}
. However, when your request is routed to a working pod, it works. So requests become intermittent: sometimes they work, sometimes they fail.We have no idea about what causes it, neither how to properly debug it. This lead us to think this may be a bug or unintended behaviour from the Micronaut Security library.
The stacktrace is, as follows:
The text was updated successfully, but these errors were encountered: