Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fire OIDC security event when OIDC server not available #37760

Conversation

michalvavrik
Copy link
Contributor

closes: #35920

@quarkus-bot

This comment has been minimized.

@sberyozkin
Copy link
Member

@michalvavrik Thanks, great follow up. I'm no sure right now how many times it will be sent, for example, if the URL is not even set or mistyped then the server will not start - I believe this event should not be sent in these cases, please check.

Also, if a connection error happens for statically configured OIDCs, then a retry will be made at the request time. I wonder if sending this event at the first failed attempt time, will confuse the consumers, as opposed to sending it after the retry time failure ? They may fire some red alarm ex in RH Insights, but then 1 min later OIDC will be up...

May be we should support an OIDC_AVAILABLE event ?

@michalvavrik
Copy link
Contributor Author

@michalvavrik Thanks, great follow up. I'm no sure right now how many times it will be sent,

  • it is fired every time Quarkus tries to connect to the OIDC server and OIDC endpoint is not available
  • you can see it in the logs as OIDC recorder logs information in the very same situation (equal): OIDC server is not available at the '...' URL.

for example, if the URL is not even set

I've tried following steps:

  1. quarkus create app org.acme:security-openid-connect-multi-tenancy-quickstart --extension='oidc,resteasy-reactive-jackson' --no-code
  2. cd security-openid-connect-multi-tenancy-quickstart/
  3. mvn clean package
  4. java -jar target/quarkus-app/quarkus-run.jar

with result 'quarkus.oidc.auth-server-url' property must be configured which is fired here

throw new ConfigurationException("'quarkus.oidc.auth-server-url' property must be configured");

therefore I don't think that could be a case.

or mistyped

Validation happens here

OidcCommonUtils.verifyEndpointUrl(oidcConfig.getAuthServerUrl().get());

while the long flow of unis that at certain point when no other uni failed starts here (which is later):

return createOidcProvider(oidcConfig, tlsConfig, vertx)

then the server will not start - I believe this event should not be sent in these cases, please check.

It is not. If you need me to write tests, I didn't test when this "should not happen" because I suppose list could be long (potentially endless), however I'll add any test you require.

Also, if a connection error happens for statically configured OIDCs, then a retry will be made at the request time. I wonder if sending this event at the first failed attempt time, will confuse the consumers, as opposed to sending it after the retry time failure ? They may fire some red alarm ex in RH Insights, but then 1 min later OIDC will be up...

We are talking about Uni, which should mean there is item or failure once. Therefore idea that retry will lead to multiple exceptions -> multiple events is not correct. I checked here https://smallrye.io/smallrye-mutiny/2.5.3/tutorials/retrying/ and I think this sentence should give us certainty: If despite multiple attempts, it still fails, the failure is propagated downstream.

For that reason, I believe that retry like this one

that later leads to reporting event

return toOidcException(t, oidcConfig.authServerUrl.get());

is fired when OIDC server is not available.

They may fire some red alarm ex in RH Insights, but then 1 min later OIDC will be up...

1 min in OpenShift where Deployment may scale in a seconds and downscale again based on load and POD health is eternity. Why would you like to disclose information from users? I believe your worries needs to be documented, rather than not fire event.

It sounds to me like you believe there is nothing reasonable users can do with this security event when 1 min later OIDC might be available. But based on that event, they can re-check health of OIDC server from whole different place than this Quarkus app.

May be we should support an OIDC_AVAILABLE event ?

That is sort of thing users can check easily with call to OIDC server from Quarkus app when they know there is a problem. I can add it if you want, but it wasn't part of #35920 so I didn't add it here. Your call.

Thanks @sberyozkin , I'll wait for feedback before I do any changes, as right now it can be anywhere between drop constructor and add SERVER URL and rewrite whole PR because your arguments are not sound enough.

@michalvavrik michalvavrik force-pushed the feature/oidc-event-for-endpoint-not-available branch from 39a0e35 to 9dbaca4 Compare December 15, 2023 11:11
@michalvavrik
Copy link
Contributor Author

michalvavrik commented Dec 15, 2023

I've decided to address the comment I agree with after all. Done.

@quarkus-bot

This comment has been minimized.

@sberyozkin
Copy link
Member

Hi @michalvavrik Sorry for a delay.

As far as a retry is concerned: I don't refer to that Uni retry, if the connection fails at the start up, then it is pending a retry when the first client request arrives, so at that point a new attempt to get that Uni based connection (with its own retries) is done.

Does it clarify it ?

@michalvavrik
Copy link
Contributor Author

Hi @michalvavrik Sorry for a delay.

No hurry, I sense this PR will take a while.

As far as a retry is concerned: I don't refer to that Uni retry, if the connection fails at the start up, then it is pending a retry when the first client request arrives, so at that point a new attempt to get that Uni based connection (with its own retries) is done.

I'm well aware, I tried to answer this here #37760 (comment). The idea that it is alright that we do not fire event because it might later be alright means users is loosing opportunity to react ASAP. When it fails on first request, it is worse situation than when it fails on startup. What matters are user requests.

Does it clarify it ?

Yes.

@sberyozkin
Copy link
Member

@michalvavrik The PR is perfect, I'm just thinking about that retry thing, the existing oidc-wiremock test on main demonstrates it, it fails to connect at the start up, but then Wiremock server is started and the connection is established when the test makes a request. I believe there should be an option to observe that OIDC server is available (meaning the previous alarm can be dropped)

@michalvavrik
Copy link
Contributor Author

@michalvavrik The PR is perfect, I'm just thinking about that retry thing, the existing oidc-wiremock test on main demonstrates it, it fails to connect at the start up, but then Wiremock server is started and the connection is established when the test makes a request. I believe there should be an option to observe that OIDC server is available (meaning the previous alarm can be dropped)

Sure, I'll add the OIDC_AVAILABLE event. I'm not sure about name, I'll use OIDC_SERVER_AVAILABLE but you can always request changes later. Thanks

@michalvavrik michalvavrik force-pushed the feature/oidc-event-for-endpoint-not-available branch from 9dbaca4 to b2bd004 Compare December 15, 2023 18:49
@michalvavrik
Copy link
Contributor Author

Done, I became worried that we will keep sending events for every tenant even though there were no issues every time there is an attempt to connect to the OIDC endpoint, therefore the event reporting that OIDC server is available is only reported once for the tenant that had previous connection failure.

@michalvavrik michalvavrik force-pushed the feature/oidc-event-for-endpoint-not-available branch from b2bd004 to 6ea3670 Compare December 15, 2023 19:21
@quarkus-bot

This comment has been minimized.

@michalvavrik michalvavrik marked this pull request as draft December 15, 2023 20:42
@michalvavrik michalvavrik marked this pull request as ready for review December 15, 2023 21:30
@michalvavrik michalvavrik force-pushed the feature/oidc-event-for-endpoint-not-available branch from 6ea3670 to de65362 Compare December 15, 2023 21:30
@quarkus-bot

This comment has been minimized.

@michalvavrik michalvavrik force-pushed the feature/oidc-event-for-endpoint-not-available branch from de65362 to 1c5af65 Compare December 16, 2023 11:32
@michalvavrik michalvavrik force-pushed the feature/oidc-event-for-endpoint-not-available branch from 5853136 to 8811efc Compare December 16, 2023 11:39
@quarkus-bot quarkus-bot bot added this to To do in Quarkus Documentation Dec 16, 2023
Copy link

github-actions bot commented Dec 16, 2023

🙈 The PR is closed and the preview is expired.

@quarkus-bot
Copy link

quarkus-bot bot commented Dec 16, 2023

✔️ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

}

private static boolean shouldFireOidcServerAvailableEvent(String tenantId) {
return tenantsExpectingServerAvailableEvents.contains(tenantId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michalvavrik This is the last one I think, that should probably remove the tenant from the map, I guess, otherwise, when a dynamic TenantConfigResolver is used, it might keep growing even if only when connection failures occur

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sberyozkin I'll need little more to understand Sergey. Can you give me hint what you mean, please? Please note that tenant is removed when OIDC available event is reported. I don't want to remove it before that happens, because then I wouldn't have certainty that event were reported.

@sberyozkin sberyozkin self-requested a review December 17, 2023 22:04
Quarkus Documentation automation moved this from To do to Reviewer approved Dec 17, 2023
@sberyozkin sberyozkin merged commit cdbb8e1 into quarkusio:main Dec 17, 2023
24 checks passed
Quarkus Documentation automation moved this from Reviewer approved to Done Dec 17, 2023
@quarkus-bot quarkus-bot bot added the kind/enhancement New feature or request label Dec 17, 2023
@quarkus-bot quarkus-bot bot added this to the 3.7 - main milestone Dec 17, 2023
@michalvavrik michalvavrik deleted the feature/oidc-event-for-endpoint-not-available branch December 17, 2023 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Unable to add additional handling when OIDC provider is not available
2 participants