Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jpa Map Storage: locked user sessions lead to read timeouts #13348

Closed
sguilhen opened this issue Jul 27, 2022 · 1 comment · Fixed by #13889
Closed

Jpa Map Storage: locked user sessions lead to read timeouts #13348

sguilhen opened this issue Jul 27, 2022 · 1 comment · Fixed by #13889
Assignees
Labels
area/storage Indicates an issue that touches storage (change in data layout or data manipulation) impact/medium kind/bug Categorizes a PR related to a bug
Milestone

Comments

@sguilhen
Copy link
Contributor

sguilhen commented Jul 27, 2022

Describe the bug

When running the broker tests, the logout process ends up with exceptions like the one bellow:

23:15:40,417 WARN  [org.keycloak.broker.oidc.OIDCIdentityProvider] Failed backchannel broker logout to: https://localhost:8543/auth/realms/provider/protocol/openid-connect/logout?state=2af21a4b-5222-452e-9339-cd76999045a8&id_token_hint=eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJhWWxYbVFhbERjVWM1d2JFNVBON0RKSURGRTQzXzNvZEpJQ2cyNFlFaWxRIn0.eyJleHAiOjE2NTg4ODg0MzQsImlhdCI6MTY1ODg4ODEzNCwiYXV0aF90aW1lIjoxNjU4ODg4MTM0LCJqdGkiOiIwZGQ2Y2Q1MC1iN2NlLTRkOTUtOTAwMC0wNTRkYjVmNDA4MjAiLCJpc3MiOiJodHRwczovL2xvY2FsaG9zdDo4NTQzL2F1dGgvcmVhbG1zL3Byb3ZpZGVyIiwiYXVkIjoiYnJva2VyYXBwIiwic3ViIjoiZmE2MzEyZmQtZjc3MS00NGNiLWFiYzQtNGNkZDlkN2NjZTUyIiwidHlwIjoiSUQiLCJhenAiOiJicm9rZXJhcHAiLCJub25jZSI6Im5talkzYjhRMmlhZF8xVkNGWjF1ZEEiLCJzZXNzaW9uX3N0YXRlIjoiZTM3OWY1YWUtNjM2ZC00NGI5LWE0ZmMtYWExNTk2NWE1MzQ3IiwiYXRfaGFzaCI6IlMxdE1aNWlkdTNUYXNka0FGSmh1QmciLCJhY3IiOiIxIiwic2lkIjoiZTM3OWY1YWUtNjM2ZC00NGI5LWE0ZmMtYWExNTk2NWE1MzQ3IiwidXNlci1jbGFpbSI6eyJ0ZXN0IjoidmFsdWUifSwiZW1haWxfdmVyaWZpZWQiOnRydWUsInByZWZlcnJlZF91c2VybmFtZSI6InRlc3R1c2VyIiwiZW1haWwiOiJ1c2VyQGxvY2FsaG9zdC5jb20ifQ.XUR4DCJ_wRWOFz_SU3kfFNyUEx7h8aqr_qn7SJ2oshnSTzGqPJFqLBJMrV-nwT2OefcZPly8NphmkvC_xzoJyqSbaY8gY-UbUaGZyd4OZPGSySL8V-aEQHQoKhSC5L6xn0_e-FgFDMgmrmtjlc_tVen4n4-pYxnf9V8LuP973IrZ5zTrRd4RMV9FxLH2WaOu3CrYv581LlSxEdz6bAlkKKz9L500QkokHdUT7qdoWTCykkVujtYlz614-VmUmeEvCEvzB8r_pcUzyQPADEP00r7SnpldzzrEQHVc5Oy0cqzb_RRcAYGK_j61iRB2oV1w0EuaiXUdH2kcagy8P3nDfA
java.net.SocketTimeoutException: Read timed out
	at java.base/java.net.SocketInputStream.socketRead0(Native Method)
	at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168)
	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)

And this happens because the requests exchanged between the consumer and producer logout endpoints attempt to lock the same user session:

  1. AbstractBaseBrokerTest.logoutRealm ends up calling the LogoutEndpoint.logoutConfirmAction on the consumer endpoint, which proceeds with the browser logout that locks the user session and that at some point sends a logout request to the provider endpoint. Relevant call stack up to this point:
	at org.keycloak.broker.oidc.OIDCIdentityProvider.backchannelLogout(OIDCIdentityProvider.java:139)
	at org.keycloak.broker.oidc.OIDCIdentityProvider.keycloakInitiatedBrowserLogout(OIDCIdentityProvider.java:155)
	at org.keycloak.services.managers.AuthenticationManager.browserLogout(AuthenticationManager.java:649)
	at org.keycloak.protocol.oidc.endpoints.LogoutEndpoint.initiateBrowserLogout(LogoutEndpoint.java:750)
	at org.keycloak.protocol.oidc.endpoints.LogoutEndpoint.doBrowserLogout(LogoutEndpoint.java:447)
	at org.keycloak.protocol.oidc.endpoints.LogoutEndpoint.logoutConfirmAction(LogoutEndpoint.java:376)
  1. The provider logout endpoint proceeds with the logout until it sends a k_logout request back to the consumer. Relevant stack:
	at org.keycloak.services.managers.ResourceAdminManager.sendLogoutRequest(ResourceAdminManager.java:301)
	at org.keycloak.services.managers.ResourceAdminManager.logoutClientSessions(ResourceAdminManager.java:167)
	at org.keycloak.services.managers.ResourceAdminManager.logoutClientSession(ResourceAdminManager.java:123)
	at org.keycloak.protocol.oidc.OIDCLoginProtocol.backchannelLogout(OIDCLoginProtocol.java:342)
	at org.keycloak.services.managers.AuthenticationManager.backchannelLogoutClientSession(AuthenticationManager.java:509)
	at org.keycloak.services.managers.AuthenticationManager.backchannelLogoutAll(AuthenticationManager.java:393)
	at org.keycloak.services.managers.AuthenticationManager.backchannelLogout(AuthenticationManager.java:296)
	at org.keycloak.services.managers.AuthenticationManager.backchannelLogout(AuthenticationManager.java:253)
	at org.keycloak.protocol.oidc.endpoints.LogoutEndpoint.doBrowserLogout(LogoutEndpoint.java:458)
	at org.keycloak.protocol.oidc.endpoints.LogoutEndpoint.logout(LogoutEndpoint.java:307)
  1. This request is now handled by the KeycloakOIDCIdentityProvider.backchannelLogout endpoint, and at this method, in line 105, attempts to lock the user session
                    UserSessionModel userSession = lockUserSessionsForModification(session, () -> session.sessions().getUserSessionByBrokerSessionId(realm, brokerSessionId));

The request is now waiting for the user session that was locked in step 1, and the deadlock is only undone when the Read timed out exception is thrown, wihch allows the first request to proceed and release the lock.

This happens with both PostgreSQL and CockroachDB.

Version

main

Expected behavior

Broker logout logic shouldn't get stuck on a deadlock where multiple requests are trying to lock the same user sessions

Actual behavior

No response

How to Reproduce?

1- Setup CockroachDB according to the instructions in #9596
2- Using the IDE, run any broker test (for example, ExternalKeycloakRoleToRoleMapperTest) with the following VM options:

-Dpageload.timeout=90000
-Dkeycloak.profile.feature.map_storage=enabled
-Dkeycloak.authSession.provider=map
-Dkeycloak.authorization.provider=map
-Dkeycloak.client.provider=map
-Dkeycloak.clientScope.provider=map
-Dkeycloak.eventsStore.provider=map
-Dkeycloak.group.provider=map
-Dkeycloak.loginFailure.provider=map
-Dkeycloak.realm.provider=map
-Dkeycloak.role.provider=map
-Dkeycloak.actionToken.provider=map
-Dkeycloak.singleUseObject.provider=map
-Dkeycloak.user.provider=map
-Dkeycloak.userSession.provider=map
-Dkeycloak.userSession.map.storage.provider=jpa
-Dkeycloak.publicKeyStorage.provider=map
-Dkeycloak.realmCache.enabled=false
-Dkeycloak.userCache.enabled=false
-Dkeycloak.authorizationCache.enabled=false
-Dkeycloak.publicKeyCache.enabled=false
-Dkeycloak.userFederatedStorage.provider=
-Dkeycloak.userSessionPersister.provider=
-Dkeycloak.dblock.provider=none
-Dkeycloak.map.storage.connectionsJpa.url=jdbc:postgresql://localhost:5432/keycloak
-Dkeycloak.map.storage.connectionsJpa.user=keycloak
-Dkeycloak.map.storage.connectionsJpa.password=pass

For CRDB adjust the connection port to 26257

Anything else?

No response

@sguilhen sguilhen added kind/bug Categorizes a PR related to a bug area/storage Indicates an issue that touches storage (change in data layout or data manipulation) team/storage-sig labels Jul 27, 2022
@sguilhen
Copy link
Contributor Author

sguilhen commented Aug 3, 2022

After pairing with @hmlnarik and trying an approach based on short-lived transactions that locks the user session only when changes are made (for example to the session's state), the original deadlock seems to be gone.

Link to branch with changes: https://github.com/sguilhen/keycloak/tree/wip-deadlock

However, running all broker tests revealed a couple of issues:
1- KcOidcBrokerTest#testLogoutWorksWithTokenTimeout always gets stuck in a deadlock. Apparently when LogoutEndpoint,doBrowserLogout is called, line 442 is locking the entity even though it is not using PESSIMISTIC_WRITE. So when initiateBrowserLogout is called and attempts to lock the entity in a separate transaction, it gets stuck. I couldn't figure out why this is happening.

To reproduce, simply run this test using the VM parameters described above. Happens with both databases.

2- Changing user session state in separate transaction seems to be needed for the SamlService.logoutRequest as well.

3- Some tests occasionally show this exception in the logs, but are still green:

org.keycloak.models.ModelException: javax.persistence.OptimisticLockException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [org.keycloak.models.map.storage.jpa.userSession.entity.JpaUserSessionEntity#62250ac2-0809-4424-80b0-4014b163acfc]
	at org.keycloak.models.map.storage.jpa.PersistenceExceptionConverter.convert(PersistenceExceptionConverter.java:84)
	at org.keycloak.models.map.storage.jpa.JpaTransactionWrapper.commit(JpaTransactionWrapper.java:52)
	at org.keycloak.services.DefaultKeycloakTransactionManager.commit(DefaultKeycloakTransactionManager.java:136)
	at org.keycloak.services.filters.AbstractRequestFilter.close(AbstractRequestFilter.java:64)
	at org.keycloak.services.filters.AbstractRequestFilter.filter(AbstractRequestFilter.java:49)
	at org.keycloak.testsuite.UndertowRequestFilter.doFilter(UndertowRequestFilter.java:47)
        ...
Caused by: javax.persistence.OptimisticLockException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [org.keycloak.models.map.storage.jpa.userSession.entity.JpaUserSessionEntity#62250ac2-0809-4424-80b0-4014b163acfc]
	at org.hibernate.internal.ExceptionConverterImpl.wrapStaleStateException(ExceptionConverterImpl.java:226)
	at org.hibernate.internal.ExceptionConverterImpl.convert(ExceptionConverterImpl.java:93)
	at org.hibernate.internal.ExceptionConverterImpl.convert(ExceptionConverterImpl.java:181)
	at org.hibernate.internal.ExceptionConverterImpl.convertCommitException(ExceptionConverterImpl.java:65)
	at org.hibernate.engine.transaction.internal.TransactionImpl.commit(TransactionImpl.java:107)
	at org.keycloak.models.map.storage.jpa.JpaTransactionWrapper.commit(JpaTransactionWrapper.java:50)
	... 39 more
Caused by: org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [org.keycloak.models.map.storage.jpa.userSession.entity.JpaUserSessionEntity#62250ac2-0809-4424-80b0-4014b163acfc]
	at org.hibernate.persister.entity.AbstractEntityPersister.check(AbstractEntityPersister.java:2541)
	at org.hibernate.persister.entity.AbstractEntityPersister.update(AbstractEntityPersister.java:3376)
	at org.hibernate.persister.entity.AbstractEntityPersister.updateOrInsert(AbstractEntityPersister.java:3250)
	at org.hibernate.persister.entity.AbstractEntityPersister.update(AbstractEntityPersister.java:3651)
	at org.hibernate.action.internal.EntityUpdateAction.execute(EntityUpdateAction.java:146)
	at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:604)
	at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:478)
	at org.hibernate.event.internal.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:356)
	at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:39)
	at org.hibernate.internal.SessionImpl.doFlush(SessionImpl.java:1472)
	at org.hibernate.internal.SessionImpl.managedFlush(SessionImpl.java:512)
	at org.hibernate.internal.SessionImpl.flushBeforeTransactionCompletion(SessionImpl.java:3310)
	at org.hibernate.internal.SessionImpl.beforeTransactionCompletion(SessionImpl.java:2506)
	at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.beforeTransactionCompletion(JdbcCoordinatorImpl.java:447)
	at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl.beforeCompletionCallback(JdbcResourceLocalTransactionCoordinatorImpl.java:178)
	at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl.access$300(JdbcResourceLocalTransactionCoordinatorImpl.java:39)
	at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.commit(JdbcResourceLocalTransactionCoordinatorImpl.java:271)
	at org.hibernate.engine.transaction.internal.TransactionImpl.commit(TransactionImpl.java:104)

This seems sensitive to timing - when debugging some tests I was able to get this exception to show more often than when running the test with no breakpoints - and is not always reproducible for a particular test. I think it might have something to do with the last session refresh that happens when the logout endpoint validates the identity token on AuthenticationManager.authenticateIdentityCookie, line 913, that might lead hibernate to try to update this entity after it has been removed by another tx at the end of the logout process. I can't yet explain why it is not always visible - perhaps the tx that deletes the entity runs after the main one and the error doesn't happen?

sguilhen added a commit to sguilhen/keycloak that referenced this issue Aug 19, 2022
…in logout transaction commits.

- This also fixes broker test failures with CockroachDB

Closes keycloak#13348
Closes keycloak#13212
Closes keycloak#13214
stianst pushed a commit that referenced this issue Aug 23, 2022
…in logout transaction commits. (#13889)

- This also fixes broker test failures with CockroachDB

Closes #13348
Closes #13212
Closes #13214
@stianst stianst added this to the 20.0.0 milestone Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage Indicates an issue that touches storage (change in data layout or data manipulation) impact/medium kind/bug Categorizes a PR related to a bug
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants