Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEYCLOAK-19230 Calling getTopLevelGroups is slow inside GroupLDAPStorageMapper#getLDAPGroupMappingsConverted #8430

Merged
merged 20 commits into from
Sep 20, 2023

Conversation

bohmber
Copy link
Contributor

@bohmber bohmber commented Sep 14, 2021

Calling getTopLevelGroups is slow inside GroupLDAPStorageMapper#getLDAPGroupMappingsConverted

Closes #14820

KEYCLOAK-19230

Copy link
Contributor

@mposolda mposolda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bohmber Thanks for the PR! Few points:

  • The PR introduces new method to the model API and hence I am adding @hmlnarik as a reviewer to doublecheck if this matches with the requirements for the new store and doublecheck if implementation in the new stores (Map store) is ok
  • Added 2 comments regarding possible optimization of the SQL in the JPA layer to call only single SQL in "getGroupByName" instead of two separate queries (one just for return ID and another for lookup group by this ID).
  • Will be good to doublecheck if we have DB index by parent and name. This can add even more performance points. Could you doublecheck if such index exists in the DB?
  • I wonder that we can change KeycloakModelUtils.findGroupByPath to be more optimal and leverage the new API method for getGroupByName and parent.
  • I've added few more comments below regarding cache invalidations and some other minor things...

When you do some more optimizations, can you please share in the JIRA some results of the performance before this change and after this change? If possible, it will be ideal to share:

  • Count of LDAP top level groups
  • Do you use some custom "group path" for the Group LDAP mapper or just top level groups?
  • How the login of the user performs before and after this change? How many groups your typical user has?

Thanks!

@hmlnarik WDYT about the points above and about this PR in general?

@mposolda mposolda self-assigned this Sep 15, 2021
@mposolda mposolda added area/storage Indicates an issue that touches storage (change in data layout or data manipulation) area/user-federation impact/medium labels Sep 15, 2021
@bohmber
Copy link
Contributor Author

bohmber commented Sep 15, 2021

  • Will be good to doublecheck if we have DB index by parent and name. This can add even more performance points. Could you doublecheck if such index exists in the DB?

At least there is a @UniqueConstraint(columnNames = {"REALM_ID", "PARENT_GROUP", "NAME"})

@stianst stianst added area/ldap and removed area/user-federation area/storage Indicates an issue that touches storage (change in data layout or data manipulation) labels Nov 3, 2021
@mposolda
Copy link
Contributor

@bohmber Still interested in this PR? If yes, please comment here.

There will be at least rebase needed and align with latest Keycloak codebase, which changed since this PR was sent. But possibly some more things will be needed to be updated in this PR. But before going further, I would like to know if you are still interested in this PR or not anymore.

@bohmber
Copy link
Contributor Author

bohmber commented Sep 27, 2022

@bohmber Still interested in this PR? If yes, please comment here.

Thanks @mposolda
Yes I'm interested in the PR. I know I need to rebase and update the pr. Currently we have almost the same change for rh-sso 7.3 and 7.6 running in production and we are happy with the results. I need at least 2-3 week until I can update the pr busy with a 2nd 7.6 migration

@bohmber
Copy link
Contributor Author

bohmber commented Nov 30, 2022

@mposolda pr updated. I'm looking forward to your feedback

Calling getTopLevelGroups is slow inside GroupLDAPStorageMapper#getLDAPGroupMappingsConverted
@ah-ping-luk
Copy link

I also want to vote for this fix and in my company, with very large AD, there are many users (especially those senior staffs) are assigned to 100+ LDAPS groups. We also suffer from this issue.

@stianst stianst requested review from ahus1 and removed request for hmlnarik July 17, 2023 15:04
@ahus1
Copy link
Contributor

ahus1 commented Jul 25, 2023

Next steps to evaluate the impact / scenarios this PR is about to solve:

  • Does this affect only LDAP or also JPA-only store?
  • Prepare a locally reproducible setup for the problem to measure the response times (how many groups need to be setup for this)
  • Is it slow only in the admin console, or also during the log in of the user?

@bohmber
Copy link
Contributor Author

bohmber commented Jul 25, 2023

Only LDAP is affected. There should be an internal RH issue with more details. It's hard to reproduce you need Group modifications in ldap, a high group count and many calls to getTopLevelGroups. If the cache is invalidated it will hammer the database with Group calls. https://issues.redhat.com/browse/KEYCLOAK-19230 has some information and the internal support case should have some data from a jfr session. Token endpoint is affected

@ah-ping-luk
Copy link

Actually, it is not difficult to reproduce it…

You just need to create 100 ldap groups. Create an ldap user. Assign the user into the groups created.

Config user federation with the ldap group mapper.

Login with the ldap user created, you will find it takes more than 10 seconds to login (time taken depends on cpu speed) for the first time (or cases that the user not loaded into cache).

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreported flaky test detected, please review

@ghost
Copy link

ghost commented Sep 19, 2023

Unreported flaky test detected

If the below flaky tests below are affected by the changes, please review and update the changes accordingly. Otherwise, a maintainer should report the flaky tests prior to merging the PR.

org.keycloak.testsuite.ui.account2.ApplicationsTest#navigationTest

Keycloak CI - Account Console IT (firefox)

org.awaitility.core.ConditionTimeoutException: Assertion condition defined as a org.keycloak.testsuite.util.URLAssert URL expected to begin with: https://localhost:8543/auth/realms/test/protocol/openid-connect/auth ; actual URL: https://localhost:8543/auth/realms/test/account/#/applications within 10 seconds.
	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:985)
...

Report flaky test

@mhajas
Copy link
Contributor

mhajas commented Sep 19, 2023

We are waiting for @mposolda whether the @keycloak/core team wants to have a look.

@mhajas mhajas removed the missing/tests Tests are missing label Sep 19, 2023
Calling getTopLevelGroups is slow inside GroupLDAPStorageMapper#getLDAPGroupMappingsConverted
fix default impl and MapGroupProvider
@bohmber
Copy link
Contributor Author

bohmber commented Sep 20, 2023

@mhajas Thanks for the Test. Found a misunderstanding of searchForGroupByNameStream in the default Method GroupLookupProvider#getGroupByName and a missing branch in MapGroupProvider#getGroupByName

Copy link
Contributor

@mhajas mhajas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additions @bohmber, however, I am not following the latest changes, could you elaborate more on why you did it?

Calling getTopLevelGroups is slow inside GroupLDAPStorageMapper#getLDAPGroupMappingsConverted
default impl in GroupLookupProvider
Copy link
Contributor

@mhajas mhajas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @bohmber! I am approving and merging now

@ahus1
Copy link
Contributor

ahus1 commented Sep 20, 2023

Awesome, thank you for everyone involved in preparing and review this!

@bohmber
Copy link
Contributor Author

bohmber commented Sep 21, 2023

@mhajas @sguilhen @mposolda @ahus1 Thanks to everyone. From the title the pr looks small. But in a case the toplevel group cache becomes invalid every call for group mapping will hammer the database for all groups for each group in the ldap response. Depending on the request count and group count you will get really bad response times quicky. Now there is a direct mapping between name and group and you will not affected when a group has added/changed.

ahus1 pushed a commit to ahus1/keycloak that referenced this pull request Oct 13, 2023
…APGroupMappingsConverted (keycloak#8430)

Closes keycloak#14820
---------
Co-authored-by: Michal Hajas <mhajas@redhat.com>

(cherry picked from commit bb2f59d)
mhajas pushed a commit that referenced this pull request Oct 16, 2023
…APGroupMappingsConverted (#8430)

Closes #14820
---------
Co-authored-by: Michal Hajas <mhajas@redhat.com>

(cherry picked from commit bb2f59d)
@ahus1
Copy link
Contributor

ahus1 commented Oct 17, 2023

To those watching this issue: The related PR has been backported to 22.0.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Calling getTopLevelGroups is slow inside GroupLDAPStorageMapper#getLDAPGroupMappingsConverted
7 participants