New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve validatePublicUsername() performance #3091
Conversation
… groups Signed-off-by: Jonathan Treffler <jonathan.treffler@rwth-aachen.de>
b2b11a3
to
6169251
Compare
First of all, thanks for your excellent analysis. 12k groups. Wow. Never thought that this is a valid use case. Your suggestion will speed up it for sure, but it will eliminate the thought of avoiding spoofing by using exiting displaynames. But I have to rethink about @svenseeberg's comment. Maybe it would be better to remove the check against group names, because groups can't be a participants. The check as it is will probably lose its value. To be honest: I have to dig in my memory why we introduced it back then. |
Ah, wait. Reading helps. I did not get the fact, that the search also includes the search for the group's displayname. In this case, adding the serach term to the group search will improve it indeed. Nevertheless, we have to rethink the check. |
Tracking this down a little bit further: Since the IGroup:search() does an exact case insensitive search, it would be enough to simply check ( This way the object creation could be skipped also. same for usernames in Nextcloud and among the shares. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this should work...
Please merge by yourself, unless you plan further improvements. |
Not exactly, IGroup:search() on the default database group backend does a SQL iLike query with That is why I left in the for loop checking the returned entries, because both exact match and partial matches will get returned from the search. |
If we detect other issues in the roll out of the app onto our huge instance we will probably be back for other optimizations :) , but for now I think this can be merged. |
Sure. All contributions are welcome here. Feeling less lonly. 😆 |
Thank you all :) |
Hello there, We hope that the review process is going smooth and is helpful for you. We want to ensure your pull request is reviewed to your satisfaction. If you have a moment, our community management team would very much appreciate your feedback on your experience with this PR review process. Your feedback is valuable to us as we continuously strive to improve our community developer experience. Please take a moment to complete our short survey by clicking on the following link: https://cloud.nextcloud.com/apps/forms/s/i9Ago4EQRZ7TWxjfmeEpPkf6 Thank you for contributing to Nextcloud and we hope to hear from you soon! |
On Instances with many groups the validatePublicUsername endpoint ends with a timeout, because the code checking if the anonymous users chosen userName matches any group on the server is quite inefficient. It does not utilize the search parameter of the Group::search() function and therefore loads all groups from all group backends from the database into RAM, creates a php entity for every one of them and calls two functions on every one of those entities.
I created a performance profile of a simple call of the /check/username endpoint on an instance with ~12k groups (which is certainly a lot, but I assure you every one of those has an important role on our instance and other apps have no problems working with this amount of groups):
In total this request took 71.6 seconds, causing the connection to the client to drop during the request with a timeout (the php process still runs to completion).
In the performance profile you can see, that most time (73%) is spent in the nextcloud core just fetching the data from the database and putting it into php entities.
In total this one request causes 54 THOUSAND requests to QueryBuilder->execute(), which AFAIK is below the cache layer and that means every one of those is a sql query to the database. I hope we can all agree that kind of amplification from one request is not great.
But the current implementation makes sense looking at the timeline: The first usage of this empty search parameter I could track down through various refactors is the initial implementation of Public Pages back in 2019 in #664 .
This was before the default database group backend started searching by display names in 2020 (introduced in nextcloud/server#21358), so it makes sense a custom search was implemented on this high level. But now we can use the MUCH faster filtering on the database query level 🥳 .
It is still true, that every group backend decides for itself, how it handles the search parameter and theoretically a group backend could decide only to search by gid, but I think it is reasonable to assume, that any group backend, that uses DisplayNames (ours with the 12k groups does not) follows the core implementation and also searches display names.
On our instance this small 9 characters of code patch brought the request time of the /check/username endpoint to 714 ms down from 71.6 seconds, a 10200% performance improvement 🥳 .