-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Severe performance issue with large repositories #323
Comments
This is weird, I tested the performance fix on our develpment server and there's almost no difference between the createrepo_c v0.19.0 (without the fix from PR#324):
and v0.20.0 (contains the fix):
https://copr.stg.fedoraproject.org/coprs/praiskup/rubygems/builds/ (1., 4. and 5. builds are 0.20, 2. and 3. are 0.19). |
Copr runs createrepo_c like:
|
@praiskup 0.20.0 doesn't contain the fix, 0.20.1 does. Unless Fedora patched it in separately. https://github.com/rpm-software-management/createrepo_c/commits/master |
Ah, it makes much more sense now - and the speedup is awesome! From 37 to 5 minutes. Thank you.
|
With large repositories,
createrepo_c
spends far more time sorting the queue of tasks than it does performing useful work.With the repository https://download.copr.fedorainfracloud.org/results/@rubygems/rubygems/fedora-rawhide-x86_64/repodata/
Performing the command
createrepo_c --skip-stat --update --recycle-pkglist .
Produces the following flamegraph:
You can see that 85% of the time is spent inside of
g_queue_insert_sorted()
, of which about half is spent performing string comparisons via the sort operator. The goal is to add new tasks to the threadpool in a sorted order based on the name of the file they operate on - but because this is a queue backed by a linked-list, and this repo contains more than 270,000 packages, this requires an enormous number of comparisons and pointer traversals, leading to extreme inefficiency.https://github.com/rpm-software-management/createrepo_c/blob/ace4c87a392b8c25fd7127da49516dba52d397c9/src/createrepo_c.c#L112=
Theoretically you might expect that if the tasks are already in a roughly-sorted order prior to being added to the queue, there might be magnification based on the fact that every new task would need to be inserted near the end. Thus, roughly O(N^2)
The text was updated successfully, but these errors were encountered: