Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Severe performance issue with large repositories #323

Closed
dralley opened this issue Jun 24, 2022 · 4 comments · Fixed by #324
Closed

Severe performance issue with large repositories #323

dralley opened this issue Jun 24, 2022 · 4 comments · Fixed by #324

Comments

@dralley
Copy link
Contributor

dralley commented Jun 24, 2022

With large repositories, createrepo_c spends far more time sorting the queue of tasks than it does performing useful work.

With the repository https://download.copr.fedorainfracloud.org/results/@rubygems/rubygems/fedora-rawhide-x86_64/repodata/

Performing the command createrepo_c --skip-stat --update --recycle-pkglist .

Produces the following flamegraph:

flamegraph

You can see that 85% of the time is spent inside of g_queue_insert_sorted(), of which about half is spent performing string comparisons via the sort operator. The goal is to add new tasks to the threadpool in a sorted order based on the name of the file they operate on - but because this is a queue backed by a linked-list, and this repo contains more than 270,000 packages, this requires an enormous number of comparisons and pointer traversals, leading to extreme inefficiency.

https://github.com/rpm-software-management/createrepo_c/blob/ace4c87a392b8c25fd7127da49516dba52d397c9/src/createrepo_c.c#L112=

Theoretically you might expect that if the tasks are already in a roughly-sorted order prior to being added to the queue, there might be magnification based on the fact that every new task would need to be inserted near the end. Thus, roughly O(N^2)

@praiskup
Copy link
Member

praiskup commented Aug 1, 2022

This is weird, I tested the performance fix on our develpment server and there's almost no difference between the createrepo_c v0.19.0 (without the fix from PR#324):

[2022-07-31 18:33:16,382][  INFO][PID:127143] Running command 'copr-repo --batched /var/lib/copr/public_html/results/praiskup/rubygems/fedora-rawhide-x86_64 --add 02909554-dummy-pkg' as PID 134025
[2022-07-31 19:10:52,652][  INFO][PID:127143] Finished with code 0 (copr-repo --batched /var/lib/copr/public_html/results/praiskup/rubygems/fedora-rawhide-x86_64 --add 02909554-dummy-pkg)

and v0.20.0 (contains the fix):

[2022-08-01 04:40:46,808][  INFO][PID:325728] Running command 'copr-repo --batched /var/lib/copr/public_html/results/praiskup/rubygems/fedora-rawhide-x86_64 --add 02909561-dummy-pkg' as PID 330542
[2022-08-01 05:21:05,829][  INFO][PID:325728] Finished with code 0 (copr-repo --batched /var/lib/copr/public_html/results/praiskup/rubygems/fedora-rawhide-x86_64 --add 02909561-dummy-pkg)

https://copr.stg.fedoraproject.org/coprs/praiskup/rubygems/builds/ (1., 4. and 5. builds are 0.20, 2. and 3. are 0.19).

@praiskup
Copy link
Member

praiskup commented Aug 1, 2022

Copr runs createrepo_c like:

$ cat /var/lib/copr/public_html/results/praiskup/rubygems/fedora-rawhide-x86_64/.copr-createrepo-pkglist
02909562-dummy-pkg/dummy-pkg-20220801_0753-1.fc37.src.rpm
02909562-dummy-pkg/dummy-pkg-20220801_0753-1.fc37.x86_64.rpm
$ /usr/bin/createrepo_c /var/lib/copr/public_html/results/praiskup/rubygems/fedora-rawhide-x86_64 --database --ignore-lock --local-sqlite --cachedir /tmp/ --workers 8 --update --skip-stat --recycle-pkglist --pkglist /var/lib/copr/public_html/results/praiskup/rubygems/fedora-rawhide-x86_64/.copr-createrepo-pkglist

@dralley
Copy link
Contributor Author

dralley commented Aug 1, 2022

@praiskup 0.20.0 doesn't contain the fix, 0.20.1 does. Unless Fedora patched it in separately.

https://github.com/rpm-software-management/createrepo_c/commits/master

@praiskup
Copy link
Member

praiskup commented Aug 1, 2022

Ah, it makes much more sense now - and the speedup is awesome! From 37 to 5 minutes. Thank you.

[2022-08-01 14:30:21,758][  INFO][PID:549549] Running command 'copr-repo --batched /var/lib/copr/public_html/results/praiskup/rubygems/fedora-rawhide-x86_64 --add 02909563-dummy-pkg' as PID 557770
[2022-08-01 14:35:38,064][  INFO][PID:549549] Finished with code 0 (copr-repo --batched /var/lib/copr/public_html/results/praiskup/rubygems/fedora-rawhide-x86_64 --add 02909563-dummy-pkg)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants