Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve publish speed by between 2x and 20x #493

Merged
merged 1 commit into from Apr 6, 2021

Conversation

dralley
Copy link
Collaborator

@dralley dralley commented Apr 5, 2021

Use a more single query rather than N small queries to iterate content
artifacts.

Using a repositoriy containing 20,000 content units, the publish time
is improved to around 1 second (opposed to ~23 seconds) if the content
is immediate, and 14 seconds (vs. the same) if content is on_demand.

Complexity (in terms of # of queries performed and not taking into account
query complexity) was reduced from 2N + 1 to either a constant or 1N
depending on whether content was synced immediate or on-demand.

Benefit is very noticeable.

closes: #8508
https://pulp.plan.io/issues/8508

@pep8speaks
Copy link

pep8speaks commented Apr 5, 2021

Hello @dralley! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-04-06 15:15:45 UTC

@pulpbot
Copy link
Member

pulpbot commented Apr 5, 2021

Attached issue: https://pulp.plan.io/issues/8508

@dralley dralley force-pushed the publish-improvement branch 3 times, most recently from 13c1040 to 3536d4c Compare April 5, 2021 16:18
@dralley dralley changed the title Improve publish speed by 2x to 20x Improve publish speed by between 2x and 20x Apr 5, 2021
Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

If i read it correctly, it scales accordingly for mixed repo versions: 1 + #(on-demand artifacts)

@ipanova
Copy link
Member

ipanova commented Apr 6, 2021

Nice!

pk__in=publication.repository_version.content
).order_by("-pulp_created"):
for content_artifact in content.contentartifact_set.all():
artifact = find_artifact()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2N + 1 queries

Use a more complex single query rather than N small queries to iterate
content artifacts.

Using a repositoriy containing 20,000 content units, the publish time
is improved from ~23 seconds to 1 second if the content is immediate
synced, and 14 seconds (vs. the same) if content is on_demand.

closes: #8508
https://pulp.plan.io/issues/8508
@dralley dralley merged commit 3aebee2 into pulp:master Apr 6, 2021
@dralley dralley deleted the publish-improvement branch April 6, 2021 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants