Works around a sync-time performance regression on PG12 #4609

dralley · 2023-10-25T16:21:50Z

We still don't really "know":

Why this only seems to happen on PG12
Why the original query change (https://github.com/pulp/pulpcore/pull/4275/files) triggered this
Why it only seems to happen on capsule syncs (this is most mysterious)

lubosmj

With this, we got down to ~25m of sync time on the Satellite 6.14 environment (instead of 2 hours), syncing 36 repositories to a capsule with the on-demand policy. We got the same result as with Pulpcore 3.22.12, where the commit in question (#4275) was not included.

It would be interesting to see if this change improved the performance on PG13 even more.

ggainey · 2023-10-25T17:45:03Z

pulpcore/app/models/repository.py

@@ -850,7 +850,9 @@ def add_content(self, content):
            raise ResourceImmutableError(self)

        repo_content = []
-        to_add = set(content.exclude(pk__in=self.content).values_list("pk", flat=True))
+        to_add = set(content.values_list("pk", flat=True)) - set(


So the net here, is that we are moving from "let postgres do the set-theory via exclude()", to "do the set-theory explicitly, in python, via set(A) - set(B)". I have nothing to add that clears up any of the mysteries noted; just clarifying what we're doing.

Are we using set just to get the difference between those two sets?

Hello folks. I'm just exploring some issues to get familiarized with the codebase and had a small idea on this.

We can achieve the same set difference with some performance gains (about 2x with a humble sample) by not creating a second set. If this is a meaningful point to optimize, it may be worth trying this out.

def set_diff(a: Set, b: tuple): for obj in b: if obj in a: a.remove(obj) return a from typing import Set def set_diff_normal(a: Set, b: tuple): return a - set(b) a = {3775, 4642, 1725, 576, 1467, 1152, 1718, 6721, 7899, 4756, 2213, 7843, 768, 6464, 7427} b = (3775, 7899, 434, 15048, 7843) if __name__ == "__main__": assert set_diff(a, b) == set_diff_normal(a, b) from timeit import timeit result_a = timeit( "set_diff(a,b)", setup="from __main__ import set_diff,a,b") result_b = timeit( "set_diff_normal(a,b)", setup="from __main__ import set_diff_normal,a,b") print(result_a) # output: ~ 0.10 print(result_b) # output: ~ 0.24

The performance issue has nothing to do with the set creation (which would take milliseconds at most), but rather the way the query is constructed and how Postgresql plans its execution.

Previously we were generating one large query, which different versions of Postgresql seemed to handle differently, and Postgresql 12 was having severe performance issues. By gathering the results of some smaller queries into a set in Python, we keep the queries smaller and simpler, and avoid causing Postgresql to go a little haywire.

This is really just a workaround since it was time-sensitive. We're trying to better understand why this is going on currently.

I appreciate the enthusiasm @pedro-psb :) But you'll have plenty of time to learn this stuff when you're being paid to do so, there's no need to do it in your "free" time!

@dralley Yeah, I thought maybe it could be irrelevant in the problem context, thanks for the clarification.
And I dig your tip, I'm really a little over-enthusiastic!

ggainey

As noted, I can't explain any of the noted mysteries - and this makes me a little nervous. But if this fixes the observed performance problem, I'm good to move ahead with it.

patchback · 2023-10-25T20:17:47Z

Backport to 3.16: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.16/4c4d6aa0745b4938e2aeee8ffc28a0b0392cc9ef/pr-4609

Backported as #4610

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

patchback · 2023-10-25T20:18:09Z

Backport to 3.18: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.18/4c4d6aa0745b4938e2aeee8ffc28a0b0392cc9ef/pr-4609

Backported as #4611

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

patchback · 2023-10-25T20:18:22Z

Backport to 3.21: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.21/4c4d6aa0745b4938e2aeee8ffc28a0b0392cc9ef/pr-4609

Backported as #4612

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

patchback · 2023-10-25T20:18:37Z

Backport to 3.22: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.22/4c4d6aa0745b4938e2aeee8ffc28a0b0392cc9ef/pr-4609

Backported as #4613

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

patchback · 2023-10-25T20:18:52Z

Backport to 3.23: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.23/4c4d6aa0745b4938e2aeee8ffc28a0b0392cc9ef/pr-4609

Backported as #4614

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

patchback · 2023-10-25T20:19:11Z

Backport to 3.28: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.28/4c4d6aa0745b4938e2aeee8ffc28a0b0392cc9ef/pr-4609

Backported as #4615

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

patchback · 2023-10-27T16:32:12Z

Backport to 3.38: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.38/4c4d6aa0745b4938e2aeee8ffc28a0b0392cc9ef/pr-4609

Backported as #4620

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

patchback · 2023-10-30T19:22:17Z

Backport to 3.39: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.39/4c4d6aa0745b4938e2aeee8ffc28a0b0392cc9ef/pr-4609

Backported as #4623

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

Works around a sync-time performance regression on PG12

f70316e

closes pulp#4591

dralley added backport-3.16 backport-3.21 backport-3.22 labels Oct 25, 2023

lubosmj approved these changes Oct 25, 2023

View reviewed changes

ggainey reviewed Oct 25, 2023

View reviewed changes

ggainey approved these changes Oct 25, 2023

View reviewed changes

dkliban approved these changes Oct 25, 2023

View reviewed changes

dkliban merged commit 4c4d6aa into pulp:main Oct 25, 2023
12 of 13 checks passed

patchback bot mentioned this pull request Oct 25, 2023

[PR #4609/4c4d6aa0 backport][3.16] Works around a sync-time performance regression on PG12 #4610

Merged

patchback bot mentioned this pull request Oct 25, 2023

[PR #4609/4c4d6aa0 backport][3.18] Works around a sync-time performance regression on PG12 #4611

Merged

patchback bot mentioned this pull request Oct 25, 2023

[PR #4609/4c4d6aa0 backport][3.21] Works around a sync-time performance regression on PG12 #4612

Merged

patchback bot mentioned this pull request Oct 25, 2023

[PR #4609/4c4d6aa0 backport][3.22] Works around a sync-time performance regression on PG12 #4613

Merged

patchback bot mentioned this pull request Oct 25, 2023

[PR #4609/4c4d6aa0 backport][3.23] Works around a sync-time performance regression on PG12 #4614

Merged

patchback bot mentioned this pull request Oct 25, 2023

[PR #4609/4c4d6aa0 backport][3.28] Works around a sync-time performance regression on PG12 #4615

Merged

dralley added the backport-3.38 label Oct 27, 2023

patchback bot mentioned this pull request Oct 27, 2023

[PR #4609/4c4d6aa0 backport][3.38] Works around a sync-time performance regression on PG12 #4620

Merged

dralley added the backport-3.39 label Oct 30, 2023

patchback bot mentioned this pull request Oct 30, 2023

[PR #4609/4c4d6aa0 backport][3.39] Works around a sync-time performance regression on PG12 #4623

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Works around a sync-time performance regression on PG12 #4609

Works around a sync-time performance regression on PG12 #4609

dralley commented Oct 25, 2023 •

edited

Loading

lubosmj left a comment

ggainey Oct 25, 2023

decko Oct 25, 2023

dralley Oct 25, 2023

pedro-psb Oct 27, 2023

dralley Oct 27, 2023

pedro-psb Oct 27, 2023

ggainey left a comment

patchback bot commented Oct 25, 2023 •

edited

Loading

patchback bot commented Oct 25, 2023 •

edited

Loading

patchback bot commented Oct 25, 2023 •

edited

Loading

patchback bot commented Oct 25, 2023 •

edited

Loading

patchback bot commented Oct 25, 2023 •

edited

Loading

patchback bot commented Oct 25, 2023 •

edited

Loading

patchback bot commented Oct 27, 2023 •

edited

Loading

patchback bot commented Oct 30, 2023 •

edited

Loading

Works around a sync-time performance regression on PG12 #4609

Works around a sync-time performance regression on PG12 #4609

Conversation

dralley commented Oct 25, 2023 • edited Loading

lubosmj left a comment

Choose a reason for hiding this comment

ggainey Oct 25, 2023

Choose a reason for hiding this comment

decko Oct 25, 2023

Choose a reason for hiding this comment

dralley Oct 25, 2023

Choose a reason for hiding this comment

pedro-psb Oct 27, 2023

Choose a reason for hiding this comment

dralley Oct 27, 2023

Choose a reason for hiding this comment

pedro-psb Oct 27, 2023

Choose a reason for hiding this comment

ggainey left a comment

Choose a reason for hiding this comment

patchback bot commented Oct 25, 2023 • edited Loading

Backport to 3.16: 💚 backport PR created

patchback bot commented Oct 25, 2023 • edited Loading

Backport to 3.18: 💚 backport PR created

patchback bot commented Oct 25, 2023 • edited Loading

Backport to 3.21: 💚 backport PR created

patchback bot commented Oct 25, 2023 • edited Loading

Backport to 3.22: 💚 backport PR created

patchback bot commented Oct 25, 2023 • edited Loading

Backport to 3.23: 💚 backport PR created

patchback bot commented Oct 25, 2023 • edited Loading

Backport to 3.28: 💚 backport PR created

patchback bot commented Oct 27, 2023 • edited Loading

Backport to 3.38: 💚 backport PR created

patchback bot commented Oct 30, 2023 • edited Loading

Backport to 3.39: 💚 backport PR created

dralley commented Oct 25, 2023 •

edited

Loading

patchback bot commented Oct 25, 2023 •

edited

Loading

patchback bot commented Oct 25, 2023 •

edited

Loading

patchback bot commented Oct 25, 2023 •

edited

Loading

patchback bot commented Oct 25, 2023 •

edited

Loading

patchback bot commented Oct 25, 2023 •

edited

Loading

patchback bot commented Oct 25, 2023 •

edited

Loading

patchback bot commented Oct 27, 2023 •

edited

Loading

patchback bot commented Oct 30, 2023 •

edited

Loading