New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud: use the cloned column to filter by clone status #11932
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11932 +/- ##
==========================================
- Coverage 50.13% 47.89% -2.25%
==========================================
Files 1516 1412 -104
Lines 88577 80238 -8339
Branches 6664 6764 +100
==========================================
- Hits 44412 38428 -5984
+ Misses 40221 38216 -2005
+ Partials 3944 3594 -350
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should test these changes end to end manually too, look at the progress status indicator and the site admin repositories listing pages with the different clone filters.
@@ -739,6 +750,7 @@ SELECT | |||
external_service_id, | |||
external_id, | |||
archived, | |||
cloned, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is OK, since the Update
call in NewDiff
by the Syncer doesn't care about the cloned field, so it won't reset the value to the default of the src
version produce by each Source
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving for code owned by campaigns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments — most important one is to add some tests for the boolean fields being positive :)
// As for the number: 1250 is the result of local benchmarks where | ||
// it yielded the best performance/resources tradeoff, before | ||
// diminishing returns set in | ||
opt2.Limit += 1250 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Farewell and audieu, long comment and weird magic number! I can't say I'm proud to have added you, but... well, let's say it was interesting and I'm not sad to see you go! 👋
cmd/repo-updater/repos/store.go
Outdated
RETURNING id | ||
) | ||
UPDATE repo SET cloned = false | ||
FROM c | ||
WHERE cloned AND repo.id != c.id; | ||
WHERE cloned AND repo.id NOT IN (SELECT id FROM c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this? Haven't tested it, but I think it goes in the right direction.
WITH names AS (
SELECT UNNEST(%s) AS name
),
cloned AS (
UPDATE repo SET cloned = true
FROM names
WHERE NOT cloned AND name = names.name
)
UPDATE repo SET cloned = false
FROM names
WHERE cloned AND name != names.name;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock, no need to ask for review again.
cmd/repo-updater/repos/store.go
Outdated
RETURNING id | ||
) | ||
UPDATE repo SET cloned = false | ||
FROM c | ||
WHERE cloned AND repo.id != c.id; | ||
WHERE cloned AND repo.id NOT IN (SELECT id FROM c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between repo.id != c.id
and NOT IN (SELECT id FROM c)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, repo.id != c.id
only works on the first row of c
, not on all the rows. I had to use NOT IN (SELECT id FROM c)
to make it work as intended
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha, that is surprising, TIL :) So you 'd have to adapt my proposed query.
cmd/repo-updater/repos/store.go
Outdated
@@ -457,12 +431,26 @@ const setClonedReposQueryFmtstr = ` | |||
-- source: cmd/repo-updater/repos/store.go:DBStore.SetClonedRepos | |||
WITH c AS ( | |||
UPDATE repo SET cloned = true | |||
WHERE NOT cloned AND name in (%s) | |||
WHERE name IN (%s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this brings us back to the performance issue of updating rows that didn't change. Can you think of a way to avoid that?
Second thing I just thought of: Postgres as limit of 32767 bind parameters per statement. In a large installation, we'd have more repos than that limit. We should be able to circumvent that by passing in a JSON array with the names and using jsonb_array_elements_text
to convert that to a set of rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this brings us back to the performance issue of updating rows that didn't change. Can you think of a way to avoid that?
I will try yes 👍
Second thing I just thought of: Postgres as limit of 32767 bind parameters per statement. In a large installation, we'd have more repos than that limit. We should be able to circumvent that by passing in a JSON array with the names and using jsonb_array_elements_text to convert that to a set of rows.
Good point
@asdine: Once the overall issue is done, please add a CHANGELOG entry and tell @uwedeportivo we can roll this out to certain affected customers. |
Co-authored-by: ᴜɴᴋɴᴡᴏɴ <joe@sourcegraph.com>
Co-authored-by: Thorsten Ball <mrnugget@gmail.com>
d12e080
to
49aa369
Compare
@asdine got it, thanks |
This makes the frontend store know about the new
cloned
column and add filtering capabilities based on its value. It also removes the application level filtering done in by the repositoryResolver which used to ask Gitserver for clone status, and uses the database instead.Related to #11029