Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud: use the cloned column to filter by clone status #11932

Merged
merged 15 commits into from Jul 10, 2020

Conversation

asdine
Copy link
Contributor

@asdine asdine commented Jul 3, 2020

This makes the frontend store know about the new cloned column and add filtering capabilities based on its value. It also removes the application level filtering done in by the repositoryResolver which used to ask Gitserver for clone status, and uses the database instead.

Related to #11029

@asdine asdine requested a review from slimsag as a code owner July 3, 2020 12:26
@asdine asdine requested a review from a team July 3, 2020 12:26
@codecov
Copy link

codecov bot commented Jul 3, 2020

Codecov Report

Merging #11932 into master will decrease coverage by 2.24%.
The diff coverage is 92.00%.

@@            Coverage Diff             @@
##           master   #11932      +/-   ##
==========================================
- Coverage   50.13%   47.89%   -2.25%     
==========================================
  Files        1516     1412     -104     
  Lines       88577    80238    -8339     
  Branches     6664     6764     +100     
==========================================
- Hits        44412    38428    -5984     
+ Misses      40221    38216    -2005     
+ Partials     3944     3594     -350     
Flag Coverage Δ
#go 52.08% <92.00%> (-2.47%) ⬇️
#storybook 11.66% <ø> (ø)
#typescript 36.78% <ø> (ø)
#unit 47.39% <92.00%> (-2.30%) ⬇️
Impacted Files Coverage Δ
cmd/frontend/graphqlbackend/repository.go 25.88% <ø> (-7.06%) ⬇️
cmd/repo-updater/repos/store.go 86.17% <77.77%> (-0.39%) ⬇️
cmd/frontend/db/repos.go 72.43% <100.00%> (-5.09%) ⬇️
cmd/frontend/graphqlbackend/repositories.go 41.66% <100.00%> (+4.01%) ⬆️
cmd/repo-updater/repos/observability.go 92.99% <100.00%> (ø)
cmd/repo-updater/repos/testing.go 81.41% <100.00%> (-3.28%) ⬇️
cmd/frontend/db/orgs_mock.go 0.00% <0.00%> (-100.00%) ⬇️
cmd/frontend/db/users_mock.go 0.00% <0.00%> (-100.00%) ⬇️
internal/extsvc/github/codehost.go 0.00% <0.00%> (-100.00%) ⬇️
internal/extsvc/gitlab/codehost.go 0.00% <0.00%> (-100.00%) ⬇️
... and 263 more

Copy link
Contributor

@tsenart tsenart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should test these changes end to end manually too, look at the progress status indicator and the site admin repositories listing pages with the different clone filters.

cmd/frontend/graphqlbackend/repositories.go Outdated Show resolved Hide resolved
@@ -739,6 +750,7 @@ SELECT
external_service_id,
external_id,
archived,
cloned,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is OK, since the Update call in NewDiff by the Syncer doesn't care about the cloned field, so it won't reset the value to the default of the src version produce by each Source.

cmd/frontend/graphqlbackend/repositories.go Outdated Show resolved Hide resolved
Base automatically changed from cloud/add-count-cloned-repos to master July 6, 2020 10:37
@asdine asdine requested a review from a team as a code owner July 7, 2020 16:15
Copy link
Member

@eseliger eseliger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving for code owned by campaigns.

@asdine asdine requested a review from tsenart July 8, 2020 13:08
cmd/frontend/db/repos_db_test.go Outdated Show resolved Hide resolved
cmd/frontend/graphqlbackend/repositories.go Show resolved Hide resolved
cmd/frontend/graphqlbackend/repositories_test.go Outdated Show resolved Hide resolved
cmd/repo-updater/shared/main.go Show resolved Hide resolved
Copy link
Contributor

@mrnugget mrnugget left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments — most important one is to add some tests for the boolean fields being positive :)

cmd/repo-updater/repos/observability.go Outdated Show resolved Hide resolved
cmd/repo-updater/repos/observability.go Outdated Show resolved Hide resolved
cmd/repo-updater/repos/observability.go Outdated Show resolved Hide resolved
cmd/repo-updater/repos/store_test.go Show resolved Hide resolved
cmd/frontend/db/repos_db_test.go Show resolved Hide resolved
// As for the number: 1250 is the result of local benchmarks where
// it yielded the best performance/resources tradeoff, before
// diminishing returns set in
opt2.Limit += 1250
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Farewell and audieu, long comment and weird magic number! I can't say I'm proud to have added you, but... well, let's say it was interesting and I'm not sad to see you go! 👋

RETURNING id
)
UPDATE repo SET cloned = false
FROM c
WHERE cloned AND repo.id != c.id;
WHERE cloned AND repo.id NOT IN (SELECT id FROM c);
Copy link
Contributor Author

@asdine asdine Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to update the repo table by setting all of the repos cloned column to false, except the ones that are passed in the IN clause, all of this while limiting the number of writes.
I am not sure this is the best way to do this.
@tsenart @mrnugget @unknwon What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this? Haven't tested it, but I think it goes in the right direction.

WITH names AS (
    SELECT UNNEST(%s) AS name
),
cloned AS (
    UPDATE repo SET cloned = true
    FROM names
    WHERE NOT cloned AND name = names.name
)
UPDATE repo SET cloned = false
FROM names
WHERE cloned AND name != names.name;

Copy link
Contributor

@tsenart tsenart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock, no need to ask for review again.

RETURNING id
)
UPDATE repo SET cloned = false
FROM c
WHERE cloned AND repo.id != c.id;
WHERE cloned AND repo.id NOT IN (SELECT id FROM c);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between repo.id != c.id and NOT IN (SELECT id FROM c)?

Copy link
Contributor Author

@asdine asdine Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, repo.id != c.id only works on the first row of c, not on all the rows. I had to use NOT IN (SELECT id FROM c) to make it work as intended

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha, that is surprising, TIL :) So you 'd have to adapt my proposed query.

@@ -457,12 +431,26 @@ const setClonedReposQueryFmtstr = `
-- source: cmd/repo-updater/repos/store.go:DBStore.SetClonedRepos
WITH c AS (
UPDATE repo SET cloned = true
WHERE NOT cloned AND name in (%s)
WHERE name IN (%s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this brings us back to the performance issue of updating rows that didn't change. Can you think of a way to avoid that?

Second thing I just thought of: Postgres as limit of 32767 bind parameters per statement. In a large installation, we'd have more repos than that limit. We should be able to circumvent that by passing in a JSON array with the names and using jsonb_array_elements_text to convert that to a set of rows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this brings us back to the performance issue of updating rows that didn't change. Can you think of a way to avoid that?

I will try yes 👍

Second thing I just thought of: Postgres as limit of 32767 bind parameters per statement. In a large installation, we'd have more repos than that limit. We should be able to circumvent that by passing in a JSON array with the names and using jsonb_array_elements_text to convert that to a set of rows.

Good point

@tsenart
Copy link
Contributor

tsenart commented Jul 9, 2020

@asdine: Once the overall issue is done, please add a CHANGELOG entry and tell @uwedeportivo we can roll this out to certain affected customers.

@asdine asdine force-pushed the cloud/filter-by-cloned-status branch from d12e080 to 49aa369 Compare July 10, 2020 08:03
@asdine asdine merged commit cab1a64 into master Jul 10, 2020
@asdine asdine deleted the cloud/filter-by-cloned-status branch July 10, 2020 08:39
@uwedeportivo
Copy link
Contributor

@asdine got it, thanks

asdine added a commit that referenced this pull request Jul 13, 2020
asdine added a commit that referenced this pull request Jul 13, 2020
asdine added a commit that referenced this pull request Jul 14, 2020
@asdine asdine restored the cloud/filter-by-cloned-status branch July 14, 2020 08:37
asdine added a commit that referenced this pull request Jul 15, 2020
asdine added a commit that referenced this pull request Jul 16, 2020
…s"" (#12149)

* Revert "Revert "cloud: use the cloned column to filter by clone status (#11932)" (#12128)"

This reverts commit 52130ea.

* Fix cloned repositories listing and counting (#12152)
daxmc99 pushed a commit that referenced this pull request Jul 16, 2020
…s"" (#12149)

This is cherry-pick for the 3.18.0 release

* Revert "Revert "cloud: use the cloned column to filter by clone status (#11932)" (#12128)"

This reverts commit 52130ea.

* Fix cloned repositories listing and counting (#12152)
@asdine asdine deleted the cloud/filter-by-cloned-status branch January 15, 2021 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants