New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle renames and deletions of mirrored repositories #914

Closed
sqs opened this Issue Nov 9, 2018 · 10 comments

Comments

Projects
None yet
6 participants
@sqs
Copy link
Member

sqs commented Nov 9, 2018

This is an umbrella issue that covers generally cleaning up the problem where code host entries in site config (and repos.list) merely create repositories, they do not "manage" the active set of repositories. This causes the following (specific) problems:

  • If a GitHub token is used and grants access to repos a and b, then the token's owner no longer has access to b, the b repo remains on the Sourcegraph instance (but is not git-updated)
  • If a repository is renamed on the code host, both the old and new repos will be added on Sourcegraph
  • If a repository is deleted on the code host, it remains on Sourcegraph (from the backend POV, this is equivalent to the token-no-longer-can-access case above)
  • If a repository's metadata is changed such that it is no longer in the set of repositories-to-sync (e.g., a GitHub repositoryQuery excluding archived repos and a someone archives a repo on GitHub), it remains on Sourcegraph

Fixing this issue also addresses the following problems:

In general, the principle we want is: "You tell Sourcegraph which repositories you want on Sourcegraph. Sourcegraph upholds that contract continuously." So if you want archived repositories to not exist on Sourcegraph, then you need to specify a repositoryQuery that omits archived repositories. If a GitHub repository is archived, then Sourcegraph will automatically remove it. Another way to say this is that the site config is essentially a constraint you specify that Sourcegraph always seeks to satisfy in the current instant.

Robustness

However, we don't want this to result in data loss. It needs to be robust to (1) unexpected errors from the code host API and (2) typos in site config. If one of those things happens and suddenly a previously included repository is no longer included, we do not want to permanently delete all data associated with the repository. (Today this really just includes code discussions but will likely include other info in the future.) So if Sourcegraph detects that a repository is no longer included, it should disable it on Sourcegraph.

This entails another addition, I think: there should be a "disable reason" to explain to site admins why a repo was disabled (e.g., "repository deleted on GitHub.com" or "repository not included by repositoryQuery")

Of course, this should be smart in the case of renames; if there is a rename, it should just rename the repo on Sourcegraph, too, not disable the old name and create a new repo with the new name.

@beyang

This comment has been minimized.

Copy link
Member

beyang commented Nov 11, 2018

FYI discussion thread: https://sourcegraph.slack.com/archives/C89KCDK5J/p1541899122058100. To me the question reduces to, "What is the repo metadata source of truth?"

A few potential answers:

  1. Whatever the corresponding extsvc/$CODEHOST package decides to use as the source of truth.
  2. repo-updater. I think this is a special case of (1).
  3. The DB (repos table)
@keegancsmith

This comment has been minimized.

Copy link
Member

keegancsmith commented Nov 12, 2018

We should make the repos table the source of truth for everything that reads repo information, but repo-updater responsible for maintaining the repos table (via config + syncing from codehost). We are actually quite close to this, we just don't have repo-updater reading the full repos table and making decisions about disabling/updating/etc. Currently it only reads from code hosts.

@sqs sqs changed the title Offer guarantees that deconfigured repos are no longer available Handle renames and deletions of mirrored repositories Nov 13, 2018

@nicksnyder

This comment has been minimized.

Copy link
Member

nicksnyder commented Nov 19, 2018

To solve all of the above problems well, we need to have a first class model for "external services" and each row in the repo table will be owned by a single external service.

An external service should have a stable ID that is independent from information that might need to change about that external service (e.g. access token). Requiring a user to manage connection ids in a config file is error prone and hard to document, so the proposal is to create a new database table (with an auto increment id). Users can then update an expired/revoked access token without causing a lot of churn in the db.

We don't have a lot of time between now and when we want to ship 3.0-preview and it is important to get the breaking changes in now.

Plan for 3.0-preview

  • Remove all UI paths to manually add repos (#1068, #1357)
    • Preserve behavior that user visiting repo for first time will have it transparently added on sourcegraph.com
    • Deploy to sourcegraph.com and dogfood
  • Create database table to store external services (#1077)
  • Create UI to create/edit external services behind feature flag (#1103, #1229)
  • Refactor repository sync workers to not watch the config
    • Phabricator (#1157)
    • GitHub (#1267)
    • GitLab (#1273)
    • AWS CodeCommit (#1273)
    • Bitbucket Server (#1273)
    • Gitolite (already not watching)
  • Add codepath to read config from database table (#1160, #1332)
  • Migrate dogfood and sourcegraph.com to read from external services table as source of truth
    • Document or automate how to do migration (for customers too)
    • Copy over config to database
    • Enable feature flag so config is read from database
    • Remove code host config from config file
    • Enable feature flag by default and update documentation (#1342)
  • clean up deprecated site config options (#1378)
  • need solution for fixture external services in dev environment (https://sourcegraph.slack.com/archives/C07KZF47K/p1544568020167300)
  • unify constant definitions for external service kinds and codehost service types (https://github.com/sourcegraph/sourcegraph/pull/1397/files#r241564948)
  • Remove repos.list from site configuration (#1391)
  • Update repo-updater to read external service state from the DB (via internal http api), sync from code hosts, write back to repo table in DB (adding/removing/updating repos).
    • How do we handle repos from multiple connections?
    • What happens if a repository is renamed?
    • What happens to data that is attached to repositories (e.g. code discussions) when access tokens are updated?
    • What happens when a token is deleted from the database?
  • Remove "Delete this repository" button?
    image
@sqs

This comment has been minimized.

Copy link
Member Author

sqs commented Nov 21, 2018

Added needs-design label for "Create UI to create/edit codehosts" @francisschmaltz

@nicksnyder

This comment has been minimized.

Copy link
Member

nicksnyder commented Nov 28, 2018

This is at risk of not being ready for 3.0-preview. The UI portion was more of a slog than I anticipated. My current UI PR (#1103) is probably good enough for internal use but not polished enough for external use and probably will get a lot of review comments since there is a decent amount of copy pasta.

@francisschmaltz

This comment has been minimized.

Copy link
Member

francisschmaltz commented Nov 28, 2018

@nicksnyder can you add screen shots of the current state so I can see what work might have to be done?

@nicksnyder

This comment has been minimized.

Copy link
Member

nicksnyder commented Nov 28, 2018

@francisschmaltz I updated the PR description with an animated gif of the current state

@sqs sqs modified the milestones: 3.0-preview, 3.0 Dec 4, 2018

@sqs

This comment has been minimized.

Copy link
Member Author

sqs commented Dec 4, 2018

I moved this to milestone 3.0 on the roadmap and added the External services UI #1103 subset to 3.0-preview.

@nicksnyder nicksnyder modified the milestones: 3.0-preview.2, 3.1 Dec 14, 2018

@tsenart tsenart referenced this issue Dec 17, 2018

Closed

Proposal: Repository mirroring improvements #1467

0 of 3 tasks complete

@sqs sqs referenced this issue Dec 20, 2018

Closed

Delete repo list #1529

@nicksnyder nicksnyder removed their assignment Jan 16, 2019

@sqs

This comment has been minimized.

Copy link
Member Author

sqs commented Jan 25, 2019

Some subset of this and other repository-related work will be planned for 3.1 and described at #2025; see #2025 for more info.

@sqs

This comment has been minimized.

Copy link
Member Author

sqs commented Feb 19, 2019

@sourcegraph/core-services Can you close this issue since #2025 is more up to date and covers the same stuff?

@tsenart tsenart closed this Feb 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment