Sync single external service #13483

ryanslade · 2020-08-31T13:52:13Z

This PR makes two main changes:

We sync each external service independently
The syncing process now happens using the workerutil package

Syncer.Run now enqueues external services into a job queue that are due to be synced every minute.

These jobs are picked by up to three sync workers concurrently each worker syncs one service using the new Syncer.SyncExternalService method.

We rely on database triggers to mark repos as deleted once they no longer exist in any external service. This is based on the state of the external_service_repos table which tracks the relationship between repos and external services.

Triggers also exist that remove rows from external_service_repos when a repo or external service is deleted or soft deleted.

@keegancsmith We'd love your feedback specifically on the the code related to SyncSubset as we are not 100% sure that this still works as expected.

Tests are still failing

Also removes a test case that tests a sync of muktiple external services as we no longer support that.

Code now compiles but a lot of tests still fail. Added a cleanup function that needs to be called when we stop a worker to unregister prometheus metrics so that it doesn't panic when we start another worker in a subsequent test.

If a job is processing when repo-updater dies that job would never complete. We would then not requeue the associated external service for syncing. To be safe we delete non locked processing rows on startup.

This will cause a sync to be triggered ASAP for the saved external service. On save, we trigger a call to repo-updater which causes us to enqueue any pending sync jobs. As we've just upated next_sync_at the job for the newly saved repo will be queued.

We no longer trigger a full sync but instead trigger enqueueing pending sync jobs.

We now have a trigger that does the same thing

ryanslade · 2020-09-04T15:38:32Z

LGTM. Too much code to review, so I'm not confident I have validated all issues. I think in this sort of change a better approach is if I also had a test plan to review. Our old tests worked in a global context, so the switch to per external service may miss things. So I think a few things need to be manually tested:

two external services returning the same repo, but with different data. Easiest conflict here is on the name (eg different name template). Is this deterministic? Does the name flip flop as each syncer runs?

Yes, if different data is returned then we'll end up flip flopping as each service syncs. This isn't something we considered unfortunately.

Perhaps we can make it eventually consistent on upsert. Currently newDiff sorts the sourced repos deterministically and merges them in that order. Maybe whenever we update a repo we also store the external service id that was used to source the repo. So we record the last service to "touch" the repo. If the id's are the same we know it's a simple update, just replace everything. If the ids are different the merge operation happens in a deterministic order based on source id, which is pretty much what is happening here:

sourcegraph/cmd/repo-updater/repos/types.go

Line 735 in 15d54fa

func (r *Repo) Less(s *Repo) bool {

Streaming insertion of new repos

I've tested this and it works as expected.

SyncSubset still works. This is all that sourcegraph.com uses. I can't remember if non dot-com instances use this, maybe streaming inserts?

In the new implementation all instances will use this since, as you guessed, it's use in the streaming inserter. So given that it's used in all code paths now and we've included a bunch of new tests as well as done a fair amount of manual testing I'm fairly confident that it still works.

keegancsmith · 2020-09-04T20:30:58Z

I just remember another case we came across. Not sure how it would apply now:

Syncs repo {Name=A ExternalID=1}
A is deleted and a new repo is created {Name=A ExternalID=2}
Sync repo. Currently we would detect this in the old code (or eventually be consistent) as a brand new repo and delete the old one.

We also have cases we handle like repos swapping names between syncs:

Sync {Name=A ExternalID=1} and {Name=B ExternalID=2}
funky renames
Sync {Name=A ExternalID=2} and {Name=B ExternalID=1}

Throw in different external services to make it more fun. EG to make this applicable, imagine if what is synced by ExternalID=1 and ExternalID=2 are different external services? Because the old code treated this globally, it would pick one winner for a certain name. So we would never violate the unique name constraint.

two external services returning the same repo, but with different data. Easiest conflict here is on the name (eg different name template). Is this deterministic? Does the name flip flop as each syncer runs?

Yes, if different data is returned then we'll end up flip flopping as each service syncs. This isn't something we considered unfortunately.

This flip flopping may be fine. For example you could enforce name templates are the same for same codehosts? Then if a codehost flipflops data we could maybe just accept that as the codehost doing something bad? I can't remember if we ever came across something like that, we only specifically cared about name.

Perhaps we can make it eventually consistent on upsert. Currently newDiff sorts the sourced repos deterministically and merges them in that order. Maybe whenever we update a repo we also store the external service id that was used to source the repo. So we record the last service to "touch" the repo. If the id's are the same we know it's a simple update, just replace everything. If the ids are different the merge operation happens in a deterministic order based on source id, which is pretty much what is happening here:

sourcegraph/cmd/repo-updater/repos/types.go

Line 735 in 15d54fa

func (r *Repo) Less(s *Repo) bool {

I think for merges picking a winner like only min(external_service_id) gets to choose the name may work? But you get into issues like min(external_service_id) having its access token revoked and never syncing (this seems somewhat likely for sourcegraph.com usecase). I think flip flopping is acceptable, as long as we don't flip flop on name.

Streaming insertion of new repos

I've tested this and it works as expected.

Great

SyncSubset still works. This is all that sourcegraph.com uses. I can't remember if non dot-com instances use this, maybe streaming inserts?

In the new implementation all instances will use this since, as you guessed, it's use in the streaming inserter. So given that it's used in all code paths now and we've included a bunch of new tests as well as done a fair amount of manual testing I'm fairly confident that it still works.

Sounds good.

asdine · 2020-09-07T15:18:02Z

@keegancsmith We have added some logic for resolving name conflicts when they happen, this should prevent the flip-flopping when a repo is renamed for any reason.

keegancsmith

LGTM, some concerns over the size of the names list I commented on inline. One wishes names were not unique and externalrepospec was the primary key :)

What is the behaviour when syncers run in parallel? I am guessing this conflicting code may run in a strange way. However, I believe that is probably fine to ignore since a future run will converge to the correct values.

cmd/repo-updater/repos/syncer.go

keegancsmith · 2020-09-08T03:32:29Z

cmd/repo-updater/repos/syncer.go

 		return errors.Wrap(err, "syncer.sync.store.list-repos")
 	}
+	conflicting = conflicting.Filter(func(r *Repo) bool {
+		for _, id := range r.ExternalServiceIDs() {
+			if id == externalServiceID {


probably not worth it, but it seems this filter could relatively easily be represented in SQL.

We wanted to modify ListRepos to allow filtering by external service id and names, but apparently it is not supported. We decided to keep that change for another PR, even though it feels wrong doing that filtering at runtime

ryanslade · 2020-09-08T09:18:50Z

What is the behaviour when syncers run in parallel? I am guessing this conflicting code may run in a strange way. However, I believe that is probably fine to ignore since a future run will converge to the correct values.

Yes, there's a chance that two syncers could try and modify the same repo in different transactions. One will fail and be requeued and should succeed on the next sync so they'll eventually converge.

We use the workerutil resetter now instead

This reverts commit d00157a.

ryanslade and others added 11 commits August 27, 2020 12:50

repo-updater: Sync single external service at a time using workerutil

6a7cf7e

Tests are still failing

repoupdater: Fix server_test

86bcc02

testSyncerSyncWithErrors uses SyncExternalService

19e808d

Also removes a test case that tests a sync of muktiple external services as we no longer support that.

Add EnqueueJobs tests

53e2882

Removed old references to syncer.Sync

4ff5e8d

Code now compiles but a lot of tests still fail. Added a cleanup function that needs to be called when we stop a worker to unregister prometheus metrics so that it doesn't panic when we start another worker in a subsequent test.

Fix repo-updater tests

104f831

Delete orphaned repos

ab4278f

Avoid constraints by changing upsert order

09622ef

Fix minSyncInterval

f131fda

WIP test sync

c144a7d

Add a test that syncs multiple external services in sequence

97fbeb2

This comment has been minimized.

Sign in to view

ryanslade and others added 3 commits August 31, 2020 16:01

Merge branch 'main' into sync_single_external_service

ad688e4

WIP test for orphaned repos

78c9670

Only delete orphaned repos

001aae9

This comment has been minimized.

Sign in to view

ryanslade and others added 12 commits September 1, 2020 17:14

repo-updater: Remove stale jobs on startup

cab5ffa

If a job is processing when repo-updater dies that job would never complete. We would then not requeue the associated external service for syncing. To be safe we delete non locked processing rows on startup.

Typo

9cd82b7

Rename trigger

50ba101

We no longer trigger a full sync but instead trigger enqueueing pending sync jobs.

Merge branch 'main' into sync_single_external_service

fd6d285

Merge branch 'main' into sync_single_external_service

b01f4b4

Remove orphaned repo code

c83cdbc

We now have a trigger that does the same thing

Improve orphaned repo test

802f4a3

Test external service deletion

95c5b92

Remove orphaned repo code from syncSubset

450d678

Remove leftover debug line

00d81ee

Fix deleting of repos when called via syncSubset

f90ea18

ryanslade marked this pull request as ready for review September 2, 2020 15:38

ryanslade requested a review from keegancsmith as a code owner September 2, 2020 15:38

ryanslade mentioned this pull request Sep 4, 2020

Cloud 3.20 Tracking issue #13144

Closed

18 tasks

ryanslade and others added 5 commits September 7, 2020 11:31

Typo

38ad065

Add failing test

1ef5250

Resolve name conflicts deterministically

a418d54

Cleanup name conflicts resolution

e5d7f3d

Resolve name conflicts during renames

82d208f

keegancsmith approved these changes Sep 8, 2020

View reviewed changes

ryanslade added 4 commits September 8, 2020 09:45

Merge branch 'main' into sync_single_external_service

d12207f

Don't wrap nil error

b73e20a

Don't queue jobs for Phabricator services

2881c7f

Handle large name lists in StoreListReposArgs.Names

f3bd2a4

ryanslade and others added 9 commits September 8, 2020 12:17

Trigger rebuild

a82c72e

Merge branch 'main' into sync_single_external_service

f08cf58

Remove stale job cleanup code

4af8ba7

We use the workerutil resetter now instead

Add failing test

18c4d22

Fix predicate on sourcegraphcom mode

f9e4d9d

Don't sync deleted external services

efea120

Merge branch 'main' into sync_single_external_service

fc54381

Merge branch 'main' into sync_single_external_service

2e694c7

When in cloud mode DON'T sync admin added external services

28c4a10

ryanslade merged commit d00157a into main Sep 9, 2020

ryanslade deleted the sync_single_external_service branch September 9, 2020 12:22

ryanslade added a commit that referenced this pull request Sep 9, 2020

Revert "Sync single external service (#13483)"

41ba76a

This reverts commit d00157a.

ryanslade added a commit that referenced this pull request Sep 9, 2020

Revert "Sync single external service (#13483)" (#13720)

b7599f3

This reverts commit d00157a.

ryanslade mentioned this pull request Sep 17, 2020

repo-updater: Sync single external service at a time using workerutil #13742

Merged

asdine mentioned this pull request Sep 23, 2020

dbworker: Pass sql options to TransactableHandle #14044

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync single external service #13483

Sync single external service #13483

ryanslade commented Aug 31, 2020 •

edited

This comment has been minimized.

This comment has been minimized.

ryanslade commented Sep 4, 2020

keegancsmith commented Sep 4, 2020

asdine commented Sep 7, 2020

keegancsmith left a comment

keegancsmith Sep 8, 2020

asdine Sep 8, 2020

ryanslade commented Sep 8, 2020

Sync single external service #13483

Sync single external service #13483

Conversation

ryanslade commented Aug 31, 2020 • edited

This comment has been minimized.

This comment has been minimized.

ryanslade commented Sep 4, 2020

keegancsmith commented Sep 4, 2020

asdine commented Sep 7, 2020

keegancsmith left a comment

Choose a reason for hiding this comment

keegancsmith Sep 8, 2020

Choose a reason for hiding this comment

asdine Sep 8, 2020

Choose a reason for hiding this comment

ryanslade commented Sep 8, 2020

ryanslade commented Aug 31, 2020 •

edited