Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use streaming in repo-updater to make syncing process more responsive #5145

mrnugget opened this issue Aug 9, 2019 · 2 comments · Fixed by #5645

Use streaming in repo-updater to make syncing process more responsive #5145

mrnugget opened this issue Aug 9, 2019 · 2 comments · Fixed by #5645


Copy link

@mrnugget mrnugget commented Aug 9, 2019

Currently the syncing process in repo-updater works in batches: it loads all repositories from each configured external service and then adds/modifies/deletes them from the database.

The drawback of this approach is that if the external service yields thousands and thousands of repositories it takes a long time for a repository to end up in the database and show up in the UI.

That leads to confusion with users thinking that the repo has not been added:

The idea is to change the architecture of repo-updater to stream new and modified repositories. That would make new repos show up immediately after adding a new external service.

  • Slack conversation about slow sync & streaming architecture: Slack
  • @tsenart's WIP streaming branch: b3209a2

This comment has been minimized.

Copy link

@tsenart tsenart commented Sep 14, 2019

@mrnugget: Changing the milestone to 3.9 since I don't think this will make it in until Monday. We should absolutely allocate a few days to test and ship your work on #5526, including code review by @keegancsmith and/or me.

@tsenart tsenart modified the milestones: 3.8, 3.9 Sep 14, 2019

This comment has been minimized.

Copy link
Contributor Author

@mrnugget mrnugget commented Sep 16, 2019

Little update on the state of this ticket.

#5526 has been open for a few days now. It contains a feature-flagged streaming version of the repo-updater syncer.

It is feature-complete and all integration tests are passing,ut there are still some TODOs that should be tackled:

  • Proper review by @keegancsmith (PR has been reviewed by @kzh and @unknwon already, but I'd appreciate Keegan to take a look at this, since he has the most context)
  • Testing the streaming syncer locally. Add external service. Enable feature flag. Make sure that repositories stay as they are. Then: add external service, remove external service, modify external service configuration and make sure the correct repositories are synced. Do the same with multiple external services.
  • Testing the streaming syncer on Enable feature flag and make sure nothing changes. Then to the same as locally: add/remove/modify external services and make sure it behaves correctly.
  • Locally: analyze the "intermediate phases" of a sync (i.e. do we delete and insert the same repos in a single sync? If so, how many useless writes do we do)
  • If applicable: port the naming conflict tests from the TestNewDiff and TestSyncSubset tests to the streaming-sync tests

@tsenart and me decided that, even though my focus shifted from this ticket to A8N, I'll keep pursuing these TODOs in the coming weeks "on the side" so that we don't run into "PR rot" and get the PR merged as soon as possible.

The goal is to get #5526 merged ASAP with the feature disabled behind a flag. Then we can test the feature on and on local dev machines until we feel confident to enable it for customers, possibly in 3.9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
2 participants
You can’t perform that action at this time.