Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repo-updater: Updates every repository on every syncer sync #8501

Closed
keegancsmith opened this issue Feb 19, 2020 · 4 comments
Closed

repo-updater: Updates every repository on every syncer sync #8501

keegancsmith opened this issue Feb 19, 2020 · 4 comments

Comments

@keegancsmith
Copy link
Member

@keegancsmith keegancsmith commented Feb 19, 2020

A customer complained that every repository was being "git fetched" even though the UpdateSchedule had a much later interval. I believe the problem is that we enqueue the repository onto the update queue every time we run the repo metadata syncer!

The fix may be as simple as removing this line

updated = s.updateQueue.enqueue(repo, priorityLow)
log15.Debug("scheduler.updateQueue.enqueued", "repo", r.Name, "updated", updated)

A new repository should be enqueued by the scheduler, so we should still clone new repositories. (This will need to be tested). The only other thing upsert needs to do is update the remote URL on the update queue if it has changed.

Context: Our git scheduler has two parts. The schedule and the updateQueue. The updateQueue is consumed and issues git fetch/git clone against gitserver. schedule stores all repositories and periodically puts items into the updateQueue to actually be updated. The upsert function linked to above is how the scheduler/updatequeue is informed of the list of repositories.

See sourcegraph/customer#31
This may also explain the issues seen with enabling autoGitUpdates at #8458. So is likely closely related to the graphs seen in #8400 (comment)

@keegancsmith keegancsmith added this to the 3.14 milestone Feb 19, 2020
@keegancsmith keegancsmith self-assigned this Feb 19, 2020
@keegancsmith keegancsmith changed the title repo-updater: Updates every repository on every syncer sync. repo-updater: Updates every repository on every syncer sync Feb 19, 2020
@unknwon unknwon assigned unknwon and unassigned keegancsmith Feb 19, 2020
@slimsag

This comment has been minimized.

Copy link
Member

@slimsag slimsag commented Feb 19, 2020

Can someone with context comment when this regression occurred? If it is a recent regression then we can consider notifying customers not to upgrade until we release a patch, if it's been ongoing for several versions then likely it is not a problem

@keegancsmith

This comment has been minimized.

Copy link
Member Author

@keegancsmith keegancsmith commented Feb 19, 2020

I just checked out 3.10 and the same problem exists from reading the code. I then checked out 3.8 and it has the same issue. I would almost expect this problem has existed for a very long time, it's just large scale customers worked around it by adjusting the concurrent git clones / disabled auto git update or had a code host who could handle the git traffic.

@tsenart

This comment has been minimized.

Copy link
Contributor

@tsenart tsenart commented Feb 20, 2020

This is the commit that introduced the regression: 29f961a?w=1

I squashed the update and add logic into upsert, and that was wrong for repos that were only modified.

@unknwon

This comment has been minimized.

Copy link
Member

@unknwon unknwon commented Feb 21, 2020

@keegancsmith:

The only other thing upsert needs to do is update the remote URL on the update queue if it has changed.

After working on the issue, I think this is not a problem? We update whatever in the queue with a new configuredRepo2, and it contains a URL field which I suppose it is the remote URL?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants
You can’t perform that action at this time.