New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of Repositories.GetAllTags() #1105
Comments
@ryangribble what version of GitHub Enterprise are you running? |
@ryangribble I'm also not sure of a real-world example with that sheer number of tags, so I could test it on my end to understand the behaviour better. Asking around. |
Just ran a simple test against https://github.com/fsprojects/Paket to illustrate some of the pain that's occurring here (note that Paket only has 1185 tags): Or statistically:
Look at that cascading, it's so beautiful... Switching over to
We haven't tweaked the pagination defaults in Octokit (so it defaults to 30 as per the docs) but that could be a quick win for situations like this. I'm thinking about ways to introduce this broadly across the codebase, so the caller can control it if they want, but that's in #760 and has kinda stalled. |
👋 Hey Friends, this issue has been automatically marked as |
Just ran into an interesting situation today where we can reproducibly flatline our Enterprise server CPU by calling the GetAllTags() method on a repository that has around 7000 tags. After about 20 minutes we had to kill the command as it was affecting internal users.
var tags = await this._gitHubClient.Repository.GetAllTags(owner, repo);
We found that when we re-implemented it using the References client it returns almost instantly
var tags = await this._gitHubClient.Git.Reference.GetAllForSubNamespace(owner, repo, "tags");
I know that Octokit GetAll() is actually fetching every page in the paginated results, it looks like the page size is 30 so thats a fair few calls (233 calls)... BUT to have run for more than 20 minutes (with maxed CPU) and still not finished means each call for 30 tags is taking approx 5+ seconds, which doesnt seem right. AND the exact same number of tags (and thus 233 round trip calls) would be made by the References call which took say a second or two to return all of them... The servers adhere to the required GitHub Enterprise hardware specs so I dont think it will be a hardware issue...
Any ideas? I was thinking surely GitHub wouldnt be happy to have the GitHub API accessible, and octokit.net doing a "get all the paginated results every time" type stuff out in the wild if it could hammer a server so bad... what's to stop a user calling "GetAllTags" from some mammoth repository on github.com itself and having such a disproportionate server impact?
But I cant see what we (or even octokit.net) is really doing "wrong" here as all the calls seem to go straight through to the API...
For extra info, I also tried calling the endpoint through Octokit using the Connection.Get<> class so I only got 1 page of 30 tags, and that worked pretty quickly.
It makes me wonder whether I should also be re-implementing things like
Repositories.GetAllBranches()
to use theGit.Reference.GetAllForSubNamespace(x,x "heads")
call, incase the Repository::Branches one is inefficient on large repos like the Tags seems to be...Any insight appreciated!
The text was updated successfully, but these errors were encountered: