Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of Repositories.GetAllTags() #1105

Closed
ryangribble opened this issue Feb 12, 2016 · 4 comments
Closed

Performance of Repositories.GetAllTags() #1105

ryangribble opened this issue Feb 12, 2016 · 4 comments
Labels
Status: Stale Used by stalebot to clean house

Comments

@ryangribble
Copy link
Contributor

Just ran into an interesting situation today where we can reproducibly flatline our Enterprise server CPU by calling the GetAllTags() method on a repository that has around 7000 tags. After about 20 minutes we had to kill the command as it was affecting internal users.
var tags = await this._gitHubClient.Repository.GetAllTags(owner, repo);

We found that when we re-implemented it using the References client it returns almost instantly
var tags = await this._gitHubClient.Git.Reference.GetAllForSubNamespace(owner, repo, "tags");

image

I know that Octokit GetAll() is actually fetching every page in the paginated results, it looks like the page size is 30 so thats a fair few calls (233 calls)... BUT to have run for more than 20 minutes (with maxed CPU) and still not finished means each call for 30 tags is taking approx 5+ seconds, which doesnt seem right. AND the exact same number of tags (and thus 233 round trip calls) would be made by the References call which took say a second or two to return all of them... The servers adhere to the required GitHub Enterprise hardware specs so I dont think it will be a hardware issue...

Any ideas? I was thinking surely GitHub wouldnt be happy to have the GitHub API accessible, and octokit.net doing a "get all the paginated results every time" type stuff out in the wild if it could hammer a server so bad... what's to stop a user calling "GetAllTags" from some mammoth repository on github.com itself and having such a disproportionate server impact?
But I cant see what we (or even octokit.net) is really doing "wrong" here as all the calls seem to go straight through to the API...

For extra info, I also tried calling the endpoint through Octokit using the Connection.Get<> class so I only got 1 page of 30 tags, and that worked pretty quickly.

It makes me wonder whether I should also be re-implementing things like Repositories.GetAllBranches() to use the Git.Reference.GetAllForSubNamespace(x,x "heads") call, incase the Repository::Branches one is inefficient on large repos like the Tags seems to be...

Any insight appreciated!

@shiftkey
Copy link
Member

@ryangribble what version of GitHub Enterprise are you running?

@shiftkey
Copy link
Member

@ryangribble I'm also not sure of a real-world example with that sheer number of tags, so I could test it on my end to understand the behaviour better. Asking around.

@shiftkey
Copy link
Member

Just ran a simple test against https://github.com/fsprojects/Paket to illustrate some of the pain that's occurring here (note that Paket only has 1185 tags):

Or statistically:

Request Count:   40
Bytes Sent:      15,169     (headers:15,169; body:0)
Bytes Received:  116,598        (headers:58,451; body:58,147)

ACTUAL PERFORMANCE
--------------
Requests started at:        13:34:26.051
Responses completed at: 13:34:40.223
Sequence (clock) duration:  00:00:14.1719851
Aggregate Session duration: 00:00:13.948

RESPONSE CODES
--------------
HTTP/200:   40

RESPONSE BYTES (by Content-Type)
--------------
       ~headers~: 58,451
application/json: 58,147

Look at that cascading, it's so beautiful...

Switching over to this._gitHubClient.Git.Reference.GetAllForSubNamespace(owner, repo, "tags"); only requires one request to get all the required references.

Request Count:   1
Bytes Sent:      405        (headers:405; body:0)
Bytes Received:  344,903        (headers:1,165; body:343,738)

ACTUAL PERFORMANCE
--------------
ClientConnected:    13:36:02.963
ClientBeginRequest: 13:36:03.854
GotRequestHeaders:  13:36:03.854
ClientDoneRequest:  13:36:03.854
Determine Gateway:  0ms
DNS Lookup:         0ms
TCP/IP Connect: 0ms
HTTPS Handshake:    0ms
ServerConnected:    13:36:03.244
FiddlerBeginRequest:    13:36:03.854
ServerGotRequest:   13:36:03.854
ServerBeginResponse:    13:36:04.229
GotResponseHeaders: 13:36:04.229
ServerDoneResponse: 13:36:04.745
ClientBeginResponse:    13:36:04.745
ClientDoneResponse: 13:36:04.745

    Overall Elapsed:    00:00:00.8917767

RESPONSE BYTES (by Content-Type)
--------------
application/json: 343,738
       ~headers~: 1,165

We haven't tweaked the pagination defaults in Octokit (so it defaults to 30 as per the docs) but that could be a quick win for situations like this. I'm thinking about ways to introduce this broadly across the codebase, so the caller can control it if they want, but that's in #760 and has kinda stalled.

@shiftkey shiftkey mentioned this issue Feb 14, 2016
13 tasks
@shiftkey shiftkey added the meta label Mar 11, 2016
@github-actions
Copy link

👋 Hey Friends, this issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Please add the pinned label if you feel that this issue needs to remain open/active. Thank you for your contributions and help in keeping things tidy!

@github-actions github-actions bot added the Stale label Jul 12, 2022
@nickfloyd nickfloyd added Status: Stale Used by stalebot to clean house and removed Stale labels Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale Used by stalebot to clean house
Projects
None yet
Development

No branches or pull requests

3 participants