Performance of Repositories.GetAllTags() #1105

ryangribble · 2016-02-12T07:26:53Z

Just ran into an interesting situation today where we can reproducibly flatline our Enterprise server CPU by calling the GetAllTags() method on a repository that has around 7000 tags. After about 20 minutes we had to kill the command as it was affecting internal users.
var tags = await this._gitHubClient.Repository.GetAllTags(owner, repo);

We found that when we re-implemented it using the References client it returns almost instantly
var tags = await this._gitHubClient.Git.Reference.GetAllForSubNamespace(owner, repo, "tags");

I know that Octokit GetAll() is actually fetching every page in the paginated results, it looks like the page size is 30 so thats a fair few calls (233 calls)... BUT to have run for more than 20 minutes (with maxed CPU) and still not finished means each call for 30 tags is taking approx 5+ seconds, which doesnt seem right. AND the exact same number of tags (and thus 233 round trip calls) would be made by the References call which took say a second or two to return all of them... The servers adhere to the required GitHub Enterprise hardware specs so I dont think it will be a hardware issue...

Any ideas? I was thinking surely GitHub wouldnt be happy to have the GitHub API accessible, and octokit.net doing a "get all the paginated results every time" type stuff out in the wild if it could hammer a server so bad... what's to stop a user calling "GetAllTags" from some mammoth repository on github.com itself and having such a disproportionate server impact?
But I cant see what we (or even octokit.net) is really doing "wrong" here as all the calls seem to go straight through to the API...

For extra info, I also tried calling the endpoint through Octokit using the Connection.Get<> class so I only got 1 page of 30 tags, and that worked pretty quickly.

It makes me wonder whether I should also be re-implementing things like Repositories.GetAllBranches() to use the Git.Reference.GetAllForSubNamespace(x,x "heads") call, incase the Repository::Branches one is inefficient on large repos like the Tags seems to be...

Any insight appreciated!

The text was updated successfully, but these errors were encountered:

shiftkey · 2016-02-13T06:58:05Z

@ryangribble what version of GitHub Enterprise are you running?

shiftkey · 2016-02-13T07:04:04Z

@ryangribble I'm also not sure of a real-world example with that sheer number of tags, so I could test it on my end to understand the behaviour better. Asking around.

shiftkey · 2016-02-14T02:44:29Z

Just ran a simple test against https://github.com/fsprojects/Paket to illustrate some of the pain that's occurring here (note that Paket only has 1185 tags):

Or statistically:

Request Count:   40
Bytes Sent:      15,169     (headers:15,169; body:0)
Bytes Received:  116,598        (headers:58,451; body:58,147)

ACTUAL PERFORMANCE
--------------
Requests started at:        13:34:26.051
Responses completed at: 13:34:40.223
Sequence (clock) duration:  00:00:14.1719851
Aggregate Session duration: 00:00:13.948

RESPONSE CODES
--------------
HTTP/200:   40

RESPONSE BYTES (by Content-Type)
--------------
       ~headers~: 58,451
application/json: 58,147

Look at that cascading, it's so beautiful...

Switching over to this._gitHubClient.Git.Reference.GetAllForSubNamespace(owner, repo, "tags"); only requires one request to get all the required references.

Request Count:   1
Bytes Sent:      405        (headers:405; body:0)
Bytes Received:  344,903        (headers:1,165; body:343,738)

ACTUAL PERFORMANCE
--------------
ClientConnected:    13:36:02.963
ClientBeginRequest: 13:36:03.854
GotRequestHeaders:  13:36:03.854
ClientDoneRequest:  13:36:03.854
Determine Gateway:  0ms
DNS Lookup:         0ms
TCP/IP Connect: 0ms
HTTPS Handshake:    0ms
ServerConnected:    13:36:03.244
FiddlerBeginRequest:    13:36:03.854
ServerGotRequest:   13:36:03.854
ServerBeginResponse:    13:36:04.229
GotResponseHeaders: 13:36:04.229
ServerDoneResponse: 13:36:04.745
ClientBeginResponse:    13:36:04.745
ClientDoneResponse: 13:36:04.745

    Overall Elapsed:    00:00:00.8917767

RESPONSE BYTES (by Content-Type)
--------------
application/json: 343,738
       ~headers~: 1,165

We haven't tweaked the pagination defaults in Octokit (so it defaults to 30 as per the docs) but that could be a quick win for situations like this. I'm thinking about ways to introduce this broadly across the codebase, so the caller can control it if they want, but that's in #760 and has kinda stalled.

github-actions · 2022-07-12T02:16:16Z

👋 Hey Friends, this issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Please add the pinned label if you feel that this issue needs to remain open/active. Thank you for your contributions and help in keeping things tidy!

shiftkey mentioned this issue Feb 14, 2016

Pagination Support #760

Merged

13 tasks

shiftkey added the meta label Mar 11, 2016

github-actions bot added the Stale label Jul 12, 2022

github-actions bot closed this as completed Jul 19, 2022

nickfloyd added Status: Stale Used by stalebot to clean house and removed Stale labels Oct 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of Repositories.GetAllTags() #1105

Performance of Repositories.GetAllTags() #1105

ryangribble commented Feb 12, 2016

shiftkey commented Feb 13, 2016

shiftkey commented Feb 13, 2016

shiftkey commented Feb 14, 2016

github-actions bot commented Jul 12, 2022

Performance of Repositories.GetAllTags() #1105

Performance of Repositories.GetAllTags() #1105

Comments

ryangribble commented Feb 12, 2016

shiftkey commented Feb 13, 2016

shiftkey commented Feb 13, 2016

shiftkey commented Feb 14, 2016

github-actions bot commented Jul 12, 2022