Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of Repo.CommitsCount() #3022

Open
tycho opened this issue Apr 26, 2016 · 15 comments
Open

Improve performance of Repo.CommitsCount() #3022

tycho opened this issue Apr 26, 2016 · 15 comments
Labels
💊 bug Something isn't working ⛔ do not send pull request Don't ever think about it!

Comments

@tycho
Copy link

tycho commented Apr 26, 2016

The "Repo.CommitsCount" API really slows things down.

I have a mirror of Linus' linux.git tree, and viewing that in the gogs web interface is a miserable experience. It takes several seconds for it to render the page. I dug through and found that gogs was executing this command:

# time git rev-list --count bcc981e9ed84c678533299d7eff17d2c81e4d5de
589485

real   0m6.707s
user   0m6.550s
sys    0m0.150s

One way to improve this is to cache the constant data. You could permanently cache the rev-list --count values for tags, as that value would not change (as long as it's keyed by the tag hash rather than the tag name):

# time git describe --tags --abbrev=0 bcc981e9ed84c678533299d7eff17d2c81e4d5de
v4.6-rc5

real    0m0.185s
user    0m0.170s
sys 0m0.010s
# time git show-ref v4.6-rc5
8ef3ad9a813abdf0817acb0b2be30be70bf25a9b refs/tags/v4.6-rc5

real    0m0.009s
user    0m0.000s
sys 0m0.000s
# time git rev-list --count 8ef3ad9a813abdf0817acb0b2be30be70bf25a9b
589482

real    0m6.800s
user    0m6.690s
sys 0m0.100s

So for the commit in question, the most recent tag in its ancestry is v4.6-rc5, and we could permanently cache the rev-list --count for the corresponding tag hash.

Once the cache is populated, calculating the rev-list --count for an arbitary hash would go something like this:

# time git describe --tags --abbrev=0 bcc981e9ed84c678533299d7eff17d2c81e4d5de
v4.6-rc5

real    0m0.187s
user    0m0.160s
sys 0m0.020s
# time git rev-list --count v4.6-rc5..bcc981e9ed84c678533299d7eff17d2c81e4d5de
3

real    0m0.009s
user    0m0.000s
sys 0m0.000s

If you add that '3' to the tag's cached rev-list --count value, then you get the final CommitsCount(). You could cache that value as well, but give it a more sensible expiration.

@DeX77
Copy link

DeX77 commented May 11, 2016

This is even worse when viewing the "releases" page of a large repo with quite some releases as it gets called for every tag.

Example repo with 162 releases(tags):

Gogs Version: 0.9.22.0425 Page: 85559ms Template: 33ms

@tycho
Copy link
Author

tycho commented May 11, 2016

@DeX77 Nice catch, I missed that. All the more reason to cache the rev-list --count at each tag...

Repro is sitting up here if anyone wants to see this painful experience in action: https://git.uplinklabs.net/gogs/steven/linux

@unknwon unknwon added 💊 bug Something isn't working ⛔ do not send pull request Don't ever think about it! labels May 12, 2016
@Thibauth
Copy link

Note that there is a rather new and unknown git feature called bitmap indexes (it think it is not activated by default) which already does exactly this: caching the number of commits reachable from a given commit and that could be used as a cache for rev-list --count. This article is an amazing introduction to this feature: http://githubengineering.com/counting-objects/

Note that once it is activated, rev-list --count automatically takes advantage of it. So fixing this issue could probably be as simple as activating bitmap indexes on the gogs side.

@tycho
Copy link
Author

tycho commented May 12, 2016

Handy. I'm adding this to /etc/gitconfig on my git host and repacking all the repos to see if that helps:

[pack]
    writeBitmapHashCache = true
[repack]
    writeBitmaps = true

@tycho
Copy link
Author

tycho commented May 12, 2016

Hmm, that didn't seem to help. I verified the bitmap was created:

# find . | grep pack-
./objects/pack/pack-02667941ad4b27d69d787dcdafe2e2cc8bff78fc.pack
./objects/pack/pack-02667941ad4b27d69d787dcdafe2e2cc8bff78fc.idx
./objects/pack/pack-02667941ad4b27d69d787dcdafe2e2cc8bff78fc.bitmap

But I don't think git rev-list --count is taking advantage of the new bitmap indexes:

# time git rev-list --count 8ef3ad9a813abdf0817acb0b2be30be70bf25a9b
589482

real    0m6.875s
user    0m6.810s
sys     0m0.050s

@tycho
Copy link
Author

tycho commented May 12, 2016

I looked at the rev-list.c source and apparently there's an extra flag that makes it behave. Wonder if there's a config option too...

# time git rev-list --use-bitmap-index --count 8ef3ad9a813abdf0817acb0b2be30be70bf25a9b
589482

real    0m0.426s
user    0m0.360s
sys     0m0.060s

@Thibauth
Copy link

Great! So should this become the default configuration for Gogs?

@tycho
Copy link
Author

tycho commented May 12, 2016

I don't think it's the full solution. The repository below is fully repacked and has the bitmap indexes. I've also patched gogs to add the --use-bitmap-index flag to rev-list --count invocations.

https://git.uplinklabs.net/gogs/steven/linux

This particular page loads significantly faster. Well... relatively. It's still very slow compared to e.g. GitHub. The "Releases" page is still completely inoperable though -- Chrome gives up waiting for it after a while. So it still seems like some caching mechanism is needed, even if that doesn't particularly help initial page load times.

@tycho
Copy link
Author

tycho commented May 12, 2016

The commit count will make file history a problem too, e.g. via Repo.FileCommitsCount():

$ time git rev-list --use-bitmap-index --count HEAD -- MAINTAINERS
6145

real    0m3.321s
user    0m3.220s
sys 0m0.080s

I think the simplest solution is to stop caring about the number of commits unless a way to cheaply calculate them can be found. In a lot of places, the rev-list count is only a mildly interesting statistic, without any practical use.

@DeX77
Copy link

DeX77 commented May 18, 2016

In a lot of places, the rev-list count is only a mildly interesting statistic, without any practical use.

Indeed, this info is mostly not very usefull anyway. Don't know if gogs can do that but what about doing that async and displaing it "when its done" ?

@tycho
Copy link
Author

tycho commented May 18, 2016

what about doing that async and displaing it "when its done" ?

I think that just shifts the pain from the end user to the server admin. The work still happens, it's just not apparent to the end user. Personally I'd rather not have my server spend the CPU cycles on it.

@unknwon unknwon added this to the 0.11.0 milestone Jul 25, 2016
@unknwon
Copy link
Member

unknwon commented Jul 25, 2016

Note: show Commits and Releases only in repositories home page as GitHub does.

@unknwon
Copy link
Member

unknwon commented Aug 26, 2016

Merge this thread to #3518

@unknwon unknwon closed this as completed Aug 26, 2016
@tycho
Copy link
Author

tycho commented Aug 26, 2016

@unknwon I don't see how the aesthetic design issue you referenced is relevant to a crippling performance issue.

@unknwon
Copy link
Member

unknwon commented Aug 26, 2016

Hmm, you're right.

@unknwon unknwon reopened this Aug 26, 2016
@unknwon unknwon changed the title Repo.CommitsCount() slows down page renders for big repos Improve performance of Repo.CommitsCount() Aug 26, 2016
@unknwon unknwon removed this from the 0.11.0 milestone Feb 18, 2017
ethantkoenig added a commit to ethantkoenig/gogs that referenced this issue Dec 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💊 bug Something isn't working ⛔ do not send pull request Don't ever think about it!
Projects
None yet
Development

No branches or pull requests

4 participants