Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upSlow repository browsing response times #1518
Comments
Unknwon
added
the
kind/enhancement
label
Aug 20, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Unknwon
Aug 20, 2015
Member
Thanks your detailed feedback!
Any idea of what might be causing this?
In my opinion, I think it's not a topic about what causes this, instead, it is because Gogs hasn't had a cache system for repository Git data(it reloads everything every time you visit a page).
The speed should be improved at some point in the future release.
Hope my explanation help you understand.
|
Thanks your detailed feedback!
In my opinion, I think it's not a topic about what causes this, instead, it is because Gogs hasn't had a cache system for repository Git data(it reloads everything every time you visit a page). The speed should be improved at some point in the future release. Hope my explanation help you understand. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cameron314
commented
Aug 20, 2015
|
Ah OK, so this is normal? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Unknwon
Aug 20, 2015
Member
At least it is expected for me when:
I have a repository with 30 entries in the root, and 147 folders in a nested folder.
|
At least it is expected for me when:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cameron314
Aug 20, 2015
Got it :-) It's really a pretty small repository, it's just spread over a lot of files.
cameron314
commented
Aug 20, 2015
|
Got it :-) It's really a pretty small repository, it's just spread over a lot of files. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Unknwon
Aug 20, 2015
Member
Yeah, so parse file header info takes time, cache(when implemented) would definitely help!
|
Yeah, so parse file header info takes time, cache(when implemented) would definitely help! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cameron314
Aug 20, 2015
Hmm, I wonder -- would it help much if the calls to git were batched by changeset? Often many files were changed at the same time.
The calls to git log -1 for each sha1 could have multiple paths appended instead of just one at a time.
cameron314
commented
Aug 20, 2015
|
Hmm, I wonder -- would it help much if the calls to The calls to |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cameron314
Aug 20, 2015
Hmm, turns out git log -1 path1 path2 doesn't work that way -- it yields the top commit among the set of commits for both files together (one output), instead of the top commit per file (two outputs). Too bad.
This is related to #684.
cameron314
commented
Aug 20, 2015
|
Hmm, turns out This is related to #684. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
hasufell
Oct 1, 2015
this is currently a show-stopper for something like this repo: https://github.com/gentoo/gentoo
hasufell
commented
Oct 1, 2015
|
this is currently a show-stopper for something like this repo: https://github.com/gentoo/gentoo |
Unknwon
added this to the 0.7.5 milestone
Oct 1, 2015
Unknwon
modified the milestones:
0.7.5,
0.8.0
Nov 15, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Performance is enhanced. |
Unknwon
removed this from the 0.9.0 milestone
Feb 7, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Rukenshia
Mar 14, 2016
Contributor
This is still a thing: https://try.gogs.io/Rukenshia/loonix
It is impossible to browse this repository properly. Loading the "Documentation" folder takes over 45 seconds for me. I dont know how gogs handles viewing the tree, but maybe an idea would be some kind of loading the file info delayed (last commit on that file/directory). I think GitLab does this too.
@Unknwon can you tell me whether gogs loads the complete tree of the repository when viewing the repo?
|
This is still a thing: https://try.gogs.io/Rukenshia/loonix It is impossible to browse this repository properly. Loading the "Documentation" folder takes over 45 seconds for me. I dont know how gogs handles viewing the tree, but maybe an idea would be some kind of loading the file info delayed (last commit on that file/directory). I think GitLab does this too. @Unknwon can you tell me whether gogs loads the complete tree of the repository when viewing the repo? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
hasufell
Mar 14, 2016
Yeah, I too think that caching is not the only answer. The answer is to just show a loading icon for the git commit info, have the information being gathered in the background and just show it when it's available, while still allowing other operations that don't need that information.
hasufell
commented
Mar 14, 2016
|
Yeah, I too think that caching is not the only answer. The answer is to just show a loading icon for the git commit info, have the information being gathered in the background and just show it when it's available, while still allowing other operations that don't need that information. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cameron314
Mar 14, 2016
Gogs finds only the information necessary for the immediate files in the folder being displayed, not the whole tree. But internally, git itself goes through its entire history looking for each child file/folder to get the most recent commit. Git's data structures are really not made for that kind of query.
cameron314
commented
Mar 14, 2016
|
Gogs finds only the information necessary for the immediate files in the folder being displayed, not the whole tree. But internally, git itself goes through its entire history looking for each child file/folder to get the most recent commit. Git's data structures are really not made for that kind of query. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
hasufell
Mar 14, 2016
Gogs finds only the information necessary for the immediate files in the folder being displayed,
I don't think you need commit hash, commit message and modification date to just browse the repository, do you (and these are what takes so long)? So these information can be loaded while the tree itself is already shown.
hasufell
commented
Mar 14, 2016
I don't think you need commit hash, commit message and modification date to just browse the repository, do you (and these are what takes so long)? So these information can be loaded while the tree itself is already shown. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cameron314
Mar 14, 2016
Sorry, I wasn't clear. I meant as opposed to loading the information for the whole tree, as Rukenshia was wondering.
I agree that if the information cannot be obtained instantly, preventing the UI from blocking by fetching it in the background is a good idea.
cameron314
commented
Mar 14, 2016
|
Sorry, I wasn't clear. I meant as opposed to loading the information for the whole tree, as Rukenshia was wondering. I agree that if the information cannot be obtained instantly, preventing the UI from blocking by fetching it in the background is a good idea. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Unknwon
Mar 14, 2016
Member
I might use https://github.com/src-d/go-git to test see if we can have any performance gain.
|
I might use https://github.com/src-d/go-git to test see if we can have any performance gain. |
Unknwon
added this to the 0.10.0 milestone
Mar 14, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cameron314
Mar 14, 2016
I tried at one point to implement the same functionality using the git C API directly (from C code), taking in the whole list of files as input instead of a single one at a time. No matter what I tried, it was still slower than invoking the git process one file at a time, which of course is already too slow.
Fundamentally, whatever client is used, git's data structures don't allow this sort of query to be done quickly.
cameron314
commented
Mar 14, 2016
|
I tried at one point to implement the same functionality using the git C API directly (from C code), taking in the whole list of files as input instead of a single one at a time. No matter what I tried, it was still slower than invoking the git process one file at a time, which of course is already too slow. Fundamentally, whatever client is used, git's data structures don't allow this sort of query to be done quickly. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Unknwon
Mar 14, 2016
Member
@cameron314 One git process execution is under 100ms on my dev machine, and I can speed up with unlimited processes running at the same time, but this cause many problems on machines with small memory, so right now it is hard limited to 10 at maximum.
Therefore, I think cache layer is the ultimate solution, and cache ahead browsing, is another way to improve view experience.
|
@cameron314 One git process execution is under 100ms on my dev machine, and I can speed up with unlimited processes running at the same time, but this cause many problems on machines with small memory, so right now it is hard limited to 10 at maximum. Therefore, I think cache layer is the ultimate solution, and cache ahead browsing, is another way to improve view experience. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
juanfra684
Mar 23, 2016
I have the same problem with a big repo. I uploaded it to try.gogs.io, so you can see the performance in a known machine/config. The same repo works pretty well with cgit (with the cache enabled).
Probably the slowest directory in the repo is this: https://try.gogs.io/juanfra684/openbsd-ports/src/master/devel (Gogs Version: 0.9.14.0321 Page: 118641ms Template: 214ms)
juanfra684
commented
Mar 23, 2016
|
I have the same problem with a big repo. I uploaded it to try.gogs.io, so you can see the performance in a known machine/config. The same repo works pretty well with cgit (with the cache enabled). Probably the slowest directory in the repo is this: https://try.gogs.io/juanfra684/openbsd-ports/src/master/devel (Gogs Version: 0.9.14.0321 Page: 118641ms Template: 214ms) |
Unknwon
added
the
dont send pull request
label
Mar 23, 2016
Unknwon
referenced this issue
Mar 23, 2016
Closed
Cannot allocate memory when viewing repository #2464
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Thibauth
Apr 2, 2016
An alternative strategy for very large directories, instead of calling git log for every file, would be to walk the git commit history and check for each commit whether or not a file in the directory was modified in this commit. Using the git command, this would be something like:
git log --name-only --format="%cd" <dirname>
This should be much more efficient than calling git log for each file, because internally git log has to walk the commit history until it finds a commit at which the file was modified. Instead of walking the git commit history n times where n is the number of files, this would only walk it once and collect information about all files in one pass.
Furthermore, using a git library instead of making calls to the git command could make this solution reasonably efficient.
Another advantage of this approach is that it allows trading-off running time for accuracy: there could be a limit on the number of commits to visit. After that, the files for which the date of last modification could not be determined could simply render as "more than x months ago", where x is determined based on the commit date of the last commit visited.
As mentioned above, the git data model was not designed for this kind of queries, so it seems that eventually some caching will be necessary for both efficient and exact answers for large directories/commit history.
Thibauth
commented
Apr 2, 2016
|
An alternative strategy for very large directories, instead of calling git log for every file, would be to walk the git commit history and check for each commit whether or not a file in the directory was modified in this commit. Using the git command, this would be something like:
This should be much more efficient than calling Furthermore, using a git library instead of making calls to the git command could make this solution reasonably efficient. Another advantage of this approach is that it allows trading-off running time for accuracy: there could be a limit on the number of commits to visit. After that, the files for which the date of last modification could not be determined could simply render as "more than x months ago", where As mentioned above, the git data model was not designed for this kind of queries, so it seems that eventually some caching will be necessary for both efficient and exact answers for large directories/commit history. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cameron314
Apr 3, 2016
@Thibauth: I tried pretty much this. Using libgit in C. It was slower than invoking git for each file (which doesn't pass through libgit if I understand correctly, but uses even lower level code).
But I like the idea of limiting the search to give a reasonable worst case instead of degenerating.
cameron314
commented
Apr 3, 2016
|
@Thibauth: I tried pretty much this. Using libgit in C. It was slower than invoking git for each file (which doesn't pass through libgit if I understand correctly, but uses even lower level code). But I like the idea of limiting the search to give a reasonable worst case instead of degenerating. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Thibauth
Apr 3, 2016
@cameron314 thanks for your answer. It is interesting that you tried and didn't find any performance gain by using libgit. Do you still have the code you experimented with (maybe you can put it on gist or something)? I think there are several ways to go about this and I agree with you that making it efficient will require doing it at a pretty low level, similar to how it is done in git.
Also, following the discussion in #2592, to implement this kind of approach in Go, we will have to use either git2go (go bindings to the C library libgit2) or the more recent pure go go-git. @Unknwon seems to prefer a pure go solution for portability reasons. I don't have much time right now, but I am also considering experimenting with all this a couple of weeks from now. I also have a few thoughts about how the caching could be done.
Thibauth
commented
Apr 3, 2016
|
@cameron314 thanks for your answer. It is interesting that you tried and didn't find any performance gain by using libgit. Do you still have the code you experimented with (maybe you can put it on gist or something)? I think there are several ways to go about this and I agree with you that making it efficient will require doing it at a pretty low level, similar to how it is done in git. Also, following the discussion in #2592, to implement this kind of approach in Go, we will have to use either git2go (go bindings to the C library libgit2) or the more recent pure go go-git. @Unknwon seems to prefer a pure go solution for portability reasons. I don't have much time right now, but I am also considering experimenting with all this a couple of weeks from now. I also have a few thoughts about how the caching could be done. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cameron314
Apr 4, 2016
@Thibauth Unfortunately I lost the source code. It was just a crude test so I was a little careless with it. I remember basing most of the code on the log example, with the key being git_pathspec_match_tree (I tried a few other ways too, can't quite remember now). But even if the path matching could be made very fast, walking the commit log is fundamentally too slow in the worst case. Precomputing the desired per-file/directory information and updating it as commits are pushed is probably a better solution, though obviously there is a non-negligible space (and complexity) cost to that.
cameron314
commented
Apr 4, 2016
|
@Thibauth Unfortunately I lost the source code. It was just a crude test so I was a little careless with it. I remember basing most of the code on the log example, with the key being |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Thibauth
Apr 4, 2016
Thanks, I will keep that in mind when experimenting. Regarding the caching, I think the space cost should really be close to negligible: storing the hash of the last commit for each entry in the tree should take the same amount of space as storing a single tree associated with a single commit, so this is basically the size of single commit.
Thibauth
commented
Apr 4, 2016
|
Thanks, I will keep that in mind when experimenting. Regarding the caching, I think the space cost should really be close to negligible: storing the hash of the last commit for each entry in the tree should take the same amount of space as storing a single tree associated with a single commit, so this is basically the size of single commit. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Thibauth
Apr 6, 2016
Ok so I did a very preliminary experiment usling libgit2 and the approach described above. The code is very ugly and not fully functional, but it already gives an idea of the gain which could be obtained.
Testing it on the root directory of the Gentoo repository I am able to cut the running time by a factor two when using the above approach rather than calling git log on each file. A couple of comments about this:
- I ran into some limitations of libgit2 and, profiling the code, I believe that using a customized pure go implementation of git, we could cut another factor 2
- the gain would be even larger for larger directories: calling
git logfor each file grows linearly with the number of file, while the above approach grows sublinearly with the number of files
The gain is not mind-blowing, and it is a design choice to decide to pay in terms of code complexity for it. Ultimately, caching is probably the way to go, and the more I think about it the more I think it wouldn't be too hard to implement.
Thibauth
commented
Apr 6, 2016
|
Ok so I did a very preliminary experiment usling libgit2 and the approach described above. The code is very ugly and not fully functional, but it already gives an idea of the gain which could be obtained. Testing it on the root directory of the Gentoo repository I am able to cut the running time by a factor two when using the above approach rather than calling
The gain is not mind-blowing, and it is a design choice to decide to pay in terms of code complexity for it. Ultimately, caching is probably the way to go, and the more I think about it the more I think it wouldn't be too hard to implement. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cameron314
Apr 7, 2016
A bit late, but I found the source code of my test, if you want to compare implementations. Here you go: https://gist.github.com/cameron314/c9d55a82cc91e45496ab0c38a31e69cb
cameron314
commented
Apr 7, 2016
|
A bit late, but I found the source code of my test, if you want to compare implementations. Here you go: https://gist.github.com/cameron314/c9d55a82cc91e45496ab0c38a31e69cb |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
NiklasRosenstein
Apr 8, 2016
Maybe taking a look into how GitLab implements the repository browsing might be worth a try. I just gave it a shot on a Virtual Machine here on my tower PC. GitLab displays the "Files" tab of the Git repository in very little time (~1s) while Gogs needs ~23s to display the (semantically equivalent) "Code" tab. :(
NiklasRosenstein
commented
Apr 8, 2016
|
Maybe taking a look into how GitLab implements the repository browsing might be worth a try. I just gave it a shot on a Virtual Machine here on my tower PC. GitLab displays the "Files" tab of the Git repository in very little time (~1s) while Gogs needs ~23s to display the (semantically equivalent) "Code" tab. :( |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
sapk
Apr 8, 2016
Contributor
I don't know for gitlab but i thinks that github take the approach of displaying the file list directly and retreiving after the hash from api. This will also permit to divide the problem and focus on optimizing (maybe by caching) the hash retreiving separatly.
|
I don't know for gitlab but i thinks that github take the approach of displaying the file list directly and retreiving after the hash from api. This will also permit to divide the problem and focus on optimizing (maybe by caching) the hash retreiving separatly. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tycho
commented
May 3, 2016
|
Related? #3022 |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
@tycho kind of, but not the exact same problem. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Thibauth
May 12, 2016
I think it is related in that if we go for a caching solution for this issue, we might as well cache the commit count at the same time and fix both problems at the same time.
Thibauth
commented
May 12, 2016
|
I think it is related in that if we go for a caching solution for this issue, we might as well cache the commit count at the same time and fix both problems at the same time. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
toxinu
commented
Jun 7, 2016
|
Loading Django repository (~23k commits) with gogs takes about ~5 seconds. |
Unknwon
modified the milestones:
0.10.0,
0.11.0
Jul 16, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
BurakDev
Jul 18, 2016
What do you think about this ? Can be a nice feature for large directories like https://github.com/DefinitelyTyped/DefinitelyTyped
BurakDev
commented
Jul 18, 2016
|
What do you think about this ? Can be a nice feature for large directories like https://github.com/DefinitelyTyped/DefinitelyTyped |
Unknwon
referenced this issue
Aug 9, 2016
Closed
Long page generation (20sec+) for directories with ~1000 files totalling ~100mb #3402
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
klingtnet
Sep 1, 2016
I have made a mirror of github.com/torvalds/linux and pageloads are between 6 and 40 seconds for this repository on my machine. This process runs with 100% CPU while the page is loading: git rev-list --count <SOME_HASH>. Maybe we can cache the the output of git rev-list --count ...?
klingtnet
commented
Sep 1, 2016
•
|
I have made a mirror of |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tkausl
Oct 12, 2016
Contributor
Is someone working on this? This issue is tagged with "dont send pull request" so I'm not going to send one, but through simple caching I was able to speedup a repositories home (overview) site by 300% on subsequent requests (the first one still takes its time), it's kind of a pain to work with a repository which takes three seconds to load :/
|
Is someone working on this? This issue is tagged with "dont send pull request" so I'm not going to send one, but through simple caching I was able to speedup a repositories home (overview) site by 300% on subsequent requests (the first one still takes its time), it's kind of a pain to work with a repository which takes three seconds to load :/ |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tgulacsi
Oct 12, 2016
My biggest problem with this issue is that in the underlying gogits/git-module, there's an artifical .2s delay for every non-commit object. By commenting that Sleep out, a 20s page rendering goes down to 1.6s, without caching!
tgulacsi
commented
Oct 12, 2016
|
My biggest problem with this issue is that in the underlying gogits/git-module, there's an artifical .2s delay for every non-commit object. By commenting that |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gerasiov
Mar 4, 2017
I should mention, that on my server browsing linux kernel repo clone is slow, but possible, but loading tags (release page) takes very long time.
gerasiov
commented
Mar 4, 2017
|
I should mention, that on my server browsing linux kernel repo clone is slow, but possible, but loading tags (release page) takes very long time. |
Unknwon
modified the milestones:
0.11,
0.12
Mar 7, 2017
Unknwon
removed this from the 0.12 milestone
Apr 7, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
simoesp
Apr 10, 2017
my problem is if i have plenty files on a folder ( 1500 ) to be precise it take allot to process almost 13 minutes to open the folder on gogs :(
simoesp
commented
Apr 10, 2017
|
my problem is if i have plenty files on a folder ( 1500 ) to be precise it take allot to process almost 13 minutes to open the folder on gogs :( |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tgulacsi
Apr 10, 2017
tgulacsi
commented
Apr 10, 2017
|
I transferred to gitea, as this issue is still unresolved here...
simoesp <notifications@github.com> ezt írta (időpont: 2017. ápr. 10., H
12:11):
… my problem is if i have plenty files on a folder ( 1500 ) to be precise it
take allot to process almost 13 minutes to open the folder on gogs :(
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1518 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAPoSuyRJB_4gJGuH59pe024mzqvr5qGks5rugBAgaJpZM4FvG8D>
.
|
pushed a commit
to unfoldingWord-dev/gogs
that referenced
this issue
Apr 28, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
agherzan
commented
Feb 22, 2018
|
Do we have any progress on this? |

cameron314 commentedAug 20, 2015
Configuring Gogs, browsing the commit log, searching, pushing/pulling, etc. is all fairly snappy, but browsing the files of a repository is (comparatively) very, very slow.
For example, I have a repository with 30 entries in the root, and 147 folders in a nested folder. The total page time (as seen at the bottom of the page) is ~700ms for the root, and ~4000ms for the larger nested folder. I realize that's still only 23-27ms per item, but the sum total lag is significant, and makes it difficult to browse between folders. The template rendering time itself is negligible (~10ms).
Below are the top results of an
strace -cfppidof gogs`` for a refresh of the 147-item folder. Note this only shows the time spent in syscalls, which is only about a quarter of the total time (it takes twice as long whenstraceis running):Gogs version: Gogs version 0.6.3.0802 Beta
Git version: git version 2.5.0
System: Fresh install of CentOS 7 in a single-core VM on a Windows 8 host (Hyper-V). No anti-virus.
Any idea of what might be causing this?