Slow repository browsing response times #1518

cameron314 · 2015-08-20T13:01:05Z

Configuring Gogs, browsing the commit log, searching, pushing/pulling, etc. is all fairly snappy, but browsing the files of a repository is (comparatively) very, very slow.

For example, I have a repository with 30 entries in the root, and 147 folders in a nested folder. The total page time (as seen at the bottom of the page) is ~700ms for the root, and ~4000ms for the larger nested folder. I realize that's still only 23-27ms per item, but the sum total lag is significant, and makes it difficult to browse between folders. The template rendering time itself is negligible (~10ms).

Below are the top results of an strace -cfp pidof gogs`` for a refresh of the 147-item folder. Note this only shows the time spent in syscalls, which is only about a quarter of the total time (it takes twice as long when strace is running):

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 30.50    0.749541         603      1242           futex
 17.68    0.434417      217209         2         1 restart_syscall
 17.55    0.431308         936       461           epoll_wait
 14.96    0.367632        1585       232           wait4
 13.43    0.329996        2750       120           sched_yield
  2.48    0.061040          90       676           select
  1.07    0.026240           5      5618           read
  0.46    0.011213           2      5789           mmap

Gogs version: Gogs version 0.6.3.0802 Beta
Git version: git version 2.5.0
System: Fresh install of CentOS 7 in a single-core VM on a Windows 8 host (Hyper-V). No anti-virus.

Any idea of what might be causing this?

The text was updated successfully, but these errors were encountered:

unknwon · 2015-08-20T14:01:39Z

Thanks your detailed feedback!

Any idea of what might be causing this?

In my opinion, I think it's not a topic about what causes this, instead, it is because Gogs hasn't had a cache system for repository Git data(it reloads everything every time you visit a page).

The speed should be improved at some point in the future release.

Hope my explanation help you understand. 😄

cameron314 · 2015-08-20T16:18:22Z

Ah OK, so this is normal?

unknwon · 2015-08-20T16:23:17Z

At least it is expected for me when:

I have a repository with 30 entries in the root, and 147 folders in a nested folder.

cameron314 · 2015-08-20T16:24:06Z

Got it :-) It's really a pretty small repository, it's just spread over a lot of files.

unknwon · 2015-08-20T16:28:41Z

Yeah, so parse file header info takes time, cache(when implemented) would definitely help!

cameron314 · 2015-08-20T17:02:46Z

Hmm, I wonder -- would it help much if the calls to git were batched by changeset? Often many files were changed at the same time.

The calls to git log -1 for each sha1 could have multiple paths appended instead of just one at a time.

cameron314 · 2015-08-20T21:17:52Z

Hmm, turns out git log -1 path1 path2 doesn't work that way -- it yields the top commit among the set of commits for both files together (one output), instead of the top commit per file (two outputs). Too bad.

This is related to #684.

hasufell · 2015-10-01T17:11:18Z

this is currently a show-stopper for something like this repo: https://github.com/gentoo/gentoo

unknwon · 2015-12-20T02:53:05Z

Performance is enhanced.

Rukenshia · 2016-03-14T12:24:29Z

This is still a thing: https://try.gogs.io/Rukenshia/loonix

It is impossible to browse this repository properly. Loading the "Documentation" folder takes over 45 seconds for me. I dont know how gogs handles viewing the tree, but maybe an idea would be some kind of loading the file info delayed (last commit on that file/directory). I think GitLab does this too.

@unknwon can you tell me whether gogs loads the complete tree of the repository when viewing the repo?

hasufell · 2016-03-14T14:08:01Z

Yeah, I too think that caching is not the only answer. The answer is to just show a loading icon for the git commit info, have the information being gathered in the background and just show it when it's available, while still allowing other operations that don't need that information.

cameron314 · 2016-03-14T15:33:41Z

Gogs finds only the information necessary for the immediate files in the folder being displayed, not the whole tree. But internally, git itself goes through its entire history looking for each child file/folder to get the most recent commit. Git's data structures are really not made for that kind of query.

hasufell · 2016-03-14T15:36:34Z

Gogs finds only the information necessary for the immediate files in the folder being displayed,

I don't think you need commit hash, commit message and modification date to just browse the repository, do you (and these are what takes so long)? So these information can be loaded while the tree itself is already shown.

cameron314 · 2016-03-14T15:44:49Z

Sorry, I wasn't clear. I meant as opposed to loading the information for the whole tree, as Rukenshia was wondering.

I agree that if the information cannot be obtained instantly, preventing the UI from blocking by fetching it in the background is a good idea.

unknwon · 2016-03-14T20:41:39Z

I might use https://github.com/src-d/go-git to test see if we can have any performance gain.

cameron314 · 2016-03-14T21:02:18Z

I tried at one point to implement the same functionality using the git C API directly (from C code), taking in the whole list of files as input instead of a single one at a time. No matter what I tried, it was still slower than invoking the git process one file at a time, which of course is already too slow.

Fundamentally, whatever client is used, git's data structures don't allow this sort of query to be done quickly.

unknwon · 2016-03-14T21:06:10Z

@cameron314 One git process execution is under 100ms on my dev machine, and I can speed up with unlimited processes running at the same time, but this cause many problems on machines with small memory, so right now it is hard limited to 10 at maximum.

Therefore, I think cache layer is the ultimate solution, and cache ahead browsing, is another way to improve view experience.

juanfra684 · 2016-03-23T02:43:46Z

I have the same problem with a big repo. I uploaded it to try.gogs.io, so you can see the performance in a known machine/config. The same repo works pretty well with cgit (with the cache enabled).

Probably the slowest directory in the repo is this: https://try.gogs.io/juanfra684/openbsd-ports/src/master/devel (Gogs Version: 0.9.14.0321 Page: 118641ms Template: 214ms)

Thibauth · 2016-04-02T21:56:27Z

An alternative strategy for very large directories, instead of calling git log for every file, would be to walk the git commit history and check for each commit whether or not a file in the directory was modified in this commit. Using the git command, this would be something like:

git log --name-only --format="%cd" <dirname>

This should be much more efficient than calling git log for each file, because internally git log has to walk the commit history until it finds a commit at which the file was modified. Instead of walking the git commit history n times where n is the number of files, this would only walk it once and collect information about all files in one pass.

Furthermore, using a git library instead of making calls to the git command could make this solution reasonably efficient.

Another advantage of this approach is that it allows trading-off running time for accuracy: there could be a limit on the number of commits to visit. After that, the files for which the date of last modification could not be determined could simply render as "more than x months ago", where x is determined based on the commit date of the last commit visited.

As mentioned above, the git data model was not designed for this kind of queries, so it seems that eventually some caching will be necessary for both efficient and exact answers for large directories/commit history.

Thibauth · 2016-04-06T00:52:37Z

Ok so I did a very preliminary experiment usling libgit2 and the approach described above. The code is very ugly and not fully functional, but it already gives an idea of the gain which could be obtained.

Testing it on the root directory of the Gentoo repository I am able to cut the running time by a factor two when using the above approach rather than calling git log on each file. A couple of comments about this:

I ran into some limitations of libgit2 and, profiling the code, I believe that using a customized pure go implementation of git, we could cut another factor 2
the gain would be even larger for larger directories: calling git log for each file grows linearly with the number of file, while the above approach grows sublinearly with the number of files

The gain is not mind-blowing, and it is a design choice to decide to pay in terms of code complexity for it. Ultimately, caching is probably the way to go, and the more I think about it the more I think it wouldn't be too hard to implement.

cameron314 · 2016-04-07T23:53:05Z

A bit late, but I found the source code of my test, if you want to compare implementations. Here you go: https://gist.github.com/cameron314/c9d55a82cc91e45496ab0c38a31e69cb

NiklasRosenstein · 2016-04-08T02:48:50Z

Maybe taking a look into how GitLab implements the repository browsing might be worth a try. I just gave it a shot on a Virtual Machine here on my tower PC. GitLab displays the "Files" tab of the Git repository in very little time (~1s) while Gogs needs ~23s to display the (semantically equivalent) "Code" tab. :(

sapk · 2016-04-08T06:31:48Z

I don't know for gitlab but i thinks that github take the approach of displaying the file list directly and retreiving after the hash from api. This will also permit to divide the problem and focus on optimizing (maybe by caching) the hash retreiving separatly.

tycho · 2016-05-03T03:27:40Z

Related? #3022

unknwon · 2016-05-12T02:06:39Z

@tycho kind of, but not the exact same problem.

Thibauth · 2016-05-12T16:45:08Z

I think it is related in that if we go for a caching solution for this issue, we might as well cache the commit count at the same time and fix both problems at the same time.

toxinu · 2016-06-07T10:15:31Z

Loading Django repository (~23k commits) with gogs takes about ~5 seconds.

BurakDev · 2016-07-18T11:40:37Z

What do you think about this ? Can be a nice feature for large directories like https://github.com/DefinitelyTyped/DefinitelyTyped

klingtnet · 2016-09-01T13:58:38Z

I have made a mirror of github.com/torvalds/linux and pageloads are between 6 and 40 seconds for this repository on my machine. This process runs with 100% CPU while the page is loading: git rev-list --count <SOME_HASH>. Maybe we can cache the the output of git rev-list --count ...?

tkausl · 2016-10-12T03:54:18Z

Is someone working on this? This issue is tagged with "dont send pull request" so I'm not going to send one, but through simple caching I was able to speedup a repositories home (overview) site by 300% on subsequent requests (the first one still takes its time), it's kind of a pain to work with a repository which takes three seconds to load :/

tgulacsi · 2016-10-12T05:00:59Z

My biggest problem with this issue is that in the underlying gogits/git-module, there's an artifical .2s delay for every non-commit object. By commenting that Sleep out, a 20s page rendering goes down to 1.6s, without caching!

gerasiov · 2017-03-04T09:37:27Z

I should mention, that on my server browsing linux kernel repo clone is slow, but possible, but loading tags (release page) takes very long time.

simoesp · 2017-04-10T10:10:56Z

my problem is if i have plenty files on a folder ( 1500 ) to be precise it take allot to process almost 13 minutes to open the folder on gogs :(

tgulacsi · 2017-04-10T10:13:59Z

I transferred to gitea, as this issue is still unresolved here... simoesp <notifications@github.com> ezt írta (időpont: 2017. ápr. 10., H 12:11):

…

my problem is if i have plenty files on a folder ( 1500 ) to be precise it take allot to process almost 13 minutes to open the folder on gogs :( — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1518 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAPoSuyRJB_4gJGuH59pe024mzqvr5qGks5rugBAgaJpZM4FvG8D> .

…ork defaultly (#1518) * fix go get sub package and add domain on installation to let go get work defaultly * fix import sequence * fix .git problem

agherzan · 2018-02-22T12:53:21Z

Do we have any progress on this?

unknwon added the 🔨 enhancement Make it better, faster label Aug 20, 2015

unknwon added this to the 0.7.5 milestone Oct 1, 2015

unknwon modified the milestones: 0.7.5, 0.8.0 Nov 15, 2015

unknwon removed this from the 0.9.0 milestone Feb 7, 2016

unknwon added this to the 0.10.0 milestone Mar 14, 2016

unknwon mentioned this issue Mar 16, 2016

Gogs page load very slow. #2838

Closed

unknwon added the ⛔ do not send pull request Don't ever think about it! label Mar 23, 2016

unknwon mentioned this issue Mar 23, 2016

Cannot allocate memory when viewing repository #2464

Closed

unknwon mentioned this issue Jul 15, 2016

Large directories - long time or 504 error #3264

Closed

6 tasks

unknwon modified the milestones: 0.10.0, 0.11.0 Jul 16, 2016

unknwon mentioned this issue Aug 9, 2016

Long page generation (20sec+) for directories with ~1000 files totalling ~100mb #3402

Closed

6 tasks

tgulacsi mentioned this issue Sep 9, 2016

Sleep induced slowness in gogits gogs/git-module#18

Closed

unknwon modified the milestones: 0.11, 0.12 Mar 7, 2017

unknwon removed this from the 0.12 milestone Apr 7, 2017

unknwon mentioned this issue Apr 17, 2020

Repository browsing still very slow #6114

Closed

Slow repository browsing response times #1518

Slow repository browsing response times #1518

Comments

cameron314 commented Aug 20, 2015

unknwon commented Aug 20, 2015

cameron314 commented Aug 20, 2015

unknwon commented Aug 20, 2015

cameron314 commented Aug 20, 2015

unknwon commented Aug 20, 2015

cameron314 commented Aug 20, 2015

cameron314 commented Aug 20, 2015

hasufell commented Oct 1, 2015

unknwon commented Dec 20, 2015

Rukenshia commented Mar 14, 2016

hasufell commented Mar 14, 2016

cameron314 commented Mar 14, 2016

hasufell commented Mar 14, 2016

cameron314 commented Mar 14, 2016

unknwon commented Mar 14, 2016

cameron314 commented Mar 14, 2016

unknwon commented Mar 14, 2016

juanfra684 commented Mar 23, 2016

Thibauth commented Apr 2, 2016

Thibauth commented Apr 6, 2016

cameron314 commented Apr 7, 2016

NiklasRosenstein commented Apr 8, 2016

sapk commented Apr 8, 2016

tycho commented May 3, 2016

unknwon commented May 12, 2016

Thibauth commented May 12, 2016

toxinu commented Jun 7, 2016

BurakDev commented Jul 18, 2016

klingtnet commented Sep 1, 2016 • edited Loading

tkausl commented Oct 12, 2016

tgulacsi commented Oct 12, 2016

gerasiov commented Mar 4, 2017

simoesp commented Apr 10, 2017

tgulacsi commented Apr 10, 2017 via email

agherzan commented Feb 22, 2018

klingtnet commented Sep 1, 2016 •

edited

Loading