Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow repository browsing response times #1518

Open
cameron314 opened this issue Aug 20, 2015 · 39 comments
Open

Slow repository browsing response times #1518

cameron314 opened this issue Aug 20, 2015 · 39 comments
Labels
⛔ do not send pull request Don't ever think about it! 🔨 enhancement Make it better, faster

Comments

@cameron314
Copy link

Configuring Gogs, browsing the commit log, searching, pushing/pulling, etc. is all fairly snappy, but browsing the files of a repository is (comparatively) very, very slow.

For example, I have a repository with 30 entries in the root, and 147 folders in a nested folder. The total page time (as seen at the bottom of the page) is ~700ms for the root, and ~4000ms for the larger nested folder. I realize that's still only 23-27ms per item, but the sum total lag is significant, and makes it difficult to browse between folders. The template rendering time itself is negligible (~10ms).

Below are the top results of an strace -cfp pidof gogs`` for a refresh of the 147-item folder. Note this only shows the time spent in syscalls, which is only about a quarter of the total time (it takes twice as long when strace is running):

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 30.50    0.749541         603      1242           futex
 17.68    0.434417      217209         2         1 restart_syscall
 17.55    0.431308         936       461           epoll_wait
 14.96    0.367632        1585       232           wait4
 13.43    0.329996        2750       120           sched_yield
  2.48    0.061040          90       676           select
  1.07    0.026240           5      5618           read
  0.46    0.011213           2      5789           mmap

Gogs version: Gogs version 0.6.3.0802 Beta
Git version: git version 2.5.0
System: Fresh install of CentOS 7 in a single-core VM on a Windows 8 host (Hyper-V). No anti-virus.

Any idea of what might be causing this?

@unknwon unknwon added the 🔨 enhancement Make it better, faster label Aug 20, 2015
@unknwon
Copy link
Member

unknwon commented Aug 20, 2015

Thanks your detailed feedback!

Any idea of what might be causing this?

In my opinion, I think it's not a topic about what causes this, instead, it is because Gogs hasn't had a cache system for repository Git data(it reloads everything every time you visit a page).

The speed should be improved at some point in the future release.

Hope my explanation help you understand. 😄

@cameron314
Copy link
Author

Ah OK, so this is normal?

@unknwon
Copy link
Member

unknwon commented Aug 20, 2015

At least it is expected for me when:

I have a repository with 30 entries in the root, and 147 folders in a nested folder.

@cameron314
Copy link
Author

Got it :-) It's really a pretty small repository, it's just spread over a lot of files.

@unknwon
Copy link
Member

unknwon commented Aug 20, 2015

Yeah, so parse file header info takes time, cache(when implemented) would definitely help!

@cameron314
Copy link
Author

Hmm, I wonder -- would it help much if the calls to git were batched by changeset? Often many files were changed at the same time.

The calls to git log -1 for each sha1 could have multiple paths appended instead of just one at a time.

@cameron314
Copy link
Author

Hmm, turns out git log -1 path1 path2 doesn't work that way -- it yields the top commit among the set of commits for both files together (one output), instead of the top commit per file (two outputs). Too bad.

This is related to #684.

@hasufell
Copy link

hasufell commented Oct 1, 2015

this is currently a show-stopper for something like this repo: https://github.com/gentoo/gentoo

@unknwon unknwon added this to the 0.7.5 milestone Oct 1, 2015
@unknwon unknwon modified the milestones: 0.7.5, 0.8.0 Nov 15, 2015
@unknwon
Copy link
Member

unknwon commented Dec 20, 2015

Performance is enhanced.

@unknwon unknwon removed this from the 0.9.0 milestone Feb 7, 2016
@Rukenshia
Copy link
Contributor

This is still a thing: https://try.gogs.io/Rukenshia/loonix

It is impossible to browse this repository properly. Loading the "Documentation" folder takes over 45 seconds for me. I dont know how gogs handles viewing the tree, but maybe an idea would be some kind of loading the file info delayed (last commit on that file/directory). I think GitLab does this too.

@unknwon can you tell me whether gogs loads the complete tree of the repository when viewing the repo?

@hasufell
Copy link

Yeah, I too think that caching is not the only answer. The answer is to just show a loading icon for the git commit info, have the information being gathered in the background and just show it when it's available, while still allowing other operations that don't need that information.

@cameron314
Copy link
Author

Gogs finds only the information necessary for the immediate files in the folder being displayed, not the whole tree. But internally, git itself goes through its entire history looking for each child file/folder to get the most recent commit. Git's data structures are really not made for that kind of query.

@hasufell
Copy link

Gogs finds only the information necessary for the immediate files in the folder being displayed,

I don't think you need commit hash, commit message and modification date to just browse the repository, do you (and these are what takes so long)? So these information can be loaded while the tree itself is already shown.

@cameron314
Copy link
Author

Sorry, I wasn't clear. I meant as opposed to loading the information for the whole tree, as Rukenshia was wondering.

I agree that if the information cannot be obtained instantly, preventing the UI from blocking by fetching it in the background is a good idea.

@unknwon
Copy link
Member

unknwon commented Mar 14, 2016

I might use https://github.com/src-d/go-git to test see if we can have any performance gain.

@unknwon unknwon added this to the 0.10.0 milestone Mar 14, 2016
@cameron314
Copy link
Author

I tried at one point to implement the same functionality using the git C API directly (from C code), taking in the whole list of files as input instead of a single one at a time. No matter what I tried, it was still slower than invoking the git process one file at a time, which of course is already too slow.

Fundamentally, whatever client is used, git's data structures don't allow this sort of query to be done quickly.

@unknwon
Copy link
Member

unknwon commented Mar 14, 2016

@cameron314 One git process execution is under 100ms on my dev machine, and I can speed up with unlimited processes running at the same time, but this cause many problems on machines with small memory, so right now it is hard limited to 10 at maximum.

Therefore, I think cache layer is the ultimate solution, and cache ahead browsing, is another way to improve view experience.

@juanfra684
Copy link

I have the same problem with a big repo. I uploaded it to try.gogs.io, so you can see the performance in a known machine/config. The same repo works pretty well with cgit (with the cache enabled).

Probably the slowest directory in the repo is this: https://try.gogs.io/juanfra684/openbsd-ports/src/master/devel (Gogs Version: 0.9.14.0321 Page: 118641ms Template: 214ms)

@Thibauth
Copy link

Thibauth commented Apr 2, 2016

An alternative strategy for very large directories, instead of calling git log for every file, would be to walk the git commit history and check for each commit whether or not a file in the directory was modified in this commit. Using the git command, this would be something like:

git log --name-only --format="%cd" <dirname>

This should be much more efficient than calling git log for each file, because internally git log has to walk the commit history until it finds a commit at which the file was modified. Instead of walking the git commit history n times where n is the number of files, this would only walk it once and collect information about all files in one pass.

Furthermore, using a git library instead of making calls to the git command could make this solution reasonably efficient.

Another advantage of this approach is that it allows trading-off running time for accuracy: there could be a limit on the number of commits to visit. After that, the files for which the date of last modification could not be determined could simply render as "more than x months ago", where x is determined based on the commit date of the last commit visited.

As mentioned above, the git data model was not designed for this kind of queries, so it seems that eventually some caching will be necessary for both efficient and exact answers for large directories/commit history.

@Thibauth
Copy link

Thibauth commented Apr 6, 2016

Ok so I did a very preliminary experiment usling libgit2 and the approach described above. The code is very ugly and not fully functional, but it already gives an idea of the gain which could be obtained.

Testing it on the root directory of the Gentoo repository I am able to cut the running time by a factor two when using the above approach rather than calling git log on each file. A couple of comments about this:

  • I ran into some limitations of libgit2 and, profiling the code, I believe that using a customized pure go implementation of git, we could cut another factor 2
  • the gain would be even larger for larger directories: calling git log for each file grows linearly with the number of file, while the above approach grows sublinearly with the number of files

The gain is not mind-blowing, and it is a design choice to decide to pay in terms of code complexity for it. Ultimately, caching is probably the way to go, and the more I think about it the more I think it wouldn't be too hard to implement.

@cameron314
Copy link
Author

A bit late, but I found the source code of my test, if you want to compare implementations. Here you go: https://gist.github.com/cameron314/c9d55a82cc91e45496ab0c38a31e69cb

@NiklasRosenstein
Copy link

Maybe taking a look into how GitLab implements the repository browsing might be worth a try. I just gave it a shot on a Virtual Machine here on my tower PC. GitLab displays the "Files" tab of the Git repository in very little time (~1s) while Gogs needs ~23s to display the (semantically equivalent) "Code" tab. :(

@sapk
Copy link
Contributor

sapk commented Apr 8, 2016

I don't know for gitlab but i thinks that github take the approach of displaying the file list directly and retreiving after the hash from api. This will also permit to divide the problem and focus on optimizing (maybe by caching) the hash retreiving separatly.

@tycho
Copy link

tycho commented May 3, 2016

Related? #3022

@unknwon
Copy link
Member

unknwon commented May 12, 2016

@tycho kind of, but not the exact same problem.

@Thibauth
Copy link

I think it is related in that if we go for a caching solution for this issue, we might as well cache the commit count at the same time and fix both problems at the same time.

@toxinu
Copy link

toxinu commented Jun 7, 2016

Loading Django repository (~23k commits) with gogs takes about ~5 seconds.

@unknwon unknwon modified the milestones: 0.10.0, 0.11.0 Jul 16, 2016
@BurakDev
Copy link

image

What do you think about this ? Can be a nice feature for large directories like https://github.com/DefinitelyTyped/DefinitelyTyped

@klingtnet
Copy link

klingtnet commented Sep 1, 2016

I have made a mirror of github.com/torvalds/linux and pageloads are between 6 and 40 seconds for this repository on my machine. This process runs with 100% CPU while the page is loading: git rev-list --count <SOME_HASH>. Maybe we can cache the the output of git rev-list --count ...?

@tkausl
Copy link
Contributor

tkausl commented Oct 12, 2016

Is someone working on this? This issue is tagged with "dont send pull request" so I'm not going to send one, but through simple caching I was able to speedup a repositories home (overview) site by 300% on subsequent requests (the first one still takes its time), it's kind of a pain to work with a repository which takes three seconds to load :/

@tgulacsi
Copy link

My biggest problem with this issue is that in the underlying gogits/git-module, there's an artifical .2s delay for every non-commit object. By commenting that Sleep out, a 20s page rendering goes down to 1.6s, without caching!

@gerasiov
Copy link

gerasiov commented Mar 4, 2017

I should mention, that on my server browsing linux kernel repo clone is slow, but possible, but loading tags (release page) takes very long time.

@unknwon unknwon modified the milestones: 0.11, 0.12 Mar 7, 2017
@unknwon unknwon removed this from the 0.12 milestone Apr 7, 2017
@simoesp
Copy link

simoesp commented Apr 10, 2017

my problem is if i have plenty files on a folder ( 1500 ) to be precise it take allot to process almost 13 minutes to open the folder on gogs :(

@tgulacsi
Copy link

tgulacsi commented Apr 10, 2017 via email

richmahn referenced this issue in unfoldingWord/dcs Apr 28, 2017
…ork defaultly (#1518)

* fix go get sub package and add domain on installation to let go get work defaultly

* fix import sequence

* fix .git problem
@agherzan
Copy link

Do we have any progress on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⛔ do not send pull request Don't ever think about it! 🔨 enhancement Make it better, faster
Projects
None yet
Development

No branches or pull requests