Git checkout slow with many LFS files #931

Closed
larsxschneider opened this Issue Jan 13, 2016 · 22 comments

Projects

None yet

9 participants

@larsxschneider
Contributor

Hi,

we are using Git-LFS at my company and in general we are really happy as it nicely solves the binary files problem in Git. We started using Git-LFS for all binary files and experienced very slow checkouts for a repository with +10k binaries.

E.g. consider this repo with 15k LFS files:
https://github.com/larsxschneider/lfstest-manyfiles

Checkout time on my Mac with SSD is around 10min (which is OK).
Checkout time on Windows with SSD is well over an hour (which is not so nice).

The reason for this slow checkout is, as you probably already know, the invocation of the Git-LFS process via the Git attribute filters. 10k files means 10k process executions which is, of course, horribly slow on Windows. As far as I understand the design of Git-LFS there is not much we can do about it.

However, what if we could convince the Git core team to merge a patch that (optionally) makes Git talking to a local socket instead of executing a process for Git filters? Then we could run a Git-LFS daemon kind of thing and I imagine the checkout and other operations could be much faster. Of course the daemon should be optional.

Do you think this idea is worth exploring? Do you see an easier way to tackle this problem?

Thanks,
Lars

@peff
peff commented Jan 13, 2016

Where does the time go on Windows? Is the CPU pegged? Or are we getting killed by the latency of starting many processes serially?

@strich
Contributor
strich commented Jan 16, 2016

You can work around the single-process-per-file problem by temporarily disabling the smudge filter and then using git lfs pull to grab the files in a single pass. Example here: #911 (comment).

@larsxschneider
Contributor

@strich I know. However, disabling the smudge filter with a variable is still slow.

The git-lfs binary is ~10MB and it takes quite some time to run it even if it does nothing due to the GIT_LFS_SKIP_SMUDGE flag (test A). Ideally we would deactivate the LFS filter on the initial clone all together which I tried with test C. The problem with this approach is that LFS is deactivated globally and that might cause trouble with parallel Git/Git-LFS processes. Unfortunately I was not able to run git clone with a deactivated config. The best I came up with was to set the LFS filter to the cat command (test B1). See this gist for a git clone wrapper using this approach. I also tested the true command (test B2) is to measure the overhead of calling an external process.

Here are the clone timings for the test repo mentioned above (exact test run commands below):

Test/time in sec OS X Windows
A GIT_LFS_SKIP_SMUDGE=1 210 2749
B1 --config filter.lfs.smudge=cat 33 432
B2 --config filter.lfs.smudge=true 29 229
C git-lfs uninstall 4 37

As you can see the "no smudge filter" solution is more than 50 times faster than the GIT_LFS_SKIP_SMUDGE flag on my machines.

@peff: I will try to create a git-core patch which allows to deactivate a git config value on git clone (e.g. git clone --unset-config filter.lfs.smudge <repository>). Maybe it gets accepted.

@technoweenie: Do you see an issue with disabling the filter in the clone statement temporarily?


OS X - Test A
$ rm -rf lfstest-manyfiles && time GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/larsxschneider/lfstest-manyfiles.git
Cloning into 'lfstest-manyfiles'...
remote: Counting objects: 15010, done.
remote: Compressing objects: 100% (15008/15008), done.
remote: Total 15010 (delta 0), reused 15010 (delta 0), pack-reused 0
Receiving objects: 100% (15010/15010), 2.02 MiB | 1.61 MiB/s, done.
Checking connectivity... done.
Checking out files: 100% (15001/15001), done.
GIT_LFS_SKIP_SMUDGE=1 git clone   64.56s user 133.35s system 94% cpu 3:29.81 total
OS X - Test B1/2
$ rm -rf lfstest-manyfiles && time git clone --config filter.lfs.smudge=true https://github.com/larsxschneider/lfstest-manyfiles.git
Cloning into 'lfstest-manyfiles'...
remote: Counting objects: 15010, done.
remote: Compressing objects: 100% (15008/15008), done.
remote: Total 15010 (delta 0), reused 15010 (delta 0), pack-reused 0
Receiving objects: 100% (15010/15010), 2.02 MiB | 1.79 MiB/s, done.
Checking connectivity... done.
Checking out files: 100% (15001/15001), done.
git clone --config filter.lfs.smudge=true   9.85s user 16.43s system 92% cpu 28.551 total
OS X - Test C
$ rm -rf lfstest-manyfiles && git-lfs uninstall && time git clone https://github.com/larsxschneider/lfstest-manyfiles.git
Global Git LFS configuration has been removed.
Cloning into 'lfstest-manyfiles'...
remote: Counting objects: 15010, done.
remote: Compressing objects: 100% (15008/15008), done.
remote: Total 15010 (delta 0), reused 15010 (delta 0), pack-reused 0
Receiving objects: 100% (15010/15010), 2.02 MiB | 1.60 MiB/s, done.
Checking connectivity... done.
git clone https://github.com/larsxschneider/lfstest-manyfiles.git  0.28s user 1.31s system 46% cpu 3.435 total

Windows - Test A
$ rm -rf lfstest-manyfiles && time GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/larsxschneider/lfstest-manyfiles.git
Cloning into 'lfstest-manyfiles'...
remote: Counting objects: 15010, done.
remote: Compressing objects: 100% (15008/15008), done.
remote: Total 15010 (delta 0), reused 15010 (delta 0), pack-reused 0
Receiving objects: 100% (15010/15010), 2.02 MiB | 1.63 MiB/s, done.
Checking connectivity... done.
Checking out files: 100% (15001/15001), done.

real    45m48.289s
user    0m0.000s
sys     0m0.015s
Windows - Test B1/2
$ rm -rf lfstest-manyfiles && time git clone --config filter.lfs.smudge=true https://github.com/larsxschneider/lfstest-manyfiles.git
Cloning into 'lfstest-manyfiles'...
remote: Counting objects: 15010, done.
remote: Compressing objects: 100% (15008/15008), done.
remote: Total 15010 (delta 0), reused 15010 (delta 0), pack-reused 0
Receiving objects: 100% (15010/15010), 2.02 MiB | 1.55 MiB/s, done.
Checking connectivity... done.
Checking out files: 100% (15001/15001), done.

real    3m48.358s
user    0m0.015s
sys     0m0.000s
Windows - Test C
$ rm -rf lfstest-manyfiles && git-lfs uninstall && time git clone https://github.com/larsxschneider/lfstest-manyfiles.git
Global Git LFS configuration has been removed.
Cloning into 'lfstest-manyfiles'...
remote: Counting objects: 15010, done.
remote: Compressing objects: 100% (15008/15008), done.
remote: Total 15010 (delta 0), reused 15010 (delta 0), pack-reused 0
Receiving objects: 100% (15010/15010), 2.02 MiB | 1.51 MiB/s, done.
Checking connectivity... done.
Checking out files: 100% (15001/15001), done.

real    0m36.816s
user    0m0.000s
sys     0m0.000s
@strich
Contributor
strich commented Jan 18, 2016

Good analysis @larsxschneider. In addition to what you've discussed already, can you comment on why the OSX and Windows tests are so wildly different? I'd be very interested in looking into resolving that myself.

@larsxschneider
Contributor

Process creation on Windows is just slower (see here for a few arguments). This slowness has nothing to do with git or git-lfs.

For the fun of it you can start the cat command a 1000 times on different systems:
time bash -c 'for i in {1..1000}; do cat /dev/null; done'
On my OS X machine that takes 1.5 sec and on Windows 26 sec ๐Ÿ˜„

@peff
peff commented Jan 18, 2016

@peff: I will try to create a git-core patch which allows to deactivate a git config value on git clone (e.g. git clone --unset-config filter.lfs.smudge <repository>). Maybe it gets accepted.

Unfortunately, I think you will find it quite difficult because of the way git's config code is structured. The config code reads the various config files and hands the values to an arbitrary callback in a streaming fashion. So you cannot "unset" a variable in the local config after it has been set in the user-wide config. You can only tell the callback "here is another value" and hope that it overwrites.

One thing that would probably work is to teach git -c, the per-invocation config override mechanism, to keep a blacklist of config keys and avoid passing them to the callbacks. That works because git -c config can be parsed before any of the other files (anything else would involve 2 passes over the config, or moving to a non-streaming system).

@technoweenie
Member

Do you see an issue with disabling the filter in the clone statement temporarily?

Are you suggesting a git lfs clone command to wrap that behavior up? I'm in favor of that. The current skip-smudge behavior is as good of a workaround as I can come up with for now. Maybe a clone command could do something strange like:

  1. Bare clone (no checkout, so filters are not used)
  2. Unset filter.lfs.* for the locally cloned repo
  3. Checkout to a working dir
  4. Run git lfs pull
  5. Undo local repo config from step 2.
@strich
Contributor
strich commented Jan 19, 2016

I would be for that - I'm currently manually doing those exact steps for our projects with new clones.

@peff
peff commented Jan 19, 2016

Anything that is clone-specific feels a bit like a hack. The real problem is "you have a lot of LFS files". So it's going to be an issue in other cases, too. E.g., init + fetch + checkout. Or just checkout between two distant points in history.

Rather than a socket interface, I wonder if git could "batch" files that are going to be fed to the same filter, start the filter once, and then feed them all over its stdin. Like a socket interface, that requires defining a totally new interface for the filters, but it's going to be a lot simpler and more portable than sockets.

@peff
peff commented Jan 19, 2016

The real problem is "you have a lot of LFS files".

Re-reading this, it looks like I might be implying that you're doing something wrong. You're not. I just mean "the problem is that we're moving from point A to B, and that there are a lot of different LFS files, so we have to kick off LFS a lot of times". The fact that point A is empty in a clone makes it a common place to run into this, but not the only one.

@larsxschneider
Contributor

@technoweenie @strich: see this gist for my current wrapper command: https://gist.github.com/larsxschneider/85652462dcb442cc9344

@peff: Thanks! I get your point ๐Ÿ˜„ ! Batching the filter sounds like the right solution. Although this is probably a bit more involved.

@larsxschneider larsxschneider referenced this issue in git-for-windows/build-extra Jan 20, 2016
@dscho dscho Add a script to generate the Portable Application
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
084ad98
@larsxschneider
Contributor

I looked into git config for an easy way to disable certain configs. As @peff already indicated, it is a lot more complicated than I initially thought. I made quite some progress to get git -c !filter.lfs.smudge clone <repository> working when I realized that git -c filter.lfs.smudge= -c filter.lfs.required=false clone <repository> is an equally good solution and it even works with the current Git. The problem with the latter solution is that Git still tries to execute the filter and prints an error for every processed file (which adds unnecessary time to the clone execution).

Printed error using Git on OS X/Linux:

error: cannot run : No such file or directory
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

error: cannot fork to run external filter
error: external filter  failed

Printed error using Git for Windows:

error: cannot spawn : No such file or directory
error: cannot fork to run external filter
error: external filter  failed

I submitted a patch to the Git mailing list which disables filters with empty strings (topic "convert: legitimately disable clean/smudge filter with an empty override"). With this patch the clone on OS X is as fast Test C (git-lfs uninstall). I haven't compiled/tested the solution on Windows but I expect a similar result.

With the current Git we can still use the approach and ignore the errors (Test B3):

git clone --config filter.lfs.smudge= --config filter.lfs.required=false <repository> 2>&1 | sed '/error: cannot run : No such file or directory/{N;N;N;N;N;N;d;}' | sed '/error: cannot spawn : No such file or directory/{N;N;d;}'
Test/time in sec OS X Windows
A GIT_LFS_SKIP_SMUDGE=1 210 2749
B1 --config filter.lfs.smudge=cat 33 432
B2 --config filter.lfs.smudge=true 29 229
B3 --config filter.lfs.smudge= --config filter.lfs.required=false 19 207
C git-lfs uninstall 4 37

This makes the clone execution again faster, especially on Windows (twice as fast as the previous cat solution). The only downside is that the clone progress is gone because of the redirect. I updated the LFS fast clone gist with this code.

@technoweenie: Do you think it would make sense to add a git lfs clone command that wraps the git clone using my gist?

@technoweenie
Member

Yes, this seems like a better idea than the skip smudge setting.

This was referenced Jan 25, 2016
@sinbad
Contributor
sinbad commented Feb 5, 2016

I'm having a go at this: sinbad@463abf5

Seems to work great for me, I just need to write some integration tests. I'm out of time today, will pick this up on Monday & submit a PR.

@larsxschneider
Contributor

@sinbad Thank you! It is great to see that this approach is natively adopted by Git-LFS! ๐Ÿ‘

@tyen901
tyen901 commented Mar 3, 2016

When might we see a release with 'git lfs clone' built in? It's the only thing stopping my team from using git lfs for our large projects.

@larsxschneider larsxschneider added a commit to Autodesk/enterprise-config-for-git that referenced this issue Apr 20, 2016
@larsxschneider larsxschneider improve adsk clone
The git-lfs binary is ~10MB and it takes quite some time to run it even if it does nothing due to the GIT_LFS_SKIP_SMUDGE flag. Ideally we would deactivate the LFS filter on the initial clone all together. The problem with this approach is that LFS is deactivated globally and that might cause trouble with parallel Git/Git-LFS processes. Unfortunately I was not able to run git clone with a deactivated config. The best I came up with was to set the LFS filter to the `cat` command.

Details:
git-lfs/git-lfs#931 (comment)
bbac5ee
@rjbell4
Contributor
rjbell4 commented Jun 17, 2016 edited

Hasn't this been resolved with the addition of the git lfs clone command? Or is there something still pending here? Enterprise-config-for-git still implements its own clone command: https://github.com/Autodesk/enterprise-config-for-git/blob/master/clone.sh

@larsxschneider
Contributor

@rjbell4 Yes, the git lfs clone command is the way to go. I haven't updated the the enterprise-config repo with our latest internal changes plus we do this internally ... I will tackle that soon.
BTW: are you actually using enterprise-config? If yes, then I would like to get in contact. It is not easily usable out of the box and I wonder if I can do something to improve that.

@rjbell4
Contributor
rjbell4 commented Jun 17, 2016

In the process of using enterprise-config, yes. (Other people here are driving it) I think we/they would absolutely love to be in contact.

@0day-ci 0day-ci pushed a commit to 0day-ci/git that referenced this issue Jun 27, 2016
@larsxschneider @fengguang larsxschneider + fengguang Native access to Git LFS cache
Hi,

I found a way to make Git LFS faster up to a factor of 100x in
repositories with a large number of Git LFS files. I am looking
for comments if my approach would be acceptable by the Git community.

## What is Git LFS?
Git LFS [1] is an extension to Git that handles large files for Git
repositories. The project gained quite some momentum as almost all major
Git hosting services support it (GitHub [1], Atlassian Bitbucket [2],
GitLab [4]).

## What is the problem with Git LFS?
Git LFS is an application that is executed via Git clean/smudge filter.
The process invocation of these filters requires noticeable time (especially
on Windows) even if the Git LFS executable only accesses its local cache.

Based on my previous findings [5] Steve Streeting (@sinbad) improved the
clone times of Git LFS repositories with a lot of files by a factor of 10
or more [6][7].

Unfortunately that fix helps only with cloning. Any local Git operation
that invokes the clean/smudge filter (e.g. switching branches) is still
slow. Even on the Git mailing list a user reported that issue [8].

## Proposed solution
Git LFS caches its objects under .git/lfs/objects. Most of the time Git
LFS objects are already available in the cache (e.g. if you switch branches
back and forth). I implemented these "cache hits" natively in Git.
Please note that this implementation is just a quick and dirty proof of
concept. If the Git community agrees that this kind of approach would be
acceptable then I will start to work on a proper patch series with cross
platform support and unit tests.

## Performance tests
I executed both test runs on a 2,5 GHz Intel Core i7 with SSD and OS X.
A test run is the consecutive execution of four Git commands:
 1. clone the repo
 2. checkout to the "removed-files" branch
 3. timed: checkout the "master" branch
 4. timed: checkout "removed-files" branch

Test command:
set -x; git lfs clone https://github.com/larsxschneider/lfstest-manyfiles.git repo; cd repo; git checkout removed-files; time git checkout master; time git checkout removed-files

I compiled Git with the following flags:
NO_GETTEXT=YesPlease NEEDS_SSL_WITH_CRYPTO=YesPlease make -j 8 CFLAGS="-I/usr/local/opt/openssl/include" LDFLAGS="-L/usr/local/opt/openssl/lib"

### TEST RUN A -- Default Git 2.9 (ab7797d) and Git LFS 1.2.1
+ git lfs clone https://github.com/larsxschneider/lfstest-manyfiles.git repo
Cloning into 'repo'...
warning: templates not found /Users/lars/share/git-core/templates
remote: Counting objects: 15012, done.
remote: Total 15012 (delta 0), reused 0 (delta 0), pack-reused 15012
Receiving objects: 100% (15012/15012), 2.02 MiB | 1.77 MiB/s, done.
Checking connectivity... done.
Checking out files: 100% (15001/15001), done.
Git LFS: (15000 of 15000 files) 0 B / 77.04 KB
+ cd repo
+ git checkout removed-files
Branch removed-files set up to track remote branch removed-files from origin.
Switched to a new branch 'removed-files'
+ git checkout master
Checking out files: 100% (12000/12000), done.
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.

real    6m2.979s
user    2m39.066s
sys 2m41.610s
+ git checkout removed-files
Switched to branch 'removed-files'
Your branch is up-to-date with 'origin/removed-files'.

real    0m1.310s
user    0m0.385s
sys 0m0.881s

### TEST RUN B -- Default Git 2.9 with native LFS cache and Git LFS 1.2.1
https://github.com/larsxschneider/git/tree/lfs-cache
+ git lfs clone https://github.com/larsxschneider/lfstest-manyfiles.git repo
Cloning into 'repo'...
warning: templates not found /Users/lars/share/git-core/templates
remote: Counting objects: 15012, done.
remote: Total 15012 (delta 0), reused 0 (delta 0), pack-reused 15012
Receiving objects: 100% (15012/15012), 2.02 MiB | 1.44 MiB/s, done.
Checking connectivity... done.
Git LFS: (15001 of 15000 files) 0 B / 77.04 KB
+ cd repo
+ git checkout removed-files
Branch removed-files set up to track remote branch removed-files from origin.
Switched to a new branch 'removed-files'
+ git checkout master
Checking out files: 100% (12000/12000), done.
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.

real    0m2.267s
user    0m0.295s
sys 0m1.948s
+ git checkout removed-files
Switched to branch 'removed-files'
Your branch is up-to-date with 'origin/removed-files'.

real    0m0.715s
user    0m0.072s
sys 0m0.672s

### Results
Default Git:                      6m2.979s + 0m1.310s = 364s
Git with native LFS cache access: 0m2.267s + 0m0.715s = 4s

The native cache solution is almost 100x faster when switching branches
on my local machine with a test repository containing 15,000 Git LFS files.
Based on my previous experience with Git LFS clone I expect even more
dramatic results on Windows.

Thanks,
Lars

[1] https://git-lfs.github.com/
[2] https://github.com/blog/1986-announcing-git-large-file-storage-lfs
[3] http://blogs.atlassian.com/2016/02/git-lfs-for-designers-game-developers-architects/
[4] https://about.gitlab.com/2015/11/23/announcing-git-lfs-support-in-gitlab/
[5] git-lfs/git-lfs#931 (comment)
[6] git-lfs/git-lfs#988
[7] https://developer.atlassian.com/blog/2016/04/git-lfs-12-clone-faster/
[8] http://article.gmane.org/gmane.comp.version-control.git/297809
a7647a4
@samskiter
samskiter commented Dec 2, 2016 edited

Having a lot of issues with this on Jenkins - anyone have any tips?

EDIT: Found this old issue on jenkins - looks like an outstanding issue there?

@ttaylorr
Member
ttaylorr commented Dec 2, 2016

Having a lot of issues with this on Jenkins - anyone have any tips?

Yes. Git LFS v1.5.0 implements support for the new process filter in Git v2.11.0. If you upgrade to both of those on your Jenkins instance, the time it takes to checkout will decrease by a factor of 80 or so (depending on platform/architecture).

@ttaylorr ttaylorr closed this Dec 2, 2016
@larsxschneider
Contributor

@samskiter The default Jenkins Git plugin does not take advantage of the git lfs clone command... therefore cloning repos with many LFS files takes ages. Either you perform the Git LFS commands as part of your Jenkins job yourself or you wait for this (I am working on it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment