Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: long running clean/smudge filter protocol #1382

Closed
wants to merge 1 commit into from

Conversation

larsxschneider
Copy link
Member

@larsxschneider larsxschneider commented Jul 18, 2016

This is my first WIP version of a Git/Git-LFS stream filter. Please keep in
mind that I have little go-lang knowledge and experience. Therefore
I would be happy to receive a very strict review to improve my go-lang
skills 😄 👍

What is the problem with Git LFS?

Git LFS is an application that is executed via Git clean/smudge filter.
The process invocation of these filters requires noticeable time (especially
on Windows). An individual filter process is required for every single file
that Git touches during its operations (e.g. checkout etc).

Proposed solution

Instead of a single Git LFS process per file, I propose a single Git LFS
process per Git invocation. That means Git invokes the filter process
(e.g. Git LFS) only once and then continuously talks to the same filter
process via a pipes.

You can find the corressponding WIP Git core implementation here:
https://github.com/larsxschneider/git/tree/filter-stream

Performance tests

I executed both test runs on a 2,5 GHz Intel Core i7 with SSD and OS X.
A test run is the consecutive execution of four Git commands:

  1. clone the repo
  2. checkout to the "removed-files" branch
  3. timed: checkout the "master" branch
  4. timed: checkout "removed-files" branch

Test command:

set -x; git lfs clone https://github.com/larsxschneider/lfstest-manyfiles.git repo; cd repo; git checkout removed-files; time git checkout master; time git checkout removed-files

I compiled Git with the following flags:

NO_OPENSSL=YesPlease APPLE_COMMON_CRYPTO=YesPlease NO_GETTEXT=YesPlease make -j 8

TEST RUN A -- Default Git 2.9 (ab7797d) and Git LFS 1.2.1

+ git lfs clone https://github.com/larsxschneider/lfstest-manyfiles.git repo
Cloning into 'repo'...
warning: templates not found /Users/lars/share/git-core/templates
remote: Counting objects: 15012, done.
remote: Total 15012 (delta 0), reused 0 (delta 0), pack-reused 15012
Receiving objects: 100% (15012/15012), 2.02 MiB | 1.77 MiB/s, done.
Checking connectivity... done.
Checking out files: 100% (15001/15001), done.
Git LFS: (15000 of 15000 files) 0 B / 77.04 KB
+ cd repo
+ git checkout removed-files
Branch removed-files set up to track remote branch removed-files from origin.
Switched to a new branch 'removed-files'
+ git checkout master
Checking out files: 100% (12000/12000), done.
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.

real    6m2.979s
user    2m39.066s
sys 2m41.610s
+ git checkout removed-files
Switched to branch 'removed-files'
Your branch is up-to-date with 'origin/removed-files'.

real    0m1.310s
user    0m0.385s
sys 0m0.881s

TEST RUN B -- Git and Git LFS with stream filter support

+ git lfs clone https://github.com/larsxschneider/lfstest-manyfiles.git repo
Cloning into 'repo'...
warning: templates not found /Users/lars/share/git-core/templates
remote: Counting objects: 15012, done.
remote: Total 15012 (delta 0), reused 0 (delta 0), pack-reused 15012
Receiving objects: 100% (15012/15012), 2.02 MiB | 1.30 MiB/s, done.
Checking connectivity... done.
Git LFS: (15000 of 15000 files) 0 B / 77.04 KB
+ cd repo
+ git checkout removed-files
Branch removed-files set up to track remote branch removed-files from origin.
Switched to a new branch 'removed-files'
+ git checkout master
Checking out files: 100% (12000/12000), done.
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.

real    0m2.528s
user    0m0.209s
sys 0m1.602s
+ git checkout removed-files
Switched to branch 'removed-files'
Your branch is up-to-date with 'origin/removed-files'.

real    0m2.280s
user    0m0.066s
sys 0m0.637s

Results

Default Git:                      6m2.979s + 0m1.310s = 364s
Git and Git LFS with stream filter support: 0m2.528s + 0m2.280s = 5s

The Git stream filter solution is almost ✨ 70x faster ✨ when switching branches
on my local machine with a test repository containing 12,000 Git LFS files.
Based on my previous experience with Git LFS clone I expect even more
dramatic results on Windows.

Next Steps

  1. Make Travis-CI tests pass (anyone an idea what is wrong with the "clone with submodules" test 1a24a7c ?)
  2. Make the pipe protocol more robust against errors (e.g. by adding ACK messages). Do you have other suggestions for the protocol?
  3. Cleanup Git-core patch, propose patch to mailing list (/cc heads up @peff).
  4. Cleanup code duplication in command_smudge.go/command_clean.go and command_filter.go.

Questions

  1. Would you be OK with this approach in general? /cc @technoweenie @sinbad @ttaylorr
  2. How should I handle integration tests? Git LFS would need to support and test both protocols ("per file" and "filter stream"). I was thinking about running all integration tests twice with different Git filter configs.

Thanks,
Lars

esac;
make install;
export PATH=$PWD:$PATH;
cd ..;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Travis-CI changes won't be in the final patch, of course.

@sinbad
Copy link
Contributor

sinbad commented Jul 18, 2016

This is going to be great, it will remove the need for more lfs-specific commands and eventually could lead to us deprecating git lfs clone for recent git versions.

I've started a few comments but haven't finished reviewing yet, will pick up again tomorrow.

for {
buf := make([]byte, 4)
readBytes, err := reader.Read(buf)
if readBytes == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you checking whether or not you're at the end of stdin? Checking this may be more appropriate:

if _, err := reader.Read(buf); err == io.EOF {
        return
}

// <snip>

@ttaylorr
Copy link
Contributor

I like this approach. This will definitely be a huge speed gain for what are currently pretty slow operations under certain scenarios.

I made a few comments on some the code above, mainly boiling down to a few suggestions:

  1. A InputDataHdr struct
  2. Using binary.Read instead of buffering and calling binary.LittleEndian.Uint32(...)
  3. A Processor type

I think that the header/data idea is a good one for this protocol. I am wondering, however, about the benefits of implementing a multiplexed chunked protocol. Would it make sense to interleave the data that was being sent across these file descriptors? I am not entirely sure.

On one hand, this would allow the clean and smudge filters to start processing one file before they had finished others, which would enable increased parallelism.

On the other hand, perhaps this sort of optimization is not necessary. One concern with this approach is the additional complexity that would be incurred by this sort of muxing. I am thinking in particular of the approach that is implemented in the RTMP protocol, which is certainly complex. The relevant parts of the documentation can be found here, and a reference implementation that I wrote in Go can be found here (docs). There is far more going on in that chunk package than we would implement in LFS, but it'd still be more than we have now.

Your thoughts?

@larsxschneider
Copy link
Member Author

@sinbad and @ttaylorr: Thanks a lot for your feedback!

Re: Multiplexing
Multiplexing would considerably complicate the protocol and is therefore
more error prone. Plus, with the Git clean/smudge interface that we have
today, the parallelism wouldn't buy us anything as Git processes the
files sequentially anyway. However, I have some vague ideas how we could
approach that. Therefore, I plan to define a "protocol version" field
that allows Git to support different filter protocols in the future.

if fileNameLen > 0 {
buf := make([]byte, fileNameLen)
readLen, err := r.Read(buf)
if err != nil || readLen != int(fileNameLen) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I would split out these two cases individually. Returning the error if it's != nil would be the first, and the unexpected EOF would be the second. This makes things a little clearer since we can get an "error" when the read succeeded, just read less than we wanted. I'm thinking:

if readLen, err := r.Read(buf); err != nil {
        return errutil.Errorf(err, "Unexpected error")
} else if readLEn != int(fileNameLen) {
        return fmt.Errorf("unexpected EOF when reading file (got %d, wanted %d)", readLen, fileNameLen)
}

@@ -733,15 +733,18 @@ func CloneWithoutFilters(flags CloneFlags, args []string) error {
// not working. You can get around that with https://github.com/kr/pty but that
// causes difficult issues with passing through Stdin for login prompts
// This way is simpler & more practical.
filterOverride := ""
filterDriverOverride := ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty variable initializations are typically done in go using the var block. I would write this like so:

var (
        filterDriverOverride bool
        smudgeFilterOverride string
)

and then use the appropriate fmt verbs to turn the above string and bool into cmdargs down below 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 👍 9aebb9c

Although "false" is actually a string because I am referring to the bash built in:
http://tldp.org/LDP/abs/html/internal.html

false: A command that returns an unsuccessful exit status, but does nothing else.

@larsxschneider
Copy link
Member Author

relevant to Git LFS in general:
I am writing tests for the Git core side of the protocol and I discovered that Git calls clean way more than necessary: http://thread.gmane.org/gmane.comp.version-control.git/300028

lfs.InstallHooks(false)

reader := bufio.NewReader(os.Stdin)
writer := bufio.NewWriter(os.Stdout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stdio buffering should already be handled by the kernel, is there a reason we're buffering stdout here?

@rubyist
Copy link
Contributor

rubyist commented Jul 22, 2016

It'll be pretty awesome if this makes it into git! I have two primary critiques so far. I think there's too much use of Panic() in deeper parts of the code, and the protocol parsing is happening in multiple places. I don't think the InputFileHdr is quite the right abstraction and the Read method on it is a little awkward. As it is, this parsing code would be very difficult to unit test.

I think something similar to bufio.Scanner would look pretty clean here, leaving us with something like this:

func filterCommand() {
    // ...

    scanner := NewObjectScanner(os.Stdin)
    for scanner.Scan() {
        obj := scanner.Object()
        r := obj.Reader()

        // ...

        switch obj.Command {
        case cmdClean:
            clean(r, obj.Name)
        case cmdSmudge:
            smudge(r, obj.Name)
        }
    }

    if err := scanner.Err(); err != nil {
        // ...
    }

    // write output
}

Here's a gist with a quick pass at an implementation.

@ttaylorr
Copy link
Contributor

Oh snap, that's way better. I like the idea of an object scanner, that puts the parsing implementation in the right place I think. Still not sold on the switch block, there may be some other abstraction that we can reach for, but I think this is awesome.

larsxschneider added a commit to larsxschneider/git that referenced this pull request Oct 7, 2016
Git's clean/smudge mechanism invokes an external filter process for
every single blob that is affected by a filter. If Git filters a lot of
blobs then the startup time of the external filter processes can become
a significant part of the overall Git execution time.

In a preliminary performance test this developer used a clean/smudge
filter written in golang to filter 12,000 files. This process took 364s
with the existing filter mechanism and 5s with the new mechanism. See
details here: git-lfs/git-lfs#1382

This patch adds the `filter.<driver>.process` string option which, if
used, keeps the external filter process running and processes all blobs
with the packet format (pkt-line) based protocol over standard input and
standard output. The full protocol is explained in detail in
`Documentation/gitattributes.txt`.

A few key decisions:

* The long running filter process is referred to as filter protocol
  version 2 because the existing single shot filter invocation is
  considered version 1.
* Git sends a welcome message and expects a response right after the
  external filter process has started. This ensures that Git will not
  hang if a version 1 filter is incorrectly used with the
  filter.<driver>.process option for version 2 filters. In addition,
  Git can detect this kind of error and warn the user.
* The status of a filter operation (e.g. "success" or "error) is set
  before the actual response and (if necessary!) re-set after the
  response. The advantage of this two step status response is that if
  the filter detects an error early, then the filter can communicate
  this and Git does not even need to create structures to read the
  response.
* All status responses are pkt-line lists terminated with a flush
  packet. This allows us to send other status fields with the same
  protocol in the future.

Helped-by: Martin-Louis Bright <mlbright@gmail.com>
Reviewed-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
larsxschneider added a commit to larsxschneider/git that referenced this pull request Oct 8, 2016
Git's clean/smudge mechanism invokes an external filter process for
every single blob that is affected by a filter. If Git filters a lot of
blobs then the startup time of the external filter processes can become
a significant part of the overall Git execution time.

In a preliminary performance test this developer used a clean/smudge
filter written in golang to filter 12,000 files. This process took 364s
with the existing filter mechanism and 5s with the new mechanism. See
details here: git-lfs/git-lfs#1382

This patch adds the `filter.<driver>.process` string option which, if
used, keeps the external filter process running and processes all blobs
with the packet format (pkt-line) based protocol over standard input and
standard output. The full protocol is explained in detail in
`Documentation/gitattributes.txt`.

A few key decisions:

* The long running filter process is referred to as filter protocol
  version 2 because the existing single shot filter invocation is
  considered version 1.
* Git sends a welcome message and expects a response right after the
  external filter process has started. This ensures that Git will not
  hang if a version 1 filter is incorrectly used with the
  filter.<driver>.process option for version 2 filters. In addition,
  Git can detect this kind of error and warn the user.
* The status of a filter operation (e.g. "success" or "error) is set
  before the actual response and (if necessary!) re-set after the
  response. The advantage of this two step status response is that if
  the filter detects an error early, then the filter can communicate
  this and Git does not even need to create structures to read the
  response.
* All status responses are pkt-line lists terminated with a flush
  packet. This allows us to send other status fields with the same
  protocol in the future.

Helped-by: Martin-Louis Bright <mlbright@gmail.com>
Reviewed-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitster pushed a commit to git/git that referenced this pull request Oct 10, 2016
Git's clean/smudge mechanism invokes an external filter process for
every single blob that is affected by a filter. If Git filters a lot of
blobs then the startup time of the external filter processes can become
a significant part of the overall Git execution time.

In a preliminary performance test this developer used a clean/smudge
filter written in golang to filter 12,000 files. This process took 364s
with the existing filter mechanism and 5s with the new mechanism. See
details here: git-lfs/git-lfs#1382

This patch adds the `filter.<driver>.process` string option which, if
used, keeps the external filter process running and processes all blobs
with the packet format (pkt-line) based protocol over standard input and
standard output. The full protocol is explained in detail in
`Documentation/gitattributes.txt`.

A few key decisions:

* The long running filter process is referred to as filter protocol
  version 2 because the existing single shot filter invocation is
  considered version 1.
* Git sends a welcome message and expects a response right after the
  external filter process has started. This ensures that Git will not
  hang if a version 1 filter is incorrectly used with the
  filter.<driver>.process option for version 2 filters. In addition,
  Git can detect this kind of error and warn the user.
* The status of a filter operation (e.g. "success" or "error) is set
  before the actual response and (if necessary!) re-set after the
  response. The advantage of this two step status response is that if
  the filter detects an error early, then the filter can communicate
  this and Git does not even need to create structures to read the
  response.
* All status responses are pkt-line lists terminated with a flush
  packet. This allows us to send other status fields with the same
  protocol in the future.

Helped-by: Martin-Louis Bright <mlbright@gmail.com>
Reviewed-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
larsxschneider added a commit to larsxschneider/git that referenced this pull request Oct 16, 2016
Git's clean/smudge mechanism invokes an external filter process for
every single blob that is affected by a filter. If Git filters a lot of
blobs then the startup time of the external filter processes can become
a significant part of the overall Git execution time.

In a preliminary performance test this developer used a clean/smudge
filter written in golang to filter 12,000 files. This process took 364s
with the existing filter mechanism and 5s with the new mechanism. See
details here: git-lfs/git-lfs#1382

This patch adds the `filter.<driver>.process` string option which, if
used, keeps the external filter process running and processes all blobs
with the packet format (pkt-line) based protocol over standard input and
standard output. The full protocol is explained in detail in
`Documentation/gitattributes.txt`.

A few key decisions:

* The long running filter process is referred to as filter protocol
  version 2 because the existing single shot filter invocation is
  considered version 1.
* Git sends a welcome message and expects a response right after the
  external filter process has started. This ensures that Git will not
  hang if a version 1 filter is incorrectly used with the
  filter.<driver>.process option for version 2 filters. In addition,
  Git can detect this kind of error and warn the user.
* The status of a filter operation (e.g. "success" or "error) is set
  before the actual response and (if necessary!) re-set after the
  response. The advantage of this two step status response is that if
  the filter detects an error early, then the filter can communicate
  this and Git does not even need to create structures to read the
  response.
* All status responses are pkt-line lists terminated with a flush
  packet. This allows us to send other status fields with the same
  protocol in the future.

Helped-by: Martin-Louis Bright <mlbright@gmail.com>
Reviewed-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitster pushed a commit to git/git that referenced this pull request Oct 17, 2016
Git's clean/smudge mechanism invokes an external filter process for
every single blob that is affected by a filter. If Git filters a lot of
blobs then the startup time of the external filter processes can become
a significant part of the overall Git execution time.

In a preliminary performance test this developer used a clean/smudge
filter written in golang to filter 12,000 files. This process took 364s
with the existing filter mechanism and 5s with the new mechanism. See
details here: git-lfs/git-lfs#1382

This patch adds the `filter.<driver>.process` string option which, if
used, keeps the external filter process running and processes all blobs
with the packet format (pkt-line) based protocol over standard input and
standard output. The full protocol is explained in detail in
`Documentation/gitattributes.txt`.

A few key decisions:

* The long running filter process is referred to as filter protocol
  version 2 because the existing single shot filter invocation is
  considered version 1.
* Git sends a welcome message and expects a response right after the
  external filter process has started. This ensures that Git will not
  hang if a version 1 filter is incorrectly used with the
  filter.<driver>.process option for version 2 filters. In addition,
  Git can detect this kind of error and warn the user.
* The status of a filter operation (e.g. "success" or "error) is set
  before the actual response and (if necessary!) re-set after the
  response. The advantage of this two step status response is that if
  the filter detects an error early, then the filter can communicate
  this and Git does not even need to create structures to read the
  response.
* All status responses are pkt-line lists terminated with a flush
  packet. This allows us to send other status fields with the same
  protocol in the future.

Helped-by: Martin-Louis Bright <mlbright@gmail.com>
Reviewed-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@larsxschneider
Copy link
Member Author

Junio merged the Git core patch series required for this PR to next ( git/git@ffd0de0 ) and it is scheduled for master ( http://public-inbox.org/git/xmqqk2d2ein7.fsf@gitster.mtv.corp.google.com/ ) 🎉 🎉 🎉 .
I'll update this PR with a working version soon, so that we can discuss how to use the new Git filter protocol feature in the most efficient way.

@sinbad
Copy link
Contributor

sinbad commented Oct 24, 2016

Woooo, amazing work Lars. 🤘

@ttaylorr
Copy link
Contributor

@larsxschneider great to hear. I'm looking forward to seeing your up-to-date version of this PR once it's ready. I'll be around to provide any help or answer any questions that you might have.

@technoweenie
Copy link
Contributor

@larsxschneider Congrats, great work!

@larsxschneider
Copy link
Member Author

I didn't had much time today and I won't have too much time in the next 1.5 weeks. That's why I push a very rough versions here. A few thoughts:

  1. The implementation as it is right now would smudge files from the Git LFS cache or download files if they are not in the cache. That works but we cannot take advantage of parallel downloads. Ideally I would extend the filter protocol in a way that a filter can tell Git "I can't process this file just yet, ask me later again". However, this wouldn't be straight forward in Git core and the filter protocol change was already pretty large. Until the filter protocol is improved that way we could use a trick, though. If Git asks Git LFS for a file that is not in cache, then Git LFS could return the pointer file as-is and download the file in background (maybe even batched with other files). When Git shuts down, then it sends an EOF to Git LFS via pipe and waits until Git LFS terminates. In this step Git LFS could finish the downloads and replace the pointer files with the actual content. This should give us git lfs clone like speed and we could get rid of the wrapper commands. Credit: I think @sinbad posted that idea somewhere but I can't find the link.
  2. ReadRequest reads the entire content into a []byte which is then passed to the clean/smudge functions. I would rather pass a Reader to clean/smudge.
  3. clean/smudge returns a []byte which is then written to the pipe in PktLine format using WriteResponse. I would rather pass a Writer to clean/smudge.
  4. clean/smudge functions in command_filter.go are pretty much the same as the ones in command_clean.go and command_smudge.go. I want to reuse code here.
  5. The process filter takes precedence over clean/smudge filter. That's nice as it makes the filter update for the user very easy. However, I would like to ensure in the tests that Git versions > 2.10 actually use the process filter. I am thinking in the direction of an env variable (see GIT_LFS_USE_LEGACY_FILTER) to control that. Any good idea how to do that in a nice way?
  6. Unit test are missing for git/git_filter_protocol.go

for _, pair := range requestList {
v := strings.Split(pair, "=")
requestMap[v[0]] = v[1]
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peff I finally grokked your "parse as dictionary" comment completely. As you have already recognized it parses this nicely:

packet:          git> command=smudge
packet:          git> pathname=path/testfile.dat

But not this:

packet:          git> capability=clean
packet:          git> capability=smudge

... I wonder what you think about this?

packet:          git> clean=capability
packet:          git> smudge=capability

However, it's probably too late to change it. I guess in the end it is not that important and I don't want to annoy Junio with this kind of late change now that the code is in next.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's always nice to be vindicated eventually. ;)

If you feel there's room for improvement in the protocol, I don't think being in next is too late. You are welcome to submit patches on top, and nothing is cemented until the feature is in a released version of git.

It's up to you whether you think the change is worth making on top.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's up to you whether you think the change is worth making on top.

To clarify: this is whether you think it's worth your time in dealing with the patch and any additional review.

From the maintainer's perspective, I think "I implemented a protocol, and now that it is close to cemented, I was fleshing out the other end of the protocol and realized there are some deficiencies" seems like a perfectly good reason to add more patches to the original topic.

@technoweenie
Copy link
Contributor

@larsxschneider 👍 on your 6 points. Regarding the first one:

When Git shuts down, then it sends an EOF to Git LFS via pipe and waits until Git LFS terminates.

How is this handled in the filter? Would it be after the for loop exits, just before filterCommand() returns?

Copy link
Contributor

@technoweenie technoweenie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to do a full review, but I think Taylor or I will start tackling items 2-6 in your list. I want us to start with the protocol tests first, make some of the interface changes, and then end up with a working version. I think adding the bg downloading should be done in a separate PR after this is a functioning filter.

So, enjoy my single, totally super important review comment :trollface:

)

// Private function copied from "github.com/xeipuuv/gojsonschema/utils.go"
// TODO: Is there a way to reuse this?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it's private in gojsonschema too. A little copied code isn't too bad :)

@ttaylorr
Copy link
Contributor

Unit test are missing for git/git_filter_protocol.go

I'm working on this right now and will post a PR to merge into this one as soon as I have the tests fleshed out! 🤘

@sinbad
Copy link
Contributor

sinbad commented Oct 26, 2016

Short of time this week but I totally agree with your "trick" in point 1 (I think we discussed it before as you said) - just smudging to the pointer in serial and doing the actual download in batch/parallel in the background, gated on the termination at the end seems like the best approach. The one Q I had outstanding about that is whether the stat info in the index would then be out of date and would need a git update-index to avoid files being displayed as modified.

@larsxschneider
Copy link
Member Author

@sinbad I haven't worked with git update-index, yet. Here is what would need to happen:

  1. Files that GitLFS does not have in cache would end up as LFS pointer files in the working tree
  2. After Git is done, GitLFS exchanges the pointer files with the actual content. I guess then we need to tell Git that the new content is OK and the working tree clean. That's what git update-index is for ?!

@sinbad
Copy link
Contributor

sinbad commented Oct 27, 2016

@larsxschneider yeah what I'm not sure about is exactly when Git updates the index for a file; it must be after the smudge filter is run so the size & date are correct, but I don't know if it does it immediately after calling the filter, or at the end of the entire checkout. If it can do it at any time before LFS replaces the pointer file with the real content then the stat in the index will be out of date and probably a git update-index will be needed to avoid it appearing to be modified.

@larsxschneider
Copy link
Member Author

I close this PR as the work is continued in #1617

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants