Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter Protocol Support #1617

Merged
merged 90 commits into from
Nov 11, 2016
Merged

Filter Protocol Support #1617

merged 90 commits into from
Nov 11, 2016

Conversation

ttaylorr
Copy link
Contributor

@ttaylorr ttaylorr commented Nov 2, 2016

This pull-request is a continuation of #1382, originally authored by @larsxschneider.

It should serve as a place to merge in all of PRs in the filter-stream series (see description below), and then be merged into master as one huge thing. I'm suggesting this since the PR as of 46a8ee5 is incomplete, and not in a state that I would feel comfortable landing into master.

In terms of the PR series, here's what I'm thinking:

  1. Filter Protocol Support #1617: Initial implementation of the filter protocol from @larsxschneider, that's this PR.
  2. filter_stream: Integration tests #1618: Integration tests to make sure that all subsequent PRs keep working
  3. filter_stream: Protocol unit tests #1619: Unit tests around the protocol reading/writing to verify that its behavior is correct.
  4. filter_stream: Make ObjectScanner behave like a Scanner #1620: Make the ObjectScanner behave like a Scanner.
  5. filter_stream: provide io.Reader implementation #1621: An io.Reader implementation for reading data off of the filter stream, and wiring that io.Reader up throughout the code
  6. filter_stream: provide io.Writer implementation #1622: An io.Writer implementation for writing streams of data to the filter stream, and wiring that up
  7. [pending]: A cleanup PR to rename types, document anything that I missed, and generally apply some 💅 EDIT: this was done in place

/cc @larsxschneider @technoweenie @sinbad @rubyist @sschuberth

@ttaylorr ttaylorr added the 1.5.0 label Nov 2, 2016
@ttaylorr ttaylorr added this to the Filter protocol support milestone Nov 2, 2016
// instead of a pointer.
//
// TODO(taylor): figure out if there is more data on the reader,
// and buffer that as well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I understand that 100% but I think according to the filter protocol there can't be more data on the reader. Because we need to acknowledge the data with a status packet before Git sends more data, no?

Do you want to remove or address this TODO before merge?

Copy link
Contributor Author

@ttaylorr ttaylorr Nov 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing in meat-space, the consensus was there will be no more data on the io.Reader coming from Git, since we will have decoded the entire pointer (or at least, tried to) by this point, so I went ahead and removed the TODO.

@larsxschneider can clarify anything that I left out.

EDIT: Went ahead and removed the TODO note in 480ec1b.

}

func filterCommand(cmd *cobra.Command, args []string) {
requireStdin("This command should be run by the Git filter process")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: This seems not to work on Windows (same is true for the legacy clean/smudge functions). However, I don't think that should be addressed in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome catch. I agree that this is out of scope of the current PR, so I opened #1639 to address.


if err := s.Err(); err != nil && err != io.EOF {
ExitWithError(err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.Err() == io.EOF is a legitimate signal from Git telling GitLFS that it should shut down. Maybe we should document/make this more clear here?

More importantly: If we get an error during w.Flush() in L101 then we wouldn't exit with error here, right?

if err != nil {
ptr.Encode(to)
// Download declined error is ok to skip if we weren't requesting download
if !(errors.IsDownloadDeclinedError(err) && !download) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we get a download error if we don't request a download?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This happens since we treat downloading a file excluded with the -X flag as optional. If we tried to download the file (i.e., repository was public), we just won't do the smudge in order to speed up the checkout. If the repo was private and we tried to download, we could get an error due to missing credentials, but that's safe to ignore, since we don't need the contents of the file anyway.

/refs: 417fd73 #759

@@ -49,42 +97,18 @@ func smudgeCommand(cmd *cobra.Command, args []string) {
Exit(err.Error())
}

stat, err := os.Stat(localPath)
if err != nil {
if stat, err := os.Stat(localPath); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: What is the advantage of this style over the other?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things: It's 1 less line of code, and the err value is scoped to the if stat, err block.

"smudge": []string{"git-lfs smudge %f"},
"clean": []string{"git-lfs clean %f"},
"smudge": []string{"git-lfs smudge %f"},
"process": []string{"git-lfs filter"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: Shouldn't that be filter-process here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This is in the Upgradeables section of the setup code, meaning it's legit to overwrite this value so long as it matches one of the values in that slice. Since there was code publicly available during the course of this PR that referred to git-lfs filter instead of git-lfs filter-process, I'm going to keep this.

"smudge": []string{"git-lfs smudge --skip %f"},
"clean": []string{"git-lfs clean %f"},
"smudge": []string{"git-lfs smudge --skip %f"},
"process": []string{"git-lfs filter --skip"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: Shouldn't that be filter-process here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Once 2.11 is released, replace this with:
#
# ```
# ensure_git_version_isnt $VERSION_LOWER "2.11.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should teach ensure_git_version the g format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thought. Since adding support for the git-describe format would complicate this function (and I'm not planning on gating any more LFS features behind Git versions other than those released), I'm going to forgo this change for now, unless we need something like this in the future.


diff -u <(echo "$expected") <(echo "$got")
)
end_test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer it if we set clean and smudge to false here to ensure they are not used.


printf "$smudge" | grep "git-lfs smudge"
printf "$clean" | grep "git-lfs clean"
printf "$filter" | grep "git-lfs filter"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/lfs filter/lfs filter-process/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, ha! Fixed it right up in: 4ae8afc.

@larsxschneider
Copy link
Member

The PR looks really good from my point of view. During my review I found 2 important problems that I think need to be addressed before merge and a couple of nits that could be/should be addressed.

(Yeah, I should have used the "Request changes" button instead of "Approve" ... but well 😄 )

Copy link
Contributor

@technoweenie technoweenie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work 🤘

@ttaylorr ttaylorr self-assigned this Nov 10, 2016
// Truncate "p" such that it no longer includes the data that we
// have in the internal buffer.
p = p[m:]
w.buf = append(w.buf, p[n:n+m]...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what clean/smudge operations triggered this bug with your special repo? Can we extract that case and add it to the integration tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. The bug was actually not here, but in the io.Reader, as patched in: 021511a. I'm not sure an integration test is appropriate for that level of detail, but the unit tests (specifically this one) do assert for that behavior.

As for the change in the io.Writer implementation, this is due to a contract that I failed to uphold when originally implementing this type:

Write must not modify the slice data, even temporarily.

Implementations must not retain p.

(see: https://golang.org/pkg/io#Writer).

That behavior is tested in this unit test, which would fail if the underlying dataums b1 and b2 were modified.

@ttaylorr ttaylorr merged commit 4935486 into master Nov 11, 2016
@ttaylorr ttaylorr deleted the filter-stream branch November 11, 2016 21:08
@ttaylorr
Copy link
Contributor Author

Thanks everyone! 🎉

@eminence
Copy link

Tremendous work!

lucamilanesio pushed a commit to GerritCodeReview/git-repo that referenced this pull request May 29, 2019
During sync, repo runs `git read-tree --reset -u -v HEAD` which causes
git-lfs's smudge filter to run, which fails because git-lfs does not
work with bare repositories.

This was fixed in I091ff37998131e2e6bbc59aa37ee352fe12d7fcd to
automatically disable this smudge filter. However, later versions of
Git (2.11.0) introduced a new filter protocol [1], to avoid spawning
a new command for each filtered file. This was implemented in Git-LFS
1.5.0 [2].

This patch fixes the issue by setting the git lfs process filter, in
addition to the smudge filter. For any projects that have LFS objects,
`git lfs pull` must still be executed manually afterwards.

[1] git/git@edcc858
[2] git-lfs/git-lfs#1617

Bug: https://crbug.com/gerrit/10911
Change-Id: I277fc68fdefc91514a2412b3887e3be9106cab48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants