New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lfs detection inconsistent #1057

Closed
dnewmarch opened this Issue Mar 3, 2016 · 16 comments

Comments

Projects
None yet
3 participants
@dnewmarch
Copy link

dnewmarch commented Mar 3, 2016

I'm using Ubuntu 14.04, git version 1.9.1, git-lfs/1.1.1 (GitHub; linux amd64; go 1.5.3).
I seem to be able to create a situation where a committed binary file doesn't get picked up by lfs when lfs tracking is added later. I've used the following series of commands to demonstrate the issue:

git init git-lfs-bug
cd git-lfs-bug
echo 0 > lfs.bin
git add lfs.bin
git commit -m "add binary file"
git lfs track *.bin
git add .gitattributes
git commit -m "track with lfs"

After this series of commands I expect that the lfs.bin file should be identified as modified, because it is matched by the git lfs tracking pattern, but the pointer file hasn't been committed yet. However, when I run git status it says there are no modifications. And if I run gitk it shows two commits, and no modified files. Interestingly, gitk shows the first commit as adding a pointer file, which is also incorrect.

One last detail is that I tried adding the above commands to a shell script and running them that way, but when run from the script the problem did not happen, ie. git status reports lfs.bin as modified and wants me to commit the pointer file.

@technoweenie

This comment has been minimized.

Copy link
Member

technoweenie commented Mar 3, 2016

First, use double quotes when tracking paths or extensions: git lfs track "*.bin". Otherwise, your shell may expand that to all of the *.bin files in the local directory. If you have none in there, it won't track anything.

Git LFS works by intercepting the git add command through a "clean" filter that is run on file patterns in .gitattributes. Since *.bin wasn't in .gitattributes when you added lfs.bin, it gets added to the repository as a normal git blob.

@dnewmarch

This comment has been minimized.

Copy link
Author

dnewmarch commented Mar 3, 2016

Yep, fair call on the double quotes. The .gitattributes does indeed just have just lfs.bin tracked.
However, it is the point that lfs.bin is initially add as a normal git blob. My understanding of how lfs is supposed to work is still evolving, so I may have an unrealistic expectation, but I was expecting that when I add an entry to the .gitattributes that matches a file already in the repo it will identify that file as modified because it has replaced it with a pointer file. But this doesn't appear to happen.
This has caused me confusion due to the incorrect history reported by gitk and inconsistent git status.

@sinbad

This comment has been minimized.

Copy link
Contributor

sinbad commented Mar 3, 2016

Unfortunately when git tries to figure out whether a file is changed, the first thing it does is check the last record it had in the index, checking the file size on disk and the timestamp. Provided the index timestamp is later and the file size is still the same, it won't run the clean filter on the file to check if it's now different because of filter configuration change. Unfortunately there's nothing we can do about that, unstaging the file would have triggered it if all you've done is add it & not committed yet.

If you'd already committed, you wouldn't save any space in your git repo by re-committing the file as-is anyway, even though it would turn into a pointer, because the original content is already in git verbatim (from a previous commit). If you want to change that & rewrite history so that the binaries are not in git any more, take a look at https://github.com/rtyley/bfg-repo-cleaner/releases/tag/v1.12.5. Be aware that because this changes history so will require all users of this repo to re-clone it, so take all the usual precautions like keeping a backup and making sure everyone knows a disruptive change is going to happen to the repo.

@sinbad sinbad closed this Mar 3, 2016

@dnewmarch

This comment has been minimized.

Copy link
Author

dnewmarch commented Mar 3, 2016

Firstly, I just wanted to say thanks for the responses, I really appreciate both of your time.
Thanks for the advice regarding bfg, and the value of recommitting the binary as a pointer, that is good advice, but I am aware of both of those things and I really only want to figure out if git lfs can present a more consistent state.
I guess I can see that the detection is more difficult than I had realised, and perhaps that's just something to be aware of.
I'm probably more concerned about gitk incorrectly reporting the first commit as being a pointer commit when it wasn't.
I also have another inconsistent scenario which is what led me to investigate this in the first place. If I perform the above steps, and then follow it up with these steps:

git rm lfs.bin
git commit -m "remove file"
git checkout HEAD~1

when I run git status lfs.bin shows as modified. In this state, I can't stash the file, or reset to remove the modification, so my only options to get back to master are to either commit it or do force checkout. For a new user to git lfs, this seems like a pretty awkward situation to be in, even if the behaviour is legitimate and there is no way around it.

@sinbad

This comment has been minimized.

Copy link
Contributor

sinbad commented Mar 3, 2016

The problem is that you're in the 'middle ground' where your .gitattributes represents one setup (that this file must go through a clean/smudge filter and be stored as a pointer) and the state of the index / HEAD disagrees with that because the latest version of that file was committed without a filter. In order to be consistent, these things must agree - and usually, they do. The problems you're having are entirely related to trying to bridge the gap between being stored in LFS and not being stored in LFS. To reset that file for example, you would actually have to delete the entry in .gitattributes as well, because otherwise git would apply the filter during the reset/checkout (it has no concept that .gitattributes is different to when that file was last committed - it's not that smart).

So yes I can entirely see your issues with this ambiguity, but you should be aware that it would only occur in this awkward transition state that you have right now. Once you've re-committed the file again as an LFS pointer, crucially with the .gitattributes file change at the same time, checkouts at this commit and before will work. But, generally speaking it's still better to add new files to LFS or convert the history rather than leave this transition point, because when you're sitting right on the fence there are some edge cases. But those go away once you're off this boundary, they're not a persistent issue. I imagine if you did the LFS conversion piecemeal, per file, and without rewriting history you'd create more problems for yourself though.

@dnewmarch

This comment has been minimized.

Copy link
Author

dnewmarch commented Mar 3, 2016

Thanks again for your detailed explanation.
I guess I did feel as though this was an edge case, and I knew that it was the result of less than ideal use of lfs. And I now understand much better why it has occurred and my opportunities for dealing with it. It's just a bit of a nasty edge case because it's the sort of thing that is likely to bite new users who can't justify rewriting history while transitioning to using lfs, which I imagine will be many as lfs use rises.
Is there a way to force running of the filter to make sure all candidates are converted to pointer files at the time of adding the .gitattributes file?
Finally, do you have an explanation for why the incorrect history being reported for the first commit.
Thanks

@sinbad

This comment has been minimized.

Copy link
Contributor

sinbad commented Mar 3, 2016

Is there a way to force running of the filter to make sure all candidates are converted to pointer files at the time of adding the .gitattributes file

If you're not planning on converting history using BFG, there's not much point re-committing all the binary files again, converting them to pointer files. It won't save you any space and doesn't necessarily need to be done. Just commit the tracking entry in .gitattributes and next time the file is changed, the content will be externalised. Checkouts of history will continue to work because the old version of .gitattributes didn't have the filter installed. However, you may get some funkiness in some history commands because of .gitattributes being a snapshot at your current checkout (I suspect this is your issue in gitk's history). You're going to get that regardless if you don't BFG convert, although maybe recommitting all might help some of those cases to decrease the number of cross-over points.

If you do want to re-commit, just explicitly running git add on all the files you track should let you do that. You can probably also force the status refresh using git update-index --refresh <path> on the affected files. I still think it's better generally to either use LFS for new files or to convert history, personally, as disruptive as that is.

@dnewmarch

This comment has been minimized.

Copy link
Author

dnewmarch commented Mar 4, 2016

I was nearly going to leave the issue, but I got bitten by another scenario today.
In an attempt to avoid the confusion surrounding unconverted git blobs, we touched all the files which would match our git lfs tracking rules. This caused them to be converted to pointers which we committed to master. But now when I go from master where the conversion has happened to an older branch, all of the bin files which we just converted are marked as modified. As far as I understand my only options to get out of this state are to commit the files, or force checkout master again. So I can resolve it, but once again it's awkward and unexpected so I thought it was worth mentioning.

@sinbad

This comment has been minimized.

Copy link
Contributor

sinbad commented Mar 4, 2016

I've managed to reproduce this but only when .gitattributes is still modified locally. When on a clean checkout, bouncing between pre-lfs track and post-lfs track commits works OK for me.

~/t/git-lfs-test> git cat-file blob master:spine_strips.dds
version https://git-lfs.github.com/spec/v1
oid sha256:29afe29dc594032eb2d0f8e8d6e81f2ae8d475dcd7437f54c33fced5665ddfdf
size 65664
~/t/git-lfs-test> git cat-file blob 0f326b3:spine_strips.dds
DDS  DXT5����� (binary data continues...)
~/t/git-lfs-test> git status
On branch master
Your branch is ahead of 'origin/master' by 2 commits.
  (use "git push" to publish your local commits)
nothing to commit, working directory clean
~/t/git-lfs-test> git checkout 0f326b3
Note: checking out '0f326b3'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 0f326b3... Commit binary file without lfs tracking
~/t/git-lfs-test> git status
HEAD detached at 0f326b3
nothing to commit, working directory clean
~/t/git-lfs-test> git show
commit 0f326b33ed9618037d5da82c944c2b7895786bd3
Author: Steve Streeting <steve@stevestreeting.com>
Date:   Fri Mar 4 10:31:09 2016 +0000

    Commit binary file without lfs tracking

diff --git a/spine_strips.dds b/spine_strips.dds
new file mode 100644
index 0000000..51538ca
Binary files /dev/null and b/spine_strips.dds differ
~/t/git-lfs-test> git checkout master
Previous HEAD position was 0f326b3... Commit binary file without lfs tracking
Switched to branch 'master'
Your branch is ahead of 'origin/master' by 2 commits.
  (use "git push" to publish your local commits)
~/t/git-lfs-test> git status
On branch master
Your branch is ahead of 'origin/master' by 2 commits.
  (use "git push" to publish your local commits)
nothing to commit, working directory clean
~/t/git-lfs-test> git show
commit 862702904dd605ba5a87dd80c1f5a8757595ecb9
Author: Steve Streeting <steve@stevestreeting.com>
Date:   Fri Mar 4 10:34:11 2016 +0000

    Recommit DDS when tracking

diff --git a/spine_strips.dds b/spine_strips.dds
index 51538ca..223dbee 100644
Binary files a/spine_strips.dds and b/spine_strips.dds differ

The only thing of note here is that git show on the final commit, where the file was turned into a pointer, still says "Binary files .. differ" - but this is just because git does this if one side of the diff is binary.

@dnewmarch

This comment has been minimized.

Copy link
Author

dnewmarch commented Mar 6, 2016

I've extended my initial set of commands listed above to attempt to demonstrate the issue. The following should work as a script (due to the sleep), and demonstrates the issue for me.

git init git-lfs-test
cd git-lfs-test
echo 0 > lfs.bin
sleep 1
git add lfs.bin
git commit -m "add file"

git lfs track "*.bin"
git add .gitattributes
git commit -m "track with lfs"

git checkout -b topic
touch foo.txt
git add foo.txt
git commit -m "added another file"
git status

git checkout master
git status
touch lfs.bin
git add lfs.bin
git commit -m "converted to pointer"

git checkout topic
git status

The git status just before switching off the topic branch shows no modifications, but the final git status when I switch back to the topic branch shows lfs.bin as modified.

@sinbad

This comment has been minimized.

Copy link
Contributor

sinbad commented Mar 7, 2016

Ah OK, I understand now - the reason is that on branch 'topic' you have tracked *.bin in LFS but haven't converted it at the same time, but ALSO because you already made changes to it in the working copy you had to return to topic from, it had to be checked out when you went back to topic. The .gitattributes forces *.bin to be put through the smudge filter, changing its contents and thus making it dirty. I've never come across this before because I've never gone backwards to this 'middle' state. Thanks for the script.

So perhaps this changes the guidance that to avoid this situation people should commit any existing files in git that are affected by 'git lfs track' in the same commit as the .gitattributes. That's what people usually do TBH - the vast majority of cases it's new files or they use BFG to convert history. Maybe we should warn about already tracked files when 'git lfs track' is executed. We could even automatically 'touch' files that are already tracked in git, match the lfs track command & do not show up in status.

@sinbad sinbad reopened this Mar 7, 2016

@sinbad

This comment has been minimized.

Copy link
Contributor

sinbad commented Mar 7, 2016

/cc @technoweenie what do you think about automatically touching files that match in git lfs track and are already committed, so they appear in git status and avoid this limbo state?

@technoweenie

This comment has been minimized.

Copy link
Member

technoweenie commented Mar 7, 2016

That sounds okay. We'd have to make sure LFS scans the files the same way that Git does, taking the filesystem's case sensitivity into account. Also gives the track command something more useful to do.

@sinbad

This comment has been minimized.

Copy link
Contributor

sinbad commented Mar 7, 2016

Yeah I think I can re-use the logic for include/exclude in fetch, that's based on the same matching (gitignore style).

@dnewmarch

This comment has been minimized.

Copy link
Author

dnewmarch commented Mar 7, 2016

This sounds like a good resolution to me. We've created a small mess for ourselves, and it might be atypical usage, but I'd wager we won't be the last people to go down this path, especially as more people take up git lfs usage who are wary of rewriting their history.

Thanks for your work.

sinbad added a commit that referenced this issue Mar 22, 2016

'git lfs track' now automatically touches matching files already in git
This makes them appear as modified in 'git status' which indicates they
need to be re-committed to be moved into LFS if already in git.
See #1057
@sinbad

This comment has been minimized.

Copy link
Contributor

sinbad commented Mar 22, 2016

Resolved by #1104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment