Teach status to recognize multiple files with identical contents #1550
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull-request teaches the
git lfs status
command how to recognize multiple files that share the same contents. By modifying the behavior oflfs.ScanIndex
, we now maintain a slice of file metadata associated with a given OID, instead of a one-to-one association (as before).ScanIndex
now only accepts unique pointer OIDs (even if there are multiple pointers for a given OID, in the case of many files sharing the same contents), and then resolves them all at once to multiple files, per the data stored in the indexFileMap.While working on this, @technoweenie and I found what may be a Git core bug having to do with checking a file that is both in the index and working tree (i.e., it appears in both
git diff-index
andgit diff-index --cached
). I stashed a failing test that I wrote to demonstrate this behavior in this Gist.As an aside, this PR is a temporary measure. Maintaining these extra caches is not a long-term solution that I'd like to keep, this is more of a hack to solve this bug quickly.
/cc @technoweenie @sinbad @rubyist