New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index tree-bob collision #2085

Merged
merged 9 commits into from Jan 30, 2014

Conversation

Projects
None yet
5 participants
@vmg
Copy link
Member

vmg commented Jan 29, 2014

Here's @arrbee's test and my proposed fix, which is ported verbatim from Git and doesn't quite work. We're now getting a couple errors about collisions from merge. @ethomson, could you please give this a look?

I am not too excited about this because the fix has been backported quite haphazardly from Core Git. It seems we're doing a lot of things poorly in our index implementation... I am strongly considering scratching it and porting it verbatim from Core Git, as we have licensing permission for most of the code in that file. The Index is one of the oldest parts of libgit2 and it's starting to show its age. Any feedback?

@ethomson

This comment has been minimized.

Copy link
Member

ethomson commented Jan 29, 2014

Hmm. I will take a look at this as soon as I can, but from the test that's failing, I think that it is not taking stage numbers into account. A typechange conflict will result in a file in the index at high stages and a file in the index with 0-stage, with that previous file as its parent.

All those tests were produced using core git's outputs as the expected values.

It's easier to show it. In this example, I have 'file' as a blob in ancestor and master. In the branch I'm merging in, it's a tree that contains a single entry 'not_a_file_anymore`:

C:\Temp\TestRepos\foo65>git merge branch
Adding file/not_a_file_anymore
CONFLICT (modify/delete): file deleted in branch and modified in HEAD. Version HEAD of file left in tree at file~HEAD.
Automatic merge failed; fix conflicts and then commit the result.

C:\Temp\TestRepos\foo65>git ls-files --stage
100644 f73f3093ff865c514c6c51f867e35f693487d0d3 1       file
100644 1ff865098b7b96faec79e9235fa43282eed7c0a4 2       file
100644 b852a0ac91f620d18c7316333e8758558549c967 0       file/not_a_file_anymore

It's also possible those tests are not accurate. I'll take a look when I have a moment.

@vmg

This comment has been minimized.

Copy link
Member

vmg commented Jan 29, 2014

@peff: I'd appreciate a couple extra eyes. <3

@ethomson

This comment has been minimized.

Copy link
Member

ethomson commented Jan 29, 2014

Ah, nevermind, obviously that was me being optimistic. It looks like my index merging isn't happy, looking into it now.

arrbee and others added some commits Jan 29, 2014

Give index_isrch the same semantics as index_srch
In case insensitive index mode, we would stop at a prefixed entry,
treating the provided search key length as a substring, not the
length of the string to match.
Two-phase index merging
When three-way merging indexes, we previously changed each path
as we read them, which would lead to us adding an index entry for
'foo', then removing an index entry for 'foo/file'.  With the new
index requirements, this is not allowed.  Removing entries in the
merged index, then adding them, resolves this.  In the previous
example, we now remove 'foo/file' before adding 'foo'.
@ethomson

This comment has been minimized.

Copy link
Member

ethomson commented Jan 29, 2014

So the issue here is that the index merging relied on us being able to (temporarily) have a file and dir at the same spot in the index as it applied changes in path order. So if you had a file named foo/file in your index and the merge result would remove foo/file and create foo, then the steps would be to add foo to the index, then remove foo/file. It did this by calling the normal old git_index_add functions that everybody else calls and that do sane error checking. Or at least they do now.

I turned this into a two-stage process where we apply the deletes then apply the adds. This makes it a little less efficient, so I may come back and move this into index.c and make this work like it did before, only without the aforementioned checks. But I will probably wait until I come through and do some other perf improvements on merge that I have in mind.

Also - when running on a case insensitive filesystem, I was seeing additional failures. It turns out that the index_isrch was substring matching. Doh! I gave it the same semantics as index_srch.

@ethomson

This comment has been minimized.

Copy link
Member

ethomson commented Jan 29, 2014

I don't yet know why those checkout tests are failing. I can take a look at this later, if nobody beats me to it, but I have to run at the moment.

@arrbee

This comment has been minimized.

Copy link
Member

arrbee commented Jan 30, 2014

I'll look into the checkout issues - this is most likely an issue similar to the merge problem where checkout is updating the index while checking out a typechange. Checkout is already written to process deletes before adds, so I would have expected it to be immune to these issues, but perhaps there is a case where that does not happen (maybe with submodules?). I'll check it out...

@vmg

This comment has been minimized.

Copy link
Member

vmg commented Jan 30, 2014

I'll check it out...

ha ha ha

src/index.c Outdated
return retval;
}

static int check_file_directory_conflict(git_index *index,

This comment has been minimized.

@ethomson

ethomson Jan 30, 2014

Member

Can we call this a "collision" instead of a "conflict"? We use "conflict" elsewhere in the index handling to mean a high-stage entry, and we explicitly use the terminology "directory/file conflict" to mean something else.

This comment has been minimized.

@arrbee

arrbee Jan 30, 2014

Member

I prefer that too - updated. @vmg feel free to remove if you want to keep the old name.

arrbee added some commits Jan 30, 2014

Force explicit remove of files instead of defer
The checkout code used to defer removal of "blocking" files in
checkouts until the blocked item was actually being written (since
we have already checked that the removing the block is acceptable
according to the update rules).  Unfortunately, this resulted in
an intermediate index state where both the blocking and new items
were in the index which is no longer allowed.  Now we just remove
the blocking item in the first pass so it never needs to coexist.

In cases where there are typechanges, this could result in a bit
more churn of removing and recreating intermediate directories,
but I'm going to assume that is an unusual case and the churn will
not be too costly.
Fix checkout NONE to not remove file
If you are checking out NONE, then don't remove.
@arrbee

This comment has been minimized.

Copy link
Member

arrbee commented Jan 30, 2014

Okay, so the checkout problem appears to be that we were relying on a lazy delete in the case of a typechange from a blob to a tree (or vice versa) and this was allowing the deleted index entry to still be present when the new item was being added to the index. I made the removal of files a little more proactive and fixed the error.

@ethomson

This comment has been minimized.

Copy link
Member

ethomson commented Jan 30, 2014

👍

@vmg

This comment has been minimized.

Copy link
Member

vmg commented Jan 30, 2014

Man, one of the first times where my code actually worked as expected and other code did not. ;)

Thank you so so much @ethomson and @arrbee for going ham on this stuff. I'm in the middle of some internal stuff atm, so I'm gonna let this sit for a few hours, in case anybody wants throw more eyes at it.

Also yeah definitely use collision everywhere. Index conflicts have a very distinct meaning.

@peff

This comment has been minimized.

Copy link
Member

peff commented Jan 30, 2014

Overall your approach looks OK. I think the CE_REMOVE stuff from git gave you some confusion. That is not so much "we are planning an operation to remove this from the index" as "our data structure does not support removing the element efficiently, so mark it with a flag and ignore it" (of course we use the latter when doing the former). It looks like you just shorten the vector directly, which is simpler, though you may be paying the price in the resulting memmove.

vmg added a commit that referenced this pull request Jan 30, 2014

@vmg vmg merged commit 8646b0a into development Jan 30, 2014

1 check passed

default The Travis CI build passed
Details

rtyley added a commit to rtyley/bfg-repo-cleaner that referenced this pull request Feb 1, 2014

Add option to fix tree-blob filename collision corruption
The CocoaPods Specs repo (https://github.com/CocoaPods/Specs) suffered
corruption due to an issue in libgit2, resulting in an identically named
tree and blob occupying the same folder:

libgit2/libgit2#2085 (comment)
http://blog.cocoapods.org/Repairing-Our-Broken-Specs-Repository/

Hopefully this issue has not affected many more repositories, but it was
interesting to see how history in affected repositories might be
re-written to remove the corruption.

Prior to this change, The BFG was not reliable at removing this kind of
corruption (using ie '--delete-files 2.0.0'). Due to the way it models Git
trees (a model that doesn't permit duplicate filenames) it couldn't see
that the affected tree contained a file with the supplied name (as it was
masked by the identically-named folder) and so didn't generate a cleaned
tree.
@rtyley

This comment has been minimized.

Copy link
Contributor

rtyley commented Feb 1, 2014

Just a note for any users who've been affected by this issue (hopefully there aren't many) - the latest release of The BFG (v1.11.1) adds a switch to make removing the duplicate tree-entries easy (as an alternative to using git-filter-branch):

$ bfg --fix-filename-duplicates-preferring tree

I've tested this against a copy of the corrupted CocoaPods/Specs repo, getting a clean git fsck in just over 2 minutes.

@arrbee arrbee deleted the rb/index-tree-blob-collision branch Feb 3, 2014

phatblat pushed a commit to phatblat/libgit2 that referenced this pull request Sep 13, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment