Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Better way to identify conflict in a status #271

Open
yishaigalatzer opened this Issue Dec 11, 2012 · 30 comments

Comments

Projects
None yet
5 participants

When Calling Index.RetrieveStatus(path) and the underlying file is in a conflict state the returned value is currently multiple flags.

This behavior is changing from version to version and a bit hard to code reliably against.

It would be beneficial to consider a FileStatus.Conflict and return it in this case.

Member

nulltoken commented Dec 11, 2012

@yishaigalatzer What version of LibGit2Sharp are you running against (a call to Repository.Version should answer that)? Would you please be so kind as to provide us with some simple repro steps as well?

@ethomson I'm pretty sure libgit2/libgit2#1111 has been included in the latest binary upgrade.

Owner

ethomson commented Dec 12, 2012

Indeed it is. @yishaigalatzer you shouldn't be seeing conflicts in the status. Are you saying you want to see conflicts?

Version = 0.9.5-unknown-16e6cee (x86)

What I'm saying is I did a pull from command line, that indeed ends up in a conflict state. So far it's fine.

I'm looking to get a proper conflict state from libgit2sharp

Member

nulltoken commented Dec 13, 2012

I'm looking to get a proper conflict state from libgit2sharp

I thought that we were discussing about a bug in the way Index.RetrieveStatus(path) was working ( "the returned value is currently multiple flags" ). And now, I'm unsure of what you're after ( "I'm looking to get a proper conflict state from libgit2sharp" ).

Let's try another approach. Hopefully, it will get through my thick skull.

  • What are the repro steps to put the repository in the initial state?
  • What's the code you're running that returns an unexpected value?
  • What other value do you think should be returned?
Member

arrbee commented Dec 13, 2012

I suspect you all already know this, but to make it explicit: libgit2 currently ignores files in a conflict state in both diff and status. Unfortunately, I don't think I'll have time to tackle that situation until after the libgit2 1.0 release (i.e. in the next month) but at some point, I do think we should expand status to indicate that a file has conflicts and expand diff to show the 3-way diff that core git does when diffing a file with conflicts.

I'd love to have a discussion about how this should be exposed so we'll have a good API ready to go when we're ready to start on implementation.

Member

nulltoken commented Dec 22, 2012

@yishaigalatzer See (loosely) related #279

@arrbee I agree with your view of things. I do have a good working solution until that happens.

Feel free to close, or keep around to track @arrbee's suggestion

Member

dahlbyk commented Feb 13, 2013

To provide a specific example, in the conflict test repo provided by #334 our status looks like this:
+0 ~2 -5 | +7 ~0 -0 | i0

In git.git:

PS [master +2 ~2 -2 !4 | +2 ~0 -2 !4 !]> git status -sb
## master
UD ancestor-and-ours.txt
DU ancestor-and-theirs.txt
DD ancestor-only.txt
UU conflicts-one.txt
UU conflicts-two.txt
M  one.txt
AA ours-and-theirs.txt
AU ours-only.txt
UA theirs-only.txt
M  two.txt

From an API standpoint, I would think we need new status codes to stand in for a U above, e.g.:

    GIT_STATUS_INDEX_CONFLICT = (1u << 5),
    ...
    GIT_STATUS_WT_CONFLICT = (1u << 11),
Owner

ethomson commented Feb 13, 2013

@dahlbyk I'm not sure that I'm following what you mean when you say to stand in for the U...? Also, what's a WT_CONFLICT - a working tree conflict? I don't see status raising one of these, but I haven't thought much about the checkout case - does it return a GIT_STATUS?

Owner

ethomson commented Feb 13, 2013

Wait, I think I'm being dense, you're talking about the U as an Unmerged status entry, I think.

If you want a report like the one above, I think you'll need an enum that can cover all the conflict cases (sans rename conflicts -- although that's an interesting discussion for another day): https://github.com/ethomson/libgit2/blob/merge/include/git2/diff_tree.h#L54

Owner

ethomson commented Feb 13, 2013

But to be honest, I'm not sure I entirely understand the output of git status -sb, so there's that, also.

Member

dahlbyk commented Feb 13, 2013

Yeah, you're right... I didn't really think this all the way through. git status docs give a nice summary of the possible combinations:

A  A  unmerged, both added       # CONFLICT_BOTH_NEW
A  U  unmerged, added by us      # CONFLICT_OURS_NEW
U  A  unmerged, added by them    # CONFLICT_THEIRS_NEW
D  D  unmerged, both deleted     # CONFLICT_BOTH_DELETED
D  U  unmerged, deleted by us    # CONFLICT_OURS_DELETED
U  D  unmerged, deleted by them  # CONFLICT_THEIRS_DELETED
U  U  unmerged, both modified    # CONFLICT_BOTH_MODIFIED

Or perhaps it would be better to represent conflict status in OURS/THEIRS pairs?

A  A  unmerged, both added       # CONFLICT_OURS_NEW      | CONFLICT_THEIRS_NEW
A  U  unmerged, added by us      # CONFLICT_OURS_NEW      | CONFLICT_THEIRS_MODIFIED
U  A  unmerged, added by them    # CONFLICT_OURS_MODIFIED | CONFLICT_THEIRS_NEW
D  D  unmerged, both deleted     # CONFLICT_OURS_DELETED  | CONFLICT_THEIRS_DELETED
D  U  unmerged, deleted by us    # CONFLICT_OURS_DELETED  | CONFLICT_THEIRS_MODIFIED
U  D  unmerged, deleted by them  # CONFLICT_OURS_MODIFIED | CONFLICT_THEIRS_DELETED
U  U  unmerged, both modified    # CONFLICT_OURS_MODIFIED | CONFLICT_THEIRS_MODIFIED
Member

dahlbyk commented Feb 13, 2013

Or in the context of the test repo:

Current status:

ancestor-and-ours.txt    Removed, Untracked
ancestor-and-theirs.txt  Untracked
ancestor-only.txt        Nonexistent
conflicts-one.txt        Removed, Untracked
conflicts-two.txt        Removed, Untracked
ours-and-theirs.txt      Removed, Untracked
ours-only.txt            Removed, Untracked
theirs-only.txt          Untracked

Instead, maybe something like this?

ancestor-and-ours.txt    ConflictOursModified, ConflictTheirsDeleted
ancestor-and-theirs.txt  ConflictOursDeleted, ConflictTheirsModified
ancestor-only.txt        ConflictOursDeleted, ConflictTheirsDeleted
conflicts-one.txt        ConflictOursModified, ConflictTheirsModified
conflicts-two.txt        ConflictOursModified, ConflictTheirsModified
ours-and-theirs.txt      ConflictOursUntracked, ConflictTheirsUntracked
ours-only.txt            ConflictOursUntracked, ?
theirs-only.txt          ?, ConflictTheirsUntracked

I'm not really sure how you get to the ours-only or theirs-only state...

How about that also FileStatus will be including a new flag Unmerged without all the details. You can then access a conflict or unmerged collection and get the details above in one of @dahlbyk suggestions

Owner

ethomson commented Feb 13, 2013

The *-only.txt came as a result of a rename conflict. I renamed ancestor-only.txt to ours-only.txt in ours and theirs-only.txt in theirs. core git doesn't do any rename detection after a merge, so you don't get a lot of insight into rename conflicts.

As a person who doesn't use the API much, I don't feel like I have good insight into how I would expect this to look.

Member

dahlbyk commented Feb 13, 2013

How about that also FileStatus will be including a new flag Unmerged without all the details. You can then access a conflict or unmerged collection and get the details above in one of @dahlbyk suggestions

I agree that cluttering up RepositoryStatus with six new collections is not a good idea, but having the changes partitioned into lists would be useful. Maybe RepositoryStatus.Conflicted with subcollections for OursModified, TheirsDeleted, etc?

As a person who doesn't use the API much, I don't feel like I have good insight into how I would expect this to look.

From a posh-git standpoint, I would prefer a separate list for each conflict type since I provide tab completion for git add, git rm, etc. as appropriate to aid in marking conflicts as resolved.

Member

dahlbyk commented Feb 13, 2013

@ethomson thanks for clarifying *-only.txt. I guess we also need OURS_RENAMED and THEIRS_RENAMED for that case?

ancestor-only.txt        ConflictOursDeleted, ConflictTheirsDeleted
ours-only.txt            ConflictOursRenamed, N/A
theirs-only.txt          N/A, ConflictOursRenamed
Owner

ethomson commented Feb 13, 2013

Generally, I would not be in favor of adding information about rename conflicts and would prefer to call it "added by us" or "added by them" like core git does, since there's so little context in the index. One renames get slightly complicated (eg, 2->1, both renamed, etc), all hope of identifying them as such is lost. So we can really only surface simple rename conflicts and 1->2 conflicts as "renamed in ours" or "renamed in theirs".

More importantly, since the index makes no distinction about renames, it's not impossible that a merge tool could simply call an add a conflict and you have a conflict with only one side that does not represent a rename.

I'm not aware of a merge strategy that does this, but in the future, I would imagine that add/add conflicts that differ only in case could be dealt with this way. Currently, this is a clean merge, but realistically this is probably not what you would expect if you have a case insensitive working tree. I would love to see a merge option that does a case insensitive compare, thus you could end up with an add/add conflict between BAR and bar:

Ancestor Ours Theirs
foo
BAR
bar

So in the above example, you cannot discern between a both renamed conflict and an add/add conflict with different case.

Again, none of git-merge-* behaves this way, but there's also nothing stopping one from doing so.

Member

dahlbyk commented Feb 13, 2013

Generally, I would not be in favor of adding information about rename conflicts and would prefer to call it "added by us" or "added by them" like core git does, since there's so little context in the index.

Works for me.

Member

arrbee commented Feb 13, 2013

Thanks to @dahlbyk for giving this discussion a good kick and bringing the knowledge! Let me try to jump in...

If we take the table from the git status help page and translate it to libgit2 codes, I think for the non-conflicted cases we get:

X          Y     libgit2
-------------------------------------------------
          [MD]   WT_MODIFIED or WD_DELETED
M        [ MD]   INDEX_MODIFIED + (none, WT_MODIFIED, WT_DELETED)
A        [ MD]   INDEX_ADDED + (none, WT_MODIFIED, WT_DELETED)
D         [ M]   INDEX_DELETED + (none or WT_MODIFIED)
R        [ MD]   INDEX_RENAMED + (none, WT_MODIFIED, WT_DELETED)
C        [ MD]   INDEX_COPIED (*missing*) + (none, WT_MODIFIED, WT_DELETED)
[MARC]           (INDEX_NEW, INDEX_MODIFIED, INDEX_DELETED, INDEX_COPIED*)
[ MARC]     M    (none, INDEX_NEW, INDEX_MODIFIED, INDEX_DELETED, INDEX_COPIED*) + WT_MODIFIED
[ MARC]     D    (none, INDEX_NEW, INDEX_MODIFIED, INDEX_DELETED, INDEX_COPIED*) + WT_DELETED
?           ?    WT_NEW
!           !    IGNORED

That table is a bit odd since there is overlap between the items. I don't know that it's particularly easier to read than just enumerating the permutations. But whatever.

Also, while we do have GIT_STATUS_INDEX_RENAMED, we don't currently have a version for COPIED. Our status doesn't currently have an option for invoking rename / copy detection.

It is probably worth pointing out that, like core git, we don't (or at least shouldn't) have a way to give INDEX_NEW + WT_NEW or INDEX_DELETED + WT_DELETED as output when we don't consider conflicts. Although the constants exist, there should never be a circumstance in the current code that would generate those pairings. Thus, as with current git, it is acceptable (albeit confusing to my mind) to use those combinations to represent conflicts.

So, what could the conflict states look like? The simplest proposal to match core git would probably be:

X          Y     libgit2 (proposed)
-------------------------------------------------
D           D    INDEX_DELETED + WT_DELETED
A           U    INDEX_ADDED + WT_UNMERGED*
U           D    INDEX_UNMERGED* + WT_DELETED
U           A    INDEX_UNMERGED* + WT_ADDED
D           U    INDEX_DELETED + WT_UNMERGED*
A           A    INDEX_ADDED + WT_ADDED
U           U    INDEX_UNMERGED* + WT_UNMERGED*
-------------------------------------------------

I say this is the "simplest" proposal because it maximizes the reuse of existing status flags (introducing just GIT_STATUS_INDEX_UNMERGED and GIT_STATUS_WT_UNMERGED) and emulates the git output. To my mind, this is also an extremely confusing result because we chose to label the X column as INDEX and the Y column as WT which is true for the non-conflict entries, but not right for the conflict entries.

This behavior is reasonable, I guess, since if you have an unresolved conflict on file X, since there is a good chance that conflict markers have been inserted into the working tree version of X, and it doesn't make sense for status to even bother with examining the working directory contents.

Alternatively, we could introduce a new set of constants: GIT_STATUS_OURS_DELETED, GIT_STATUS_OURS_ADDED, GIT_STATUS_OURS_MODIFIED, GIT_STATUS_THEIRS_DELETED, GIT_STATUS_THEIRS_ADDED, GIT_STATUS_THEIRS_MODIFIED and use those instead when there is a conflict.

As a third alternative, we could defined GIT_STATUS_OURS_DELETED to the same value as GIT_STATUS_INDEX_DELETED, etc. and introduce a single additional bit flag GIT_STATUS_CONFLICT. Then conflict entries would be marked as such with that single extra bit and the use would know to use the OURS/THEIRS interpretation of the flags instead of the INDEX/WT interpretation (although the numeric values would be identical).

Thoughts?

Member

dahlbyk commented Feb 13, 2013

Thanks for chiming in @arrbee!

As a variation on the third alternative, could you define GIT_STATUS_OURS_DELETED as GIT_STATUS_INDEX_DELETED | GIT_STATUS_CONFLICT?

Using 3 over 2 would conserve several bits, but are we really worried about running out? Using all new constants would bring us to using 17 of 31.

Owner

ethomson commented Feb 13, 2013

@dahlbyk i've run out of sanity and hair, i fear that i'm going to run out of bits next.

Member

arrbee commented Feb 13, 2013

@dahlbyk You're right, it's true we're in no huge risk of running out of bits (although looking at the API, I'm wondering if we should convert the status_flags parameter in the git_status_cb from and unsigned int to something of a definitive bit length like uint32_t or such). At some level, aesthetically I find it displeasing to have two sets of parallel constants that are mutually exclusive to one another, which is why I lean towards sharing the values, but I would rather let real-world usage patterns dictate what makes sense.

One use case is emulating git's output. In that case, it might be a minor advantage to use the existing ADDED and DELETED constants, since the porcelain / short output would output an A or D for those cases and you would just have to special case MODIFIED to output U (provided we don't add completely novel values for the UNMERGED constants).

What do you think about this:

GIT_STATUS_CONFLICT         = (1u << 16),

GIT_STATUS_OURS_ADDED       = GIT_STATUS_INDEX_NEW | GIT_STATUS_CONFLICT,
GIT_STATUS_OURS_DELETED     = GIT_STATUS_INDEX_DELETED | GIT_STATUS_CONFLICT,
GIT_STATUS_OURS_UNMERGED    = (1u << 17) | GIT_STATUS_CONFLICT,

GIT_STATUS_THEIRS_ADDED     = GIT_STATUS_WT_NEW | GIT_STATUS_CONFLICT,
GIT_STATUS_THEIRS_DELETED   = GIT_STATUS_WT_DELETED | GIT_STATUS_CONFLICT,
GIT_STATUS_THEIRS_UNMERGED  = (1u << 18) | GIT_STATUS_CONFLICT,

That adds new values for the two UNMERGED cases, instead of reusing the modified bits, since git status -s doesn't reuse the M and uses a U in the output instead, but does reuse the A and D. I also like the idea of always blending the GIT_STATUS_CONFLICT flag with these constants.

Member

arrbee commented Feb 13, 2013

I don't want you to think I lean too heavily towards that last suggestion. I actually find it a little confusing that git doesn't call the UNMERGED cases "MODIFIED", so maybe we should just use MODIFIED in libgit2 (because if there is a conflict, then the file was modified, right?). I want to find that line of making the libgit2 values easy to understand but also make emulating core git easy as well. Please keep the alternate suggestions coming!

Owner

ethomson commented Feb 13, 2013

I was not in love with your option #1 (above) but now that I see it, it's grown on me. Let's consider the cases, and I apologize, I find it much easier to visualize at the index level.

To note, I don't think there's a case where you can have OURS_UNMERGED without THEIRS_UNMERGED, so I feel like maybe those should be collapsed to a BOTH_UNMERGED? Although you could take this a step further and just use CONFLICT for that. Eg:

Anc Our Their Code Description Bits
X X X UU Unmerged (both modified) GIT_STATUS_CONFLICT
X X AA Unmerged (both added) GIT_STATUS_OURS_ADDED + GIT_STATUS_THEIRS_ADDED
X X DU Unmerged (delete by us) GIT_STATUS_OURS_DELETED
X X UD Unmerged (deleted by them) GIT_STATUS_THEIRS_DELETED
X DD Both deleted GIT_STATUS_OURS_DELETED + GIT_STATUS_THEIRS_DELETED
X AU Unmerged (added by us) GIT_STATUS_OURS_ADDED
X UA Unmerged (added by them) GIT_STATUS_THEIRS_ADDED
Member

arrbee commented Feb 13, 2013

Hmm, I kind of like that. What do you feel about OURS_ADDED == INDEX_ADDED, etc.? We certainly have space to add the 5 additional constants that this dictates (CONFLICT, O_A, O_D, T_A, T_D) without reuse, so I could go either way.

If it will make it easier in the libgit2sharp bindings to have these as completely separate constants, then let's go that way - someone more expert in C# aesthetics than me will probably have comment on that, however.

By the way, my current thinking on libgit2 implementation is to handling this inside git_status_foreach completely for now (i.e. don't modify the diff code; instead just have status check explicitly for conflicts when it detects a "missing" index entry in diffs). Eventually it would be nice to have diff be conflict aware, but I started looking at the code to do that and it appears to be substantially more complex.

Owner

ethomson commented Feb 13, 2013

Despite my earlier protestations, after thinking about it, I have no complaints with OURS_ADDED == CONFLICT + INDEX_ADDED, if that's what you're asking. I assume you're not suggesting they be strictly equal?

Member

dahlbyk commented Feb 13, 2013

although looking at the API, I'm wondering if we should convert the status_flags parameter in the git_status_cb from and unsigned int to something of a definitive bit length like uint32_t or such

👍

@ethomson Even if OURS_DELETED implies THEIRS_MODIFIED or vice versa (though with renames/copies, it feels like there might be a DELETED+ADDED edge case somehow), there might be value in being explicit about the other side of the conflict.

@arrbee I'm also ambivalent about *_UNMERGED vs *_MODIFIED. The tie-breaker in my mind is the symmetry with the other OURS/THEIRS definitions - why is modification a special case? If one wishes to emulate git.git it's easy enough to check for CONFLICT in that one special case.

If it will make it easier in the libgit2sharp bindings to have these as completely separate constants, then let's go that way - someone more expert in C# aesthetics than me will probably have comment on that, however.

It will work fine either way. Enum members can be defined in terms of others, and our status foreach handler is already designed to look for exact matches, so the current code will work as is until we decide what to do with conflict entries.

Owner

ethomson commented Feb 13, 2013

There's no rename breaking in merge, so there's no possibility of a delete+add at the moment. (At least not with git-merge-recursive, again the whole pluggable nature makes things sort of questionable.) But still there's no way to determine that from the index format anyway without smacking those files over the head with a rename heuristic... Which I think would be very interesting, but probably not a 1.0 sort of feature.

Member

dahlbyk commented Feb 13, 2013

smacking those files over the head with a rename heuristic

A lovely mental image, this.

@yorah yorah referenced this issue Apr 23, 2013

Closed

Add a bunch of stuffs to Index.Remove() #398

4 of 4 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment