-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update checkout with new strategies & behavior #1016
Conversation
I don't think that changes that improve performance without adding clutter can be "questionable". I don't care if accessing diff internals for checkout is not "elegant" -- it does seem to be faster and just as clean. The fileops changes look great to me. Haven't looked at the actual checkout code. |
Conflictio progress callbackio |
Rebased |
@nulltoken @ben Okay, this is just an idea about direction for checkout. I'm just looking for feedback - not that this is actually ready to move forward with... Instead of just doing a diff between the index and the working directory, we would do something more like a merged status diff, where the HEAD, index, and working directory were all factored in. The code would be trying to make the working directory look like the index, but the differences with the HEAD would accounted for. What do you think if the checkout flags became something like: /** A dry run with no changes applied */
GIT_CHECKOUT_NONE = 0,
/** Update files in working dir that match HEAD to match index;
*
* Possible errors: untracked file exists that is in new HEAD, modified
* file exists that has to be created/deleted for new HEAD, unmerged
* entries exist in index.
*/
GIT_CHECKOUT_SAFE = (1u << 0),
/** Update files to match index even if they don't match HEAD; ignores
* unmerged entries in index; errors: untracked file exists that is in
* new HEAD.
*/
GIT_CHECKOUT_HARD = (1u << 1),
/** Allow overwrite of existing working dir files even if not in HEAD */
GIT_CHECKOUT_OVERWRITE_CONFLICTS = (1u << 2),
/** Remove working dir files not in index (and not ignored) */
GIT_CHECKOUT_REMOVE_UNTRACKED = (1u << 3),
/** For unmerged files, checkout stage 2 from index */
GIT_CHECKOUT_USE_OURS = (1u << 4),
/** For unmerged files, checkout stage 3 from index */
GIT_CHECKOUT_USE_THEIRS = (1u << 5),
/** Checkout submodules if submodule HEAD moved in parent tree */
GIT_CHECKOUT_UPDATE_SUBMODULES_IF_CHANGED = (1u << 6),
/** Checkout submodule with same options; i.e. HARD checkout will do
* HARD update of submodule, too
*/
GIT_CHECKOUT_UPDATE_SUBMODULES = (1u << 7),
/** Normal checkout will preflight the entire operation to make sure
* there will are no conflicts before making any actual changes. This
* flag skips the preflight and just starts doing the work
*/
GIT_CHECKOUT_NON_ATOMIC = (1u << 10), If the working directory had been modified from the HEAD, then with In
The Lastly, the Again, I only post this speculatively based on some thinking about what I'd like to see in the API. Please feel free to tear it apart. I really want to get this right. |
Let me see if I'm synthesizing this right:
I like this mapping a lot better than what we have now. We were tending to think of it as a copy (which made sense for cloning), but I guess when you start looking at it like a merge, the approach becomes more clear. I don't think there are any other flags to core git's checkout that make sense here ( tl;dr: 👍 This is way better. |
I agree with @ben. Those flags make much more sense. Thanks for having Two questions:
|
@nulltoken Thanks for your thoughts!
|
If by
I agree. I'd even say that provided the process can notify the users of conflicts and checks out as much as it's allowed to (taking into account the flags), it could even be seen as a safe multi-pass/incremental operation. |
Okay, so that was more complicated that I thought it would be. I just force pushed a big update here with new checkout strategy flags that resemble the proposal. When I actually sat down to implement it I made a few changes, but I think it is similar to what we discussed. There are still two test failures in the code. One related to submodules I need to investigate. The other is related to reset behavior. The test There are a number of other refactorings embedded in this PR now that laid the groundwork for the rewrite of checkout. I tried to break them out into separate commits. We might want to cherry pick some of them into a separate PR and get them merged (e.g. lazy eval of ignores by iterators, moving pathspec code into a separate file) just to keep this PR focused, or we can just leave them in here and focus on getting this resolved. At this point, I'd love it if folks could start by reviewing the updated include/git2/checkout.h and giving feedback on the reworked API. I'd love to hear opinions on that while I fix the remaining broken test and do some valgrinding. |
Okay, so I fixed the one regression that I hadn't had time to analyze. The second issue is with the reset test. I don't think the old behavior is correct. Unfortunately, I rewrote the reset hard test to check the behavior that I believe it should have, and we still don't match the results of With the rewritten checkout, it is probably not that much work to rearrange the reset code to do things in the right order, but this PR has already turned into a lot of code. I'll wait for some discussion before making any further changes (although I may push my extended reset test, even though it is still failing). |
@arrbee You're right, this test is incorrect. It lacks a commit and wrongly asserts that subdir should be removed. It should rather reflect the following
|
Okay, so I updated the reset test in a slightly different manner, but my plan is to open a new PR shortly to fix the reset behavior. In that PR I will implement some new reset tests including the one you propose above @nulltoken I think there are potentially two things that might be considered missing in the PR.
Does that sounds reasonable? |
Travis fixes on the way... |
There is still a memory leak in here. I'm working on it... |
* Rework GIT_DIRREMOVAL values to GIT_RMDIR flags, allowing combinations of flags * Add GIT_RMDIR_EMPTY_PARENTS flag to remove parent dirs that are left empty after removal * Add GIT_MKDIR_VERIFY_DIR to give an error if item is a file, not a dir (previously an EEXISTS error was ignored, even for files) and enable this flag for git_futils_mkpath2file call * Improve accuracy of error messages from git_futils_mkdir
So, @nulltoken created a failing test case for checkout that proved to be particularly daunting. If checkout is given only a very limited strategy mask (e.g. just GIT_CHECKOUT_CREATE_MISSING) then it is possible for typechange/rename modifications to leave it unable to complete the request. That's okay, but the existing code did not have enough information not to generate an error (at least for tree/blob conflicts). This led me to a significant reorganization of the code to handle the failing case, but it has three benefits: 1. The test case is handled correctly (I think) 2. The new code should actually be much faster than the old code since I decided to make checkout aware of diff list internals. 3. The progress value accuracy is hugely increased since I added a fourth pass which calculates exactly what work needs to be done before doing anything.
This makes it so that the check if a file is ignored will be deferred until requested on the workdir iterator, instead of aggressively evaluating the ignore rules for each entry. This should improve performance because there will be no need to check ignore rules for files that are already in the index.
Diff uses a `git_strarray` of path specs to represent a subset of all files to be processed. It is useful to be able to reuse this filtering in other places outside diff, so I've moved it into a standalone set of utilities.
There are some diff functions that are useful in a rewritten checkout and this lays some groundwork for that. This contains three main things: 1. Share the function diff uses to calculate the OID for a file in the working directory (now named `git_diff__oid_for_file` 2. Add a `git_diff__paired_foreach` function to iterator over two diff lists concurrently. Convert status to use it. 3. Move all the string/prefix/index entry comparisons into function pointers inside the `git_diff_list` object so they can be switched between case sensitive and insensitive versions. This makes them easier to reuse in various functions without replicating logic. As part of this, move a couple of index functions out of diff.c and into index.c.
This is a major reworking of checkout strategy options. The checkout code is now sensitive to the contents of the HEAD tree and the new options allow you to update the working tree so that it will match the index content only when it previously matched the contents of the HEAD. This allows you to, for example, to distinguish between removing files that are in the HEAD but not in the index, vs just removing all untracked files. Because of various corner cases that arise, etc., this required some additional capabilities in rmdir and other utility functions. This includes the beginnings of an implementation of code to read a partial tree into the index based on a pathspec, but that is not enabled because of the possibility of creating conflicting index entries.
The `git_reset` API with the HARD option is still slightly broken, but this test now does exercise the ability of the command to revert modified files.
This fixes a number of warnings and problems with cross-platform builds. Among other things, it's not safe to name a member of a structure "strcmp" because that may be #defined.
Okay, I think I've fixed everything. Since I was valgrinding, I rebased onto the latest development (even though there were no merge conflicts in this PR) so I would be testing against the latest... Let's see what Travis has to say. |
This fixes some various warnings that showed up in Travis and a couple uses of uninitialized memory and one memory leak.
By the way, this does not currently fix #1047. I started to lay the groundwork for doing so in |
Update checkout with new strategies & behavior
✨ 🏇 ✨ |
Update checkout with new strategies & behavior
So, @nulltoken created a failing test case for checkout that proved to be particularly daunting. If checkout is given only a very limited strategy mask (e.g. just
GIT_CHECKOUT_CREATE_MISSING
) then it is possible for typechange/rename modifications to leave it unable to complete the request. That's okay, but the existing code did not have enough information not to generate an error (at least for tree/blob conflicts).This led me down a long reworking of existing code. Initially I just approached this with a substantial reorganization of the existing code, and you can still see that in the first couple of commits, but at some point I realized that there were two issues:
After much rewriting, this PR nows tries to deal with those cases correctly.
In addition to those core issues being fixed, there are a number of other things going on in this PR:
rmdir
stuff)pathspec
stuff out ofdiff.c
and into a standalone fileI think the main facet of this PR that should be examined closely is the direction that I went with the checkout options structure and strategy flags. I'm worried that I went overboard and it is now going to be more complicated to use, but I suspect that most users will just want either the
GIT_CHECKOUT_SAFE
orGIT_CHECKOUT_FORCE
flags (which are actually combinations of more granular flags).BTW, I've editing this description of the PR to be more up to date with the current actual content of the branch...