Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Checkout, reset and others enhancements #887

Merged
merged 14 commits into from

6 participants

@nulltoken
Collaborator
  • git_checkout_commit()
  • git_checkout_tree()
  • git_checkout_index()
  • git_reset(GIT_RESET_HARD)
@travisbot

This pull request fails (merged d80062eb into 5c27da1).

@travisbot

This pull request fails (merged 4ead6a83 into 5c27da1).

@travisbot

This pull request fails (merged 1039e288 into c358814).

@travisbot

This pull request fails (merged b3d21403 into c358814).

@travisbot

This pull request fails (merged 58b8bb84 into b2be351).

@travisbot

This pull request fails (merged d411200d into b2be351).

@travisbot

This pull request fails (merged 85024211 into b2be351).

src/checkout.c
((264 lines not shown))
- assert(repo);
- if (!opts) opts = &default_opts;
- if (!stats) stats = &dummy_stats;
+ if (git_repository_config(&cfg, repo) < 0)
+ return -1;
+
+ error = git_config_get_bool((int *)can_symlink, cfg, "core.symlinks");
@nulltoken Collaborator

@carlosmn This is weird. This very line returns 0 on Windows and GIT_ENOTFOUND on Linux.

The codeblock below looks only required on Linux. This makes me feel uneasy... Any idea?

@scunz
scunz added a note

from man git-config:

The default is true, except git-clone(1) or git-init(1) will probe and set core.symlinks false if appropriate when the repository is created.

So, the *can_symlink = 0; below is wrong; But since we might have come here from a clone-operation: Maybe we should probe it here unconditionally?

@scunz
scunz added a note

To answer this myself: No, we should not probe it here. Since on Windows it is expected to be set to false during git-clone or git-init, @nulltoken already states that this part has been run successfully.

@nulltoken Collaborator

@nulltoken already states that this part has been run successfully.

I indeed assume this. Checkout is an operation of its own, after all ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@travisbot

This pull request fails (merged 0f4196e2 into b2be351).

@travisbot

This pull request fails (merged 3d201eec into b2be351).

@nulltoken
Collaborator

@ben Ok. I'm not done yet, but I'm starting to feel rather confident about this PR. The diff based approach "feels" the right way to work this out.

If you've got comments about the code, now would be a good time, as I think I've done most of what I'd primarily envisioned. ;-)

Two quickies:

  • What arethe use cases of git_checkout_opts.[file_open_flags|dir_mode|file_mode]?
  • How would you like the stats to be handled? There's the tree reading with its own progress, then the diffing which exposes a "progress"... Beside the different kinds of progress implementation, I wonder if this doesn't put under the light a need for a multi-step progress indicator (17% of step 2 of 3)
@travisbot

This pull request fails (merged 872f2072 into 697665c).

@nulltoken
Collaborator

So, the *can_symlink = 0; below is wrong;

@scunz You're of course right! I've changed this in the code.

When no config entry is found, the hosting platform is supposed to be able to handle symlinks. However, nothing is stored in the configuration store (this should rather be handled by init/clone..)

@travisbot

This pull request fails (merged 0337d59b into 697665c).

@travisbot

This pull request passes (merged 0d8f6c27 into 697665c).

@travisbot

This pull request fails (merged e14a1fbf into 697665c).

@travisbot

This pull request fails (merged 06f60d83 into 697665c).

@travisbot

This pull request passes (merged ec0f0399 into 697665c).

@travisbot

This pull request passes (merged 432f2173 into 697665c).

@ben

@nulltoken I have two answers!

What are the use cases of git_checkout_opts.[file_open_flags|dir_mode|file_mode]?

@arrbee suggested that a checkout caller might want to control the flags passed to open, or the default file or directory mode. He couldn't think of a solid use case at the time. If these are the sorts of things that core Git allows, we should probably allow them as well (file_mode and dir_mode probably more than the ability to use O_APPEND), but if not they're probably not necessary.

How would you like the stats to be handled?

Let's leave this as-is for this PR, but it's clear we need something better going forward. Let's discuss that over on #890, shall we?

@travisbot

This pull request passes (merged 9cbaf45a into 697665c).

@travisbot

This pull request passes (merged 70cee741 into 5fdc41e).

@travisbot

This pull request passes (merged 88dc301d into 5fdc41e).

@travisbot

This pull request passes (merged 354a97dd into 5fdc41e).

@travisbot

This pull request passes (merged 024d341e into c920e16).

@nulltoken
Collaborator

@ben @carlosmn I've reworked the PR. It should be ready to be reviewed, now.

@travisbot

This pull request passes (merged 753e8fdb into c920e16).

@travisbot

This pull request passes (merged b7175b56 into 1168410).

src/checkout.c
((194 lines not shown))
+{
+ struct checkout_diff_data *data;
+ int error = -1;
+
+ data = (struct checkout_diff_data *)cb_data;
+
+ data->stats->processed = (unsigned int)(data->stats->total * progress);
+
+ git_buf_truncate(data->path, data->workdir_len);
+ if (git_buf_joinpath(data->path, git_buf_cstr(data->path), delta->new_file.path) < 0)
+ return -1;
+
+ switch (delta->status) {
+ case GIT_DELTA_UNTRACKED:
+ if (!git__suffixcmp(delta->new_file.path, "/"))
+ error = git_futils_rmdir_r(git_buf_cstr(data->path), GIT_DIRREMOVAL_FILES_AND_DIRS);
@ben
ben added a note

Is it intended that the library do a "clean" as well? Neither git reset --hard nor git checkout remove untracked files. Am I misunderstanding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ben

:sunglasses:

One other thing we should think about is what the external interface needs in order to support selecting a merge strategy. This is probably means replacing git_checkout_opts.existing_file_action with merge_strategy, and providing a "clobber" option for a forced checkout. What do you think?

@travisbot

This pull request passes (merged 758e6cf2 into af6bcd8).

@nulltoken
Collaborator

@ben @carlosmn I'm starting to update the code in order to make git_checkout_index() support something like the following

diff --git a/include/git2/checkout.h b/include/git2/checkout.h
index a2deff6..c6a00f0 100644
--- a/include/git2/checkout.h
+++ b/include/git2/checkout.h
@@ -21,13 +21,16 @@
  */
 GIT_BEGIN_DECL

-
-#define GIT_CHECKOUT_OVERWRITE_EXISTING 0 /* default */
-#define GIT_CHECKOUT_SKIP_EXISTING 1
+enum {
+   GIT_CHECKOUT_NONE               = (1 << 0),
+   GIT_CHECKOUT_OVERWRITE_MODIFIED = (1 << 1),
+   GIT_CHECKOUT_CREATE_MISSING     = (1 << 2),
+   GIT_CHECKOUT_REMOVE_UNTRACKED   = (1 << 3),
+};

 /* Use zeros to indicate default settings */
 typedef struct git_checkout_opts {
-   int existing_file_action; /* default: GIT_CHECKOUT_OVERWRITE_EXISTING */
+   int checkout_strategy; /* default: GIT_CHECKOUT_NONE */
    int disable_filters;
    int dir_mode; /* default is 0755 */
    int file_mode; /* default is 0644 */
@nulltoken
Collaborator

How about this? ^^

@arrbee
Owner

Hey @nulltoken and @ben, okay, so first let me apologize that I am coming to this work so late in the game. I have a bunch of comments and if I'm just out of touch with discussion that has already occurred, please feel free to tell me I am off track! So, that being said, here goes...


Let me start off by saying that overall this looks great! The amount of functionality and the level of control is amazing. Really nice work! :sparkles: It is also really easy to follow to code and see how everything interrelates, which is a great accomplishment.


I notice that git_checkout_commit and git_checkout_tree have identical signatures, just one resolving to a commit and one to a tree; however they have significantly different behaviors. I wish there was a way we could name the functions in a way that made it clearer what was going on.

Do you think we could reorganize the API slightly to distinguish plumbing and porcelain a little bit more, and then make it more transparent how the porcelain functions break down into the plumbing? To be more specific, I guess I feel like we have plumbing that does:

  1. check out the index to the working directory (with lots of options)
  2. load a tree into the index (with an easy way to get the HEAD tree)
  3. move the HEAD to a particular reference and/or commit

Then layered on those utilities we have:

  • git_checkout_head -> 2 (load HEAD to index) + 1 (checkout index)
  • git_checkout_reference -> 3 (move HEAD to reference, possibly detached) + 2 + 1
  • git_checkout_commit -> 3 (move HEAD to commit, detached) + 2 + 1
  • git_checkout_index -> 1
  • git_checkout_tree -> 2 (load tree to index) + 1 (checkout index)

Are the git_checkout_reference and git_checkout_commit APIs even necessary to preserve? Given APIs that make it easy to move the HEAD around, could we reduce the API to the HEAD, tree, and index checkout commands?

Am I misunderstanding things? Apologies in advance, if so...


In blob_content_to_file, if filters are disabled and/or no filters apply to a given blob, it is somewhat unfortunate that we have to make a full copy of the blob contents in memory (into the content git_buf) instead of just writing the blob directly out to disk. In other similar code, I have sometimes created a "fake" git_buf directly from the blob data pointer that I can pass to function expecting a git_buf, but then just don't free the fake buffer. I realize that it may make the code path slightly more complex have a "there are filters" path and "there are no filters" path, but with large blobs, eliminating the extra allocation and copy would be nice.


Is there a reason why you didn't put the paths strarray into the git_checkout_opts structure and instead making it a separate parameter? It feels a bit like another checkout option to me...


The documentation for the paths strarray should probably note that those paths are actually fnmatch path patterns. The tests check *.txt as the path spec (which is great, because this path matching probably needs more testing) but from the header comments it might not be obvious that you can do that. Also, I'd love to see a test for checkout with a directory/prefix/ type of path to confirm that just that subdirectory get's checked out, since I suspect that will be a common use case for the path matching.

@nulltoken
Collaborator

@arrbee I rewrote the history a bit.

  • git_checkout_commit never happened.
  • git_checkout_reference has been dropped
  • git_checkout_opts exposes a path property and bears a comment to explain its target usage

Remaining to be done:

  • Optimize git_buf usage
  • Cover with test the checkout of a subdirectory
@arrbee
Owner

@nulltoken Looking good to me. :metal: Is it less code than before? How are you feeling about it?

Do we need functions to make it super simple to move the HEAD around so that it will still be easy to achieve the git_checkout_commit and git_checkout_reference type of functionality, or do feel like those behaviors will still be obvious to emulate?

@nulltoken
Collaborator

Is it less code than before? [...] do you feel like those behaviors will still be obvious to emulate?

Not much less code, but you were right, it was redundant. The API is much clearer, now.

How are you feeling about it?

I'm always happy to remove some code!

@ben

Do we need functions to make it super simple to move the HEAD around [...]?

Let's take a walk down that road: git_head_set(repo, commit) would be a convenience wrapper around this:

git_reference_lookup(&ref, repo, GIT_HEAD_FILE);
git_reference_set_oid(ref, git_commit_id(commit));

... that validates that it's a commit. It's likely to be a common operation, and having a built-in call would prevent some of the pitfalls that come up.

Having this also implies having an API namespace for managing HEAD. Is there anything else that would be handy to have in there? git_head_lookup(&ref, repo)?

@carlosmn
Owner

We already have git_repository_head and friends. git_repository_set_head seems like a natural name.

@scunz

Then I'd suggest making two of them:

  • git_repository_set_head(git_repository*, const git_oid* commitish) and
  • git_repository_set_head_symbolic(git_repository*, const char* refname).
@nulltoken
Collaborator

@scunz :heart: these. I'm going to add them as part of this PR

@carlosmn
Owner

Usually you'd want HEAD to stay attached, so I'd rather have that be set_head. We can have something like detach_head or set_head_detached for the cases where we want to do that.

@scunz

That sounds even more reasonable, indeed.

@nulltoken
Collaborator

So we'd have:

  • git_repository_set_head(git_repository*, const char* refname)
  • git_repository_set_head_detached(git_repository*, const git_oid* commitish)
  • git_repository_detach_head(git_repository*)
@nulltoken
Collaborator

or should we go with git_repository_set_head(git_repository*, git_reference *ref)?

@nulltoken
Collaborator

How about this?

/**
 * Make the repository HEAD point to the specified reference.
 *
 * If the provided reference doesn't exist, the HEAD is unaltered and 
 * GIT_ENOTFOUND is returned.
 *
 * If the provided reference points to a Tree or a Blob, the HEAD is
 * unaltered and -1 is returned.
 *
 * If the provided reference points to a branch, the HEAD will point
 * to that branch, staying attched.
 *
 * Otherwise, the HEAD will be detached and will directly point to
 * the Commit.
 *
 * @param repo Repository pointer
 * @param refname Canonical name of the reference the HEAD should point at
 * @return 0 on success, or an error code
 */
GIT_EXTERN(int) git_repository_set_head(
    git_repository* repo,
    const char* refname);

/**
 * Make the repository HEAD directly point to the Commit.
 *
 * If the provided committish cannot be found in the repository, the HEAD
 * is unaltered and GIT_ENOTFOUND is returned.
 *
 * If the provided commitish cannot be peeled into a commit, the HEAD
 * is unaltered and -1 is returned.
 *
 * Otherwise, the HEAD will eventually be detached and will directly point to
 * the peeled Commit.
 *
 * @param repo Repository pointer
 * @param commitish Object id of the Commit the HEAD should point to
 * @return 0 on success, or an error code
 */
GIT_EXTERN(int) git_repository_set_head_detached(
    git_repository* repo,
    const git_oid* commitish);

/**
 * Detach the HEAD.
 *
 * If the HEAD already directly points to a Commit, 0 is returned.
 *
 * If the HEAD directly points to a Tag, the HEAD is updated into
 * making it point to the peeled Commit, and 0 is returned.
 *
 * If the HEAD directly points to a non commitish, the HEAD is 
 * unaletered, and -1 is returned.
 *
 * Otherwise, the HEAD will eventually be detached and will directly point to
 * the peeled Commit.
 *
 * @param repo Repository pointer
 * @return 0 on success, or an error code
 */
GIT_EXTERN(int) git_repository_detach_head(
    git_repository* repo);
@scunz

set_head:

 * If the provided reference points to a branch, the HEAD will point
 * to that branch, staying attched.

"or become attached if it isn't yet"?

detach_head:

* If the HEAD directly points to a non commitish, the HEAD is 
* unaletered, and -1 is returned.

Does this 'directly' here mean: "If head is detached, but points to a tree/blob/tag?"

Other than that looks good to me.

@carlosmn
Owner

So git_repository_detach_head would be git checkout HEAD~0? I don't see that it deserves its own function.

I see a problem with both set_head and set_head_detached detaching HEAD. Instead of a ref, you can make it take a branch so we know that it's really a local branch.

@nulltoken
Collaborator

I see a problem with both set_head and set_head_detached detaching HEAD. Instead of a ref, you can make it take a branch so we know that it's really a local branch.

I may be wrong, but I think we don't have a git_branch type. ;-)

However, in order to try and prevent the user from shooting into its own foot, following counter-measures have been deployed:

  • set_head ensures the name points to a valid reference, and invoke git_reference_isbranch()
  • set_head_detached ensure the oid points to a valid object and that this object is peelable into a commit.

FWIW, git doesn't seem to complain with non branches

$ git checkout refs/tags/e90810b
Note: checking out 'refs/tags/e90810b'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at e90810b... Test commit 2
@carlosmn
Owner

I may be wrong, but I think we don't have a git_branch type. ;-)

This keeps changing from under me, I can't keep it straight.

The comment was for set_head. For detaching it, it obviously doesn't matter what it is, as the point is that we're not on a branch.

@nulltoken
Collaborator

As previously proposed, we could go with git_repository_set_head(git_repository*, git_reference *ref), but we'd still have to internally decide if HEAD is going to be detached or not, depending of git_reference_isbranch(ref) result.

Would that suit you?

@carlosmn
Owner

Would that allow us to set HEAD to an unborn branch?

@nulltoken
Collaborator

Would that allow us to set HEAD to an unborn branch?

I'm going try to tweak the char * based proposal in order to make this happen.

@nulltoken
Collaborator

Would that allow us to set HEAD to an unborn branch?

I'm going try to tweak the char * based proposal in order to make this happen.

@carlosmn Done ^^

@nulltoken
Collaborator

@arrbee Remaining task is working on lowering the memory usage.

@nulltoken
Collaborator

Remaining task is working on lowering the memory usage.

@arrbee Done ^^

@arrbee
Owner

Wow, @nulltoken! That is a ton of great work! :star: :star2: :star: :star2:

To me, this feels like a less overlapping API and at the same time, a ton of functionality.

At some future point (i.e. not needed in this PR), we should probably have a documentation page that explains "If you are trying to do the equivalent of git command XYZ, then you do XYZ' in libgit2" which will help understand how to use these APIs in combination with one another. But the beauty of this work is that the XYZ' version will generally only be 2 or 3 easy API calls at worst.

The memory reduction is exactly what I was talking about. I don't know how much that type of thing matters, but since it both reduces memory usage and increases performance, I'm glad you made the change.

My only comment is that the git_strarray *paths in the options structure could just be a git_strarray paths since an empty list of paths can be indicated by a zeroed out structure. You don't need to have a pointer to the array. However, this is more preference than anything else, and for all I know, it might actually be harder in the bindings to embed the git_strarray structure inside the other structure, so I defer to your judgement on this.

I think it's looking very solid. Anyone else that needs to comment before we merge? /cc @ben @carlosmn

@nulltoken
Collaborator

My only comment is that the git_strarray *paths in the options structure could just be a git_strarray paths

Please, hold off the merge. I'm going to fix this.

@ben

I'm :heart:ing it. Doo-doot-doot-doot-doo!

@nulltoken
Collaborator

My only comment is that the git_strarray *paths in the options structure could just be a git_strarray paths

Please, hold off the merge. I'm going to fix this.

Fixed.

@arrbee arrbee merged commit 411cb01 into from
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.