Skip to content

Conversation

niik
Copy link
Contributor

@niik niik commented Sep 5, 2014

GitHub for Windows needs a way to blow away the entire index (undoing the initial commit). Up until now we've done this through shelling out but I'm moving that over to libgit2sharp now.

Clearing the index can be done right now in l2gs by explicitly creating an empty tree object and resetting to that but that involves a few more moving parts of actually creating the object in the odb and I was thinking there should be a way to explicitly clear as well.

Initially I though about adding an Index.Clear that calls git_index_clear but I'm concerned that people will confuse Index.Clear with something that just unstages everything, not clearing the index completely so I opted for allowing passing in null to Index.Reset and clearly documenting it.

There has been some discussion of libgit2 supporting the empty tree sha (libgit2/libgit2#1713 (comment)). If libgit2 ends up supporting that a Tree.Empty static property might be a nice touch.

@nulltoken
Copy link
Member

There has been some discussion of libgit2 supporting the empty tree sha (libgit2/libgit2#1713 (comment)).

Could you please take a look at this test? Empty Tree looks supported 😉

@carlosmn From an index perspective, what would be the difference between clearing it and replacing its content by reading an empty Tree?

@niik
Copy link
Contributor Author

niik commented Sep 5, 2014

Could you please take a look at this test? Empty Tree looks supported

I'm sorry for being unclear. I wasn't saying that it isn't supported, merely that core Git has special case support for the empty tree sha 4b825dc642cb6eb9a060e54bf8d69288fbee4904.

libgit2/libgit2#1713 (comment)

FWIW, core git does this at the object retrieval layer. We have "fake" cached objects, including a builtin empty tree object. See https://github.com/git/git/blob/master/sha1_file.c#L56. So you can not only use the tree lookup functions on it, but do things like `cat-file 4b825dc642cb6eb9a060e54bf8d69288fbee4904 and get the raw tree contents (which are, of course, empty).

It is indeed possible to do Repository.Index.Reset(Repository.ObjectDatabase.CreateTree(new TreeDefinition()));. This just shortcuts that by allowing you to clear without explicitly creating an empty tree first. To the best of my knowledge clearing it is conceptually the same as replacing with an empty tree but since there's an explicit API for clearing we might as well take advantage of any optimizations that can provide.

@carlosmn
Copy link
Member

carlosmn commented Sep 5, 2014

They are practically the same thing, as you first clear out the index anyway and then you just don't refill it. But this is more about how to best make the method call expressive. I would go with Index.Clear() though. Things like Index.Reset(null), a method call with a single argument that's null looks like a code smell to me.

@nulltoken
Copy link
Member

I agree with @carlosmn regarding Index.Reset(null).

However, regarding Index.Clear(), my first feeling is that it may be error prone. The way, I see it, the Index namespace is one of the very few namespace the average new LibGit2Sharp user will play with.

He/Her may not understand exactly what is the Index. Making the whole index dropped by simply calling Clear() may seem like an unexpected result to him/her. I'd be more inclined with making this kind of thing a bit more difficult to achieve (by explicitly passing an empty Tree to Reset(), for instance).

However, it's possible that I may be lacking some perspective on this subject. How would one perform such operation with git.git? Is it a common operation that git users rely on frequently?

/cc @dahlbyk @KindDragon

@niik
Copy link
Contributor Author

niik commented Sep 5, 2014

Things like Index.Reset(null), a method call with a single argument that's null looks like a code smell to me.

I took the inspiration from this from @arrbee's comment about empty trees in libgit2 —  "in most cases where you would use the empty tree in core Git, we just allow you to pass NULL to indicate an empty tree"

However, regarding Index.Clear(), my first feeling is that it may be error prone

This was my concern as well and before I submitted this PR I asked some of the other native devs at GitHub what they thought an Index.Clear method would do without any other context and from my very small sample enough thought that it would unstage already staged changes.

I'd be more inclined with making this kind of thing a bit more difficult to achieve (by explicitly passing an empty Tree to Reset(), for instance).

Yes, and that's why I suggest that if/when libgit2 gains the same level of fake-object support for the empty tree hash libgit2sharp can expose a static readonly property on the Tree object but since that's currently in libgit2 there's no way to get such a tree other than actually creating a tree object in the odb. Which seems excessive seeing as there's already a method in libgit2 that accomplishes exactly this (namely _clear).

How would one perform such operation with git.git?

git read-tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904. Note that no such tree has to exist in the odb for this to work due to git/git special casing this sha.

Is it a common operation that git users rely on frequently?

No, definitely not. I can only think of a select few cases which is why I tried to make it slightly obfuscated.

@ethomson
Copy link
Member

ethomson commented Sep 6, 2014

Yes, and that's why I suggest that if/when libgit2 gains the same level of fake-object support for the empty tree hash libgit2sharp can expose a static readonly property on the Tree object

I am of the mind that libgit2 should not do this. This seems very porcelainy to me, though that's just my opinion.

How hard would it be to make a static empty tree without libgit2's assistance? I suspect this is a lot of churn, presumably Tree becomes an interface or some such and now there's a ActualReallyATree and an EmptyTree which seems like a lot of moving things around just for this.

I guess Reset could also take an ITreeLikeInterface but that seems a little weird too.

@niik
Copy link
Contributor Author

niik commented Sep 6, 2014

I feel like this discussion is becoming bigger than the scope of this PR. I would like to be able to use git_index_clear through libgit2sharp in some manner. If exposing it as a separate Clear method is the way to move this forward I'm happy to do that and document clearly that it's not an entry point for simply unstaging staged files.

While I personally like that Clear maps straight to the libgit2 operation We could discuss alternative/more "dangerous sounding" names for the method (Destroy, Erase etc).

@carlosmn
Copy link
Member

carlosmn commented Sep 6, 2014

git read-tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904.

That's the plumbing way. The porcelain way would be git rm --cached '*' (there's also rm .git/index but I wouldn't recommend that).

While we use NULL in C in places where you could use an empty tree, we do it (generally) as optional parameters, where you may pass in a tree or not, rather than to indicate "the empty tree". Specifying the empty tree is a convenient way to use the shell interface, but we shouldn't be trying to replicate it.

While I personally like that Clear maps straight to the libgit2 operation We could discuss alternative/more "dangerous sounding" names for the method (Destroy, Erase etc).

If someone things the index is log-based, would think anything differently from clear to destroy or erase?

@niik
Copy link
Contributor Author

niik commented Sep 6, 2014

While we use NULL in C in places where you could use an empty tree, we do it (generally) as optional parameters, where you may pass in a tree or not, rather than to indicate "the empty tree"

Thanks, this context is most useful.

If someone things the index is log-based, would think anything differently from clear to destroy or erase?

Unclear, maybe? As I said I'm pro Clear. I've updated the PR as there seems to be consensus against the Reset(null).

@nulltoken
Copy link
Member

from my very small sample enough thought that it would unstage already staged changes.

I think that's the point. Unless I'm wrong, "unstaging" is different from "clearing" the index.

  • "Unstaging" would actually replace the content of the Index with the content of HEAD
  • "Clearing" would make all the content as Index seen as "deleted" and all the content of the workdir as "untracked"

there's no way to get such a [fake] tree other than actually creating a tree object in the odb.

What's the drawback of creating one when you need it? Then GHfW could hold on the generated Tree instance and use it as a parameter to Index.Reset() when needed.

@niik niik changed the title Exposing clearing the index through Index.Reset Exposing clearing the index Sep 8, 2014
@niik
Copy link
Contributor Author

niik commented Sep 8, 2014

I'm wrong, "unstaging" is different from "clearing" the index.

It is.

What's the drawback of creating one when you need it? Then GHfW could hold on the generated Tree instance and use it as a parameter to Index.Reset() when needed.

We didn't have to create a tree when we were shelling out and now we do. That is the drawback. Why should we have to create a tree when there's a way to do it that doesn't require jumping through hoops?

The way I see this there's a right way to do it and then there's a workaround for the right way not being exposed through libgit2sharp. How can we move this PR forward to address that?

@nulltoken
Copy link
Member

That is the drawback. Why should we have to create a tree when there's a way to do it that doesn't require jumping through hoops?

@niik I'm not sure to understand the drawback you're referring to. I don't think creating the Tree in the ObjectDatabase would cause a performance issue. Besides, as this Tree won't be reachable through any reference, it won't be transmitted over the wire. Moreover, the object is going to be pruned the next time git gc is run.

The way I see this there's a right way to do it and then there's a workaround for the right way not being exposed through libgit2sharp. How can we move this PR forward to address that?

I'm not sure there's a definitive "right way' 😉

However, one proposal may be for instance, to make the ObjectDatabase expose a lazy property EmptyTree. Resolving this property would actually create the Tree in the odb and return the generated Tree. Thus, this object would live (and be easily accessible) for the whole life of the Repository instance.

Clearing the index, would then involve a call to repo.Index.Reset(repo.ObjectDatabase.EmptyTree).

Later, when libgit2 exposes way to cope with this magic empty tree (without requiring it to actually exist in the odb), we'll just have to tweak the LibGit2Sharp implementation without changes to the consumed API.

Thoughts?

@jspahrsummers
Copy link

I'm sorry, but I don't understand what's really being discussed here.

Is there a philosophical objection to exposing functionality that exists in libgit2? If so, that seems really unfortunate for a wrapper library.

If the concern is that it might be offputting to newbie users, that can be solved with documentation, or a separate API that is akin to libgit2's sys/ namespace.

As with any framework, I strongly believe that things should be safe/easy/simple by default. However, in general—and especially for our product (GitHub for Windows)—there needs to be a way to break out of that box and access “advanced user” features when necessary. If libgit2sharp can't or won't provide this, we'll effectively need a different wrapper library for libgit2.

@carlosmn
Copy link
Member

carlosmn commented Sep 9, 2014

libgit2sharp fulfills a role on Windows which would often be fulfilled by the git tool on other operating systems. The main concern is how to expose something that's not too dangerous for someone who isn't familiar with the Git system. This presents an issue of where to put the delineation.

If the concern is that it might be offputting to newbie users, that can be solved with documentation, or a separate API that is akin to libgit2's sys/ namespace.

Indeed. Though the thing with documentation is that you barely read it before you get into trouble (well, at least the cases you often end up having to support).

Chatting with @nulltoken about where to draw lines between providing helpers and direct access to the data structures, it appears we have reached an understanding. The methods under Index will be the ones which handle the index as a data structure, while those methods which provide the abstractions/indirections will embrace the fact that doing a Stage() is really about working at a whole-repository level and move to Repository() since they have no particular affinity to the index itself.

I think we shouldn't deleay this and merge this as-is (minus the leftover comments about Reset(null)) and then proceed to deprecate and move the staging/unstaging/whatever methods to Repository.

/// <param name="source">
/// The <see cref="Tree"/> to read the entries from.
/// If null the index will be reset to an empty tree.
/// </param>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation change is no longer accurate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'll fix this asap

@niik
Copy link
Contributor Author

niik commented Sep 9, 2014

Fixed the documentation leftovers and rebased on top of vNext!

@nulltoken
Copy link
Member

If the concern is that it might be offputting to newbie users, that can be solved with documentation, or a separate API that is akin to libgit2's sys/ namespace.

Yeah, that's more or less the concern. I feel wary of making the same type exposing high level operations user friendly operations and low level bare-bone ones.

I was thinking about something along the lines of keeping the high level ones under repo.Index and moving the lower level ones under repo.Advanced.Index (and migrating the repo.ObjectDatabase also under repo.Advanced), but that didn't feel very right.

@carlosmn's proposal (moving Stage and friends) under IRepository and making the low level methods exposed by the Index type indeed feels like a better idea.

@nulltoken
Copy link
Member

@niik

@nulltoken nulltoken deleted the niik/clear-index branch September 9, 2014 21:11
@jspahrsummers
Copy link

I think we shouldn't deleay this and merge this as-is (minus the leftover comments about Reset(null)) and then proceed to deprecate and move the staging/unstaging/whatever methods to Repository.

Sweet, that sounds like a great way forward. ✨

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants