New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shallow Clone Support #6396
Shallow Clone Support #6396
Conversation
This represents (old-style) grafted commits, a.k.a an array of overridden parents for a commit's OID.
This wires git_repository to open the .git/info/grafts file and load its contents as git_commit_grafts objects.
In order to increase maintainability in the future, we should try to make structures as self-contained and opaque to its users as possible. Thus it is probably not a good idea to just typedef `git_graftmap` to `git_oidmap`, as that will make it a lot harder in the future to extend the API in the future, if this need ever arises. Refactor the code to instead declare a real structure `git_grafts`, which is completely opaque to its callers.
Instead of using the newly introduced `git_buf_foreach_line`, which modifies the buffer itself, we should try to use our existing parsing infrastructure in "parse.h". Convert the grafts parsing code to make use of `git_parse_ctx`. Remove the `git_buf_foreach_line` macro, as grafts have been its sole user.
Parsing of grafts files is currently contained in the repository code. To make grafts-related logic more self-contained, move it into "grafts.c" instead.
The shallow roots are in fact another user of the grafting mechanism, and in essence they do use the same file format for grafted commits. Thus, instead of hand-coding the parsing logic a second time, we can just reuse the `git_grafts` structure for shallow commits, as well.
When loading shallow grafts, we add each of the grafting commits to the grafts backed by ".git/info/grafts". Keeping track of both grafts separately, but partially storing them in a common grafts structure is likely to lead to inconsistencies. In fact, there already are inconsistencies if refreshing shallow grafts at a later point, as we only refresh the shallows, but not the normal grafts in that case. Disentangle both grafting stores and instead check both separately when parsing commits.
The refresh logic for both "normal" and shallow grafts are currently part of the repository code and implemented twice. Unify them into the grafts code by introducing two new functions to create grafts from a file and to refresh a grafts structure.
If replacing an already existing graft in the grafts map, then we need to free the previous `git_commit_graft` structure.
Currently, we expose the function `git_repository_shallow_roots` to get all grafted roots of the repository. This already paints us into a corner, though, as we certainly need to experiment with some functionality of the grafting mechanism before we can happily expose some of its functionality. Most importantly, we need to get right when to refresh grafts and when not. Thus, this commit removes the public function with no public replacement. We should first try and see what usecases people come up with to e.g. expose the `git_grafts` mechanism directly in the future or do something different altogether. Instead, we provide an internal interface to get weak pointers to the grafting structs part of the repository itself.
Co-authored-by: Qix <Qix-@users.noreply.github.com>
…libgit2 into shallow-clone-network
Co-authored-by: Qix <Qix-@users.noreply.github.com>
I just merged main in and updated some issues flagged by Qix. |
@lrm29 looks like we're back on track :) Any chance of a review now? Getting this merged finally would be incredible! |
hey guys, any progress here for shallow clone? |
Ping @lrm29, any chance of a review? |
I'm not an approving reviewer, and nor should I be :D |
Oops, my bad, pinged the wrong person 🙃 sorry for the noise. The failures seem transient or at least unrelated:
Perhaps I'll get the right ping this time :D @ethomson, perhaps you'd have time for a review now? :) |
👋 I had a few minutes over the weekend, so I started taking a look at this. I'm starting by merging main into this branch to make sure that it queries the underlying oid type for the repository and/or remote. On the whole this looks really good and I'm looking forward to making good progress on it. For visibility: this is my top libgit2 priority right now, but libgit2 is not my top priority right now. 😓 |
So! I was able to merge latest in and update the code, because a few things have changed since it was written, namely:
I also made some changes that were unrelated to the merge:
I commented on all the lines to indicate my thinking - I didn't want to just change code, I also hoped to sort of explain the rationale. Finally, I didn't make changes here, but I have some questions:
I pushed up https://github.com/libgit2/libgit2/compare/ethomson/shallow?expand=1 with those changes. Review on my changes welcome, and if you have opinions on (1) and (2), that would be very helpful. 🙏 |
* | ||
* The default is -1. | ||
*/ | ||
int depth; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't default to -1
, it should default to 0
. We shouldn't have multiple values (0
and all negative values) that mean the same thing, let's just make 0
mean complete history and disallow negative values. We may want to use them for something meaningful in the future.
@@ -442,6 +450,11 @@ GIT_EXTERN(int) git_smart_subtransport_ssh( | |||
git_transport *owner, | |||
void *param); | |||
|
|||
GIT_EXTERN(size_t) git_shallowarray_count(git_shallowarray *array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be in transport.h? Does this really need to be public at all?
@@ -420,7 +420,9 @@ static int clone_into( | |||
|
|||
memcpy(&fetch_opts, opts, sizeof(git_fetch_options)); | |||
fetch_opts.update_fetchhead = 0; | |||
fetch_opts.download_tags = GIT_REMOTE_DOWNLOAD_TAGS_ALL; | |||
|
|||
if (opts->depth <= 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't allow negative values, we should error on them.
@@ -169,31 +172,55 @@ int git_fetch_negotiate(git_remote *remote, const git_fetch_options *opts) | |||
|
|||
remote->need_pack = 0; | |||
|
|||
if (!opts) | |||
remote->nego.depth = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default shouldn't be -1
, it should be 0
.
if (filter_wants(remote, opts) < 0) | ||
return -1; | ||
|
||
/* Don't try to negotiate when we don't want anything */ | ||
if (!remote->need_pack) | ||
return 0; | ||
|
||
if (opts && opts->unshallow && opts->depth > 0) { | ||
git_error_set(GIT_ERROR_INVALID, "options '--depth' and '--unshallow' cannot be used together"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--depth
and --unshallow
are command line options for git. libgit2 users should not be given error messages as if they were using git. They're not.
char oid[GIT_OID_SHA1_HEXSIZE]; | ||
git_str shallow_buf = GIT_STR_INIT; | ||
|
||
git_oid_fmt(oid, git_shallowarray_get(wants->shallow_roots, i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to cope with different oid types.
|
||
git_pkt_free((git_pkt *) pkt); | ||
|
||
if (error < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we generally don't wrap single lines in { }
.
git_pkt_free((git_pkt *) pkt); | ||
} | ||
|
||
git_pkt_free((git_pkt *) pkt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed this does look like a double-free. May be harmless today -- I didn't look -- but this is a red flag.
len -= 9; | ||
|
||
if (len >= GIT_OID_SHA1_HEXSIZE) { | ||
git_oid__fromstr(&pkt->oid, line + 1, GIT_OID_SHA1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like in the shallow_pkt
case, we're blindly looking at line + 1
without ever validating that the character that we're skipping over is a space.
* | ||
* The default is 0, which means the flag is off. | ||
*/ | ||
int unshallow; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need depth
and unshallow
? It looks like unshallow
just implies a large depth
. 🙁
Thanks for looking into this, and making the necessary changes that only you could probably make :). The comments you've made are very helpful and we appreciate you taking the time to make them. My response to your key points:
No it's not needed, please remove the feature flag. We added it for our own purposes in order to be able to completely turn it off if needed (not happened yet!).
This was raised by @Qix- too. I think these options have leaked down from our own usage. I would be very happy with this change, although you'll have to document INT_MAX equating to unshallow? We read this page which states that 2147483647 has the special meaning of unshallow (seemingly it's not just a "large number").
I don't think it needs exposing so if it's obvious to you how to hide it please do. In future we might like to know what the shallow roots are but I certainly don't need that to be part of this change. You also mentioned the double free in your review too, that definitely needs fixing! |
Thanks @lrm29 - funny, when you put a number to I'll think on this but I'm excited that this is nearly there. Great work by everybody involved. (Btw, I don't know if you're in Cambridge but if so, we should grab a coffee and talk through how you're using libgit2.) |
I think it's a good idea to keep the change is small as possible, in which case ditch the "unshallow" option and just mention that setting depth to INT_MAX will do the job (it'll have to anyway). "unshallow" isn't that great anyway as it leaves you with the impression it'll properly convert your repository to a "full" one, but you need to also change the fetch spec too as it's still set to single branch.
I am, that would be good 👍. My email is in my github profile. |
support for shallow clones and fetches with `gitoxide` This PR makes it possible to enable shallow clones and fetches for git dependencies and crate indices independently with the `-Zgitoxide=fetch,shallow_deps` and `-Zgitoxide=fetch,shallow_index` respectively. ### Tasks * [x] setup the shallow option when fetching, differentiated by 'registry' and 'git-dependency' * [x] validate registries are cloned shallowly *and* fetched shallowly * [x] validate git-dependencies are cloned shallowly *and* fetched shallowly * [x] a test to show what happens if a shallow index is opened with `git2` (*it can open it and fetch like normal, no issues*) * [x] assure that `git2` can safely operate on a shallow clone - we unshallow it beforehand, both for registries and git dependencies * [x] assure git-deps with revisions are handled correctly (they should just not be shallow, and they should unshallow themselves if they are) * [x] make sure shallow index clones aren't seen by older cargo's * [x] make sure shallow git dependency clones aren't seen by older cargo's * [x] shallow.lock test and more test-suite runs with shallow clones enabled for everything * [x] release new version of `gix` with full shallow support and use it here * [x] check why `shallow` files remain after unshallowing. Should they not rather be deleted if empty? - Yes, `git` does so as well, implemented [with this commit](Byron/gitoxide@2cd5054) * ~~see if it can be avoided to ever unshallow an existing `-shallow` clone by using the right location from the start. If not, test that we can go `shallow->unshallow->shallow` without a hitch.~~ Cannot happen anymore as it can predict the final location perfectly. * [x] `Cargo.lock` files don't prevent shallow clones * [x] assure all other tests work with shallow cloning enabled (or fix the ones that don't with regression protection) * [x] can the 'split-brain' issue be solved for good? ### Review Notes * there is a chance of 'split brain' in git-dependencies as the logic for determining whether the clone/fetch is shallow is repeated in two places. This isn't the case for registries though. ### Notes * I am highlighting that this is the `gitoxide` version of shallow clones as the `git2` version [might soon be available](libgit2/libgit2#6396) as well. Having that would be good as it would ensure interoperability remains intact. * Maybe for when `git2` has been phased out, i.e. everything else is working, I think (unscientifically) there might be benefits in using worktrees for checkouts. Admittedly I don't know the history of why they weren't used in the first place. Also: `gitoxide` doesn't yet support local clones and might not have to if worktrees were used instead.
Shallow (#6396) with some fixes from review
This PR combines #4747 and #5254 to support:
If preferred, can be separated into support for local shallow clones and network.
This will be used in MATLAB to support users working with shallow clones, and a feature flag is added for the features.
Tests were added for all three features and so far, the features are functional.
To clone a shallow repository, set
clone_opts.fetch_opts.depth
to be a positive integer depth.To fetch unshallow a shallow repository, set
fetch_opts.unshallow = 1;
Any feedback is greatly appreciated!