Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shallow Clone Support #6396

Merged
merged 81 commits into from May 9, 2023
Merged

Conversation

lya001
Copy link
Contributor

@lya001 lya001 commented Aug 31, 2022

This PR combines #4747 and #5254 to support:

  1. Performing git operations on local shallow repositories
  2. Git cloning with a depth to obtain a shallow repository
  3. Performing git fetch unshallow on shallow repositories, to convert them into full clones

If preferred, can be separated into support for local shallow clones and network.

This will be used in MATLAB to support users working with shallow clones, and a feature flag is added for the features.

Tests were added for all three features and so far, the features are functional.

To clone a shallow repository, set clone_opts.fetch_opts.depth to be a positive integer depth.

To fetch unshallow a shallow repository, set fetch_opts.unshallow = 1;

Any feedback is greatly appreciated!

tiennou and others added 30 commits June 27, 2020 14:33
This represents (old-style) grafted commits, a.k.a an array of
overridden parents for a commit's OID.
This wires git_repository to open the .git/info/grafts file and load its
contents as git_commit_grafts objects.
In order to increase maintainability in the future, we should try to
make structures as self-contained and opaque to its users as possible.
Thus it is probably not a good idea to just typedef `git_graftmap` to
`git_oidmap`, as that will make it a lot harder in the future to extend
the API in the future, if this need ever arises.

Refactor the code to instead declare a real structure `git_grafts`,
which is completely opaque to its callers.
Instead of using the newly introduced `git_buf_foreach_line`, which
modifies the buffer itself, we should try to use our existing parsing
infrastructure in "parse.h".

Convert the grafts parsing code to make use of `git_parse_ctx`. Remove
the `git_buf_foreach_line` macro, as grafts have been its sole user.
Parsing of grafts files is currently contained in the repository code.
To make grafts-related logic more self-contained, move it into
"grafts.c" instead.
The shallow roots are in fact another user of the grafting mechanism,
and in essence they do use the same file format for grafted commits.
Thus, instead of hand-coding the parsing logic a second time, we can
just reuse the `git_grafts` structure for shallow commits, as well.
When loading shallow grafts, we add each of the grafting commits to the
grafts backed by ".git/info/grafts". Keeping track of both grafts
separately, but partially storing them in a common grafts structure is
likely to lead to inconsistencies. In fact, there already are
inconsistencies if refreshing shallow grafts at a later point, as we
only refresh the shallows, but not the normal grafts in that case.

Disentangle both grafting stores and instead check both separately when
parsing commits.
The refresh logic for both "normal" and shallow grafts are currently
part of the repository code and implemented twice. Unify them into the
grafts code by introducing two new functions to create grafts from a
file and to refresh a grafts structure.
If replacing an already existing graft in the grafts map, then we need
to free the previous `git_commit_graft` structure.
Currently, we expose the function `git_repository_shallow_roots` to get
all grafted roots of the repository. This already paints us into a
corner, though, as we certainly need to experiment with some
functionality of the grafting mechanism before we can happily expose
some of its functionality. Most importantly, we need to get right when
to refresh grafts and when not.

Thus, this commit removes the public function with no public
replacement. We should first try and see what usecases people come up
with to e.g. expose the `git_grafts` mechanism directly in the future
or do something different altogether. Instead, we provide an internal
interface to get weak pointers to the grafting structs part of the
repository itself.
lmcglash and others added 4 commits March 10, 2023 09:30
Co-authored-by: Qix <Qix-@users.noreply.github.com>
Co-authored-by: Qix <Qix-@users.noreply.github.com>
@lrm29
Copy link
Contributor

lrm29 commented Mar 10, 2023

I was able to "rebase" this PR on top of v1.6.2. Well, it's technically not a git rebase because I ended up hand-writing the code as Git merged parts of the code that I didn't want it to. Feature-wise it works well, just haven't rebased the test over and tested it yet, which I'm planning to do over the weekend. Shall I submit a new PR for the shallow clone support then?

I just merged main in and updated some issues flagged by Qix.

@Qix-
Copy link
Contributor

Qix- commented Mar 10, 2023

@lrm29 looks like we're back on track :) Any chance of a review now? Getting this merged finally would be incredible!

@shaybenh7
Copy link

hey guys, any progress here for shallow clone?

@Qix-
Copy link
Contributor

Qix- commented Apr 5, 2023

Ping @lrm29, any chance of a review?

@lrm29
Copy link
Contributor

lrm29 commented Apr 5, 2023

Ping @lrm29, any chance of a review?

I'm not an approving reviewer, and nor should I be :D
Note that two checks failed but I do not think they are related to this pull request, there seems to be a sporadic issue as they passed before and all I did was fix some typos.

@Qix-
Copy link
Contributor

Qix- commented Apr 5, 2023

Oops, my bad, pinged the wrong person 🙃 sorry for the noise. The failures seem transient or at least unrelated:

Perhaps I'll get the right ping this time :D @ethomson, perhaps you'd have time for a review now? :)

@ethomson
Copy link
Member

ethomson commented Apr 24, 2023

👋 I had a few minutes over the weekend, so I started taking a look at this. I'm starting by merging main into this branch to make sure that it queries the underlying oid type for the repository and/or remote. On the whole this looks really good and I'm looking forward to making good progress on it.

For visibility: this is my top libgit2 priority right now, but libgit2 is not my top priority right now. 😓

@ethomson
Copy link
Member

So! I was able to merge latest in and update the code, because a few things have changed since it was written, namely:

  1. We don't do assert anymore, we use GIT_ASSERT and friends.
  2. We don't assume SHA1, we look up the repo/remote oid type

I also made some changes that were unrelated to the merge:

  1. I don't see any utility in feature flagging this. This should just be in the product. Is this wrong?

I commented on all the lines to indicate my thinking - I didn't want to just change code, I also hoped to sort of explain the rationale.

Finally, I didn't make changes here, but I have some questions:

  1. Do we really need depth and unshallow? It looks like unshallow just implies a large depth. 🙁 Two options that are basically the same is not a great DX. I think that we should just have depth and let users pass INT_MAX to indicate unshallow.
  2. Do we really need to expose git_shallowarray? Even limiting it to sys, it's sort of not great. I need to think through the mechanisms here that users would ever see this.

I pushed up https://github.com/libgit2/libgit2/compare/ethomson/shallow?expand=1 with those changes. Review on my changes welcome, and if you have opinions on (1) and (2), that would be very helpful. 🙏

*
* The default is -1.
*/
int depth;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't default to -1, it should default to 0. We shouldn't have multiple values (0 and all negative values) that mean the same thing, let's just make 0 mean complete history and disallow negative values. We may want to use them for something meaningful in the future.

@@ -442,6 +450,11 @@ GIT_EXTERN(int) git_smart_subtransport_ssh(
git_transport *owner,
void *param);

GIT_EXTERN(size_t) git_shallowarray_count(git_shallowarray *array);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in transport.h? Does this really need to be public at all?

@@ -420,7 +420,9 @@ static int clone_into(

memcpy(&fetch_opts, opts, sizeof(git_fetch_options));
fetch_opts.update_fetchhead = 0;
fetch_opts.download_tags = GIT_REMOTE_DOWNLOAD_TAGS_ALL;

if (opts->depth <= 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't allow negative values, we should error on them.

@@ -169,31 +172,55 @@ int git_fetch_negotiate(git_remote *remote, const git_fetch_options *opts)

remote->need_pack = 0;

if (!opts)
remote->nego.depth = -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default shouldn't be -1, it should be 0.

if (filter_wants(remote, opts) < 0)
return -1;

/* Don't try to negotiate when we don't want anything */
if (!remote->need_pack)
return 0;

if (opts && opts->unshallow && opts->depth > 0) {
git_error_set(GIT_ERROR_INVALID, "options '--depth' and '--unshallow' cannot be used together");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--depth and --unshallow are command line options for git. libgit2 users should not be given error messages as if they were using git. They're not.

char oid[GIT_OID_SHA1_HEXSIZE];
git_str shallow_buf = GIT_STR_INIT;

git_oid_fmt(oid, git_shallowarray_get(wants->shallow_roots, i));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to cope with different oid types.


git_pkt_free((git_pkt *) pkt);

if (error < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we generally don't wrap single lines in { }.

git_pkt_free((git_pkt *) pkt);
}

git_pkt_free((git_pkt *) pkt);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this does look like a double-free. May be harmless today -- I didn't look -- but this is a red flag.

len -= 9;

if (len >= GIT_OID_SHA1_HEXSIZE) {
git_oid__fromstr(&pkt->oid, line + 1, GIT_OID_SHA1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like in the shallow_pkt case, we're blindly looking at line + 1 without ever validating that the character that we're skipping over is a space.

*
* The default is 0, which means the flag is off.
*/
int unshallow;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need depth and unshallow? It looks like unshallow just implies a large depth. 🙁

@lrm29
Copy link
Contributor

lrm29 commented Apr 25, 2023

Thanks for looking into this, and making the necessary changes that only you could probably make :). The comments you've made are very helpful and we appreciate you taking the time to make them. My response to your key points:

1. I don't see any utility in feature flagging this. This should just be in the product. Is this wrong?

No it's not needed, please remove the feature flag. We added it for our own purposes in order to be able to completely turn it off if needed (not happened yet!).

Finally, I didn't make changes here, but I have some questions:

1. Do we really need depth and unshallow? It looks like unshallow just implies a large depth. 🙁 Two options that are basically the same is not a great DX. I think that we should just have `depth` and let users pass `INT_MAX` to indicate unshallow.

This was raised by @Qix- too. I think these options have leaked down from our own usage. I would be very happy with this change, although you'll have to document INT_MAX equating to unshallow? We read this page which states that 2147483647 has the special meaning of unshallow (seemingly it's not just a "large number").

2. Do we really need to expose `git_shallowarray`? Even limiting it to `sys`, it's sort of not great. I need to think through the mechanisms here that users would ever see this.

I don't think it needs exposing so if it's obvious to you how to hide it please do. In future we might like to know what the shallow roots are but I certainly don't need that to be part of this change.

You also mentioned the double free in your review too, that definitely needs fixing!

@ethomson
Copy link
Member

Thanks @lrm29 - funny, when you put a number to INT_MAX, it feels less great. 😓

I'll think on this but I'm excited that this is nearly there. Great work by everybody involved.

(Btw, I don't know if you're in Cambridge but if so, we should grab a coffee and talk through how you're using libgit2.)

@lrm29
Copy link
Contributor

lrm29 commented Apr 27, 2023

Thanks @lrm29 - funny, when you put a number to INT_MAX, it feels less great. 😓

I think it's a good idea to keep the change is small as possible, in which case ditch the "unshallow" option and just mention that setting depth to INT_MAX will do the job (it'll have to anyway). "unshallow" isn't that great anyway as it leaves you with the impression it'll properly convert your repository to a "full" one, but you need to also change the fetch spec too as it's still set to single branch.

(Btw, I don't know if you're in Cambridge but if so, we should grab a coffee and talk through how you're using libgit2.)

I am, that would be good 👍. My email is in my github profile.

bors added a commit to rust-lang/cargo that referenced this pull request May 3, 2023
support for shallow clones and fetches with `gitoxide`

This PR makes it possible to enable shallow clones and fetches for git dependencies and crate indices independently with the `-Zgitoxide=fetch,shallow_deps` and `-Zgitoxide=fetch,shallow_index` respectively.

### Tasks

* [x] setup the shallow option when fetching, differentiated by 'registry' and 'git-dependency'
* [x] validate registries are cloned shallowly *and* fetched shallowly
* [x] validate git-dependencies are cloned shallowly *and* fetched shallowly
* [x] a test to show what happens if a shallow index is opened with `git2` (*it can open it and fetch like normal, no issues*)
* [x] assure that `git2` can safely operate on a shallow clone - we unshallow it beforehand, both for registries and git dependencies
* [x] assure git-deps with revisions are handled correctly (they should just not be shallow, and they should unshallow themselves if they are)
* [x] make sure shallow index clones aren't seen by older cargo's
* [x] make sure shallow git dependency clones aren't seen by older cargo's
* [x] shallow.lock test and more test-suite runs with shallow clones enabled for everything
* [x] release new version of `gix` with full shallow support and use it here
* [x] check why `shallow` files remain after unshallowing. Should they not rather be deleted if empty? - Yes, `git` does so as well, implemented [with this commit](Byron/gitoxide@2cd5054)
* ~~see if it can be avoided to ever unshallow an existing `-shallow` clone by using the right location from the start. If not, test that we can go `shallow->unshallow->shallow` without a hitch.~~ Cannot happen anymore as it can predict the final location perfectly.
* [x] `Cargo.lock` files don't prevent shallow clones
* [x] assure all other tests work with shallow cloning enabled (or fix the ones that don't with regression protection)
* [x] can the 'split-brain' issue be solved for good?

### Review Notes

* there is a chance of 'split brain' in git-dependencies  as the logic for determining whether the clone/fetch is shallow is repeated in two places. This isn't the case for registries though.

### Notes

* I am highlighting that this is the `gitoxide` version of shallow clones as the `git2` version [might soon be available](libgit2/libgit2#6396) as well. Having that would be good as it would ensure interoperability remains intact.
* Maybe for when `git2` has been phased out, i.e. everything else is working, I think (unscientifically) there might be benefits  in using worktrees for checkouts. Admittedly I don't know the history of why they weren't used in the first place. Also: `gitoxide` doesn't yet support local clones and might not have to if worktrees were used instead.
ethomson added a commit that referenced this pull request May 9, 2023
Shallow (#6396) with some fixes from review
@ethomson ethomson merged commit 190a4c5 into libgit2:main May 9, 2023
24 of 26 checks passed
@Qix-
Copy link
Contributor

Qix- commented May 9, 2023

Amazing, been a long time coming. Really happy to see this land! Thanks everyone!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants