Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse checkout support #2263

Open
jspahrsummers opened this issue Apr 11, 2014 · 28 comments
Open

Sparse checkout support #2263

jspahrsummers opened this issue Apr 11, 2014 · 28 comments
Labels

Comments

@jspahrsummers
Copy link
Contributor

A user can set core.sparsecheckout to true, then populate .git/info/sparse-checkout with the specific subtrees that should be checked out. See http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/ for a basic explanation.

I don't know whether it makes sense for libgit2 to support this in its own checkout operations, but sparse checkouts should be taken into consideration for diffs and status (in that omitted subtrees should not appear).

@calavera
Copy link
Contributor

I understand for the lack of traffic here that this is not in any roadmap, but I thought I could ask again. @ethomson, @carlosmn is there any possibility that this can be implemented? I don't really understand the whole scope of this request, so "no" is a very valid answer.

@carlosmn
Copy link
Member

carlosmn commented Jul 1, 2016

It is highly unlikely that either of us will work on this. Maybe @johnhaley81 is interested in this since he works on a client-side app.

@johnhaley81
Copy link
Contributor

It won't be anytime soon but that is something that I'm looking to tackle eventually :)

@jlindner
Copy link

Is this any update on this functionality possibly making it into libgit2? This has pretty much become a show stopper for us to use git since we have massive mono repos, but only need to perform diffs of certain directories. Currently, this lack of support has kept us from moving everything from Subversion to git. Since this was talked about a couple of years ago, I was hoping somebody would maybe have an update on this.

Thanks

@ethomson
Copy link
Member

@tiennou has done some work here, but this is an insanely complex topic and remains unimplemented.

@tiennou
Copy link
Contributor

tiennou commented Jul 19, 2018

I think you're mismatching with shallow support ? At least I don't remember going anywhere deep down checkout yet 😉.

From a cursory look at it, sparse checkouts ought to be easier to implement than shallow is (there seem to be no transport-level/pkt-protocol changes needed, as that the repo is still downloaded in full).

I wonder though, doesn't diff provide a pathspec limiter already ? Am I missing something for your use case @jlindner ?

@ethomson
Copy link
Member

Yes I was conflating the two; I read this issue too quickly and thought it was about shallow not sparse. Sorry!

@jlindner
Copy link

Yeah. I definetly need the sparseCheckout ability I believe. Basically, I have about 1500 directories(each directory contains a c++ project) in one git repo. I have a need to extract one directory(one project), without the rest of the directories(projects), and be able to get data from it. I know I can use the git command line to get to this data, but I was hoping that I could do it from the libgit2 functionality, without shelling out to the command line.

@jlindner
Copy link

With the diff, which I haven't looked into that much, will it allow me to extract from the git repo, a specific directory, and a specific commit? I am not needing to diff, but if that has the ability to get that out where I could access the files, that might work.

Basically, what is happening is for a directory/commit, I extract the files in that location, and run some scanners on it making sure they don't contain security flaws and a few other checks. Since the user gives me the directory and commit, for performance reasons, I would like to just get that out, and nothing else. Cloning the whole git repository locally isn't great either, as it is pretty good sized.

@tiennou
Copy link
Contributor

tiennou commented Jul 19, 2018

Sorry, I'm still unclear on what you need, and your clarifications seems to imply you really need the files to exist in the filesystem at some point… If that's the case, that's sparse checkout. If that's not, you might be able to get to the "contents" of the directory/files by looking up the commit, traverse its tree to get at the "project" path, and you'll have another tree whose contents are accessible (in memory at least, if your scanner supports that).

Another caveat is that all this will require a git repository "locally" (though what you meant eludes me), ie. libgit2 will need to open a "clone" of your repository, either the canonical one (because your application is "local" to the machine that has the repository), or a brand new one (ie. clone it).

@jlindner
Copy link

Yes, what I am needing is the ability to get one directory out of github, without bringing the whole repository down locally first. I am not wanting to have a local copy of the repository, as that would require large amounts of disk space. I am basically looking for the the git equivalent of SVN export on a specific directory. Basically, I don't need the git repo stuff, I need the files in that location, and only that location, so I can run scanners on them. From my understanding, to do so in git, you would use the sparse checkout. If there is another way to get that specific directory out, I am all for it, but I haven't been able to find a way to do that.

With the command line git.exe, I have been able to perform this exact task(but I think it creates repo locally, it just doesn't have anything in that repo except this one directory. That is acceptible to my use case.

I am just looking for the best way, to get one directory out of a git repo, so I can perform actions on our files that we have committed.

@jlindner
Copy link

Do you know a better way to accomplish what I am needing to do? Do you think the sparseCheckout is what I need, or do you know of a better way to basically export one directory from a larger github repository. I am open to any ideas as to the best way to get the files out. I don't need a checkout/clone, I just need the actual files that I have checked in.

@ethomson
Copy link
Member

With the command line git.exe, I have been able to perform this exact task(but I think it creates repo locally, it just doesn't have anything in that repo except this one directory. That is acceptible to my use case.

If you're doing a sparse checkout, you are indeed downloading the entire repository, you're just limiting what's checked out. You can emulate the checkout part of this by specifying paths to the checkout command. (However, this is not true sparse checkout support, as we do not set up the sparse checkout file). It sounds like this is not burdensome for your workflows.

But this does download more than what you're checking out. If you want individual files, this is not part of the git protocol (at the moment), so you will need to use the API for the hosting provider you are using, whether that's GitHub, Bitbucket, GitLab, VSTS, etc.

@jlindner
Copy link

Using the sparsecheckout from git.exe, I have been able to get just one directory out, the size of the entire repo is about 6GB currently, with several thousand projects. The local repo created for the sparsecheckout to work, ends up being about 115k, so it is not bringing down the entire repo, it just brings down that one directory, checked out. It does create the .git folders, which I can live with, because I can just delete them, but it isn't bringing down the entire repo, and taking up 6GB of disk space. I was hoping for an API that would be able to perform this action, since I am actually doing this task from a rest web service that the user calls to perform the action. Shelling out the command line in a web service may not be the worlds smartest idea, but if there is no other way around it, that may be my only option since none of the APIs seem to support this task.

@ethomson
Copy link
Member

It does create the .git folders, which I can live with, because I can just delete them, but it isn't bringing down the entire repo

That is the repo.

I’m confused. What is the size of the .git directory with git? And with libgit2?

@tiennou
Copy link
Contributor

tiennou commented Jul 23, 2018

Can you provide the git commands you're executing ? That would clarify a lot, as right now unless you're running those commands on the same machine the repo is in (triggering an "optimized" local clone), you're likely going through the git protocol, which has no support for fetching only some specific paths.

@jlindner
Copy link

After doing more testing with the sparseCheckout, I have figured out that it isn't going to work for what I need. I was testing the sparseCheckout on a little repository, with only a few directories in it. It seemed really fast, and I wasn't looking at the logs of the action close enough, to realize that is was in fact cloning the entire repo locally, before it ran the sparseCheckout functionality to give me just the files I was wanting. As soon as I pointed it at the real repository, you are correct, it clones the whole thing locally first, and then sets the directory to only what I wanted. Sorry for the confusion. I don't think this is going to work for what I was needing, and we are going to need to take a different approach on the repo.

Thanks for all your time. Keep up the good work.

James

@ethomson
Copy link
Member

Thanks for the confirmation, @jlindner. What you've experienced aligns with my expectations.

Work on "narrow" or "partial" clone support in git is continuing but this is in-progress and not something that libgit2 supports yet. I would encourage you to check out the appropriate API for your hosting provider in the meantime.

Thanks!

@mathomp4
Copy link

mathomp4 commented Jul 22, 2019

Hello all. The linked issue above is from me. The project I'm working on uses sparse checkout and apparently powerlevel10k prompt doesn't handle sparse checkout because libgit2 doesn't handle it.

Note if someone has time and wants to see what the repos I'm using look like, it should be fairly simple to get the code checked out.

@mathomp4
Copy link

As an aside, it looks like git itself has been focusing on sparse checkout:

https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/
https://github.blog/2020-03-22-highlights-from-git-2-26/#updates-to-git-sparse-checkout

So hopefully that will make it more common and might get sparse-checkout support in libgit2 up the priority tree.

@ghost
Copy link

ghost commented Oct 19, 2020

Is it possible to use
git pull --depth=1 origin master
with
git fetch --filter=sparse:oid=master:shiny-app/.gitfilterspec origin
As a workaround for sparse checkout?

https://docs.gitlab.com/ee/topics/git/partial_clone.html
https://stackoverflow.com/questions/600079/git-how-do-i-clone-a-subdirectory-only-of-a-git-repository/28039894#28039894

@jairbubbles
Copy link
Contributor

Looks like there is a PR on the subject: #5833

@zentron
Copy link

zentron commented May 19, 2022

PR #5833 and #6169 both look for provide a workable solution to sparse checkouts. Is there anything further waiting for these to progress and merge?

@Nikitae57
Copy link

There're 2 PR ready to fix this issue. C'mon, it's been 8 years

@ethomson
Copy link
Member

@Nikitae57 have you tested them? Do they work? What sort of problems do they still have? Where are the performance issues?

@mathomp4
Copy link

I am thinking that #6394 by @YuKitsune is probably the PR to focus on. I think it's a successor to #5833 by @jochenhz?

@shaybenh7
Copy link

hey any news here? this thread is already 9 years old! do you plan on supporting sparse checkout?

@wtfacoconut
Copy link

Raising my hand to keep this issue alive. Sparse checkout is indeed a much needed capability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests