New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse checkout support #2263
Comments
It is highly unlikely that either of us will work on this. Maybe @johnhaley81 is interested in this since he works on a client-side app. |
It won't be anytime soon but that is something that I'm looking to tackle eventually :) |
Is this any update on this functionality possibly making it into libgit2? This has pretty much become a show stopper for us to use git since we have massive mono repos, but only need to perform diffs of certain directories. Currently, this lack of support has kept us from moving everything from Subversion to git. Since this was talked about a couple of years ago, I was hoping somebody would maybe have an update on this. Thanks |
@tiennou has done some work here, but this is an insanely complex topic and remains unimplemented. |
I think you're mismatching with shallow support ? At least I don't remember going anywhere deep down checkout yet 😉. From a cursory look at it, sparse checkouts ought to be easier to implement than shallow is (there seem to be no transport-level/pkt-protocol changes needed, as that the repo is still downloaded in full). I wonder though, doesn't diff provide a pathspec limiter already ? Am I missing something for your use case @jlindner ? |
Yes I was conflating the two; I read this issue too quickly and thought it was about shallow not sparse. Sorry! |
Yeah. I definetly need the sparseCheckout ability I believe. Basically, I have about 1500 directories(each directory contains a c++ project) in one git repo. I have a need to extract one directory(one project), without the rest of the directories(projects), and be able to get data from it. I know I can use the git command line to get to this data, but I was hoping that I could do it from the libgit2 functionality, without shelling out to the command line. |
With the diff, which I haven't looked into that much, will it allow me to extract from the git repo, a specific directory, and a specific commit? I am not needing to diff, but if that has the ability to get that out where I could access the files, that might work. Basically, what is happening is for a directory/commit, I extract the files in that location, and run some scanners on it making sure they don't contain security flaws and a few other checks. Since the user gives me the directory and commit, for performance reasons, I would like to just get that out, and nothing else. Cloning the whole git repository locally isn't great either, as it is pretty good sized. |
Sorry, I'm still unclear on what you need, and your clarifications seems to imply you really need the files to exist in the filesystem at some point… If that's the case, that's sparse checkout. If that's not, you might be able to get to the "contents" of the directory/files by looking up the commit, traverse its tree to get at the "project" path, and you'll have another tree whose contents are accessible (in memory at least, if your scanner supports that). Another caveat is that all this will require a git repository "locally" (though what you meant eludes me), ie. libgit2 will need to open a "clone" of your repository, either the canonical one (because your application is "local" to the machine that has the repository), or a brand new one (ie. clone it). |
Yes, what I am needing is the ability to get one directory out of github, without bringing the whole repository down locally first. I am not wanting to have a local copy of the repository, as that would require large amounts of disk space. I am basically looking for the the git equivalent of SVN export on a specific directory. Basically, I don't need the git repo stuff, I need the files in that location, and only that location, so I can run scanners on them. From my understanding, to do so in git, you would use the sparse checkout. If there is another way to get that specific directory out, I am all for it, but I haven't been able to find a way to do that. With the command line git.exe, I have been able to perform this exact task(but I think it creates repo locally, it just doesn't have anything in that repo except this one directory. That is acceptible to my use case. I am just looking for the best way, to get one directory out of a git repo, so I can perform actions on our files that we have committed. |
Do you know a better way to accomplish what I am needing to do? Do you think the sparseCheckout is what I need, or do you know of a better way to basically export one directory from a larger github repository. I am open to any ideas as to the best way to get the files out. I don't need a checkout/clone, I just need the actual files that I have checked in. |
If you're doing a sparse checkout, you are indeed downloading the entire repository, you're just limiting what's checked out. You can emulate the checkout part of this by specifying But this does download more than what you're checking out. If you want individual files, this is not part of the git protocol (at the moment), so you will need to use the API for the hosting provider you are using, whether that's GitHub, Bitbucket, GitLab, VSTS, etc. |
Using the sparsecheckout from git.exe, I have been able to get just one directory out, the size of the entire repo is about 6GB currently, with several thousand projects. The local repo created for the sparsecheckout to work, ends up being about 115k, so it is not bringing down the entire repo, it just brings down that one directory, checked out. It does create the .git folders, which I can live with, because I can just delete them, but it isn't bringing down the entire repo, and taking up 6GB of disk space. I was hoping for an API that would be able to perform this action, since I am actually doing this task from a rest web service that the user calls to perform the action. Shelling out the command line in a web service may not be the worlds smartest idea, but if there is no other way around it, that may be my only option since none of the APIs seem to support this task. |
That is the repo. I’m confused. What is the size of the |
Can you provide the git commands you're executing ? That would clarify a lot, as right now unless you're running those commands on the same machine the repo is in (triggering an "optimized" local clone), you're likely going through the git protocol, which has no support for fetching only some specific paths. |
After doing more testing with the sparseCheckout, I have figured out that it isn't going to work for what I need. I was testing the sparseCheckout on a little repository, with only a few directories in it. It seemed really fast, and I wasn't looking at the logs of the action close enough, to realize that is was in fact cloning the entire repo locally, before it ran the sparseCheckout functionality to give me just the files I was wanting. As soon as I pointed it at the real repository, you are correct, it clones the whole thing locally first, and then sets the directory to only what I wanted. Sorry for the confusion. I don't think this is going to work for what I was needing, and we are going to need to take a different approach on the repo. Thanks for all your time. Keep up the good work. James |
Thanks for the confirmation, @jlindner. What you've experienced aligns with my expectations. Work on "narrow" or "partial" clone support in git is continuing but this is in-progress and not something that libgit2 supports yet. I would encourage you to check out the appropriate API for your hosting provider in the meantime. Thanks! |
Hello all. The linked issue above is from me. The project I'm working on uses sparse checkout and apparently powerlevel10k prompt doesn't handle sparse checkout because libgit2 doesn't handle it. Note if someone has time and wants to see what the repos I'm using look like, it should be fairly simple to get the code checked out. |
As an aside, it looks like git itself has been focusing on sparse checkout: https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/ So hopefully that will make it more common and might get sparse-checkout support in libgit2 up the priority tree. |
Is it possible to use https://docs.gitlab.com/ee/topics/git/partial_clone.html |
Looks like there is a PR on the subject: #5833 |
There're 2 PR ready to fix this issue. C'mon, it's been 8 years |
@Nikitae57 have you tested them? Do they work? What sort of problems do they still have? Where are the performance issues? |
I am thinking that #6394 by @YuKitsune is probably the PR to focus on. I think it's a successor to #5833 by @jochenhz? |
hey any news here? this thread is already 9 years old! do you plan on supporting sparse checkout? |
Raising my hand to keep this issue alive. Sparse checkout is indeed a much needed capability. |
A user can set
core.sparsecheckout
totrue
, then populate.git/info/sparse-checkout
with the specific subtrees that should be checked out. See http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/ for a basic explanation.I don't know whether it makes sense for libgit2 to support this in its own checkout operations, but sparse checkouts should be taken into consideration for diffs and status (in that omitted subtrees should not appear).
The text was updated successfully, but these errors were encountered: