Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: faster blobless git clones #290

Merged
merged 1 commit into from
Apr 4, 2022

Conversation

yajo
Copy link
Contributor

@yajo yajo commented Feb 11, 2022

Cloning a repo with git can be really slow if the repo is big.

Adding --filter=blob:none to the git clone command will make it really faster and slimmer because it will lazy-download blobs on checkout, while only getting commit metadata for whatever is not checked out.

See "Blobless Clones" in https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/ for a good explanation.

Also interesting to see that pip 21.3 itself landed this enhancement: pypa/pip#9086.

@moduon MT-83

@radoering
Copy link
Member

I wonder if even a shallow fetch with depth 1 would be sufficient? This could result in an even bigger performance boost. There is a follow up post with some performance measurements: https://github.blog/2020-12-22-git-clone-a-data-driven-study-on-cloning-behaviors/

In my understanding (which may be wrong), poetry always does a clone, followed by a checkout of a specific reference. checkout is never used several times with different references on an existing repo, is it?

Of course, using shallow fetch would be a more significant change compared to blobless clone. clone and checkout could not be considered separately anymore. The (simplified) sequence of git commands should probably look like this (in order to maximize the performance benefit):

git init
git remote add origin <repo-url>
git fetch --depth 1 origin <ref>
git checkout FETCH_HEAD

Probably, shallow fetch should not replace the existing clone and checkout methods of the Git class anyway but be an additional option which could be used by poetry but may not be used by other downstream projects. Thus, it could be considered in a separate PR.

@yajo
Copy link
Contributor Author

yajo commented Feb 14, 2022

It could work. And it could be better. However, that would be more drastic. For example, if poetry kept a real cache, it would be more useful with a blobless clone than with a shallow clone.

This one enhancement is good and means practically no difference in current behavior. Actually, if all tests go green, is enough prove that this won't change anything deeply important.

I think that can be considered as a separate future enhancement, but I don't see that as a blocker for this enhancement.

OTOH, neither of these fix the problem explained in python-poetry/poetry#5188 (quadruplicated clones). Again, not a blocker for this enhancement IMHO.

@radoering
Copy link
Member

In the linked pip PR, the git version is checked. Maybe, we should also check the version.

@yajo
Copy link
Contributor Author

yajo commented Feb 14, 2022

What is the lowest version supported here for git?

@radoering
Copy link
Member

Probably, this is not specified anywhere.

@yajo
Copy link
Contributor Author

yajo commented Feb 17, 2022

Version checked.

@sonarcloud
Copy link

sonarcloud bot commented Feb 17, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

neersighted
neersighted previously approved these changes Feb 28, 2022
@radoering
Copy link
Member

@yajo Can you please rebase your branch in order to allow merging? Probably, you have to adapt the type hints.

Cloning a repo with git can be really slow if the repo is big.

Adding `--filter=blob:none` to the `git clone` command will make it really faster and slimmer because it will lazy-download blobs on checkout, while only getting commit metadata for whatever is not checked out.

See "Blobless Clones" in https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/ for a good explanation.

Also interesting to see that pip 21.3 itself landed this enhancement: pypa/pip#9086.

@moduon MT-83
@yajo
Copy link
Contributor Author

yajo commented Apr 4, 2022

I think it should be good now.

@sonarcloud
Copy link

sonarcloud bot commented Apr 4, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.6% 0.6% Duplication

@radoering radoering merged commit f405311 into python-poetry:master Apr 4, 2022
@yajo yajo deleted the git-clone-blobless branch April 5, 2022 14:06
@finswimmer finswimmer mentioned this pull request May 20, 2022
bostonrwalker pushed a commit to bostonrwalker/poetry-core that referenced this pull request Aug 29, 2022
Cloning a repo with git can be really slow if the repo is big.

Adding `--filter=blob:none` to the `git clone` command will make it really faster and slimmer because it will lazy-download blobs on checkout, while only getting commit metadata for whatever is not checked out.
DavidVujic pushed a commit to DavidVujic/poetry-core that referenced this pull request Aug 31, 2022
Cloning a repo with git can be really slow if the repo is big.

Adding `--filter=blob:none` to the `git clone` command will make it really faster and slimmer because it will lazy-download blobs on checkout, while only getting commit metadata for whatever is not checked out.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants