git checkout could/should be done with --depth=1 #2432

anguslees · 2015-02-20T06:25:06Z

pip can install directly from git, great!

When pip installs from git, it downloads the entire history of the git branch - when all it wants is the target version. I believe this can be achieved using the --depth=1 flag to various git subcommands. The speedup on projects with a long history can be significant.

The text was updated successfully, but these errors were encountered:

cancan101 · 2015-02-23T21:43:54Z

See: #344

anguslees · 2015-02-23T22:58:29Z

@cancan101: thanks for the pointer, I assumed it had to be already discussed somewhere.

That PR was withdrawn, with what looks like some confusion as to why - it seems an obvious improvement to that PR would have been to use --depth=1 more selectively, or with some other variation of git command line.

In most uses of pip's git backend, we don't need the full git history - only the target head/branch/tag. This change adds --depth=1 to git fetch and clone commands. NB: "Shallow" checkouts are not supported from arbitrary git commits, so this change removes the ability to install from git SHA1 commit IDs. Tags and branches continue to be supported. Fixes pypa#2432

expect_stderr was added to a few tests due to the following messages: - *warning: --depth is ignored in local clones; use file:// instead.* - *You are in 'detached HEAD' state...* (This is to be expected when the revision shallow-cloned is a tag. As the git-clone manpage says "--branch can also take tags and detaches the HEAD at that commit in the resulting repository.")

dennybaa · 2016-01-22T15:40:13Z

yeah seriously it's not fun... +1 to shallow clone!
Any reason pip grabs the whole git tree!?

RonnyPfannschmidt · 2016-01-22T16:24:32Z

This has potential to break setuptools-scm

dennybaa · 2016-01-23T07:56:03Z

potentially) so why not give it a shot, test it with setuptools-scm... pip is already 8, but such an obvious optimization hasn't been included yet...

drocco007 · 2016-02-10T14:15:24Z

+1

Even just having an option or setting to do this would be a huge help.

RonnyPfannschmidt · 2016-02-10T15:15:36Z

@dennybaa a change such as this pretty much completely breaks git describe thus breaking setuptools_scm for certain, which means sucha change would break git for pretty much all packages that use scm metadata on their own or via pbr/setuptools_scm

ryneeverett · 2016-02-10T22:17:12Z

@RonnyPfannschmidt Couldn't we just add a git fetch to setuptools_scm?

RonnyPfannschmidt · 2016-02-10T22:30:13Z

there is a very intense difference between asking for scm metadata that is local and orchestrating network operations

my choice of words intentionally makes this issue seem larger

ryneeverett · 2016-02-10T22:46:35Z

@RonnyPfannschmidt Sure, but that's something that only needs to happen once, so more like:

if $(rev-list HEAD --count) == 1:
    git fetch

My assumption is that setuptools_scm is called when the package using it is installed. (Is this incorrect?) So if you're using setuptools_scm the behavior would be the same as deep cloning it to begin with; the fetch would occur when network activity is already required.

RonnyPfannschmidt · 2016-02-11T07:14:38Z

From my POV a opt in for shallow clone is just as hard and let's us avoid getting into causing network operations in setuptools-scm

Workaround for pypa/pip#2432.

mrmachine · 2016-08-16T01:09:28Z

@RonnyPfannschmidt if the specific version being shallow cloned was tagged, would setuptools_scm still work? So it would only break if you git install an untagged commit?

dusenberrymw · 2016-09-13T19:16:08Z

This would definitely be useful as an opt-in flag, such as depth=1.

ryneeverett · 2016-09-13T19:38:53Z

I'd much rather see an opt-out flag. The use cases for opting out are narrow and automated, so why should humans have to pass a flag for optimal behaviour?

RonnyPfannschmidt · 2016-09-13T20:20:51Z

its completely broken behaviour for a growing number of my projects ^^

in future even ones like pytest

ryneeverett · 2016-09-13T22:36:26Z

@RonnyPfannschmidt Why would pytest access the git history of a dependency?

RonnyPfannschmidt · 2016-09-14T08:45:40Z

There is a plan to use setuptools-scm for release automation

ryneeverett · 2016-09-14T14:07:49Z

I'm all aboard on supporting setuptools-scm. It seems like the ideal solution would be to clone just the commit history but git doesn't seem to have a way to do that yet. In the meantime, what if pip just did a deep clone if use_scm_version == True?

RonnyPfannschmidt · 2016-09-14T14:30:28Z

there is no acceptable clean way to do that atm, and the peps that enable it would need time to grow before pip an implement them

ryneeverett · 2016-09-14T14:34:28Z

It might be faster to get a patch merged and released in git than wait for the peps.

RonnyPfannschmidt · 2016-09-14T14:37:23Z

feel free to try then

ryneeverett · 2016-09-14T18:18:30Z

To sum up the maintainers' position:

We cannot change implementation details on which setuptools-scm depends.
We cannot explicitly support setuptools-scm without a pep.
We would rather add a crippled* cli flag (which would have to be maintained indefinitely for backwards compatibility) than an if-statement (which would only trigger an optimization and could thus be removed at any time).

* Crippled because it would be in the hands of users to make sure that their dependencies don't use setuptools-scm when using the flag and one would have to track setuptools-scm dependencies in a separate requirements file.

dennybaa · 2016-09-14T18:21:16Z

true ^^

dstufft · 2017-04-01T02:44:10Z

Why being so severely "conservative"? The first group is already "safe" everything works, but why totally ignoring another group, for who you might just make a cli switch?

@dstufft Who does benefit from your approach?

The underlying question here is, what is the fundamental cost of adding an option? One of the foundational pieces of writing for my view point on this comes from Havoc Pennington's Choosing our Preferences which is speaking specifically about GUI applications but they can extend to any kind of tool as well.

To go into some more detail though, every option/preference we add has a cost. It costs increased complexity, because every option adds another state that your software can be in, and the more of them you have the more total states your software can be (and this explodes combinatorially for each option you have). It is not unusual at all for software to fail only when multiple specific options are selected because the way the different options interplayed with each other wasn't fully obvious at first. This makes it harder to actually test the software fully because you can quickly get to a point where you have thousands of different possible ways to configure the software.

Different options (and features in general) also mean more code, and that additional code needs maintenance over time. Thus for each option, we're increasing the burden of maintaining pip (because each option adds additional code).

Another aspect of this is, is since this is something that end users would turn on that individual projects would have no control over, is that this flag would essentially be a "break me" flag for a subset of projects that expect to be able to get a full git version history when they are in a git tree. Thus not only are we adding another option that we have to support and which adds to our combinatorial list of possible states, but now we're also forcing those downstream projects to also deal with that fact (and no matter what, they will get people asking them to support that if pip supports it).

Beyond it's direct effect on us though, is the cognitive overhead it places on end users. There is a concept called Hick's Law, which essentially states that the time it takes for a user to make a decision increases logarithmically for every possible option they have in making that decision. That means that for every option we add, we're making it fundamentally harder for someone to use pip, because when they do a pip install -h or look at our documentation, they are presented with more choices, which increases the cognitive burden of trying to make pip do something they want to do.

If you do any research on some of the fundamentals around designing a good user experience, particularly around options/preferences, one of the largest common themes you'll find is that deciphering when to say Yes and when to say No to a feature is one of the single most important things you can do for the over all user experience of a piece of software... and failing to do that correctly can utterly destroy the usability of your software.

Now, with all of that you obviously can't have software with zero options (at least, most software is not the kind of software that is small enough to have zero options) but trying to weigh the impact of adding an option is part of managing a project. In this case it is my belief that the downsides of adding the option in question outweigh the upsides nor do I think the upsides outweigh the upsides of just always doing this. I'm not the only maintainer, so it is entirely possible that one of the other @pypa/pip-committers feels differently to me here (and if they do, they are free to reopen this issue!) but unless someone comes up upsides to this that I have missed (thereby increasing the amount of good it would do) or they come up with a plan (as @pfmoore) to decrease or remove the downsides, I personally am -1 on this.

dennybaa · 2017-04-01T15:12:07Z

@dstufft The said above sounds very theoretical, but though might be true.

Another aspect of this is, is since this is something that end users would turn on that individual projects would have no control over, is that this flag would essentially be a "break me" flag for a subset of projects that expect to be able to get a full git version history when they are in a git tree. Thus not only are we adding another option that we have to support and which adds to our combinatorial list of possible states, but now we're also forcing those downstream projects to also deal with that fact (and no matter what, they will get people asking them to support that if pip supports it).

Don't use --depth=x if you expect a full git history. If you do don't cry, you broke your project YOURSELF that's not the fault of pip maintainers or anybody else. Or don't use projects which you don't know how they work...

IMO, as pip maintainers:

you can never have 100% guaranty of dumb users.
you can never fully cover all compatibility aspects in tests.
you should not try to protect the fate of downstream projects (especially those which are "non standard conforming"), let these projects live and evolve themselves .

Besides if things don't break they don't evolve, that's might be even worse for the pip project than that what you describe. But anyways, we all have our opinions. It's just my personal, and I can not insist on anything, just shared to you guys.

ryneeverett · 2017-04-01T18:01:35Z

@dennybaa I want this solved as much as anyone. I even wrote a patch implementing the --depth flag. I am now opposed to the flag.

I think you'd be surprised at how many packages a flag would break, and I disagree that we can only progress by breaking things. There have been several proposed solutions in this thread that do not involve fragmenting the python packaging ecosystem. I suggest you pursue one of them instead.

you can never have 100% guaranty of dumb users.

Are you really a dumb user if you don't realize that one of your indirect dependencies uses one of the popular auto-versioning tools?

pfmoore · 2017-04-01T19:06:55Z

@dennybaa Let's just say that the pip maintainers don't agree with how you suggest we treat users confused by an option we add to pip. So we try to avoid adding options that will make using pip a frustrating and confusing experience for our users, no matter what level of experience they have (you may describe them as "dumb", I don't take that view).

dennybaa · 2017-04-01T22:19:52Z

Maybe there's misunderstanding. We've got different opinions.

@ryneeverett , @pfmoore I din't call anyone who uses projects providing auto-versioning to python packaging ecosystem "dumb". There was just a list of obvious notions which are my sole opinion. Anyways my bad if you might read it like that...

Are you really a dumb user if you don't realize that one of your indirect dependencies uses one of the popular auto-versioning tools?

I was saying that such projects SHOULD document their behavioral specificity rather than pip taking this burden for all. That's why there was number 3) which I mentioned.

pradyunsg · 2017-12-28T11:02:15Z

My thought was that as the frontend, pip's job soon will be to just fetch the right requirements and call the backend (to put it dumbly). This would sort of make it more efficient at one specific case.

At the end of the day, this is really more of nice-to-have thing. I find this to be good enhancement but also understand that this is a potentially breaking change. I agree with what @pfmoore suggests in #2432 (comment) though so, I'm gonna let this sit. Even I don't like the idea of a flag for it.

ghost · 2017-12-28T17:59:02Z

How about rolling this into PEP 518? pip could fetch a shallow clone by default and then deepen it (before calling the build system) if pyproject.toml contained a key that said requires_deep or if there was no pyproject.toml.

pradyunsg · 2017-12-28T18:03:02Z

I mean, it sort of makes sense to me but I am wary of like what happened with PEP 426 - we don't want to try and do everything as if we have infinite resources.

ghost · 2017-12-28T18:26:10Z

We have a chance to correct build backend behavior in a backwards-compatible way. We should take it.

It's clear from this issue that most people (in this issue) want the default to be shallow clones. The reasons for not implementing this are:

Some participants don't think that in principle pip should do a shallow clone.
There is an excuse given that pip doesn't have the resources to implement it.

The second point doesn't seem accurate; from experience the problem is not resources but getting reviewers to accept your changes. It's also a non-sequitur because the issue was closed.

The first point is an intellectually-honest position. However, the practical implications of setuptools_scm invoking network activity are that it could fail, but why this is more likely than pip failing is unclear. Also there is the apparent position that it shouldn't be done, which is a more ideological position. Attempting to deepen the shallow clone if that's needed has no practical drawbacks, at least to me.

I think the process forward here should not be to attempt to get this added to a PEP. I think the process forward should be to survey the broader community about whether this should be the default and then if most agree, simply require it for projects with pyproject.toml. Then we can think about allowing projects to opt out of this and fetch the full history if they want. If I'm wrong and the survey shows differently, then we can simply point to it when people ask about this further.

tzickel · 2018-02-07T19:37:48Z

How about adding another scheme which will be explicit about intentions ?
pip install git_shallow+https://example.com/myproject.git

blueyed · 2018-02-07T20:34:02Z

@tzickel
Nice idea, but maybe have it as an opt then?
pip install git+https://example.com/myproject.git?shallow=1 - not sure how well this is supported across tooling and if it makes sense altogether. Instead of using "query" it could also be in the hash maybe: pip install git+https://example.com/myproject.git#egg=foo,shallow=1.

ohadperry · 2018-03-07T08:33:19Z

+1 on this. would be greatly appriciated.

KOLANICH · 2018-06-27T09:32:52Z

Just an opinion:

version restriction is not needed. Usually projects have a policy that master (or another default brancu) must contain working version. So there is no need to use any version except master. If a project is broken if its dependency version goes up, it's a bug in the project, not in the dependency, it must be fixed. Every project must be compatible with the latest versions of its dependencies. If the bug was in the dependency master and that bug is intolerable, it is still bug in the project, because this means that this dependency is maintained badly (i.e. bad test coverage or no policy of workable master) and it was an error to rely on that dependency repo, for this case dependency repo should be manually cloned into another repo (may be outsourced to the companies making business auditing dependencies), manually checked and this own repo used as a source.

naught101 · 2018-06-28T01:32:25Z

Every project must be compatible with the latest versions of its dependencies.

That's ridiculous. Major version changes often include non-backward-compatible API changes. There is always going to be a lag time between a project's dependencies being update, and the project updating to use the new version.

And it is not true that master must contain a working version, for example Drupal doesn't even HAVE a master branch.

KOLANICH · 2018-06-28T06:46:18Z

Major version changes often include non-backward-compatible API changes. There is always going to be a lag time between a project's dependencies being update, and the project updating to use the new version.

Incompatible changes are usually announced beforehand and developed in separate branches before merging them into the main branch. And usually it's not that hard to port the software to a new version.

for example Drupal doesn't even HAVE a master branch.

Anyway there exist a branch with the code considered to be bleeding edge.

naught101 · 2018-06-29T00:49:35Z

Anyway there exist a branch with the code considered to be bleedeing edge.

Yes, there does. And you're not supposed to use it for production.

And usually it's not that hard to port the software to a new version.

Sure. I guess that's why there are still popular python 2 libraries that aren't completely converted to python 3, 8 years after it came out (like twisted)..

lock · 2019-06-02T10:03:54Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

anguslees mentioned this issue Feb 24, 2015

Use --depth=1 when checking out git repositories #2439

Closed

xavfernandez mentioned this issue Nov 17, 2015

Use shallow clones for installs from git #3251

Closed

sharmaeklavya2 added a commit to zulip/truncated-django that referenced this issue Apr 28, 2016

Squash all upstream commits.

8c87939

Workaround for pypa/pip#2432.

xavfernandez mentioned this issue Apr 7, 2017

Shallow clone from git. Resolve #2432. #3739

Closed

tumido mentioned this issue Apr 18, 2017

Add support to list future subscription pools candlepin/python-rhsm#200

Closed

yajo mentioned this issue Oct 24, 2017

[RFC] Future of Runbot OCA/runbot-addons#144

Closed

pztrick mentioned this issue Nov 28, 2017

pip unexpectedly not installing latest version of git package with branch/commit pinning #2837

Closed

pradyunsg mentioned this issue Dec 28, 2017

Shallow git clones #4938

Closed

pradyunsg mentioned this issue Jan 25, 2018

Pip does unnecessary git clone for github dependencies #4996

Closed

pradyunsg mentioned this issue Mar 7, 2018

install from ssh private bitbucket takes 10x than installing from source #5058

Closed

benoit-pierre mentioned this issue Jul 31, 2018

easy_install: use --depth 1 with git clone when downloading requirements pypa/setuptools#1446

Closed

ubunatic mentioned this issue Jul 31, 2018

use --depth 1 for git clone + use separate git fetch to download rev pypa/setuptools#1447

Closed

2 tasks

KOLANICH mentioned this issue Nov 19, 2018

Artifacts jpype-project/jpype#383

Merged

cjerdonek mentioned this issue Jan 6, 2019

Improve performance of the Bazaar VCS backend #5445

Closed

lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 2, 2019

lock bot locked as resolved and limited conversation to collaborators Jun 2, 2019

git checkout could/should be done with --depth=1 #2432

git checkout could/should be done with --depth=1 #2432

Comments

anguslees commented Feb 20, 2015

cancan101 commented Feb 23, 2015

anguslees commented Feb 23, 2015

dennybaa commented Jan 22, 2016

RonnyPfannschmidt commented Jan 22, 2016

dennybaa commented Jan 23, 2016

drocco007 commented Feb 10, 2016

RonnyPfannschmidt commented Feb 10, 2016

ryneeverett commented Feb 10, 2016

RonnyPfannschmidt commented Feb 10, 2016

ryneeverett commented Feb 10, 2016

RonnyPfannschmidt commented Feb 11, 2016

mrmachine commented Aug 16, 2016

dusenberrymw commented Sep 13, 2016

ryneeverett commented Sep 13, 2016

RonnyPfannschmidt commented Sep 13, 2016

ryneeverett commented Sep 13, 2016

RonnyPfannschmidt commented Sep 14, 2016

ryneeverett commented Sep 14, 2016

RonnyPfannschmidt commented Sep 14, 2016

ryneeverett commented Sep 14, 2016

RonnyPfannschmidt commented Sep 14, 2016

ryneeverett commented Sep 14, 2016

dennybaa commented Sep 14, 2016

dstufft commented Apr 1, 2017

dennybaa commented Apr 1, 2017 • edited Loading

ryneeverett commented Apr 1, 2017

pfmoore commented Apr 1, 2017

dennybaa commented Apr 1, 2017

pradyunsg commented Dec 28, 2017

ghost commented Dec 28, 2017

pradyunsg commented Dec 28, 2017

ghost commented Dec 28, 2017

tzickel commented Feb 7, 2018 • edited Loading

blueyed commented Feb 7, 2018

ohadperry commented Mar 7, 2018

KOLANICH commented Jun 27, 2018 • edited Loading

naught101 commented Jun 28, 2018

KOLANICH commented Jun 28, 2018 • edited Loading

naught101 commented Jun 29, 2018

lock bot commented Jun 2, 2019

dennybaa commented Apr 1, 2017 •

edited

Loading

tzickel commented Feb 7, 2018 •

edited

Loading

KOLANICH commented Jun 27, 2018 •

edited

Loading

KOLANICH commented Jun 28, 2018 •

edited

Loading