Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git checkout could/should be done with --depth=1 #2432

Closed
anguslees opened this issue Feb 20, 2015 · 55 comments
Closed

git checkout could/should be done with --depth=1 #2432

anguslees opened this issue Feb 20, 2015 · 55 comments
Labels
auto-locked Outdated issues that have been locked by automation

Comments

@anguslees
Copy link

pip can install directly from git, great!

When pip installs from git, it downloads the entire history of the git branch - when all it wants is the target version. I believe this can be achieved using the --depth=1 flag to various git subcommands. The speedup on projects with a long history can be significant.

@cancan101
Copy link

See: #344

@anguslees
Copy link
Author

@cancan101: thanks for the pointer, I assumed it had to be already discussed somewhere.

That PR was withdrawn, with what looks like some confusion as to why - it seems an obvious improvement to that PR would have been to use --depth=1 more selectively, or with some other variation of git command line.

anguslees added a commit to anguslees/pip that referenced this issue Mar 30, 2015
In most uses of pip's git backend, we don't need the full git history -
only the target head/branch/tag.  This change adds --depth=1 to git
fetch and clone commands.

NB: "Shallow" checkouts are not supported from arbitrary git commits, so
this change removes the ability to install from git SHA1 commit IDs.
Tags and branches continue to be supported.

Fixes pypa#2432
ryneeverett added a commit to ryneeverett/pip that referenced this issue Aug 22, 2015
expect_stderr was added to a few tests due to the following messages:

- *warning: --depth is ignored in local clones; use file:// instead.*
- *You are in 'detached HEAD' state...* (This is to be expected when the revision shallow-cloned is a tag. As the git-clone manpage says "--branch can also take tags and detaches the HEAD at that commit in the resulting repository.")
@dennybaa
Copy link

yeah seriously it's not fun... +1 to shallow clone!
Any reason pip grabs the whole git tree!?

@RonnyPfannschmidt
Copy link
Contributor

This has potential to break setuptools-scm

@dennybaa
Copy link

potentially) so why not give it a shot, test it with setuptools-scm... pip is already 8, but such an obvious optimization hasn't been included yet...

@drocco007
Copy link

+1

Even just having an option or setting to do this would be a huge help.

@RonnyPfannschmidt
Copy link
Contributor

@dennybaa a change such as this pretty much completely breaks git describe thus breaking setuptools_scm for certain, which means sucha change would break git for pretty much all packages that use scm metadata on their own or via pbr/setuptools_scm

@ryneeverett
Copy link
Contributor

@RonnyPfannschmidt Couldn't we just add a git fetch to setuptools_scm?

@RonnyPfannschmidt
Copy link
Contributor

there is a very intense difference between asking for scm metadata that is local and orchestrating network operations

my choice of words intentionally makes this issue seem larger

@ryneeverett
Copy link
Contributor

@RonnyPfannschmidt Sure, but that's something that only needs to happen once, so more like:

if $(rev-list HEAD --count) == 1:
    git fetch

My assumption is that setuptools_scm is called when the package using it is installed. (Is this incorrect?) So if you're using setuptools_scm the behavior would be the same as deep cloning it to begin with; the fetch would occur when network activity is already required.

@RonnyPfannschmidt
Copy link
Contributor

From my POV a opt in for shallow clone is just as hard and let's us avoid getting into causing network operations in setuptools-scm

sharmaeklavya2 added a commit to zulip/truncated-django that referenced this issue Apr 28, 2016
@mrmachine
Copy link

@RonnyPfannschmidt if the specific version being shallow cloned was tagged, would setuptools_scm still work? So it would only break if you git install an untagged commit?

@dusenberrymw
Copy link

This would definitely be useful as an opt-in flag, such as depth=1.

@ryneeverett
Copy link
Contributor

I'd much rather see an opt-out flag. The use cases for opting out are narrow and automated, so why should humans have to pass a flag for optimal behaviour?

@RonnyPfannschmidt
Copy link
Contributor

its completely broken behaviour for a growing number of my projects ^^

in future even ones like pytest

@ryneeverett
Copy link
Contributor

@RonnyPfannschmidt Why would pytest access the git history of a dependency?

@RonnyPfannschmidt
Copy link
Contributor

There is a plan to use setuptools-scm for release automation

@ryneeverett
Copy link
Contributor

I'm all aboard on supporting setuptools-scm. It seems like the ideal solution would be to clone just the commit history but git doesn't seem to have a way to do that yet. In the meantime, what if pip just did a deep clone if use_scm_version == True?

@RonnyPfannschmidt
Copy link
Contributor

there is no acceptable clean way to do that atm, and the peps that enable it would need time to grow before pip an implement them

@ryneeverett
Copy link
Contributor

It might be faster to get a patch merged and released in git than wait for the peps.

@RonnyPfannschmidt
Copy link
Contributor

feel free to try then

@ryneeverett
Copy link
Contributor

To sum up the maintainers' position:

  • We cannot change implementation details on which setuptools-scm depends.
  • We cannot explicitly support setuptools-scm without a pep.
  • We would rather add a crippled* cli flag (which would have to be maintained indefinitely for backwards compatibility) than an if-statement (which would only trigger an optimization and could thus be removed at any time).

* Crippled because it would be in the hands of users to make sure that their dependencies don't use setuptools-scm when using the flag and one would have to track setuptools-scm dependencies in a separate requirements file.

@dennybaa
Copy link

true ^^

@dstufft
Copy link
Member

dstufft commented Apr 1, 2017

Why being so severely "conservative"? The first group is already "safe" everything works, but why totally ignoring another group, for who you might just make a cli switch?

@dstufft Who does benefit from your approach?

The underlying question here is, what is the fundamental cost of adding an option? One of the foundational pieces of writing for my view point on this comes from Havoc Pennington's Choosing our Preferences which is speaking specifically about GUI applications but they can extend to any kind of tool as well.

To go into some more detail though, every option/preference we add has a cost. It costs increased complexity, because every option adds another state that your software can be in, and the more of them you have the more total states your software can be (and this explodes combinatorially for each option you have). It is not unusual at all for software to fail only when multiple specific options are selected because the way the different options interplayed with each other wasn't fully obvious at first. This makes it harder to actually test the software fully because you can quickly get to a point where you have thousands of different possible ways to configure the software.

Different options (and features in general) also mean more code, and that additional code needs maintenance over time. Thus for each option, we're increasing the burden of maintaining pip (because each option adds additional code).

Another aspect of this is, is since this is something that end users would turn on that individual projects would have no control over, is that this flag would essentially be a "break me" flag for a subset of projects that expect to be able to get a full git version history when they are in a git tree. Thus not only are we adding another option that we have to support and which adds to our combinatorial list of possible states, but now we're also forcing those downstream projects to also deal with that fact (and no matter what, they will get people asking them to support that if pip supports it).

Beyond it's direct effect on us though, is the cognitive overhead it places on end users. There is a concept called Hick's Law, which essentially states that the time it takes for a user to make a decision increases logarithmically for every possible option they have in making that decision. That means that for every option we add, we're making it fundamentally harder for someone to use pip, because when they do a pip install -h or look at our documentation, they are presented with more choices, which increases the cognitive burden of trying to make pip do something they want to do.

If you do any research on some of the fundamentals around designing a good user experience, particularly around options/preferences, one of the largest common themes you'll find is that deciphering when to say Yes and when to say No to a feature is one of the single most important things you can do for the over all user experience of a piece of software... and failing to do that correctly can utterly destroy the usability of your software.

Now, with all of that you obviously can't have software with zero options (at least, most software is not the kind of software that is small enough to have zero options) but trying to weigh the impact of adding an option is part of managing a project. In this case it is my belief that the downsides of adding the option in question outweigh the upsides nor do I think the upsides outweigh the upsides of just always doing this. I'm not the only maintainer, so it is entirely possible that one of the other @pypa/pip-committers feels differently to me here (and if they do, they are free to reopen this issue!) but unless someone comes up upsides to this that I have missed (thereby increasing the amount of good it would do) or they come up with a plan (as @pfmoore) to decrease or remove the downsides, I personally am -1 on this.

@dennybaa
Copy link

dennybaa commented Apr 1, 2017

@dstufft The said above sounds very theoretical, but though might be true.

Another aspect of this is, is since this is something that end users would turn on that individual projects would have no control over, is that this flag would essentially be a "break me" flag for a subset of projects that expect to be able to get a full git version history when they are in a git tree. Thus not only are we adding another option that we have to support and which adds to our combinatorial list of possible states, but now we're also forcing those downstream projects to also deal with that fact (and no matter what, they will get people asking them to support that if pip supports it).

Don't use --depth=x if you expect a full git history. If you do don't cry, you broke your project YOURSELF that's not the fault of pip maintainers or anybody else. Or don't use projects which you don't know how they work...

IMO, as pip maintainers:

  1. you can never have 100% guaranty of dumb users.
  2. you can never fully cover all compatibility aspects in tests.
  3. you should not try to protect the fate of downstream projects (especially those which are "non standard conforming"), let these projects live and evolve themselves .

Besides if things don't break they don't evolve, that's might be even worse for the pip project than that what you describe. But anyways, we all have our opinions. It's just my personal, and I can not insist on anything, just shared to you guys.

@ryneeverett
Copy link
Contributor

@dennybaa I want this solved as much as anyone. I even wrote a patch implementing the --depth flag. I am now opposed to the flag.

I think you'd be surprised at how many packages a flag would break, and I disagree that we can only progress by breaking things. There have been several proposed solutions in this thread that do not involve fragmenting the python packaging ecosystem. I suggest you pursue one of them instead.

you can never have 100% guaranty of dumb users.

Are you really a dumb user if you don't realize that one of your indirect dependencies uses one of the popular auto-versioning tools?

@pfmoore
Copy link
Member

pfmoore commented Apr 1, 2017

@dennybaa Let's just say that the pip maintainers don't agree with how you suggest we treat users confused by an option we add to pip. So we try to avoid adding options that will make using pip a frustrating and confusing experience for our users, no matter what level of experience they have (you may describe them as "dumb", I don't take that view).

@dennybaa
Copy link

dennybaa commented Apr 1, 2017

Maybe there's misunderstanding. We've got different opinions.

@ryneeverett , @pfmoore I din't call anyone who uses projects providing auto-versioning to python packaging ecosystem "dumb". There was just a list of obvious notions which are my sole opinion. Anyways my bad if you might read it like that...

Are you really a dumb user if you don't realize that one of your indirect dependencies uses one of the popular auto-versioning tools?

I was saying that such projects SHOULD document their behavioral specificity rather than pip taking this burden for all. That's why there was number 3) which I mentioned.

@pradyunsg
Copy link
Member

My thought was that as the frontend, pip's job soon will be to just fetch the right requirements and call the backend (to put it dumbly). This would sort of make it more efficient at one specific case.

At the end of the day, this is really more of nice-to-have thing. I find this to be good enhancement but also understand that this is a potentially breaking change. I agree with what @pfmoore suggests in #2432 (comment) though so, I'm gonna let this sit. Even I don't like the idea of a flag for it.

@ghost
Copy link

ghost commented Dec 28, 2017

How about rolling this into PEP 518? pip could fetch a shallow clone by default and then deepen it (before calling the build system) if pyproject.toml contained a key that said requires_deep or if there was no pyproject.toml.

@pradyunsg
Copy link
Member

I mean, it sort of makes sense to me but I am wary of like what happened with PEP 426 - we don't want to try and do everything as if we have infinite resources.

@ghost
Copy link

ghost commented Dec 28, 2017

We have a chance to correct build backend behavior in a backwards-compatible way. We should take it.

It's clear from this issue that most people (in this issue) want the default to be shallow clones. The reasons for not implementing this are:

  1. Some participants don't think that in principle pip should do a shallow clone.
  2. There is an excuse given that pip doesn't have the resources to implement it.

The second point doesn't seem accurate; from experience the problem is not resources but getting reviewers to accept your changes. It's also a non-sequitur because the issue was closed.

The first point is an intellectually-honest position. However, the practical implications of setuptools_scm invoking network activity are that it could fail, but why this is more likely than pip failing is unclear. Also there is the apparent position that it shouldn't be done, which is a more ideological position. Attempting to deepen the shallow clone if that's needed has no practical drawbacks, at least to me.

I think the process forward here should not be to attempt to get this added to a PEP. I think the process forward should be to survey the broader community about whether this should be the default and then if most agree, simply require it for projects with pyproject.toml. Then we can think about allowing projects to opt out of this and fetch the full history if they want. If I'm wrong and the survey shows differently, then we can simply point to it when people ask about this further.

@tzickel
Copy link

tzickel commented Feb 7, 2018

How about adding another scheme which will be explicit about intentions ?
pip install git_shallow+https://example.com/myproject.git

@blueyed
Copy link
Contributor

blueyed commented Feb 7, 2018

@tzickel
Nice idea, but maybe have it as an opt then?
pip install git+https://example.com/myproject.git?shallow=1 - not sure how well this is supported across tooling and if it makes sense altogether. Instead of using "query" it could also be in the hash maybe: pip install git+https://example.com/myproject.git#egg=foo,shallow=1.

@ohadperry
Copy link

+1 on this. would be greatly appriciated.

@KOLANICH
Copy link
Contributor

KOLANICH commented Jun 27, 2018

Just an opinion:

version restriction is not needed. Usually projects have a policy that master (or another default brancu) must contain working version. So there is no need to use any version except master. If a project is broken if its dependency version goes up, it's a bug in the project, not in the dependency, it must be fixed. Every project must be compatible with the latest versions of its dependencies. If the bug was in the dependency master and that bug is intolerable, it is still bug in the project, because this means that this dependency is maintained badly (i.e. bad test coverage or no policy of workable master) and it was an error to rely on that dependency repo, for this case dependency repo should be manually cloned into another repo (may be outsourced to the companies making business auditing dependencies), manually checked and this own repo used as a source.

@naught101
Copy link

Every project must be compatible with the latest versions of its dependencies.

That's ridiculous. Major version changes often include non-backward-compatible API changes. There is always going to be a lag time between a project's dependencies being update, and the project updating to use the new version.

And it is not true that master must contain a working version, for example Drupal doesn't even HAVE a master branch.

@KOLANICH
Copy link
Contributor

KOLANICH commented Jun 28, 2018

Major version changes often include non-backward-compatible API changes. There is always going to be a lag time between a project's dependencies being update, and the project updating to use the new version.

Incompatible changes are usually announced beforehand and developed in separate branches before merging them into the main branch. And usually it's not that hard to port the software to a new version.

for example Drupal doesn't even HAVE a master branch.

Anyway there exist a branch with the code considered to be bleeding edge.

@naught101
Copy link

Anyway there exist a branch with the code considered to be bleedeing edge.

Yes, there does. And you're not supposed to use it for production.

And usually it's not that hard to port the software to a new version.

Sure. I guess that's why there are still popular python 2 libraries that aren't completely converted to python 3, 8 years after it came out (like twisted)..

@lock
Copy link

lock bot commented Jun 2, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 2, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation
Projects
None yet
Development

No branches or pull requests