New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 440 Version and Specifiers #1894

Merged
merged 3 commits into from Dec 13, 2014

Conversation

Projects
None yet
3 participants
@dstufft
Member

dstufft commented Jun 25, 2014

So basically this uses a slightly modified copy of pypa/packaging#1 to implement PEP 440 in pip. This is mostly a proof of concept for right now however it "works" in that I can install stuff successfully with it. It implements all of the specifiers from PEP 440 including ~=, ==X.*, and ===. These likely cannot be used inside of an install_requires because setuptools does not support them however they can be used in a requirements.txt file and on the command line. Additionally they could also be used inside of a Wheel file. It does give the new semantics towards < and > as well as the semantics around local versions.

Some notes for those not up to date on the latest PEP 440 draft:

  • All versions consider a "local" version (such as 1.0+debian3) to be semantically equivalent to whatever the public version is (1.0 in the example). So 1.0+debian3 is ==1.0.
  • <, and > work smarter with regards towards pre-releases, so <3 does not match 3.0.dev1.
  • Versions which are not PEP 440 compatible are excluded by default, however you can still depend on them by using the === operator which causes things to fall back to a simple case insensitive string comparison.

There is probably lots of bugs and corner cases and the like (hence proof of concept) but I think it's pretty cool!

@dstufft

This comment has been minimized.

Member

dstufft commented Jun 25, 2014

Oh yea, and all the actual changes to pip are in dstufft@09f612c

@dstufft

This comment has been minimized.

Member

dstufft commented Jun 25, 2014

Oh one other thing about this. The packaging library (and pip's extensions of it) properly handles combining the same requirement from two different packages. This PR doesn't fix that longstanding issue in pip, but it could be leveraged to do so.

@qwcode

This comment has been minimized.

Contributor

qwcode commented Jun 26, 2014

neat!

handles combining the same requirement from two different packages

i.e. combining top-level "double requirements"? and not just doing "first found wins" for sub-requirements?

@qwcode

This comment has been minimized.

Contributor

qwcode commented Jun 26, 2014

does this indirectly handle #1505? or no?

one of drivers of that was versions like 1.3-fork1 (and not wanting them to be called pre-releases)

how would that get handled here?

@dstufft

This comment has been minimized.

Member

dstufft commented Jun 26, 2014

Yes on combining requirements. This doesn't handle it yet, but basically it'd just need code so that if you have two reqs you do req = Requirement("Django>1.4") & Requirement("Django>=1.6"). Basically the mechanics are there but this PR doesn't attempt to utilize them to make it happen.

So the latest round of PEP 440 (which this implements) uses 1.3+fork1 which this properly handles. You can do pip install foo==1.3 and if it sees the work it'll install it, or you can do pip install foo==1.3+fork1 or any other specifier. It won't be counted as a pre-release.

This does not give some global way to just blindly allow non PEP 440 versions however it also does not have the --pre flag allow non PEP 440 versions. If you want to use a non PEP 440 versions you have to use ===<non pep440 version>. Basically the === is an escape hatch that says "don't interpret the version, just do basic string equality on it".

We do not attempt to sort versions that we cannot parse with PEP 440, which is why you can only pin to a non PEP 440 version and you can't do anything else.

@dstufft

This comment has been minimized.

Member

dstufft commented Jun 26, 2014

Obviously there are still edge cases broken in the PR, hence it being a proof of concept.

@qwcode

This comment has been minimized.

Contributor

qwcode commented Jun 26, 2014

why you can only pin to a non PEP 440 version and you can't do anything else.

hmm, no attempt at sorting? this will break stuff where people want to sort pkg_resources-style fork versions (i.e "1.3-fork1"). We have forks like this at my job.

Basically the === is an escape hatch

so, only an escape hatch for your own top-level requirements? #1505 was imagining a --allow-nonstandard-version flag which would theoretically cut across the whole dependency tree, and handle cases where install_requires has non-standard versioning. not saying it's the end of the world, but just trying to be clear what's possible with this hatch.

@dstufft

This comment has been minimized.

Member

dstufft commented Jun 26, 2014

To be specific, if you use 1.3+fork1 instead of 1.3-fork1 it'll sort just fine as that's a PEP 440 local version and it has a defined sort.

@qwcode

This comment has been minimized.

Contributor

qwcode commented Jun 26, 2014

yes, I got that, just thinking of legacy packaging.

@dstufft

This comment has been minimized.

Member

dstufft commented Jun 26, 2014

The === escape hatch can techincally be used in a project's metadata too, however setuptools itself won't allow that at the moment.

Here's the compatibility numbers by the way:

Total Version Compatibility:              231807/239450 (96.81%)
Total Sorting Compatibility (Unfiltered): 43095/45505 (94.70%)
Total Sorting Compatibility (Filtered):   45481/45505 (99.95%)
Projects with No Compatible Versions:     802/45505 (1.76%)
Projects with Differing Latest Version:   1163/45505 (2.56%)

An option we could do is if we can't find any PEP 440 compatible versions is fall back to pkg_resources. Which would allow places that have all incompatible versions to still operate as they did today. I'm not a massive fan and I didn't care enough to try to do that for the PoC but it's a possibility.

One of the important things here I think, is that if we can't parse a version then we really don't know how it should sort. An extreme example is which is the latest version, bob or dog (which is something allowed by pkg_resources. Unfortunately pkg_resources used the - symbol for more than one thing and on PyPI it was used a lot for -dev, -alpha etc, so the PEP 440 rules normalize it to things like 1.0-dev -> 1.0.dev0.

It's possible that PEP 440 could be expanded to treat -<anything that's not a|b|c|rc|alpha|beta|dev> as a "local version" as if you had did +whatever. So in the example of 1.0-fork1 it'd normalize to 1.0+fork1 and work fine.

@dstufft

This comment has been minimized.

Member

dstufft commented Jun 26, 2014

Those numbers are from what's on PyPI btw, which of course is unlikely to have many forks.

@dstufft

This comment has been minimized.

Member

dstufft commented Jun 26, 2014

Paging @ncoghlan to get an opinion on normalizing -anything to +anything as long as it doesn't match one of the pre-release normalizations.

@ncoghlan

This comment has been minimized.

Member

ncoghlan commented Jun 26, 2014

I quite like the idea of trying to use the more permissive local version
parsing to get a defined sort order for even more legacy versions. I see a
couple of possibilities:

  • for each hyphen (starting from the right), replace it with "+" to see if
    that results in a valid public+local version
  • if that still doesn't work, prepend a "0+" to treat the whole string
    as a local version

Only if even the mostly-lexical sorting offered by local versions failed
would we give up entirely.

@dstufft

This comment has been minimized.

Member

dstufft commented Jun 26, 2014

Ok, I'll see about implementing that in packaging and seeing what that does to our compatibility.

@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

I started with compatibility numbers that looked like:

$ invoke check.pep440 --cached
Total Version Compatibility:              233644/241356 (96.80%)
Total Sorting Compatibility (Unfiltered): 43231/45649 (94.70%)
Total Sorting Compatibility (Filtered):   45625/45649 (99.95%)
Projects with No Compatible Versions:     800/45649 (1.75%)
Projects with Differing Latest Version:   1169/45649 (2.56%)

Then I made the it so that - is an alternate spelling for +, unless it matches one of the other rules (-dev, -alpha, etc). This gave us compatibility numbers that look like:

$ invoke check.pep440 --cached
Total Version Compatibility:              237764/241356 (98.51%)
Total Sorting Compatibility (Unfiltered): 44379/45649 (97.22%)
Total Sorting Compatibility (Filtered):   45526/45649 (99.73%)
Projects with No Compatible Versions:     398/45649 (0.87%)
Projects with Differing Latest Version:   576/45649 (1.26%)

Then I made it so that - is also an alternate spelling for the . inside of a local version. This gave us compatibility numbers that look like this:

$ invoke check.pep440 --cached
Total Version Compatibility:              238325/241356 (98.74%)
Total Sorting Compatibility (Unfiltered): 44501/45649 (97.49%)
Total Sorting Compatibility (Filtered):   45509/45649 (99.69%)
Projects with No Compatible Versions:     343/45649 (0.75%)
Projects with Differing Latest Version:   508/45649 (1.11%)

Then I made it so that we allow a v at the start of a version (which we ignore and normalize away) and that got us compatibility numbers that look like this:

$ invoke check.pep440 --cached
Total Version Compatibility:              238684/241356 (98.89%)
Total Sorting Compatibility (Unfiltered): 44562/45649 (97.62%)
Total Sorting Compatibility (Filtered):   45488/45649 (99.65%)
Projects with No Compatible Versions:     292/45649 (0.64%)
Projects with Differing Latest Version:   464/45649 (1.02%)

Then I made it so that we allow a - or a . between a pre/post/dev marker and the numeral. This allows things like 1.0.alpha.0 which normalize as 1.0a0. This got us compatibility numbers that look like this:

$ invoke check.pep440 --cached
Total Version Compatibility:              238879/241356 (98.97%)
Total Sorting Compatibility (Unfiltered): 44618/45649 (97.74%)
Total Sorting Compatibility (Filtered):   45495/45649 (99.66%)
Projects with No Compatible Versions:     278/45649 (0.61%)
Projects with Differing Latest Version:   438/45649 (0.96%)

I'm not sure what else we can add to get even more compatible. This brings us back to the question of how compatible is compatible enough. So far every change slightly lowers our filtered sorting compatibility (this tells us how much alike we're sorting similar to pkg_resources removing things we can't sort) but it also increases our total compatibility.

Here's the list of versions which I still treat as invalid: https://gist.github.com/dstufft/c24bcacef202a3837600. I really only want to consider things that can be implemented by adjusting the regex/parsing and nothing that requires transforming the version itself. I think they are way easier to explain and implement and are far less likely to have bugs related to apply stacks of text transforms.

@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

Also I'm not sure which of the above modifications we want to allow, all of them? some of them?

@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

Here's another thing I tried, I allowed . as an alternate spelling for +. This results in a significant (comparatively) jump in accepted versions, but I'm really nervous about it. It feels like this one has the potential to misinterpret versions and make it confusing and/or surprising about how a version will be parsed much more so than any of the other changes I've tried above. However the compatibility numbers for this are:

$ invoke check.pep440 --cached
Total Version Compatibility:              240446/241356 (99.62%)
Total Sorting Compatibility (Unfiltered): 44884/45649 (98.32%)
Total Sorting Compatibility (Filtered):   45301/45649 (99.24%)
Projects with No Compatible Versions:     171/45649 (0.37%)
Projects with Differing Latest Version:   304/45649 (0.67%)
@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

To be specific, the . as an alternate spelling for + means that 1.0.abcdefg would get interpreted as 1.0+abcdefg, but it also means that 1.0a0.1 would get interpreted as 1.0a0+1. The first one seems reasonable but the second one seems very wrong to me.

@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

Another thing I tried, I allow _ anywhere that - and . were allowed. This got us numbers like this (without the above . as a stand in for + that I think is dangerous):

$ invoke check.pep440 --cached
Total Version Compatibility:              238967/241356 (99.01%)
Total Sorting Compatibility (Unfiltered): 44634/45649 (97.78%)
Total Sorting Compatibility (Filtered):   45483/45649 (99.64%)
Projects with No Compatible Versions:     271/45649 (0.59%)
Projects with Differing Latest Version:   432/45649 (0.95%)

The same rule, but including the dangerous thing above:

$ invoke check.pep440 --cached
Total Version Compatibility:              240533/241356 (99.66%)
Total Sorting Compatibility (Unfiltered): 44903/45649 (98.37%)
Total Sorting Compatibility (Filtered):   45294/45649 (99.22%)
Projects with No Compatible Versions:     165/45649 (0.36%)
Projects with Differing Latest Version:   296/45649 (0.65%)
@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

Another thing I thought, this is another one that I'm not sure about. I don't think it's as dangerous as the other one though. An implied leading "0" on any version which does not have a leading numeral for the release segment. This allows versions like .1 to be normalized to 0.1 and versions like dev to be normalized to 0.dev0.

Compatibility numbers with this also applied are (without the above dangerous change):

$ invoke check.pep440 --cached
Total Version Compatibility:              239196/241356 (99.11%)
Total Sorting Compatibility (Unfiltered): 44740/45649 (98.01%)
Total Sorting Compatibility (Filtered):   45481/45649 (99.63%)
Projects with No Compatible Versions:     209/45649 (0.46%)
Projects with Differing Latest Version:   367/45649 (0.80%)

With the dangerous change:

$ invoke check.pep440 --cached
Total Version Compatibility:              240768/241356 (99.76%)
Total Sorting Compatibility (Unfiltered): 45007/45649 (98.59%)
Total Sorting Compatibility (Filtered):   45287/45649 (99.21%)
Projects with No Compatible Versions:     102/45649 (0.22%)
Projects with Differing Latest Version:   233/45649 (0.51%)
@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

Another thing: Allowing rev, r, and pre as an alternate spelling of dev. This gives us numbers that look like:

Without the "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              239714/241356 (99.32%)
Total Sorting Compatibility (Unfiltered): 44797/45649 (98.13%)
Total Sorting Compatibility (Filtered):   45422/45649 (99.50%)
Projects with No Compatible Versions:     170/45649 (0.37%)
Projects with Differing Latest Version:   328/45649 (0.72%)

With the "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              240878/241356 (99.80%)
Total Sorting Compatibility (Unfiltered): 45019/45649 (98.62%)
Total Sorting Compatibility (Filtered):   45280/45649 (99.19%)
Projects with No Compatible Versions:     90/45649 (0.20%)
Projects with Differing Latest Version:   223/45649 (0.49%)
@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

Oh, a side effect of the implicit leading 0 change is that an empty string is a valid version, which gets normalized to 0.

@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

Ok, another change. In the release segment allow omiting a numeral anywhere which is an implicit 0. This makes a version like 1. normalize to 1.0 and 1... normalize to 1.0.0.0. This gives us numbers like:

Without "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              239747/241356 (99.33%)
Total Sorting Compatibility (Unfiltered): 44822/45649 (98.19%)
Total Sorting Compatibility (Filtered):   45422/45649 (99.50%)
Projects with No Compatible Versions:     163/45649 (0.36%)
Projects with Differing Latest Version:   321/45649 (0.70%)

With "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              240907/241356 (99.81%)
Total Sorting Compatibility (Unfiltered): 45043/45649 (98.67%)
Total Sorting Compatibility (Filtered):   45283/45649 (99.20%)
Projects with No Compatible Versions:     83/45649 (0.18%)
Projects with Differing Latest Version:   216/45649 (0.47%)
@ncoghlan

This comment has been minimized.

Member

ncoghlan commented Jul 4, 2014

Scanning the list of "still incompatible" options, I see the following major points:

  • hashes as part of the revision (this is likely the most significant factor in the compatibility jump for treating "." as "+", but I agree with you that it's a problem from a semantic perspective)
  • leading and trailing "-" and "." characters
  • "r", "rev", "p" and "pre" as component labels (with and without a numeric part)

So, I like most of your changes, except:

  • I don't like the "treat . as +" change. Yes, we'll treat hashes as orderable if someone includes them in a local version, but we shouldn't allow that implicitly (I know that contradicts my "implied 0+" prefix suggestion from earlier, but after seeing the list, I realised it was a bad idea)
  • "r", "rev" and "p" don't read as "dev" equivalents to me, they're more like "post". That suggests attempting to normalise them may be a bit closer to guessing than we would like.
@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

And Another change! This time, allow "empty" segments in the local version, essentially allowing 1.0+1...0, or even trailing segments like 1.0+abc-. Each "empty" segment is an implicit 0. This gives us numbers like:

Without "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              239843/241356 (99.37%)
Total Sorting Compatibility (Unfiltered): 44875/45649 (98.30%)
Total Sorting Compatibility (Filtered):   45420/45649 (99.50%)
Projects with No Compatible Versions:     157/45649 (0.34%)
Projects with Differing Latest Version:   310/45649 (0.68%)

With "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              241004/241356 (99.85%)
Total Sorting Compatibility (Unfiltered): 45097/45649 (98.79%)
Total Sorting Compatibility (Filtered):   45282/45649 (99.20%)
Projects with No Compatible Versions:     76/45649 (0.17%)
Projects with Differing Latest Version:   204/45649 (0.45%)
@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

Now I really am out of ideas :)

@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

So, the reason I treated rev and r as dev releases, because If I recall correctly setuptools has a routine that will autogenerate versions from SVN and it uses either rev or r. I may be remembering that wrong, but in my mind autogenerated from VCS == development version.

@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

Ok, so that's two people who don't like the implicit . is + idea, so I'll drop that out of my stack.

@ncoghlan

This comment has been minimized.

Member

ncoghlan commented Jul 4, 2014

Could you put together two lists? The versions from the "no compatible versions" projects and the old selection & new selection for the "latest version changed" projects?

@ncoghlan

This comment has been minimized.

Member

ncoghlan commented Jul 4, 2014

Ah, if the "r" is referring to svn revisions, then yes, it would count as a "dev" release. I don't really mind including that one, since it would only change the sort order if that was used together with a/b/c style numbering.

@ncoghlan

This comment has been minimized.

Member

ncoghlan commented Jul 4, 2014

Forgot to say: +1 for dropping the "implicit 0" rules for the release and local version segments.

@ncoghlan

This comment has been minimized.

Member

ncoghlan commented Jul 4, 2014

The "-" -> "+" rule is one where it might be nice to emit a warning when we're having to rely on it, just as we would for falling back to pkg_resources entirely due to a lack of PEP 440 compatible versions. Although if we're going to emit a warning anyway, then perhaps we should just leave it as "not compatible", and let the pkg_resources fallback deal with it.

@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

I think the thing that bothers me the most about it, is that 1.0-r1-r1 will be 1.0.post1+r1. The same characters mean something different depending on where in the version string they are.

@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

The same holds true for any of our pre/dev/post release specifiers of course. 1.0-dev-dev is 1.0.dev0+dev. It feels strange to me.

@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

Oh, and for the record, I'm against warnings. Warnings are for build/publishing tools that allow you to use an invalid version (e.g. one you'd need to use === to specify). They aren't for the core spec because 99% of the time the people getting those warnings are going to be people installing things who have no control over what the authors of the packages do.

@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

I'm gonna go to bed and think about this some more. For what it's worth switching between allowing - and being strict about + is a tiny code change, so I can swap back and forth easily.

@ncoghlan

This comment has been minimized.

Member

ncoghlan commented Jul 4, 2014

I think you've convinced me that including the "-" -> "+" transformation will entrench the kind of weird inconsistencies that PEP 440 is designed to eliminate. Let's leave it out, and instead deal with it by having pip fall back to pkg_resources if it can't find any matching versions. In terms of reporting compatibility numbers, that would mean changing the last two lines of the report to:

Falls back to pkg_resources: X%
Installs a different "latest version": Y%

The second last line would be the same number as the current "Projects with no compatible versions", the last line would be the current "Projects with differing latest versions" minus the projects with no compatible versions.
Falling back to pkg_resources isn't a big deal - over time, that will fix itself as setuptools gets pickier and projects adopt compliant version numbering.
It's the "installs a different latest version" that we need to be wary of, since that may cause inadvertent downgrades.

@ncoghlan

This comment has been minimized.

Member

ncoghlan commented Jul 4, 2014

Also, agreed that the warnings should come into play on the setuptools and twine side of things, rather than the pip side.

@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

I wonder if we even need to maintain compatibility here by falling back to pkg_resources. We're looking at total compatibility as a whole but that assumes that each version and each project are equally as important. We do weight things some with the last two numbers however not being able to install the latest version of a library released 10 years ago and which only works on some old version of Python is significantly different than not being able to install say lxml.

@dstufft

This comment has been minimized.

Member

dstufft commented Jul 4, 2014

IOW, I'm going to look at what those projects actually are, to see if they are projects that we need to worry about.

@dstufft

This comment has been minimized.

Member

dstufft commented Aug 5, 2014

Coming back around to this, what I've got now is:

  1. Allow a preceding literal v character which is simply ignored, normalizes to being omitted.
  2. Allow literal -, ., or _ character as well as no character preceding or following the pre-release, post-release, or dev-release signifier (dev, post, etc).
  3. Allow additional spellings of alpha, beta, rc, pre, preview which mean, a, b, c, c, and c.
  4. Allow additional spellings of r and rev which mean post.
  5. Allow the local version segment separator (inside the local version, not the preceding + to be the literal characters ., _, or -, which normalizes to ..
  6. Ignore leading and trailing whitespace ( `,`\t`,`\n`,`\r`,`\f`,`\v`). This comes *before* the ignored`v character if one exists.

Notable things which I am not including:

  • Local versions must be designated with a +, the relaxed rules were simply too confusing and too ambiguous.
  • No implicit 0 for missing release segments (e.g. 1. is not 1.0).
  • No implicit 0 for missing local version segments (e.g. 1+1. is not 1+1.0).
  • I experimented with allowing - and _ in addition to . between release segments (e.g. 1-0 is 1.0. However I decided against it because this was different than how pkg_resources interprets it and in a way that affects ordering. There are projects which use date based versions like 2014-08-04 however we can't pick those up without also incorrectly allowing the documented use of the - in pkg_resources.

This brings us to this:

$ invoke check.pep440 --cached
Total Version Compatibility:              244249/250042 (97.68%)
Total Sorting Compatibility (Unfiltered): 45081/47058 (95.80%)
Total Sorting Compatibility (Filtered):   47002/47058 (99.88%)
Projects with No Compatible Versions:     622/47058 (1.32%)
Projects with Differing Latest Version:   904/47058 (1.92%)

My method for determining which things to accept and which not to accept is fairly simple.

  1. Does it introduce likely ambiguity? (Likely being a keyword here, of course maybe someone uses some scheme where v1.0 is different than 1.0 but it's not likely).
  2. Does it break compatibility with pkg_resources, particularly in how it is ordered.
  3. Does it make versions which are unreasonably hard to read as a human being.

Of these, the first rule is the strictest one, with the second one being a close followup. The third rule is basically the "well including this rule isn't very helpful, but is it harmful?" differentiation.

@ncoghlan

This comment has been minimized.

Member

ncoghlan commented Aug 5, 2014

Assuming you meant 'c' rather than 'C' those rules sound reasonable to me. Your guidelines and a reference to this issue will likely cover most of the "rationale" section of the PEP update :)

So we have 622 projects that become uninstallable, and 282 where the version installed changes. A quick scan of both lists would likely be useful (especially the second one, since that's a silent change when folks aren't using version pinning)

@dstufft

This comment has been minimized.

Member

dstufft commented Aug 5, 2014

Oh, did I type C somewhere? Yea I meant c.

@dstufft

This comment has been minimized.

Member

dstufft commented Aug 5, 2014

For the record, I'm updating pypa/packaging at the moment to match these rules in all locations, as well as finish implementing this PR and such. Once that's done I'll consider this idea validated and update the PEP itself. Then I think we can call the PR ready for distutils-sig and hopefully we can get it accepted.

@dstufft

This comment has been minimized.

Member

dstufft commented Aug 5, 2014

Here is the list of projects and versions with no compatible versions: https://gist.github.com/dstufft/5c291d6ae03eaa67e96d
Here is the list of projects and versions which have a differing latest version: https://gist.github.com/dstufft/6092a3f8c1ebfb3d246a

Quick reminder that differing latest version includes by definition no compatible version projects (since being able to parse something is different than not being able to parse something) and is represented by a null in the pep440 key.

@dstufft

This comment has been minimized.

Member

dstufft commented Aug 5, 2014

Just to possibly make things a little bit easier, here's a list of projects which have a differing latest version and which does not include the stuff where there are no valid PEP 440 versions. This will represent the 282 versions which will get a different latest version silently.

Mostly this looks like things where the latest version of a project doesn't successfully parse with PEP 440 but older versions do successfully parse. However I do notice at least one case where the PEP 440 version appears to be more correct, e.g. Django-base which pkg_resources says the latest version is 0.91 however PEP 440 says the latest version is v1.1. So it's probably important to point out that some of these differences are where we got better not just different. See: https://gist.github.com/dstufft/412717a3435342ad125f

I also see other ones like MangoEngine where pkg_resources says the latest version is 0.1-rc.1 however we say the latest version is v1.0.0-rc.2.

Another one is bda.recipe.deployment where pkg_resources says 2.0beta and PEP 440 says 2.0b8.

@dstufft

This comment has been minimized.

Member

dstufft commented Aug 5, 2014

It looks like all the tests are passing on this now. This can't be merged as is or anything like that since PEP 440 needs accepted first and it's likely we want to get this into setuptools as well (maybe even prior to getting it into pip).

@ncoghlan

This comment has been minimized.

Member

ncoghlan commented Aug 5, 2014

Yep, I think this one looks like a winner. It would be nice to support the "YYYY-MM-DD" date based releases along with the "-N" patch level notation, but I agree that lets too much nonsense through and makes things overly confusing.

@ncoghlan

This comment has been minimized.

Member

ncoghlan commented Aug 5, 2014

And yes, it would be good to get setuptools applying the normalisation (and complaining about incompatible versions) before we publish a corresponding version of pip. It would also give us an easier way to advise owners of incompatible packages to update their version numbers.

@dstufft

This comment has been minimized.

Member

dstufft commented Aug 5, 2014

Ok I lied, I have one more possible thing.

Techincally pkg_resources supports -<any alpha string> and this represents a patch level release which comes after the same version without that -<any alpha string>. We have two constructs which sort after a version which are post releases and local versions. We attempted to use this for local versions however we were not successful because of the ambiguity it creates.

What we did not try, is normalizing -<whatever> into a post release. This is actually a more accurate translation of the meaning of the -<whatever> syntax in pkg_resources. The problem being that while pkg_resources supports any thing after the - character, our post releases can only contain numbers. However we could simply limit support for this to -<numerals>.

This should not be ambiguous if we only allow the - characters (and perhaps _) and not .. If we included the . then we couldn't tell it apart from another digit on the release segment. Both pre-releases and dev releases will still require some additional characters in order to be specified so this shouldn't be ambigious with them, and local versions use the + signifier so it shouldn't be ambiguous with that either. This would mean that 1.0-mypatch1 is considered invalid but 1.0-1 is valid and is normalized to 1.0.post1.

A quick look at what this does on PyPI is it brings our numbers down to:

$ invoke check.pep440 --cached
Total Version Compatibility:              245340/250042 (98.12%)
Total Sorting Compatibility (Unfiltered): 45330/47058 (96.33%)
Total Sorting Compatibility (Filtered):   46936/47058 (99.74%)
Projects with No Compatible Versions:     499/47058 (1.06%)
Projects with Differing Latest Version:   709/47058 (1.51%)

This adds an additional 123 projects which couldn't be installed previously, but now can and reduces the number of projects which can be installed, but which the latest version is silently different from 282 to 210. It also gives us the last remaining style of version from pkg_resources that we were not compatible with and for which we can be without re-introducing ambiguity.

@dstufft

This comment has been minimized.

Member

dstufft commented Aug 5, 2014

I checked the difference between allowing only -N and allowing either -N or _N and the only difference was we went from 709 to 708 projects in "Projects with Differing Latest Version". I'm going to declare that we only support - for that field unless we think that it makes sense to support _ for symmetry with the other locations where we support -, _, and ..

@ncoghlan

This comment has been minimized.

Member

ncoghlan commented Aug 6, 2014

Allowing a trailing "-N" by normalising it to ".postN" sounds good to me. I think that change will also greatly increase the odds of the new answer being better than the pkg_resources answer when they're different.

@dstufft dstufft changed the title from Proof of Concept: PEP 440 Version and Specifiers to PEP 440 Version and Specifiers Dec 13, 2014

dstufft added a commit that referenced this pull request Dec 13, 2014

@dstufft dstufft merged commit bff1145 into pypa:develop Dec 13, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details

@dstufft dstufft deleted the dstufft:use-packaging branch Dec 13, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment