Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

PEP 440 Version and Specifiers #1894

Merged
merged 3 commits into from Dec 13, 2014

Conversation

Projects
None yet
3 participants
Owner

dstufft commented Jun 25, 2014

So basically this uses a slightly modified copy of pypa/packaging#1 to implement PEP 440 in pip. This is mostly a proof of concept for right now however it "works" in that I can install stuff successfully with it. It implements all of the specifiers from PEP 440 including ~=, ==X.*, and ===. These likely cannot be used inside of an install_requires because setuptools does not support them however they can be used in a requirements.txt file and on the command line. Additionally they could also be used inside of a Wheel file. It does give the new semantics towards < and > as well as the semantics around local versions.

Some notes for those not up to date on the latest PEP 440 draft:

  • All versions consider a "local" version (such as 1.0+debian3) to be semantically equivalent to whatever the public version is (1.0 in the example). So 1.0+debian3 is ==1.0.
  • <, and > work smarter with regards towards pre-releases, so <3 does not match 3.0.dev1.
  • Versions which are not PEP 440 compatible are excluded by default, however you can still depend on them by using the === operator which causes things to fall back to a simple case insensitive string comparison.

There is probably lots of bugs and corner cases and the like (hence proof of concept) but I think it's pretty cool!

Owner

dstufft commented Jun 25, 2014

Oh yea, and all the actual changes to pip are in dstufft/pip@09f612c

Owner

dstufft commented Jun 25, 2014

Oh one other thing about this. The packaging library (and pip's extensions of it) properly handles combining the same requirement from two different packages. This PR doesn't fix that longstanding issue in pip, but it could be leveraged to do so.

Contributor

qwcode commented Jun 26, 2014

neat!

handles combining the same requirement from two different packages

i.e. combining top-level "double requirements"? and not just doing "first found wins" for sub-requirements?

Contributor

qwcode commented Jun 26, 2014

does this indirectly handle #1505? or no?

one of drivers of that was versions like 1.3-fork1 (and not wanting them to be called pre-releases)

how would that get handled here?

Owner

dstufft commented Jun 26, 2014

Yes on combining requirements. This doesn't handle it yet, but basically it'd just need code so that if you have two reqs you do req = Requirement("Django>1.4") & Requirement("Django>=1.6"). Basically the mechanics are there but this PR doesn't attempt to utilize them to make it happen.

So the latest round of PEP 440 (which this implements) uses 1.3+fork1 which this properly handles. You can do pip install foo==1.3 and if it sees the work it'll install it, or you can do pip install foo==1.3+fork1 or any other specifier. It won't be counted as a pre-release.

This does not give some global way to just blindly allow non PEP 440 versions however it also does not have the --pre flag allow non PEP 440 versions. If you want to use a non PEP 440 versions you have to use ===<non pep440 version>. Basically the === is an escape hatch that says "don't interpret the version, just do basic string equality on it".

We do not attempt to sort versions that we cannot parse with PEP 440, which is why you can only pin to a non PEP 440 version and you can't do anything else.

Owner

dstufft commented Jun 26, 2014

Obviously there are still edge cases broken in the PR, hence it being a proof of concept.

Contributor

qwcode commented Jun 26, 2014

why you can only pin to a non PEP 440 version and you can't do anything else.

hmm, no attempt at sorting? this will break stuff where people want to sort pkg_resources-style fork versions (i.e "1.3-fork1"). We have forks like this at my job.

Basically the === is an escape hatch

so, only an escape hatch for your own top-level requirements? #1505 was imagining a --allow-nonstandard-version flag which would theoretically cut across the whole dependency tree, and handle cases where install_requires has non-standard versioning. not saying it's the end of the world, but just trying to be clear what's possible with this hatch.

Owner

dstufft commented Jun 26, 2014

To be specific, if you use 1.3+fork1 instead of 1.3-fork1 it'll sort just fine as that's a PEP 440 local version and it has a defined sort.

Contributor

qwcode commented Jun 26, 2014

yes, I got that, just thinking of legacy packaging.

Owner

dstufft commented Jun 26, 2014

The === escape hatch can techincally be used in a project's metadata too, however setuptools itself won't allow that at the moment.

Here's the compatibility numbers by the way:

Total Version Compatibility:              231807/239450 (96.81%)
Total Sorting Compatibility (Unfiltered): 43095/45505 (94.70%)
Total Sorting Compatibility (Filtered):   45481/45505 (99.95%)
Projects with No Compatible Versions:     802/45505 (1.76%)
Projects with Differing Latest Version:   1163/45505 (2.56%)

An option we could do is if we can't find any PEP 440 compatible versions is fall back to pkg_resources. Which would allow places that have all incompatible versions to still operate as they did today. I'm not a massive fan and I didn't care enough to try to do that for the PoC but it's a possibility.

One of the important things here I think, is that if we can't parse a version then we really don't know how it should sort. An extreme example is which is the latest version, bob or dog (which is something allowed by pkg_resources. Unfortunately pkg_resources used the - symbol for more than one thing and on PyPI it was used a lot for -dev, -alpha etc, so the PEP 440 rules normalize it to things like 1.0-dev -> 1.0.dev0.

It's possible that PEP 440 could be expanded to treat -<anything that's not a|b|c|rc|alpha|beta|dev> as a "local version" as if you had did +whatever. So in the example of 1.0-fork1 it'd normalize to 1.0+fork1 and work fine.

Owner

dstufft commented Jun 26, 2014

Those numbers are from what's on PyPI btw, which of course is unlikely to have many forks.

Owner

dstufft commented Jun 26, 2014

Paging @ncoghlan to get an opinion on normalizing -anything to +anything as long as it doesn't match one of the pre-release normalizations.

Member

ncoghlan commented Jun 26, 2014

I quite like the idea of trying to use the more permissive local version
parsing to get a defined sort order for even more legacy versions. I see a
couple of possibilities:

  • for each hyphen (starting from the right), replace it with "+" to see if
    that results in a valid public+local version
  • if that still doesn't work, prepend a "0+" to treat the whole string
    as a local version

Only if even the mostly-lexical sorting offered by local versions failed
would we give up entirely.

Owner

dstufft commented Jun 26, 2014

Ok, I'll see about implementing that in packaging and seeing what that does to our compatibility.

Owner

dstufft commented Jul 4, 2014

I started with compatibility numbers that looked like:

$ invoke check.pep440 --cached
Total Version Compatibility:              233644/241356 (96.80%)
Total Sorting Compatibility (Unfiltered): 43231/45649 (94.70%)
Total Sorting Compatibility (Filtered):   45625/45649 (99.95%)
Projects with No Compatible Versions:     800/45649 (1.75%)
Projects with Differing Latest Version:   1169/45649 (2.56%)

Then I made the it so that - is an alternate spelling for +, unless it matches one of the other rules (-dev, -alpha, etc). This gave us compatibility numbers that look like:

$ invoke check.pep440 --cached
Total Version Compatibility:              237764/241356 (98.51%)
Total Sorting Compatibility (Unfiltered): 44379/45649 (97.22%)
Total Sorting Compatibility (Filtered):   45526/45649 (99.73%)
Projects with No Compatible Versions:     398/45649 (0.87%)
Projects with Differing Latest Version:   576/45649 (1.26%)

Then I made it so that - is also an alternate spelling for the . inside of a local version. This gave us compatibility numbers that look like this:

$ invoke check.pep440 --cached
Total Version Compatibility:              238325/241356 (98.74%)
Total Sorting Compatibility (Unfiltered): 44501/45649 (97.49%)
Total Sorting Compatibility (Filtered):   45509/45649 (99.69%)
Projects with No Compatible Versions:     343/45649 (0.75%)
Projects with Differing Latest Version:   508/45649 (1.11%)

Then I made it so that we allow a v at the start of a version (which we ignore and normalize away) and that got us compatibility numbers that look like this:

$ invoke check.pep440 --cached
Total Version Compatibility:              238684/241356 (98.89%)
Total Sorting Compatibility (Unfiltered): 44562/45649 (97.62%)
Total Sorting Compatibility (Filtered):   45488/45649 (99.65%)
Projects with No Compatible Versions:     292/45649 (0.64%)
Projects with Differing Latest Version:   464/45649 (1.02%)

Then I made it so that we allow a - or a . between a pre/post/dev marker and the numeral. This allows things like 1.0.alpha.0 which normalize as 1.0a0. This got us compatibility numbers that look like this:

$ invoke check.pep440 --cached
Total Version Compatibility:              238879/241356 (98.97%)
Total Sorting Compatibility (Unfiltered): 44618/45649 (97.74%)
Total Sorting Compatibility (Filtered):   45495/45649 (99.66%)
Projects with No Compatible Versions:     278/45649 (0.61%)
Projects with Differing Latest Version:   438/45649 (0.96%)

I'm not sure what else we can add to get even more compatible. This brings us back to the question of how compatible is compatible enough. So far every change slightly lowers our filtered sorting compatibility (this tells us how much alike we're sorting similar to pkg_resources removing things we can't sort) but it also increases our total compatibility.

Here's the list of versions which I still treat as invalid: https://gist.github.com/dstufft/c24bcacef202a3837600. I really only want to consider things that can be implemented by adjusting the regex/parsing and nothing that requires transforming the version itself. I think they are way easier to explain and implement and are far less likely to have bugs related to apply stacks of text transforms.

Owner

dstufft commented Jul 4, 2014

Also I'm not sure which of the above modifications we want to allow, all of them? some of them?

Owner

dstufft commented Jul 4, 2014

Here's another thing I tried, I allowed . as an alternate spelling for +. This results in a significant (comparatively) jump in accepted versions, but I'm really nervous about it. It feels like this one has the potential to misinterpret versions and make it confusing and/or surprising about how a version will be parsed much more so than any of the other changes I've tried above. However the compatibility numbers for this are:

$ invoke check.pep440 --cached
Total Version Compatibility:              240446/241356 (99.62%)
Total Sorting Compatibility (Unfiltered): 44884/45649 (98.32%)
Total Sorting Compatibility (Filtered):   45301/45649 (99.24%)
Projects with No Compatible Versions:     171/45649 (0.37%)
Projects with Differing Latest Version:   304/45649 (0.67%)
Owner

dstufft commented Jul 4, 2014

To be specific, the . as an alternate spelling for + means that 1.0.abcdefg would get interpreted as 1.0+abcdefg, but it also means that 1.0a0.1 would get interpreted as 1.0a0+1. The first one seems reasonable but the second one seems very wrong to me.

Owner

dstufft commented Jul 4, 2014

Another thing I tried, I allow _ anywhere that - and . were allowed. This got us numbers like this (without the above . as a stand in for + that I think is dangerous):

$ invoke check.pep440 --cached
Total Version Compatibility:              238967/241356 (99.01%)
Total Sorting Compatibility (Unfiltered): 44634/45649 (97.78%)
Total Sorting Compatibility (Filtered):   45483/45649 (99.64%)
Projects with No Compatible Versions:     271/45649 (0.59%)
Projects with Differing Latest Version:   432/45649 (0.95%)

The same rule, but including the dangerous thing above:

$ invoke check.pep440 --cached
Total Version Compatibility:              240533/241356 (99.66%)
Total Sorting Compatibility (Unfiltered): 44903/45649 (98.37%)
Total Sorting Compatibility (Filtered):   45294/45649 (99.22%)
Projects with No Compatible Versions:     165/45649 (0.36%)
Projects with Differing Latest Version:   296/45649 (0.65%)
Owner

dstufft commented Jul 4, 2014

Another thing I thought, this is another one that I'm not sure about. I don't think it's as dangerous as the other one though. An implied leading "0" on any version which does not have a leading numeral for the release segment. This allows versions like .1 to be normalized to 0.1 and versions like dev to be normalized to 0.dev0.

Compatibility numbers with this also applied are (without the above dangerous change):

$ invoke check.pep440 --cached
Total Version Compatibility:              239196/241356 (99.11%)
Total Sorting Compatibility (Unfiltered): 44740/45649 (98.01%)
Total Sorting Compatibility (Filtered):   45481/45649 (99.63%)
Projects with No Compatible Versions:     209/45649 (0.46%)
Projects with Differing Latest Version:   367/45649 (0.80%)

With the dangerous change:

$ invoke check.pep440 --cached
Total Version Compatibility:              240768/241356 (99.76%)
Total Sorting Compatibility (Unfiltered): 45007/45649 (98.59%)
Total Sorting Compatibility (Filtered):   45287/45649 (99.21%)
Projects with No Compatible Versions:     102/45649 (0.22%)
Projects with Differing Latest Version:   233/45649 (0.51%)
Owner

dstufft commented Jul 4, 2014

Another thing: Allowing rev, r, and pre as an alternate spelling of dev. This gives us numbers that look like:

Without the "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              239714/241356 (99.32%)
Total Sorting Compatibility (Unfiltered): 44797/45649 (98.13%)
Total Sorting Compatibility (Filtered):   45422/45649 (99.50%)
Projects with No Compatible Versions:     170/45649 (0.37%)
Projects with Differing Latest Version:   328/45649 (0.72%)

With the "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              240878/241356 (99.80%)
Total Sorting Compatibility (Unfiltered): 45019/45649 (98.62%)
Total Sorting Compatibility (Filtered):   45280/45649 (99.19%)
Projects with No Compatible Versions:     90/45649 (0.20%)
Projects with Differing Latest Version:   223/45649 (0.49%)
Owner

dstufft commented Jul 4, 2014

Oh, a side effect of the implicit leading 0 change is that an empty string is a valid version, which gets normalized to 0.

Owner

dstufft commented Jul 4, 2014

Ok, another change. In the release segment allow omiting a numeral anywhere which is an implicit 0. This makes a version like 1. normalize to 1.0 and 1... normalize to 1.0.0.0. This gives us numbers like:

Without "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              239747/241356 (99.33%)
Total Sorting Compatibility (Unfiltered): 44822/45649 (98.19%)
Total Sorting Compatibility (Filtered):   45422/45649 (99.50%)
Projects with No Compatible Versions:     163/45649 (0.36%)
Projects with Differing Latest Version:   321/45649 (0.70%)

With "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              240907/241356 (99.81%)
Total Sorting Compatibility (Unfiltered): 45043/45649 (98.67%)
Total Sorting Compatibility (Filtered):   45283/45649 (99.20%)
Projects with No Compatible Versions:     83/45649 (0.18%)
Projects with Differing Latest Version:   216/45649 (0.47%)
Member

ncoghlan commented Jul 4, 2014

Scanning the list of "still incompatible" options, I see the following major points:

  • hashes as part of the revision (this is likely the most significant factor in the compatibility jump for treating "." as "+", but I agree with you that it's a problem from a semantic perspective)
  • leading and trailing "-" and "." characters
  • "r", "rev", "p" and "pre" as component labels (with and without a numeric part)

So, I like most of your changes, except:

  • I don't like the "treat . as +" change. Yes, we'll treat hashes as orderable if someone includes them in a local version, but we shouldn't allow that implicitly (I know that contradicts my "implied 0+" prefix suggestion from earlier, but after seeing the list, I realised it was a bad idea)
  • "r", "rev" and "p" don't read as "dev" equivalents to me, they're more like "post". That suggests attempting to normalise them may be a bit closer to guessing than we would like.
Owner

dstufft commented Jul 4, 2014

And Another change! This time, allow "empty" segments in the local version, essentially allowing 1.0+1...0, or even trailing segments like 1.0+abc-. Each "empty" segment is an implicit 0. This gives us numbers like:

Without "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              239843/241356 (99.37%)
Total Sorting Compatibility (Unfiltered): 44875/45649 (98.30%)
Total Sorting Compatibility (Filtered):   45420/45649 (99.50%)
Projects with No Compatible Versions:     157/45649 (0.34%)
Projects with Differing Latest Version:   310/45649 (0.68%)

With "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              241004/241356 (99.85%)
Total Sorting Compatibility (Unfiltered): 45097/45649 (98.79%)
Total Sorting Compatibility (Filtered):   45282/45649 (99.20%)
Projects with No Compatible Versions:     76/45649 (0.17%)
Projects with Differing Latest Version:   204/45649 (0.45%)
Owner

dstufft commented Jul 4, 2014

Now I really am out of ideas :)

Owner

dstufft commented Jul 4, 2014

So, the reason I treated rev and r as dev releases, because If I recall correctly setuptools has a routine that will autogenerate versions from SVN and it uses either rev or r. I may be remembering that wrong, but in my mind autogenerated from VCS == development version.

Owner

dstufft commented Jul 4, 2014

Ok, so that's two people who don't like the implicit . is + idea, so I'll drop that out of my stack.

Member

ncoghlan commented Jul 4, 2014

Could you put together two lists? The versions from the "no compatible versions" projects and the old selection & new selection for the "latest version changed" projects?

Member

ncoghlan commented Jul 4, 2014

Ah, if the "r" is referring to svn revisions, then yes, it would count as a "dev" release. I don't really mind including that one, since it would only change the sort order if that was used together with a/b/c style numbering.

Owner

dstufft commented Jul 4, 2014

Sorry, not sure what you're asking. Is it, you want the list of all the versions from "no compatible" using the original rules? e.g. what I started with? And then what's the second thing you want?

Member

ncoghlan commented Jul 4, 2014

I was asking about the remaining lists after the changes discussed here (except the dangerous one we both agree is a dubious idea):

  • the available versions for the 157 projects with nothing currently compatible
  • for the 310 projects where the latest version changed, what is the setuptools answer vs the patched answer?
Owner

dstufft commented Jul 4, 2014

Ok, I can do that, moment.

Owner

dstufft commented Jul 4, 2014

Do you care which project those versions are attached too, or do you just want a list of versions? E.g. do you want a list of versions, or a dictionary where key = project, and value = list of versions?

Member

ncoghlan commented Jul 4, 2014

Mostly curious about the versions, to see if we can spot patterns.

Owner

dstufft commented Jul 4, 2014

Ok, two different files in this gist, this is with everything I mentioned above minus the "dangerous" change (it doesn't allow p for pre though it could, but i don't think I saw any like that). Location: https://gist.github.com/dstufft/ff0fb00f887ccd703908

Do we want any sort of list of the actual changes proposed as well as what the relative amounts of compatibility is? I can try to do something like that though it's harder because often times two changes combined will be a greater number of versions than either change alone.

The biggest thing i worry about is overly complicated the rules for what is an "allowed" version, although honestly none of these changes were particularly difficult. Here's the diff against packaging.version for everything thus far: https://gist.github.com/dstufft/a705828ea1dd5f5f3fcc or here it is in it's actual form (the regex at least): https://gist.github.com/dstufft/9ae109ff8553b181ac2c. These don't include changes that are required in the specifiers code, but it's basically just copying some stuff from this regex down there too.

Owner

dstufft commented Jul 4, 2014

One thing I'm not sure of.. If there are two optional greedy parts of a regex, is is a guarentee the first one will be matched? Example: ^(?P<thing>dev)?(?<frob>[a-z]+)?$, will dev always match in the thing group, even though it's also valid for the frob group?

Member

ncoghlan commented Jul 4, 2014

Yes, regexes work from left to right grabbing as much of the data stream as they can (except for non-greedy matches, but even those will try to match, even if the next field could consume the same data)

Owner

dstufft commented Jul 4, 2014

Ok, thought that was the case but I couldn't remember if that was part of the regex spec or not.

Member

ncoghlan commented Jul 4, 2014

Scanning the "differing latest versions" list:

  • a few examples where a hash is a legal version by sheer dumb luck (only non-numeric entry is an "a", "b" or "c")
  • several examples where the implicit leading zero in PEP 440 is giving a better answer than the pkg_resources answer
  • several examples of a "-rN" suffix, where PEP 440 is either ignoring them, or treating as pre-releases. That looks dubious, but would happen regardless of how we treat -r suffixes.

So I don't see any other obvious enhancements that would improve compatibility without increasing confusability. Regarding "strict" vs "acceptable", I think it may be worth including both regexes directly in the PEP. The strict regex should be used for testing the output of code that generates versions, while the relaxed one would be for consuming versions.

Owner

dstufft commented Jul 4, 2014

To make this a little bit easier, here are the "core" things I've done thus far:

  1. Allow _ and - as an alternate spelling for + to denote local version, unless it matches a -dev, -alpha, etc.
  2. Allow - and _ as an alternate spelling for . inside of a local version
  3. Allow a no-op v at the beginning of a version
  4. Allow a -, ., or _ in between pre/dev/post versions and their numerals
  5. Allow rev, and r as alternate spellings of dev
  6. Allow pre, preview, p as an alternate spelling of c, this matches what setuptools does according to the docs [1].
  7. Allow implicit zeros for missing segments in the release field
  8. Allow implicit zeros for missing segments in the local version field

Of these I think that 1 is the most useful for "sane" versions, but it's also the hardest to explain (although it's not terribly difficult to explain) but it does have the effect that we would mis-categorize some versions such as 1.0-git.abcdef which would go from a prerelease to a final release with a local version.

I think 2 is relatively low damage, although when you combine it with 1 you'll have versions like 2014-07-03 turn into 2014+07.03 but I don't think we have more than a handful of those.

I think 3 is incredibly safe, but it only buys us a few versions, so I'm meh on it in either direction.

I think 4 is mostly safe, I can't think of any versions off hand it wouldn't categorize correctly.

I know I said 5 should be categorized as dev, but apparently setuptools (again looking at the docs) calls these post releases and sorts them after the normal release. The one sticker here is that setuptools allows you to make a post release of a dev release which we do not allow. So even though in my mind a -r or a -rev sounds like a dev release, the docs sounds different.

6, this comes straight from the setuptools docs. It's different then how I was normalizing in my numbers (I was using dev rather than c, but the setuptools docs claims c).

7, this allows some ugly versions, I'm not sure how I feel about it but it's not particularly troublesome I don't think. Of note I don't really like that it makes an empty string legal, or things like 1........., however those are mostly visual things for me.

8 is basically the same thing as 7 except for the local version.

[1] https://pythonhosted.org/setuptools/setuptools.html#specifying-your-project-s-version

Owner

dstufft commented Jul 4, 2014

Oh I should double check to make sure that setuptools actually follows it's documentation for versions.

Owner

dstufft commented Jul 4, 2014

Ah to correct myself, setuptools does not allow p to stand for c, but it does allow pre and preview so I think those are no brainers (looking at the actual behavior).

It also does treat -r and -rev as post releases, so I think we should do that as well.

Member

ncoghlan commented Jul 4, 2014

"p" tends to stand for "patch release" in my experience. So I'd be OK with mapping all of "p", "r" and "rev" to "post". (I'm also OK with just leaving out "p" from the normalisation entirely)

As far as mischaracterising releases containing VCS hashes goes, I don't think that's avoidable at this point - we can mishandle those due to random chance in the hash details.

Owner

dstufft commented Jul 4, 2014

Here are some more numbers, i left p out and I've made all the adjustments I've mentioned from looking at setuptools docs/code/behavior.

$ invoke check.pep440 --cached
Total Version Compatibility:              239849/241356 (99.38%)
Total Sorting Compatibility (Unfiltered): 44954/45649 (98.48%)
Total Sorting Compatibility (Filtered):   45501/45649 (99.68%)
Projects with No Compatible Versions:     154/45649 (0.34%)
Projects with Differing Latest Version:   281/45649 (0.62%)

Just for kicks, I went ahead and ran the numbers without the 2nd rule in my post (so - and . are not allowed in local versions) and I got this:

$ invoke check.pep440 --cached
Total Version Compatibility:              239311/241356 (99.15%)
Total Sorting Compatibility (Unfiltered): 44826/45649 (98.20%)
Total Sorting Compatibility (Filtered):   45515/45649 (99.71%)
Projects with No Compatible Versions:     199/45649 (0.44%)
Projects with Differing Latest Version:   344/45649 (0.75%)

So like most of these, we're trading some amount of filtered (which is how pip will be sorting them, filtered) compatibility to trade for making some amount of projects able to be installed at all or have their latest version correct. I'm slightly worried about the 2014-07-03 case, but not enough to fight for it, so I'm happy either with or without the 2nd rule.

Owner

dstufft commented Jul 4, 2014

And just for kicks, I added p as an alias for post, it made a minor amount of difference (same order as above, with and without the _ and - in local versions):

$ invoke check.pep440 --cached
Total Version Compatibility:              239988/241356 (99.43%)
Total Sorting Compatibility (Unfiltered): 44991/45649 (98.56%)
Total Sorting Compatibility (Filtered):   45501/45649 (99.68%)
Projects with No Compatible Versions:     150/45649 (0.33%)
Projects with Differing Latest Version:   267/45649 (0.58%)
$ invoke check.pep440 --cached
Total Version Compatibility:              239449/241356 (99.21%)
Total Sorting Compatibility (Unfiltered): 44861/45649 (98.27%)
Total Sorting Compatibility (Filtered):   45515/45649 (99.71%)
Projects with No Compatible Versions:     195/45649 (0.43%)
Projects with Differing Latest Version:   330/45649 (0.72%)

I don't have a strong opinion on how it should be handled. I agree that in my experience p has denoted patch releases, however setuptools will treat it as a pre-release and we have pre and preview in pre-releases and post in post releases so it could go either way.

Owner

dstufft commented Jul 4, 2014

Treated as a pre-release but not as an alias for c, setuptools just allows any letter to denote a pre-release.

Owner

dstufft commented Jul 4, 2014

We should probably note in the PEP that the normalization allows some crazy versions and that people really really should use the normal forms.

Owner

dstufft commented Jul 4, 2014

In fact, maybe we should say that tooling SHOULD normalize versions wherever they are input, so for example if you do like ``setup(version="...")then setuptools should act like you put0.0.0`.

Member

ncoghlan commented Jul 4, 2014

Yes, we really want the build and publishing tools to do the normalisation as well. pip still needs to do it as a backwards compatibility measure, but build tools SHOULD normalise things. Postel's Law applies :)

I think we need to compare your original invalid list above with the latest invalid list, as it appears the new rules have combined to create some undesirable outcomes. "-main-.-VLazy.object.at.0x1006edf10-" was my favourite from the original version list, but the combination of the "implied zero for missing numeric segments" and "interpret '-' as '+' or '.' as appropriate" makes it a valid PEP 440 version that would normalise to "0+main.0.0.VLazy.object.at.0x1006edf10.0". This is... not ideal :)

If we don't allow implicit local versions at all, or if we continue to disallow '-' in local versions, then it would remain invalid.

Also, a question about the "differing latest versions" stats - does that number include the ones where the latest PEP 440 version is "not found"? There seemed to be quite a few 'null' entries in the JSON output.

Owner

dstufft commented Jul 4, 2014

Do you just want a copy of the latest invalid list? I can make one of those easily.

For the differing versions if pkg_resources gets a latest version but PEP 440 does not then that's counted as a differing version. Basically I just do:

left = sorted(versions, key=pkg_resources.parse_version)
right = sorted(versions, key=packaging.version.Version)

if left[-1:] != right[-1:]:
   pass  # Is Differing

So if PEP 440 can't parse it, then right[-1:] will be [].

Member

ncoghlan commented Jul 4, 2014

The fact that "-main-.-VLazy.object.at.0x1006edf10-" is now a valid version meant I became interested in the list of versions which were invalid, but have become valid after the latest tweaks. I think those should drive the decision on which rules we judge to be too permissive (I think we need to drop either the 1st or 2nd change in your last numbered list)

As far as the "differing latest version goes", we may want to be clearer that the last number in the stats is a superset of the "no compatible version" number - I had previously been adding them together, which isn't correct.

Owner

dstufft commented Jul 4, 2014

I can make a quick list of which versions changed status between original PEP 440 rules and the current rules, moment.

Owner

dstufft commented Jul 4, 2014

Here are the versions which were invalid prior to any of the normalizations that were added today and are now valid with the latest set of normalizations: https://gist.github.com/dstufft/72edb47d9a38474b153e

Member

ncoghlan commented Jul 4, 2014

Looking at those, I think the change to drop is the one that allows "-" to be treated like "." inside a local version. The problem is that "-" is the setuptools escape character, so all sorts of nonsense will appear as "-" in a version number. We see that with the repr() example, where there are several '-' entries.

I'd replace it with a narrower rule, which ignores any leading or trailing '-' character entirely.

Owner

dstufft commented Jul 4, 2014

The ones that stand out to me as ones that we should fail on, or we are likely parsing "wrong":

  • -
  • -VERSION-
  • -main-.-VLazy.object.at.0x1006edf10-
  • -n.0.18.44-
  • .
  • 0.0.0dev-r4238 (The -r4238 will be seen as a local version)
  • 0.0.11-pre-alpha (The -alpha will be seen as a local version)

Looking at them I think that we should actually drop 7 and 8 that gives us these numbers:

$ invoke check.pep440 --cached
Total Version Compatibility:              239384/241356 (99.18%)
Total Sorting Compatibility (Unfiltered): 44757/45649 (98.05%)
Total Sorting Compatibility (Filtered):   45510/45649 (99.70%)
Projects with No Compatible Versions:     241/45649 (0.53%)
Projects with Differing Latest Version:   372/45649 (0.81%)

And gives us a difference of these versions: https://gist.github.com/dstufft/1ef5fd1690e44b3ce002

The implicit-ness of it bothers me, I don't mind the implicit epoch of 0, because that is the common case, however it isn't common at all for there to be a 1. version or such and I think that the fact ... and -.- are valid versions is kind of silly.

That doesn't solve the second set of "wrong", but I don't think we can solve that without being strict about the separator for local versions being +.

Owner

dstufft commented Jul 4, 2014

And yes, - is used by setuptools as the escape character, however it's also the "divide two segments of things in the version" character. The official documented way in setuptools to do a "patch release" is 1.0-3.

Owner

dstufft commented Jul 4, 2014

Just for kicks, I went back and made it so we had a strict requirement for + for local version separator. Looking at the compatibility numbers it's clear that this rule is one of our best, if we drop it we drop down to 97.73% compatibility in parsing and 1.33% / 1.94% for no compat / differing.

However, part of me worries that we're muddying the waters in an attempt to get nicer looking numbers. Setuptools does not have the concept of local versions, the closest thing it has is "post release tag" which it uses for " Post-release tags are generally used to separate patch numbers, port numbers, build numbers, revision numbers, or date stamps from the release number.".

My problem comes from the fact that I'm not sure how I feel about defining this to mean "API compatible with the public version" since setuptools has no such semantics attached to it.

In other words, are we really 99% compatible, or are we just munging data until it looks like we are.

Owner

dstufft commented Jul 4, 2014

So I was interested in how exactly we were parsing those versions compared to the string that they put in originally, and it's here: https://gist.github.com/dstufft/9b1b7d7b705ff9fc297c.

Owner

dstufft commented Jul 4, 2014

Ugh, looking at that list I'm really starting to think the - separator for local versions is a bad idea. In setuptools world 0.6a9.dev-r41475 is the r41475 post release of the in development version of the 9th alpha of 0.6.

We don't have any way to model that in PEP 440, you can't make post releases of a development release. With the - local version rule, that gets parsed to 0.6a9.dev0+r41475 which is "the zeroth development release of the 9th alpha of 0.6 with some API compatible patches applied to it denoted by r41475.

IOW we're making bad guesses about what a version means :(

Member

ncoghlan commented Jul 4, 2014

I think it's that "patch release" notion in setuptools that makes the mapping to local versions defensible. Looking at the examples, I mostly see:

  • incrementing numbers (perhaps with a leading text prefix)
  • version control identifiers (which includes the "-r41475" stuff - that's a subversion commit id, not any kind of post release)
  • dates

The ones that look most dubious could likely be fixed by also allowing '-' as the separator in within the release segment. For example, "0-6-0" currently gets normalised as "0+6.0", but would be better interpreted as "0.6.0".

Owner

dstufft commented Jul 4, 2014

Yea I suppose, it feels kind of grody but I don't have a better answer besides not allowing it, and that stinks too.

Member

ncoghlan commented Jul 4, 2014

Forgot to say: +1 for dropping the "implicit 0" rules for the release and local version segments.

Member

ncoghlan commented Jul 4, 2014

The "-" -> "+" rule is one where it might be nice to emit a warning when we're having to rely on it, just as we would for falling back to pkg_resources entirely due to a lack of PEP 440 compatible versions. Although if we're going to emit a warning anyway, then perhaps we should just leave it as "not compatible", and let the pkg_resources fallback deal with it.

Owner

dstufft commented Jul 4, 2014

I think the thing that bothers me the most about it, is that 1.0-r1-r1 will be 1.0.post1+r1. The same characters mean something different depending on where in the version string they are.

Owner

dstufft commented Jul 4, 2014

The same holds true for any of our pre/dev/post release specifiers of course. 1.0-dev-dev is 1.0.dev0+dev. It feels strange to me.

Owner

dstufft commented Jul 4, 2014

Oh, and for the record, I'm against warnings. Warnings are for build/publishing tools that allow you to use an invalid version (e.g. one you'd need to use === to specify). They aren't for the core spec because 99% of the time the people getting those warnings are going to be people installing things who have no control over what the authors of the packages do.

Owner

dstufft commented Jul 4, 2014

I'm gonna go to bed and think about this some more. For what it's worth switching between allowing - and being strict about + is a tiny code change, so I can swap back and forth easily.

Member

ncoghlan commented Jul 4, 2014

I think you've convinced me that including the "-" -> "+" transformation will entrench the kind of weird inconsistencies that PEP 440 is designed to eliminate. Let's leave it out, and instead deal with it by having pip fall back to pkg_resources if it can't find any matching versions. In terms of reporting compatibility numbers, that would mean changing the last two lines of the report to:

Falls back to pkg_resources: X%
Installs a different "latest version": Y%

The second last line would be the same number as the current "Projects with no compatible versions", the last line would be the current "Projects with differing latest versions" minus the projects with no compatible versions.
Falling back to pkg_resources isn't a big deal - over time, that will fix itself as setuptools gets pickier and projects adopt compliant version numbering.
It's the "installs a different latest version" that we need to be wary of, since that may cause inadvertent downgrades.

Member

ncoghlan commented Jul 4, 2014

Also, agreed that the warnings should come into play on the setuptools and twine side of things, rather than the pip side.

Owner

dstufft commented Jul 4, 2014

I wonder if we even need to maintain compatibility here by falling back to pkg_resources. We're looking at total compatibility as a whole but that assumes that each version and each project are equally as important. We do weight things some with the last two numbers however not being able to install the latest version of a library released 10 years ago and which only works on some old version of Python is significantly different than not being able to install say lxml.

Owner

dstufft commented Jul 4, 2014

IOW, I'm going to look at what those projects actually are, to see if they are projects that we need to worry about.

Owner

dstufft commented Aug 5, 2014

Coming back around to this, what I've got now is:

  1. Allow a preceding literal v character which is simply ignored, normalizes to being omitted.
  2. Allow literal -, ., or _ character as well as no character preceding or following the pre-release, post-release, or dev-release signifier (dev, post, etc).
  3. Allow additional spellings of alpha, beta, rc, pre, preview which mean, a, b, c, c, and c.
  4. Allow additional spellings of r and rev which mean post.
  5. Allow the local version segment separator (inside the local version, not the preceding + to be the literal characters ., _, or -, which normalizes to ..
  6. Ignore leading and trailing whitespace (`,`\t`,`\n`,`\r`,`\f`,`\v`). This comes *before* the ignored`v character if one exists.

Notable things which I am not including:

  • Local versions must be designated with a +, the relaxed rules were simply too confusing and too ambiguous.
  • No implicit 0 for missing release segments (e.g. 1. is not 1.0).
  • No implicit 0 for missing local version segments (e.g. 1+1. is not 1+1.0).
  • I experimented with allowing - and _ in addition to . between release segments (e.g. 1-0 is 1.0. However I decided against it because this was different than how pkg_resources interprets it and in a way that affects ordering. There are projects which use date based versions like 2014-08-04 however we can't pick those up without also incorrectly allowing the documented use of the - in pkg_resources.

This brings us to this:

$ invoke check.pep440 --cached
Total Version Compatibility:              244249/250042 (97.68%)
Total Sorting Compatibility (Unfiltered): 45081/47058 (95.80%)
Total Sorting Compatibility (Filtered):   47002/47058 (99.88%)
Projects with No Compatible Versions:     622/47058 (1.32%)
Projects with Differing Latest Version:   904/47058 (1.92%)

My method for determining which things to accept and which not to accept is fairly simple.

  1. Does it introduce likely ambiguity? (Likely being a keyword here, of course maybe someone uses some scheme where v1.0 is different than 1.0 but it's not likely).
  2. Does it break compatibility with pkg_resources, particularly in how it is ordered.
  3. Does it make versions which are unreasonably hard to read as a human being.

Of these, the first rule is the strictest one, with the second one being a close followup. The third rule is basically the "well including this rule isn't very helpful, but is it harmful?" differentiation.

Member

ncoghlan commented Aug 5, 2014

Assuming you meant 'c' rather than 'C' those rules sound reasonable to me. Your guidelines and a reference to this issue will likely cover most of the "rationale" section of the PEP update :)

So we have 622 projects that become uninstallable, and 282 where the version installed changes. A quick scan of both lists would likely be useful (especially the second one, since that's a silent change when folks aren't using version pinning)

Owner

dstufft commented Aug 5, 2014

Oh, did I type C somewhere? Yea I meant c.

Owner

dstufft commented Aug 5, 2014

For the record, I'm updating pypa/packaging at the moment to match these rules in all locations, as well as finish implementing this PR and such. Once that's done I'll consider this idea validated and update the PEP itself. Then I think we can call the PR ready for distutils-sig and hopefully we can get it accepted.

Owner

dstufft commented Aug 5, 2014

Here is the list of projects and versions with no compatible versions: https://gist.github.com/dstufft/5c291d6ae03eaa67e96d
Here is the list of projects and versions which have a differing latest version: https://gist.github.com/dstufft/6092a3f8c1ebfb3d246a

Quick reminder that differing latest version includes by definition no compatible version projects (since being able to parse something is different than not being able to parse something) and is represented by a null in the pep440 key.

Owner

dstufft commented Aug 5, 2014

Just to possibly make things a little bit easier, here's a list of projects which have a differing latest version and which does not include the stuff where there are no valid PEP 440 versions. This will represent the 282 versions which will get a different latest version silently.

Mostly this looks like things where the latest version of a project doesn't successfully parse with PEP 440 but older versions do successfully parse. However I do notice at least one case where the PEP 440 version appears to be more correct, e.g. Django-base which pkg_resources says the latest version is 0.91 however PEP 440 says the latest version is v1.1. So it's probably important to point out that some of these differences are where we got better not just different. See: https://gist.github.com/dstufft/412717a3435342ad125f

I also see other ones like MangoEngine where pkg_resources says the latest version is 0.1-rc.1 however we say the latest version is v1.0.0-rc.2.

Another one is bda.recipe.deployment where pkg_resources says 2.0beta and PEP 440 says 2.0b8.

Owner

dstufft commented Aug 5, 2014

It looks like all the tests are passing on this now. This can't be merged as is or anything like that since PEP 440 needs accepted first and it's likely we want to get this into setuptools as well (maybe even prior to getting it into pip).

Member

ncoghlan commented Aug 5, 2014

Yep, I think this one looks like a winner. It would be nice to support the "YYYY-MM-DD" date based releases along with the "-N" patch level notation, but I agree that lets too much nonsense through and makes things overly confusing.

Member

ncoghlan commented Aug 5, 2014

And yes, it would be good to get setuptools applying the normalisation (and complaining about incompatible versions) before we publish a corresponding version of pip. It would also give us an easier way to advise owners of incompatible packages to update their version numbers.

Owner

dstufft commented Aug 5, 2014

Ok I lied, I have one more possible thing.

Techincally pkg_resources supports -<any alpha string> and this represents a patch level release which comes after the same version without that -<any alpha string>. We have two constructs which sort after a version which are post releases and local versions. We attempted to use this for local versions however we were not successful because of the ambiguity it creates.

What we did not try, is normalizing -<whatever> into a post release. This is actually a more accurate translation of the meaning of the -<whatever> syntax in pkg_resources. The problem being that while pkg_resources supports any thing after the - character, our post releases can only contain numbers. However we could simply limit support for this to -<numerals>.

This should not be ambiguous if we only allow the - characters (and perhaps _) and not .. If we included the . then we couldn't tell it apart from another digit on the release segment. Both pre-releases and dev releases will still require some additional characters in order to be specified so this shouldn't be ambigious with them, and local versions use the + signifier so it shouldn't be ambiguous with that either. This would mean that 1.0-mypatch1 is considered invalid but 1.0-1 is valid and is normalized to 1.0.post1.

A quick look at what this does on PyPI is it brings our numbers down to:

$ invoke check.pep440 --cached
Total Version Compatibility:              245340/250042 (98.12%)
Total Sorting Compatibility (Unfiltered): 45330/47058 (96.33%)
Total Sorting Compatibility (Filtered):   46936/47058 (99.74%)
Projects with No Compatible Versions:     499/47058 (1.06%)
Projects with Differing Latest Version:   709/47058 (1.51%)

This adds an additional 123 projects which couldn't be installed previously, but now can and reduces the number of projects which can be installed, but which the latest version is silently different from 282 to 210. It also gives us the last remaining style of version from pkg_resources that we were not compatible with and for which we can be without re-introducing ambiguity.

Owner

dstufft commented Aug 5, 2014

I checked the difference between allowing only -N and allowing either -N or _N and the only difference was we went from 709 to 708 projects in "Projects with Differing Latest Version". I'm going to declare that we only support - for that field unless we think that it makes sense to support _ for symmetry with the other locations where we support -, _, and ..

Member

ncoghlan commented Aug 6, 2014

Allowing a trailing "-N" by normalising it to ".postN" sounds good to me. I think that change will also greatly increase the odds of the new answer being better than the pkg_resources answer when they're different.

@dstufft dstufft changed the title from Proof of Concept: PEP 440 Version and Specifiers to PEP 440 Version and Specifiers Dec 13, 2014

@dstufft dstufft added a commit that referenced this pull request Dec 13, 2014

@dstufft dstufft Merge pull request #1894 from dstufft/use-packaging
PEP 440 Version and Specifiers
bff1145

@dstufft dstufft merged commit bff1145 into pypa:develop Dec 13, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details

@dstufft dstufft deleted the dstufft:use-packaging branch Dec 13, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment