Improve version comparison for non released versions #7980

lesteve · 2016-12-05T09:45:49Z

Moved from #7902.

Our code comparing versions (for example in sklearn.utils.fixes which implement backports of older numpy, scipy, etc... functionalities) doesn't deal very well with non-released versions:

`sklearn.utils.fixes._parse_version`

parses a version string into a tuple and comparing the tuple afterwards. This will either give the wrong answer or raise an exception:

from sklearn.utils.fixes import _parse_version

_parse_version('1.12.0b1') < (1, 12)  # False although 1.12.0b1 is before 1.12

_parse_version('1.12b1') < (1, 12, 0)  # Exception trying to compare str and int in Python 3

`distutils.version.LooseVersion`

doesn't deal correctly with beta version either:

from distutils.version import LooseVersion
LooseVersion('1.12b1') < LooseVersion('1.12')  # False

`pkg_resources.parse_version`

does deal with beta versions:

from pkg_resources import parse_version

parse_version('1.12b1') < parse_version('1.12')  # True

Making scikit-learn depend on setuptools is controversial so porting a reasonable implementation in sklearn.utils was deemed the best option in #7902 (comment).

The text was updated successfully, but these errors were encountered:

amueller · 2016-12-06T00:14:48Z

see https://github.com/pypa/setuptools/blob/master/pkg_resources/__init__.py#L146 ?

lesteve · 2016-12-08T09:52:43Z

see https://github.com/pypa/setuptools/blob/master/pkg_resources/__init__.py#L146 ?

This is the deprecated version comparison which does not support PEP0440 (for example the regex does not support beta it seems). The new one is there:
https://github.com/pypa/setuptools/blob/master/pkg_resources/__init__.py#L146 ?

amueller · 2016-12-08T16:57:33Z

@lesteve that is the same link...

lesteve · 2016-12-08T18:12:37Z

@lesteve that is the same link...

Oops copy and paste gone wrong https://github.com/pypa/packaging/blob/master/packaging/version.py#L159.

dalmia · 2016-12-17T08:16:00Z

I want to work on the issue. Could you please help me get started on it?
Thanks.

lesteve · 2016-12-18T12:23:15Z

The idea is to take the code we need from https://github.com/pypa/packaging/blob/master/packaging/version.py (and possibly simplify it, we can probably get rid of legacy versions and classes that implement __lt__) in order to have parse_version in sklearn.utils. Then we would use it across the whole scikit-learn code base replacing sklearn.utils.fixes._parse_version and LooseVersion usages.

cmarmo · 2020-06-17T12:05:01Z

@lesteve, is this issue still relevant? Also, is the related PR a reusable solution for someone willing to help on this? Thanks for your help!

lesteve · 2020-06-17T12:59:06Z

I think it is still somewhat relevant but the cost/benefit ratio seems to be in favour of a "won't fix":

not that easy to fix without some significant effort
the benefit does not seem worth it since the problem only happens when using beta or development versions of dependencies (numpy, scipy, etc ...).
no that many issues linked for the past 2 years so probably not often encountered in practice

The simplest thing to do for the PR writer seems to make scikit-learn depend on setuptools and use pkg_resources.parse_version but it may be not that easy to convince maintainers ...

The #8301 PR went the consensual way (at the time) to copy the necessary code from setuptools into sklearn.utils.fixes. I would think that this approach stand very little chance to having a PR merged in the near future.

rth · 2020-06-18T17:32:23Z

is this issue still relevant?

yes, it is. Just run into it today. And after digging through CPython issues https://bugs.python.org/issue14894 it doesn't look like distutils.version.LooseVersion is ever going to be fixed, so we shouldn't use it.

The simplest thing to do for the PR writer seems to make scikit-learn depend on setuptools

Yeah, setuptools is already a build dependency. I'm also undecided between making it a runtime dependency or vendoring the part from pkg_utils/packaging under sklearn.externals. Neither is ideal.

Went with the first solution in scikit-learn-contrib/scikit-learn-extra#42 but has a much smaller user base, so an extra dependency doesn't matter.

Actually what we could do is to make add a wrapper that would use pkg_utils / packaging if it is installed and fallback to LooseVersion otherwise. That way at least people who run into this (who tend to be more advanced users) can fix it by installing the corresponding package (if it isn't already: I would estimate that setuptools/pkg_utils is installed in 95% of cases already). Otherwise right now there is nothing one can do to fix it.

lesteve · 2020-06-19T03:51:24Z

Actually what we could do is to make add a wrapper that would use pkg_utils / packaging if it is installed and fallback to LooseVersion otherwise. That way at least people who run into this (who tend to be more advanced users) can fix it by installing the corresponding package (if it isn't already: I would estimate that setuptools/pkg_utils is installed in 95% of cases already).

I did not think of this and this sounds a very reasonable approach indeed!

So something like this somewhere in sklearn/utils could be a start:

try:
    from pkg_resources import parse_version
except ImportError:
    parse_version = LooseVersion

After that you would need to replace all the LooseVersion and other ways we compare versions in scikit-learn (see my old comment #8301 (comment) which may still be relevant).

A bit tedious, but maybe this is a reasonable one for the sprint after all!

rth · 2020-06-19T06:55:49Z

Yes, I think we could just adapt #8301 I imagine most of the replacements there are still relevant.

ogrisel · 2020-06-23T06:48:52Z

I have tried to create a new conda env with conda create -y -n tmp python=3.6 (oldest supported version as we speak) and pkg_resources is installed.

I have also tried to install a python3.x-minimal on ubuntu and again from pkg_resources import parse_version works as well.

Same in docker run --rm -ti python:3.6 bash.

So I am pretty sure that at least 99% of our our users will have the pkg_resources package installed by default. It better to keep the above try / except pattern for safety though.

hs-nazuna · 2020-06-23T09:40:28Z

I want to tackle this issue.

The approach is to add code like the following (mentioned by @lesteve):

try:
    from pkg_resources import parse_version
except ImportError:
    parse_version = LooseVersion

and fix lines using version comparison function.

Is it good?

ogrisel · 2020-06-23T09:48:13Z

@hs-nazuna yes!

rth · 2020-06-23T10:13:32Z

@hs-nazuna The idea is to define a parse_version function in sklearn.utils with those contents. Then continue PR #8301 where after conflicts are resolved most changed aside from the function definition should still be correct.

hs-nazuna · 2020-06-23T10:23:47Z

@rth Thank you for your comment. I will do so.

lesteve · 2020-06-23T12:26:38Z

@hs-nazuna hard to tell which strategy is the best without actually doing it, but if I were you, I would create a PR from scratch since #8301 has a lot of conflicts and it may be a bit painful to fix all of them.

If you feel more optimistic than me, the alternative is to try to rebase #8301 on master and try to fix conflicts for 15 (maybe 30) minutes. If you don't succeed in this much time, start a PR from scratch.

rth · 2020-06-23T12:34:49Z

to rebase #8301 on master and try to fix conflicts for 15 (maybe 30) minutes

I would say more like 10. It's much better to merge upstream/master in to sync, rather than rebase as that leads to fewer conflicts that needs to be resolved. With rebase one would need to fix conflicts for each commit in that PR..

hs-nazuna · 2020-06-23T12:35:52Z

but if I were you, I would create a PR from scratch since #8301 has a lot of conflicts and it may be a bit painful to fix all of them.

I think so too. And now I am trying to create a new PR.

lesteve added Enhancement Need Contributor labels Dec 5, 2016

amueller mentioned this issue Dec 8, 2016

Numpy import error #7999

Closed

amueller mentioned this issue Dec 12, 2016

Version Error with Numpy when version contains string "<int>, <int>, <string>" #8034

Closed

dalmia mentioned this issue Feb 6, 2017

[MRG] added parse_version to sklearn.utils #8301

Closed

lesteve removed the Need Contributor label Feb 15, 2017

lesteve mentioned this issue Mar 15, 2017

[Filing bugs] Error：scikit-learn/sklearn/utils/fixes.py #7898

Closed

lesteve mentioned this issue Jul 26, 2017

[MRG] Make all version comparison use LooseVersion #9452

Closed

cmarmo added the help wanted label Jun 13, 2020

cmarmo removed the help wanted label Jun 17, 2020

rth mentioned this issue Jun 18, 2020

New feature RobustWeightedEstimator : code + examples scikit-learn-contrib/scikit-learn-extra#42

Merged

cmarmo added the help wanted label Jun 19, 2020

hs-nazuna mentioned this issue Jun 23, 2020

MNT define a parse_version function in sklearn.utils.fixes #17670

Merged

cmarmo removed the help wanted label Jun 23, 2020

rth closed this as completed in #17670 Jun 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve version comparison for non released versions #7980

Improve version comparison for non released versions #7980

lesteve commented Dec 5, 2016 •

edited

amueller commented Dec 6, 2016

lesteve commented Dec 8, 2016 •

edited

amueller commented Dec 8, 2016

lesteve commented Dec 8, 2016

dalmia commented Dec 17, 2016

lesteve commented Dec 18, 2016 •

edited

cmarmo commented Jun 17, 2020

lesteve commented Jun 17, 2020

rth commented Jun 18, 2020

lesteve commented Jun 19, 2020 •

edited by ogrisel

rth commented Jun 19, 2020

ogrisel commented Jun 23, 2020

hs-nazuna commented Jun 23, 2020

ogrisel commented Jun 23, 2020

rth commented Jun 23, 2020

hs-nazuna commented Jun 23, 2020

lesteve commented Jun 23, 2020

rth commented Jun 23, 2020 •

edited

hs-nazuna commented Jun 23, 2020

Improve version comparison for non released versions #7980

Improve version comparison for non released versions #7980

Comments

lesteve commented Dec 5, 2016 • edited

sklearn.utils.fixes._parse_version

distutils.version.LooseVersion

pkg_resources.parse_version

amueller commented Dec 6, 2016

lesteve commented Dec 8, 2016 • edited

amueller commented Dec 8, 2016

lesteve commented Dec 8, 2016

dalmia commented Dec 17, 2016

lesteve commented Dec 18, 2016 • edited

cmarmo commented Jun 17, 2020

lesteve commented Jun 17, 2020

rth commented Jun 18, 2020

lesteve commented Jun 19, 2020 • edited by ogrisel

rth commented Jun 19, 2020

ogrisel commented Jun 23, 2020

hs-nazuna commented Jun 23, 2020

ogrisel commented Jun 23, 2020

rth commented Jun 23, 2020

hs-nazuna commented Jun 23, 2020

lesteve commented Jun 23, 2020

rth commented Jun 23, 2020 • edited

hs-nazuna commented Jun 23, 2020

lesteve commented Dec 5, 2016 •

edited

`sklearn.utils.fixes._parse_version`

`distutils.version.LooseVersion`

`pkg_resources.parse_version`

lesteve commented Dec 8, 2016 •

edited

lesteve commented Dec 18, 2016 •

edited

lesteve commented Jun 19, 2020 •

edited by ogrisel

rth commented Jun 23, 2020 •

edited