Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve version comparison for non released versions #7980

Closed
lesteve opened this issue Dec 5, 2016 · 19 comments · Fixed by #17670
Closed

Improve version comparison for non released versions #7980

lesteve opened this issue Dec 5, 2016 · 19 comments · Fixed by #17670

Comments

@lesteve
Copy link
Member

lesteve commented Dec 5, 2016

Moved from #7902.

Our code comparing versions (for example in sklearn.utils.fixes which implement backports of older numpy, scipy, etc... functionalities) doesn't deal very well with non-released versions:

sklearn.utils.fixes._parse_version

parses a version string into a tuple and comparing the tuple afterwards. This will either give the wrong answer or raise an exception:

from sklearn.utils.fixes import _parse_version

_parse_version('1.12.0b1') < (1, 12)  # False although 1.12.0b1 is before 1.12

_parse_version('1.12b1') < (1, 12, 0)  # Exception trying to compare str and int in Python 3

distutils.version.LooseVersion

doesn't deal correctly with beta version either:

from distutils.version import LooseVersion
LooseVersion('1.12b1') < LooseVersion('1.12')  # False

pkg_resources.parse_version

does deal with beta versions:

from pkg_resources import parse_version

parse_version('1.12b1') < parse_version('1.12')  # True

Making scikit-learn depend on setuptools is controversial so porting a reasonable implementation in sklearn.utils was deemed the best option in #7902 (comment).

@amueller
Copy link
Member

amueller commented Dec 6, 2016

@lesteve
Copy link
Member Author

lesteve commented Dec 8, 2016

see https://github.com/pypa/setuptools/blob/master/pkg_resources/__init__.py#L146 ?

This is the deprecated version comparison which does not support PEP0440 (for example the regex does not support beta it seems). The new one is there:
https://github.com/pypa/setuptools/blob/master/pkg_resources/__init__.py#L146 ?

@amueller
Copy link
Member

amueller commented Dec 8, 2016

@lesteve that is the same link...

@lesteve
Copy link
Member Author

lesteve commented Dec 8, 2016

@lesteve that is the same link...

Oops copy and paste gone wrong https://github.com/pypa/packaging/blob/master/packaging/version.py#L159.

@dalmia
Copy link
Contributor

dalmia commented Dec 17, 2016

I want to work on the issue. Could you please help me get started on it?
Thanks.

@lesteve
Copy link
Member Author

lesteve commented Dec 18, 2016

The idea is to take the code we need from https://github.com/pypa/packaging/blob/master/packaging/version.py (and possibly simplify it, we can probably get rid of legacy versions and classes that implement __lt__) in order to have parse_version in sklearn.utils. Then we would use it across the whole scikit-learn code base replacing sklearn.utils.fixes._parse_version and LooseVersion usages.

@cmarmo
Copy link
Member

cmarmo commented Jun 17, 2020

@lesteve, is this issue still relevant? Also, is the related PR a reusable solution for someone willing to help on this? Thanks for your help!

@lesteve
Copy link
Member Author

lesteve commented Jun 17, 2020

I think it is still somewhat relevant but the cost/benefit ratio seems to be in favour of a "won't fix":

  • not that easy to fix without some significant effort
  • the benefit does not seem worth it since the problem only happens when using beta or development versions of dependencies (numpy, scipy, etc ...).
  • no that many issues linked for the past 2 years so probably not often encountered in practice

The simplest thing to do for the PR writer seems to make scikit-learn depend on setuptools and use pkg_resources.parse_version but it may be not that easy to convince maintainers ...

The #8301 PR went the consensual way (at the time) to copy the necessary code from setuptools into sklearn.utils.fixes. I would think that this approach stand very little chance to having a PR merged in the near future.

@rth
Copy link
Member

rth commented Jun 18, 2020

is this issue still relevant?

yes, it is. Just run into it today. And after digging through CPython issues https://bugs.python.org/issue14894 it doesn't look like distutils.version.LooseVersion is ever going to be fixed, so we shouldn't use it.

The simplest thing to do for the PR writer seems to make scikit-learn depend on setuptools

Yeah, setuptools is already a build dependency. I'm also undecided between making it a runtime dependency or vendoring the part from pkg_utils/packaging under sklearn.externals. Neither is ideal.

Went with the first solution in scikit-learn-contrib/scikit-learn-extra#42 but has a much smaller user base, so an extra dependency doesn't matter.

Actually what we could do is to make add a wrapper that would use pkg_utils / packaging if it is installed and fallback to LooseVersion otherwise. That way at least people who run into this (who tend to be more advanced users) can fix it by installing the corresponding package (if it isn't already: I would estimate that setuptools/pkg_utils is installed in 95% of cases already). Otherwise right now there is nothing one can do to fix it.

@lesteve
Copy link
Member Author

lesteve commented Jun 19, 2020

Actually what we could do is to make add a wrapper that would use pkg_utils / packaging if it is installed and fallback to LooseVersion otherwise. That way at least people who run into this (who tend to be more advanced users) can fix it by installing the corresponding package (if it isn't already: I would estimate that setuptools/pkg_utils is installed in 95% of cases already).

I did not think of this and this sounds a very reasonable approach indeed!

So something like this somewhere in sklearn/utils could be a start:

try:
    from pkg_resources import parse_version
except ImportError:
    parse_version = LooseVersion

After that you would need to replace all the LooseVersion and other ways we compare versions in scikit-learn (see my old comment #8301 (comment) which may still be relevant).

A bit tedious, but maybe this is a reasonable one for the sprint after all!

@rth
Copy link
Member

rth commented Jun 19, 2020

Yes, I think we could just adapt #8301 I imagine most of the replacements there are still relevant.

@ogrisel
Copy link
Member

ogrisel commented Jun 23, 2020

I have tried to create a new conda env with conda create -y -n tmp python=3.6 (oldest supported version as we speak) and pkg_resources is installed.

I have also tried to install a python3.x-minimal on ubuntu and again from pkg_resources import parse_version works as well.

Same in docker run --rm -ti python:3.6 bash.

So I am pretty sure that at least 99% of our our users will have the pkg_resources package installed by default. It better to keep the above try / except pattern for safety though.

@hs-nazuna
Copy link
Contributor

I want to tackle this issue.

The approach is to add code like the following (mentioned by @lesteve):

try:
    from pkg_resources import parse_version
except ImportError:
    parse_version = LooseVersion

and fix lines using version comparison function.

Is it good?

@ogrisel
Copy link
Member

ogrisel commented Jun 23, 2020

@hs-nazuna yes!

@rth
Copy link
Member

rth commented Jun 23, 2020

@hs-nazuna The idea is to define a parse_version function in sklearn.utils with those contents. Then continue PR #8301 where after conflicts are resolved most changed aside from the function definition should still be correct.

@hs-nazuna
Copy link
Contributor

@rth Thank you for your comment. I will do so.

@lesteve
Copy link
Member Author

lesteve commented Jun 23, 2020

@hs-nazuna hard to tell which strategy is the best without actually doing it, but if I were you, I would create a PR from scratch since #8301 has a lot of conflicts and it may be a bit painful to fix all of them.

If you feel more optimistic than me, the alternative is to try to rebase #8301 on master and try to fix conflicts for 15 (maybe 30) minutes. If you don't succeed in this much time, start a PR from scratch.

@rth
Copy link
Member

rth commented Jun 23, 2020

to rebase #8301 on master and try to fix conflicts for 15 (maybe 30) minutes

I would say more like 10. It's much better to merge upstream/master in to sync, rather than rebase as that leads to fewer conflicts that needs to be resolved. With rebase one would need to fix conflicts for each commit in that PR..

@hs-nazuna
Copy link
Contributor

but if I were you, I would create a PR from scratch since #8301 has a lot of conflicts and it may be a bit painful to fix all of them.

I think so too. And now I am trying to create a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment