Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Add sklearn show_versions() method #11596

Merged
merged 31 commits into from Jul 31, 2018

Conversation

Projects
None yet
9 participants
@aboucaud
Copy link
Contributor

aboucaud commented Jul 17, 2018

PR that adds a show_versions() method convenient for debugging.

Reference Issues/PRs

Fixes #11522

What does this implement/fix? Explain your changes.

Fork of the pandas.show_versions().
It displays optionally the system BLAS information.

@rth
Copy link
Member

rth left a comment

Nice! Could you please provide an example of output (e.g. on your system)?

Also let's add a test that simply runs this. You can use the capsys fixture to capture output (see https://docs.pytest.org/en/2.8.7/capture.html#accessing-captured-output-from-a-test-function) and check that e.g. "numpy" key word is in the output.

("Cython", lambda mod: mod.__version__),
("pandas", lambda mod: mod.__version__),
("matplotlib", lambda mod: mod.__version__),
# ("sphinx", lambda mod: mod.__version__),

This comment has been minimized.

Copy link
@rth

rth Jul 17, 2018

Member

Let's remove all commented dependencies below.

@@ -0,0 +1,163 @@
"""
Utility methods to print system info for debugging

This comment has been minimized.

Copy link
@rth

rth Jul 17, 2018

Member

It might be better to make this module private _print_versions

@lesteve

This comment has been minimized.

Copy link
Member

lesteve commented Jul 17, 2018

You have a conflict in __init__.py it looks like.

@aboucaud

This comment has been minimized.

Copy link
Contributor Author

aboucaud commented Jul 17, 2018

Could you please provide an example on output (e.g. on your system)?

You mean here or in a docstring ?

What it looks like on my computer
System info
-----------
commit: 2d7567f9f7f89ece9b8f540d7b67fadb325ca9c1
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8

BLAS info
---------
include_dirs: ['/usr/local/include']
language: c
define_macros: [('HAVE_CBLAS', None), ('ATLAS_INFO', '"\\"3.10.1\\""')]
library_dirs: ['/usr/local/lib']
CBLAS libs: ['ptf77blas', 'ptcblas', 'atlas', 'ptf77blas', 'ptcblas']

Python libs info
----------------
pip: 10.0.1
setuptools: 40.0.0
numpy: 1.14.5
scipy: 1.1.0
Cython: 0.28.4
pandas: 0.23.3
matplotlib: None

You can use the capsys fixture to capture output

Thanks @rth, I'll have a look at it.

@aboucaud aboucaud force-pushed the aboucaud:show-versions branch from dd71a52 to 7a7d831 Jul 17, 2018

@aboucaud

This comment has been minimized.

Copy link
Contributor Author

aboucaud commented Jul 17, 2018

You have a conflict in init.py it looks like.

That's because you're merging on master too fast for me :)
Solved !

@rth

This comment has been minimized.

Copy link
Member

rth commented Jul 17, 2018

Here, just so that reviewers could see the result without running the code.

@aboucaud

This comment has been minimized.

Copy link
Contributor Author

aboucaud commented Jul 17, 2018

See above in the clickable part of the comment.

aboucaud added some commits Jul 17, 2018

@rth
Copy link
Member

rth left a comment

Thanks @aboucaud ! There are a few flake8 failures.

I'll add a separate comment about formatting below.

return dict(deps_blob)


def show_versions(as_json=False, with_blas=False):

This comment has been minimized.

Copy link
@rth

rth Jul 17, 2018

Member

We need a docstring here explaining what this function does.

Let's set with_blas=True by default. Actually I would even remove the with_blas parameter altogether, and always show this information. I can't see a use case where we wouldn't care about it.

This comment has been minimized.

Copy link
@lesteve

lesteve Jul 17, 2018

Member

+1 for always putting the blas info

except:
blas_info = {'error': "Could not retrieve BLAS information"}

if (as_json):

This comment has been minimized.

Copy link
@rth

rth Jul 17, 2018

Member

no need for parenthesis.

This branch of the code is not covered, we should add a test for it.

This comment has been minimized.

Copy link
@lesteve

lesteve Jul 17, 2018

Member

And remove the parenthesis, this is not C ;-)

This comment has been minimized.

Copy link
@aboucaud

aboucaud Jul 17, 2018

Author Contributor

tell that to pandas devs.. I know you have one in the room :-)

print(j)
else:
with codecs.open(as_json, "wb", encoding='utf8') as f:
json.dump(j, f, indent=2)

This comment has been minimized.

Copy link
@rth

rth Jul 17, 2018

Member

I'm not convinced we actually need this. Maybe remove the as_json parameter altogether. WDYT @lesteve ?

This comment has been minimized.

Copy link
@lesteve

lesteve Jul 17, 2018

Member

I kind of agree. This is useful for bug reports, I would not think of it as a programmatic way of accessing this information.

This comment has been minimized.

Copy link
@aboucaud

aboucaud Jul 17, 2018

Author Contributor

ok, I'll just completely clean the code

cblas_libs, blas_info = get_blas_info()
blas_info['cblas_libs'] = cblas_libs
except:
blas_info = {'error': "Could not retrieve BLAS information"}

This comment has been minimized.

Copy link
@rth

rth Jul 17, 2018

Member

'error' -> 'message'


# get full commit hash
commit = None
if os.path.isdir(".git") and os.path.isdir("sklearn"):

This comment has been minimized.

Copy link
@rth

rth Jul 17, 2018

Member

This code branch is currently not tested. I'm not sure we actually want it. It would be nice, not sure we should bother with it. @lesteve ?

This comment has been minimized.

Copy link
@lesteve

lesteve Jul 17, 2018

Member

I don't think it is really useful to include the git commit info. In my mind this is useful for users that come with a bug report. In 99.9% of the use cases they are using a released version.

In the case they are not using a released version, then the person can very easily provide this info.

This comment has been minimized.

Copy link
@aboucaud

aboucaud Jul 17, 2018

Author Contributor

ok it will remove complexity altogether

@@ -21,6 +21,7 @@
from ..utils.fixes import _Sequence as Sequence
from .deprecation import deprecated
from .. import get_config
from ._print_versions import show_versions

This comment has been minimized.

Copy link
@rth

rth Jul 17, 2018

Member

I think we wanted to import this in sklearn/__init__.py instead?

This comment has been minimized.

Copy link
@lesteve
@rth

This comment has been minimized.

Copy link
Member

rth commented Jul 17, 2018

System info
commit: 2d7567f
python: 3.6.5.final.0
python-bits: 64

I would just use sys.version here similarly to what is suggested in http://scikit-learn.org/stable/developers/contributing.html#filing-bugs

>>> import sys; print("Python", sys.version)
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
[GCC 7.2.0]

It also gives information about GCC & whether conda is used.

OS: Darwin

similarly, replace this,

OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little

with the output of,

>>> python -c "import platform; print(platform.platform())"
Linux-4.11.6-gentoo-x86_64-Intel-R-_Core-TM-_i5-6200U_CPU_@_2.30GHz-with-gentoo-2.4.1

LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8

I would remove this as non essential.

BLAS info
include_dirs: ['/usr/local/include']
language: c
define_macros: [('HAVE_CBLAS', None), ('ATLAS_INFO', '"\"3.10.1\""')]
library_dirs: ['/usr/local/lib']
CBLAS libs: ['ptf77blas', 'ptcblas', 'atlas', 'ptf77blas', 'ptcblas']

I would remove the "language" and "include_dirs" keys. Then for "library_dirs" run ":".join on the list CBLAS libs run, ", ".join on the list. Finally make define macros print,

define_macros: HAVE_CBLAS=None, ATLAS_INFO="\\"3.10.1\\""

(or something similar without all the tuples/lists).

Python libs info
pip: 10.0.1
setuptools: 40.0.0
numpy: 1.14.5
scipy: 1.1.0
Cython: 0.28.4
pandas: 0.23.3
matplotlib: None

I don't think we care about matplotlib in most cases.

@lesteve might disagree with some of these comments.

@lesteve
Copy link
Member

lesteve left a comment

A few comments.


# get full commit hash
commit = None
if os.path.isdir(".git") and os.path.isdir("sklearn"):

This comment has been minimized.

Copy link
@lesteve

lesteve Jul 17, 2018

Member

I don't think it is really useful to include the git commit info. In my mind this is useful for users that come with a bug report. In 99.9% of the use cases they are using a released version.

In the case they are not using a released version, then the person can very easily provide this info.

return dict(deps_blob)


def show_versions(as_json=False, with_blas=False):

This comment has been minimized.

Copy link
@lesteve

lesteve Jul 17, 2018

Member

+1 for always putting the blas info

except:
blas_info = {'error': "Could not retrieve BLAS information"}

if (as_json):

This comment has been minimized.

Copy link
@lesteve

lesteve Jul 17, 2018

Member

And remove the parenthesis, this is not C ;-)

print(j)
else:
with codecs.open(as_json, "wb", encoding='utf8') as f:
json.dump(j, f, indent=2)

This comment has been minimized.

Copy link
@lesteve

lesteve Jul 17, 2018

Member

I kind of agree. This is useful for bug reports, I would not think of it as a programmatic way of accessing this information.

@@ -21,6 +21,7 @@
from ..utils.fixes import _Sequence as Sequence
from .deprecation import deprecated
from .. import get_config
from ._print_versions import show_versions

This comment has been minimized.

Copy link
@lesteve
(sysname, nodename, release,
version, machine, processor) = platform.uname()
blob.extend([
("python", '.'.join(map(str, sys.version_info))),

This comment has been minimized.

Copy link
@lesteve

lesteve Jul 17, 2018

Member

I don't remember whether I commented already, but maybe sys.executable would be nice? This way we know whether it is a system/user/conda install. WDYT?

This comment has been minimized.

Copy link
@aboucaud

aboucaud Jul 17, 2018

Author Contributor

@rth mentioned sys.version for the same thing.

Which one is the best ? I don't have an opinion on this.

@aboucaud

This comment has been minimized.

Copy link
Contributor Author

aboucaud commented Jul 17, 2018

thx for the useful comments @rth and @lesteve, I'll work on that tomorrow.

("scipy", lambda mod: mod.version.version),
("Cython", lambda mod: mod.__version__),
("pandas", lambda mod: mod.__version__),
("matplotlib", lambda mod: mod.__version__),

This comment has been minimized.

Copy link
@jeremiedbb

jeremiedbb Jul 18, 2018

Contributor

is pandas necessary as it's not a requirement for scikit-learn ?
And shouldn't we add scikit-learn's version ?

This comment has been minimized.

Copy link
@aboucaud

aboucaud Jul 18, 2018

Author Contributor

yeah, probably nice to have 😄

This comment has been minimized.

Copy link
@rth

rth Jul 18, 2018

Member

pandas is often used together with matplotlib and would be nice to have to debug ColumnTransformer etc issues.

@aboucaud

This comment has been minimized.

Copy link
Contributor Author

aboucaud commented Jul 18, 2018

latest output

System
------
    python: 3.6.5 (default, Mar 30 2018, 06:42:10)
executable: /Users/aboucaud/codes/python_venvs/sklearn-dev/bin/python3.6
  compiler: [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)]
   machine: Darwin-16.7.0-x86_64-i386-64bit

BLAS
----
   message: Could not retrieve BLAS information

Python deps
-----------
       pip: 10.0.1
setuptools: 40.0.0
   sklearn: 0.20.dev0
     numpy: 1.14.5
     scipy: 1.1.0
    Cython: 0.28.4
    pandas: 0.23.3
matplotlib: None
@aboucaud

This comment has been minimized.

Copy link
Contributor Author

aboucaud commented Jul 18, 2018

I don't understand why calling the builtin get_blas_info() sometimes raises an exception and sometimes works.

aboucaud added some commits Jul 18, 2018

@aboucaud aboucaud changed the title [WIP] Add sklearn show_versions() method [MRG] Add sklearn show_versions() method Jul 18, 2018

aboucaud added some commits Jul 19, 2018

@jeremiedbb

This comment has been minimized.

Copy link
Contributor

jeremiedbb commented Jul 19, 2018

I guess we won't ask you to revert now, but in the future, you should not fix existing flake8 or pep8 errors.
I makes the review hard to follow.

@aboucaud

This comment has been minimized.

Copy link
Contributor Author

aboucaud commented Jul 19, 2018

Well I did not..personally. I just have automatic whitespace trimming in my editor to avoid adding errors.

And if it happened, it means these errors were pushed on master recently.. so what is worse ? 😄

@jnothman
Copy link
Member

jnothman left a comment

This should probably be added to doc/modules/classes.rst

@aboucaud

This comment has been minimized.

Copy link
Contributor Author

aboucaud commented Jul 23, 2018

This should probably be added to doc/modules/classes.rst

Done.

@aboucaud

This comment has been minimized.

Copy link
Contributor Author

aboucaud commented Jul 23, 2018

Ready for review.
cc @lesteve @jakirkham @qinhanmin2014

@rth

rth approved these changes Jul 23, 2018

Copy link
Member

rth left a comment

Minor comments below, otherwise LGTM. Thanks @aboucaud !

.. note::

This utility function is only available in scikit-learn v0.20+. For all
previous versions, one has to manually dig such information e.g. snippet::

This comment has been minimized.

Copy link
@rth

rth Jul 23, 2018

Member
For previous versions, one has to explicitly run::

     import platform; print(platform.platform())
     import sys; print("Python", sys.version)
     import numpy; print("NumPy", numpy.__version__)
     import scipy; print("SciPy", scipy.__version__)
     import sklearn; print("Scikit-Learn", sklearn.__version__)


def show_versions():
"Print useful debugging information to the terminal"

This comment has been minimized.

Copy link
@rth

rth Jul 23, 2018

Member

remove "to the terminal"

@jnothman
Copy link
Member

jnothman left a comment

Let's name the module _show_versions.py, or just put it in sklearn/_config.py. "print" is unhelpful in the name

("setuptools", lambda mod: mod.__version__),
("sklearn", lambda mod: mod.__version__),
("numpy", lambda mod: mod.__version__),
("scipy", lambda mod: mod.version.version),

This comment has been minimized.

Copy link
@jnothman

jnothman Jul 24, 2018

Member

I still don't get why we can't use __version__ here and hence in all cases

This comment has been minimized.

Copy link
@rth

rth Jul 24, 2018

Member

We can and should...

This comment has been minimized.

Copy link
@aboucaud

aboucaud Jul 24, 2018

Author Contributor

It slipt off my head. I changed it to avoid the lambda function.

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Jul 24, 2018

I vaguely remember someone arguing above to remove matplotlib? I would have included it but no strong opinion. Seems fine to me and we can always iterate easily on this.

This is what this looks like for me btw

System
------
    python: 3.7.0 (default, Jun 28 2018, 13:15:42)  [GCC 7.2.0]
executable: /home/andy/anaconda3/envs/py37/bin/python
   machine: Linux-4.13.0-46-generic-x86_64-with-debian-stretch-sid

BLAS
----
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /home/andy/anaconda3/envs/py37/lib
cblas_libs: mkl_rt, pthread

Python deps
-----------
       pip: 10.0.1
setuptools: 39.2.0
   sklearn: 0.20.dev0
     numpy: 1.14.5
     scipy: 1.1.0
    Cython: 0.28.3
    pandas: 0.23.3

Is it clear that that means I have mkl? I guess I'm not very good at reading these lol

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Jul 24, 2018

This excludes the compiler version, though, right? Earlier the PR had compiler versions. Why did we want that removed? Seems helpful. I think the [GCC 7.2.0] is the compiler the python was compiled with, not the compiler that's available in that env (i.e. what scikit-learn would be compiled with).

@aboucaud

This comment has been minimized.

Copy link
Contributor Author

aboucaud commented Jul 24, 2018

Is it clear that that means I have mkl? I guess I'm not very good at reading these lol

I do share the feeling.

Both the name macros and the results SCIPY_MKL_H=None, HAVE_CBLAS=None are not very explicit ; but maybe enough for devs to debug.
I am not really competent regarding BLAS but @rth, @jakirkham or @ogrisel might have a better opinion.

This excludes the compiler version, though, right? Earlier the PR had compiler versions. Why did we want that removed? Seems helpful. I think the [GCC 7.2.0] is the compiler the python was compiled with, not the compiler that's available in that env (i.e. what scikit-learn would be compiled with).

The "compiler" was the Python one (split of the Python info into Python + Python compiler) which failed on Windows and is now included in the same line.

I'm not sure how I would have access to the compiler used to build scikit-learn.

@aboucaud

This comment has been minimized.

Copy link
Contributor Author

aboucaud commented Jul 25, 2018

FYI I'll be afk until Tuesday 31st. Please take over if need be.

@rth

This comment has been minimized.

Copy link
Member

rth commented Jul 27, 2018

Is it clear that that means I have mkl? I guess I'm not very good at reading these lol

Both the name macros and the results SCIPY_MKL_H=None, HAVE_CBLAS=None are not very explicit ; but maybe enough for devs to debug.

Yeah, it's not that explicit, mostly cblas_libs: mkl_rt, pthread is enough to make the determination. The advantage is this should work for all the systems where scikit-learn was installed (as the underlying function is used in setup.py). If we wanted to print something more user friendly unless we test it for different BLAS (which we probably won't do), there is no certainty that it won't fail in some configurations. So I think it might be preferable to leave it like this.

I'm not sure how I would have access to the compiler used to build scikit-learn.

We could store show_version result (including the BLAS setup) at build time in a dynamically generated Cython file as suggested in #11596 (comment) but I think we can add that in a subsequent PR. This should already address most cases, I believe.

WDYT?

@rth

This comment has been minimized.

Copy link
Member

rth commented Jul 27, 2018

We can add back matplotlib if you prefer..

@rth

This comment has been minimized.

Copy link
Member

rth commented Jul 27, 2018

To be more specific, it's possible to determine the most common BLAS implementations by matching cblas_libs contents against this dictionary but again there are cases where it will fail (other BLAS implementations, possibly Windows etc). tomMoral/loky#135 might do a better job of it, but in any case I don't think it's worth the effort here , as we just want some debug information..

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jul 29, 2018

@GaelVaroquaux
Copy link
Member

GaelVaroquaux left a comment

I'm approving this PR, and people should feel free to merge it. But if there is time, I'd like my comment about the code snippet to be addressed.

@@ -112,7 +112,7 @@ following rules before submitting:
`new algorithm requirements
<http://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms>`_.

- If you are submitting a bug report, we strongly encourage you to follow the guidelines in

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Jul 29, 2018

Member

I know it's already been mentioned to you, but I am going to say it a second time: such unrelated changes are a very bad practice (they create merge conflicts, and destroy the usefulness of "git blame").

@@ -140,6 +140,13 @@ feedback:
your **Python, scikit-learn, numpy, and scipy versions**. This information
can be found by running the following code snippet::

import sklearn; sklearn.show_versions()

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Jul 29, 2018

Member

It would be better to have this as a code snippet:

::

    >>> import sklearn
    >>> sklearn.show_versions() # doctests: +SKIP

@GaelVaroquaux GaelVaroquaux changed the title [MRG] Add sklearn show_versions() method [MRG+2] Add sklearn show_versions() method Jul 29, 2018

aboucaud added some commits Jul 31, 2018

@jnothman jnothman merged commit b21f9b4 into scikit-learn:master Jul 31, 2018

7 checks passed

ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: python2 Your tests passed on CircleCI!
Details
ci/circleci: python3 Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 95.29%)
Details
codecov/project 95.3% (+<.01%) compared to 5140762
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jul 31, 2018

@aboucaud

This comment has been minimized.

Copy link
Contributor Author

aboucaud commented Aug 1, 2018

@jnothman I just noticed I forgot to rename the test file according to the new filename
test_print_versions.py => test_show_versions.py
Shall I push a quick fix in a separate PR ?

@rth

This comment has been minimized.

Copy link
Member

rth commented Aug 1, 2018

@aboucaud If you have time. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.