Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Add plotting module with heatmaps for confusion matrix and grid search results #9173

Open
wants to merge 142 commits into
base: master
from

Conversation

@aarshayj
Copy link
Contributor

aarshayj commented Jun 20, 2017

Continues PR 8082

Changes made:

  • Added plot functions - plot_confusion_matrix, plot_gridsearch_results
  • Updated examples - plot_rbf_parameters.py, plot_confusion_matrix.py
  • added unit tests for new plot modules
@amueller

This comment has been minimized.

Copy link
Member

amueller commented Jun 20, 2017

can you fix the flake8 issues?

Copy link
Member

amueller left a comment

very first round ;)

:no-inherited-members:

This module is experimental. Use at your own risk.
Use of this module requires the matplotlib library.

This comment has been minimized.

Copy link
@amueller

amueller Jun 20, 2017

Member

Maybe state version? 1.5?

This comment has been minimized.

Copy link
@aarshayj

aarshayj Jun 21, 2017

Author Contributor

Modified to: "Use of this module requires the matplotlib library,
version 1.5 or later (preferably 2.0)."

:toctree: generated/
:template: function.rst

plot_heatmap

This comment has been minimized.

Copy link
@amueller

amueller Jun 20, 2017

Member

add functions here

This comment has been minimized.

Copy link
@aarshayj

aarshayj Jun 21, 2017

Author Contributor

added. this was WIP in previous PR.

This comment has been minimized.

Copy link
@NelleV

NelleV Jul 27, 2017

Member

I wouldn't count on users to read this message. My two cents is that if you really want users to notice that the plotting module is experimental, you'd have to put it in a sub module "experimental" or "future", and only move it to the main namespace when the API is stable.

This comment has been minimized.

Copy link
@amueller

amueller Jul 28, 2017

Member

It's a bit unclear to me what it would mean for the API to be stable, and I really don't like forcing people to change their code later. I would probably just remove the warning here and then do standard deprecation cycles.

This comment has been minimized.

Copy link
@NelleV

NelleV Aug 10, 2017

Member

Doing deprecation cycles is forcing people to change their code eventually, with the additional risk that they won't know that this code is experimental :)

This comment has been minimized.

Copy link
@amueller

amueller Aug 11, 2017

Member

Deprecation cycles only sometimes require users to change their code - if they are actually using the feature you're deprecating. That's not very common for most deprecations in scikit-learn. And that is only if there is actually a change.
And I'm not sure what's experimental about this code. The experiment is more having plotting inside scikit-learn. Since it's plotting and therefor user facing, I'd rather have a warning on every call then putting it in a different module.

I guess the thing we are trying to communicate is "don't build long-term projects relying on the presence of plotting in scikit-learn because we might remove it again".

generate the heatmap.
"""

import matplotlib.pyplot as plt

This comment has been minimized.

Copy link
@amueller

amueller Jun 20, 2017

Member

not sure if we want actual tests of the functionality (I'm leaning no), but I think we want at least smoke-tests.

This comment has been minimized.

Copy link
@aarshayj

aarshayj Jun 21, 2017

Author Contributor

added smoke-tests which just run the code without asserts.

This comment has been minimized.

Copy link
@amueller

amueller Jun 28, 2017

Member

hm looks like the config we use for coverage doesn't have matplotlib. We should change that...

if normalize:
values = values.astype('float') / values.sum(axis=1)[:, np.newaxis]

print(title)

This comment has been minimized.

Copy link
@amueller

amueller Jun 20, 2017

Member

print?

from sklearn.plot import plot_heatmap


def plot_gridsearch_results(cv_results, param_grid, metric='mean_test_score',

This comment has been minimized.

Copy link
@amueller

amueller Jun 20, 2017

Member

This only works for 2d grid-searches, right? I think we should also support 1d, and error for anything else.

except ImportError:
raise SkipTest("Not testing plot_heatmap, matplotlib not installed.")

import matplotlib.pyplot as plt

This comment has been minimized.

Copy link
@amueller

amueller Jun 20, 2017

Member

hm it looks like matplotlib is not installed for the service that computes coverage? hm...

This comment has been minimized.

Copy link
@aarshayj

aarshayj Jun 21, 2017

Author Contributor

yea this seems to be a problem because my tests cover much more than what Codecov is showing. i wasn't able to check coverage locally as I'm using a mac and running matplotlib from terminal is giving errors which I'm not able to resolve yet.

This comment has been minimized.

Copy link
@amueller

amueller Jun 28, 2017

Member

have you resolved those?

This comment has been minimized.

Copy link
@aarshayj

aarshayj Jun 29, 2017

Author Contributor

nope these are tricky to resolve. it requires me to re-install python using a different method and then i'll have to setup my scikit-learn development environment again. since sklearn doesn't test a lot on matplotlib, i'm just using a jupyter-notebook for this PR.

This comment has been minimized.

Copy link
@amueller

amueller Aug 11, 2017

Member

We should discuss this in person.

@codecov

This comment has been minimized.

Copy link

codecov bot commented Jun 21, 2017

Codecov Report

Merging #9173 into master will decrease coverage by 0.17%.
The diff coverage is 23.68%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9173      +/-   ##
==========================================
- Coverage    96.3%   96.13%   -0.18%     
==========================================
  Files         332      337       +5     
  Lines       60549    60754     +205     
==========================================
+ Hits        58314    58403      +89     
- Misses       2235     2351     +116
Impacted Files Coverage Δ
sklearn/plot/__init__.py 100% <100%> (ø)
sklearn/plot/tests/test_heatmap.py 32.43% <32.43%> (ø)
sklearn/plot/_confusion_matrix.py 33.33% <33.33%> (ø)
sklearn/plot/_heatmap.py 6.66% <6.66%> (ø)
sklearn/plot/_gridsearch_results.py 8.57% <8.57%> (ø)
sklearn/utils/testing.py 89.5% <0%> (-0.04%) ⬇️
sklearn/decomposition/tests/test_pca.py 100% <0%> (ø) ⬆️
sklearn/decomposition/pca.py 94.5% <0%> (ø) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d5d842...b5e5823. Read the comment docs.

aarshayj added 2 commits Jun 21, 2017
@aarshayj

This comment has been minimized.

Copy link
Contributor Author

aarshayj commented Jun 21, 2017

I think we should also give the user an option to plot everything as a single line graph. This will allow them to get a good plot results in case number of unique parameters are 3 or more but the total number of cases are within say 20.

Adding x-axis labels will be a challenge. But we can give each combo an ID and then print a table below showing the value of each parameter for each ID on the plot.

Thoughts?

@aarshayj

This comment has been minimized.

Copy link
Contributor Author

aarshayj commented Jun 21, 2017

Also, could you help me understand the circleCI error (details here):

Exception occurred:
File "/home/ubuntu/miniconda/envs/testenv/lib/python2.7/site-packages/docutils/writers/_html_base.py", line 671, in depart_document
assert not self.context, 'len(context) = %s' % len(self.context)
AssertionError: len(context) = 1
The full traceback has been saved in /tmp/sphinx-err-RgKdVr.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at https://github.com/sphinx-doc/sphinx/issues. Thanks!
make: *** [html] Error 1

./build_tools/circle/build_doc.sh returned exit code 2

Action failed: ./build_tools/circle/build_doc.sh

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Jun 28, 2017

@aarshayj can you merge master? This is an issue with old sphinx, I think, which should be fixed in master.

@amueller

This comment has been minimized.

Copy link
Member

amueller commented on sklearn/linear_model/tests/test_logistic.py in 4dafa52 Aug 20, 2018

Why not (1/(1+exp(-decision))?

This comment has been minimized.

Copy link
Member

amueller replied Aug 20, 2018

oh wait this is the old thing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.