Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] check length of feature_names in export_graphviz (#8477) #8512

Merged
merged 15 commits into from Apr 26, 2017

Conversation

@aikinogard
Copy link
Contributor

@aikinogard aikinogard commented Mar 4, 2017

Reference Issue

Fixes #8477

What does this implement/fix? Explain your changes.

check length of feature_names in export_graphviz

  • raise ValueError if len(feature_names) > tree.n_features
  • add unit test for len(feature_names) > tree.n_features
  • change the comment of existing unit test for len(feature_names) < tree.n_features
  • raise error when length features_names != n_features
  • add unit test

Any other comments?

#8477
check length of feature_names in export_graphviz

- raise ValueError if len(feature_names) > tree.n_features
- add unit test for len(feature_names) > tree.n_features
- change the comment of existing unit test for len(feature_names) <
tree.n_features
@aikinogard
Copy link
Contributor Author

@aikinogard aikinogard commented Mar 4, 2017

The fail of check on AppVeyor
Environment: PYTHON=C:\Python35, PYTHON_VERSION=3.5.0, PYTHON_ARCH=32
FAIL: sklearn.decomposition.tests.test_pca.test_singular_values

but this is not the change I made. My changing is in sklearn.tree. This is weird.

out = StringIO()
assert_raises(IndexError, export_graphviz, clf, out, feature_names=[])

# Check feature_names more than number of features error
out = StringIO()

This comment has been minimized.

@amueller

amueller Mar 4, 2017
Member

Maybe use "outfile="none" instead of using StringIO.

out = StringIO()
assert_raises(IndexError, export_graphviz, clf, out, feature_names=[])

# Check feature_names more than number of features error
out = StringIO()
assert_raises(ValueError, export_graphviz, clf, out,

This comment has been minimized.

@amueller

amueller Mar 4, 2017
Member

Please check the error message with assert_raise_message

@amueller
Copy link
Member

@amueller amueller commented Mar 4, 2017

i think you should show a warning for either too few or too many features so that instead of an index error there is a more helpful error message.

@aikinogard
Copy link
Contributor Author

@aikinogard aikinogard commented Mar 4, 2017

@amueller It is okay to show a warning for too many feature_names. However, for too few feature_names, some features will not be able to get a name. Are you suggesting to padding the feature_names list if user's input shorter than number of features in tree?

@amueller
Copy link
Member

@amueller amueller commented Mar 4, 2017

No, i'm suggesting to provide a good error message whenever the length doesn't match.

- include length of feature_names and number of features in tree in the
error and warning message.
- raise error for too few feature_names
- for too much feature_names, will use the first n_features. raise an
warning for users
- use assert_raise_message and assert_warns_message in test to check
message.
@codecov
Copy link

@codecov codecov bot commented Mar 4, 2017

Codecov Report

Merging #8512 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #8512      +/-   ##
==========================================
+ Coverage   95.48%   95.48%   +<.01%     
==========================================
  Files         342      342              
  Lines       60913    60917       +4     
==========================================
+ Hits        58160    58164       +4     
  Misses       2753     2753
Impacted Files Coverage Δ
sklearn/tree/export.py 93.19% <100%> (+0.07%)
sklearn/tree/tests/test_export.py 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 75c892c...3fb244b. Read the comment docs.

@codecov
Copy link

@codecov codecov bot commented Mar 4, 2017

Codecov Report

Merging #8512 into master will increase coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8512      +/-   ##
==========================================
+ Coverage   95.48%   95.52%   +0.04%     
==========================================
  Files         342      342              
  Lines       60913    61371     +458     
==========================================
+ Hits        58160    58622     +462     
+ Misses       2753     2749       -4
Impacted Files Coverage Δ
sklearn/tree/export.py 93.22% <100%> (+0.1%) ⬆️
sklearn/tree/tests/test_export.py 100% <100%> (ø) ⬆️
sklearn/datasets/base.py 92.3% <0%> (-0.52%) ⬇️
sklearn/datasets/twenty_newsgroups.py 25.31% <0%> (-0.47%) ⬇️
sklearn/model_selection/_split.py 98.6% <0%> (-0.17%) ⬇️
sklearn/utils/metaestimators.py 97.7% <0%> (-0.03%) ⬇️
sklearn/cross_validation.py 98.6% <0%> (-0.01%) ⬇️
sklearn/kernel_approximation.py 100% <0%> (ø) ⬆️
sklearn/decomposition/kernel_pca.py 97.82% <0%> (ø) ⬆️
sklearn/linear_model/ridge.py 93.88% <0%> (ø) ⬆️
... and 92 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 75c892c...3f25c11. Read the comment docs.

@aikinogard
Copy link
Contributor Author

@aikinogard aikinogard commented Mar 4, 2017

@amueller I have updated the code

  • include length of feature_names and number of features in tree in the
    error and warning message.
  • raise error for too few feature_names
  • for too much feature_names, will use the first n_features. raise an
    warning for users
  • use assert_raise_message and assert_warns_message in test to check
    message.
@aikinogard
Copy link
Contributor Author

@aikinogard aikinogard commented Mar 8, 2017

@amueller please let me know if you want any improvement for this PR. Thanks.

@@ -216,6 +217,19 @@ def node_to_str(tree, node_id, criterion):
if tree.children_left[node_id] != _tree.TREE_LEAF:
# Always write node decision criteria, except for leaves
if feature_names is not None:
# Check length of feature_names
# raise error for too few feature_names
if len(feature_names) < tree.n_features:

This comment has been minimized.

@MechCoder

MechCoder Apr 21, 2017
Member

Failing early is always better.

In other words, you should check this once at the beginning when export_graphviz is called rather than when node_to_str is called. You can do that by accessing the attribute n_features_ of the provided decision_tree.

…will fail early for wrong length of feature_names
@aikinogard
Copy link
Contributor Author

@aikinogard aikinogard commented Apr 22, 2017

@MechCoder Thanks for the code review. I agree with the point to make the failing happen earlier. I have made the change of moving the feature_name length check into the export_graphviz. It passed the unit test I wrote before suggested by @amueller . Please let me know if there are any other suggestions.

decision_tree.n_features_))
# for too much feature_names, will use the first n_features
# raise an warning for users
if len(feature_names) > decision_tree.n_features_:

This comment has been minimized.

@MechCoder

MechCoder Apr 22, 2017
Member

Not sure that this is appropriate, I would just raise an error they are not equal. It is dangerous to guess that just selecting the first n values of feature_names is what the user intended.

This comment has been minimized.

@aikinogard

aikinogard Apr 22, 2017
Author Contributor

Before I worked on this issue, it used the first n_features in features_names by default if feature names was longer than n_features. I make it a warning to keep it consistent with the previous code.

Do you want me to raise error for length features_names != n_features?

This comment has been minimized.

@MechCoder

MechCoder Apr 22, 2017
Member

Good point. My vote would be still to be raise an error and document it as a bug-fix over here (https://github.com/scikit-learn/scikit-learn/blob/master/doc/whats_new.rst)

Something on the lines of export_graphviz now raises an error if length of feature_names is not the same as the number of features.

This comment has been minimized.

@aikinogard

aikinogard Apr 22, 2017
Author Contributor

Ok. I will make the change to raise error for feature_names != n_features and modify the unit test. Thanks.

…res in the decision tree
@aikinogard
Copy link
Contributor Author

@aikinogard aikinogard commented Apr 22, 2017

@MechCoder Please review the new change. I raise error when length features_names != n_features and modified the unit test.

Copy link
Member

@MechCoder MechCoder left a comment

LGTM

@MechCoder
Copy link
Member

@MechCoder MechCoder commented Apr 22, 2017

Ah did not see that Travis is faliing due to a pep8 error. Could you please look at the build and fix it?

@MechCoder MechCoder changed the title [MRG] check length of feature_names in export_graphviz (#8477) [MRG+1] check length of feature_names in export_graphviz (#8477) Apr 22, 2017
@aikinogard
Copy link
Contributor Author

@aikinogard aikinogard commented Apr 22, 2017

@MechCoder I removed assert_warns_message in the import in the unit test. It should work now. For adding the bug fix doc in
https://github.com/scikit-learn/scikit-learn/blob/master/doc/whats_new.rst
shall I add it together n this pull request or in a separate way?

@MechCoder
Copy link
Member

@MechCoder MechCoder commented Apr 22, 2017

@aikinogard
Copy link
Contributor Author

@aikinogard aikinogard commented Apr 22, 2017

@MechCoder I have added the bug fix in the doc. Thank you for the review.

@aikinogard
Copy link
Contributor Author

@aikinogard aikinogard commented Apr 23, 2017

I didn't see the point of conflict since both line
from sklearn.utils.testing import assert_raise_message
from sklearn.exceptions import NotFittedError
can exist.

@aikinogard
Copy link
Contributor Author

@aikinogard aikinogard commented Apr 25, 2017

@amueller @MechCoder The change and unit test are finished. Please take a look on the pull request and merge it. The PR has been here for almost 2 months.

@MechCoder
Copy link
Member

@MechCoder MechCoder commented Apr 25, 2017

Can you fix the merge conflicts in sklearn/tree/tests/test_export.py?

For 2nd review random pings: @lesteve @raghavrv (It is a small PR)

@aikinogard
Copy link
Contributor Author

@aikinogard aikinogard commented Apr 25, 2017

@MechCoder It requires write access to resolve the conflict.
from sklearn.utils.testing import assert_raise_message
from sklearn.exceptions import NotFittedError
we can keep both of them in the test_export.py

@MechCoder
Copy link
Member

@MechCoder MechCoder commented Apr 25, 2017

Look at the portions starting with Note: over here (http://scikit-learn.org/dev/developers/contributing.html#how-to-contribute) to know how to resolve merge conflicts.

Let me know if you need any help.

@aikinogard
Copy link
Contributor Author

@aikinogard aikinogard commented Apr 25, 2017

@MechCoder I found I can solve the conflict directly on github. :-)
The conflict is resolved. Thanks!
@lesteve @raghavrv please review it at your convenience.

@aikinogard
Copy link
Contributor Author

@aikinogard aikinogard commented Apr 26, 2017

@jnothman @TomDLT could you review my PR at your convenience?

Copy link
Member

@raghavrv raghavrv left a comment

Thanks @aikinogard... LGTM pending (mostly) nitpicks...

@@ -276,6 +276,11 @@ Bug fixes
- Fixed a bug in :class:`svm.OneClassSVM` where it returned floats instead of
integer classes. :issue:`8676` by :user:`Vathsala Achar <VathsalaAchar>`.

- Fix a bug where :func:`sklearn.tree.export_graphviz` will raise an error
when the length features_names does not match n_features in the decision

This comment has been minimized.

@raghavrv

raghavrv Apr 26, 2017
Member

length of

This comment has been minimized.

@raghavrv

raghavrv Apr 26, 2017
Member

nitpick: backticks for feature_names and n_features

@@ -276,6 +276,11 @@ Bug fixes
- Fixed a bug in :class:`svm.OneClassSVM` where it returned floats instead of
integer classes. :issue:`8676` by :user:`Vathsala Achar <VathsalaAchar>`.

- Fix a bug where :func:`sklearn.tree.export_graphviz` will raise an error

This comment has been minimized.

@raghavrv

raghavrv Apr 26, 2017
Member

nitpick again: past tense is good I think - ...raised an error...

- Fix a bug where :func:`sklearn.tree.export_graphviz` will raise an error
when the length features_names does not match n_features in the decision
tree.
:issue:`8512` by `Li Li <https://github.com/aikinogard>`_.
@@ -399,6 +402,16 @@ def recurse(tree, node_id, criterion, parent=None, depth=0):
return_string = True
out_file = six.StringIO()

# Check length of feature_names before get into the tree node

This comment has been minimized.

@raghavrv

raghavrv Apr 26, 2017
Member

nickpick (feel free to ignore): before getting into the...

# n_features_ in the decision_tree
if feature_names is not None:
if len(feature_names) != decision_tree.n_features_:
raise ValueError("Length of feature_names=%d "

This comment has been minimized.

@raghavrv

raghavrv Apr 26, 2017
Member

Length of feature_name, %d

# Check feature_names error
out = StringIO()
assert_raises(IndexError, export_graphviz, clf, out, feature_names=[])
# Check feature_names less than number of features error

This comment has been minimized.

@raghavrv

raghavrv Apr 26, 2017
Member

--> Check if it errors when length of feature_names mismatches with number of features

out = None
message = "Length of feature_names=3 does not " + \
"match number of features=2"
assert_raise_message(ValueError, message, export_graphviz, clf, out,

This comment has been minimized.

@raghavrv

raghavrv Apr 26, 2017
Member

directly pass out=None

out = None
message = "Length of feature_names=1 does not " + \
"match number of features=2"
assert_raise_message(ValueError, message, export_graphviz, clf, out,

This comment has been minimized.

@raghavrv

raghavrv Apr 26, 2017
Member

directly pass out=None

assert_raises(IndexError, export_graphviz, clf, out, feature_names=[])
# Check feature_names less than number of features error
out = None
message = "Length of feature_names=1 does not " + \

This comment has been minimized.

@raghavrv

raghavrv Apr 26, 2017
Member

use bracket and avoid \


# Check feature_names greater than number of features warning
out = None
message = "Length of feature_names=3 does not " + \

This comment has been minimized.

@raghavrv

raghavrv Apr 26, 2017
Member

This needs to be fixed after the changes above...

@raghavrv
Copy link
Member

@raghavrv raghavrv commented Apr 26, 2017

Thanks, will merge once CIs turn green.

@aikinogard
Copy link
Contributor Author

@aikinogard aikinogard commented Apr 26, 2017

@raghavrv CIs turned green. Please merge it at your convenience. Thanks!

@raghavrv raghavrv merged commit 2f1c978 into scikit-learn:master Apr 26, 2017
5 checks passed
5 checks passed
ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 95.52%)
Details
codecov/project 95.52% (+<.01%) compared to 676f878
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@raghavrv
Copy link
Member

@raghavrv raghavrv commented Apr 26, 2017

🎉

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
…n#8477) (scikit-learn#8512)

* scikit-learn#8477

check length of feature_names in export_graphviz

- raise ValueError if len(feature_names) > tree.n_features
- add unit test for len(feature_names) > tree.n_features
- change the comment of existing unit test for len(feature_names) <
tree.n_features

* fix error and warning

- include length of feature_names and number of features in tree in the
error and warning message.
- raise error for too few feature_names
- for too much feature_names, will use the first n_features. raise an
warning for users
- use assert_raise_message and assert_warns_message in test to check
message.

* move the error and warning from node_to_str to export_graphviz so it will fail early for wrong length of feature_names

* raise error if length of feature_names does not match number of features in the decision tree

* fix pep8

* remove unused assert_warns_message import in test_export.py

* add bug fix in doc/whats_new.rst

* fix the english in doc/whats_new.rst

* fix the format and english in sklearn/tree/export.py

* fix contributor format in doc/whats_new.rst

* fix english, use bracket and avoid \ in error message

* fix pep8

* fix pep8
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…n#8477) (scikit-learn#8512)

* scikit-learn#8477

check length of feature_names in export_graphviz

- raise ValueError if len(feature_names) > tree.n_features
- add unit test for len(feature_names) > tree.n_features
- change the comment of existing unit test for len(feature_names) <
tree.n_features

* fix error and warning

- include length of feature_names and number of features in tree in the
error and warning message.
- raise error for too few feature_names
- for too much feature_names, will use the first n_features. raise an
warning for users
- use assert_raise_message and assert_warns_message in test to check
message.

* move the error and warning from node_to_str to export_graphviz so it will fail early for wrong length of feature_names

* raise error if length of feature_names does not match number of features in the decision tree

* fix pep8

* remove unused assert_warns_message import in test_export.py

* add bug fix in doc/whats_new.rst

* fix the english in doc/whats_new.rst

* fix the format and english in sklearn/tree/export.py

* fix contributor format in doc/whats_new.rst

* fix english, use bracket and avoid \ in error message

* fix pep8

* fix pep8
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…n#8477) (scikit-learn#8512)

* scikit-learn#8477

check length of feature_names in export_graphviz

- raise ValueError if len(feature_names) > tree.n_features
- add unit test for len(feature_names) > tree.n_features
- change the comment of existing unit test for len(feature_names) <
tree.n_features

* fix error and warning

- include length of feature_names and number of features in tree in the
error and warning message.
- raise error for too few feature_names
- for too much feature_names, will use the first n_features. raise an
warning for users
- use assert_raise_message and assert_warns_message in test to check
message.

* move the error and warning from node_to_str to export_graphviz so it will fail early for wrong length of feature_names

* raise error if length of feature_names does not match number of features in the decision tree

* fix pep8

* remove unused assert_warns_message import in test_export.py

* add bug fix in doc/whats_new.rst

* fix the english in doc/whats_new.rst

* fix the format and english in sklearn/tree/export.py

* fix contributor format in doc/whats_new.rst

* fix english, use bracket and avoid \ in error message

* fix pep8

* fix pep8
NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
…n#8477) (scikit-learn#8512)

* scikit-learn#8477

check length of feature_names in export_graphviz

- raise ValueError if len(feature_names) > tree.n_features
- add unit test for len(feature_names) > tree.n_features
- change the comment of existing unit test for len(feature_names) <
tree.n_features

* fix error and warning

- include length of feature_names and number of features in tree in the
error and warning message.
- raise error for too few feature_names
- for too much feature_names, will use the first n_features. raise an
warning for users
- use assert_raise_message and assert_warns_message in test to check
message.

* move the error and warning from node_to_str to export_graphviz so it will fail early for wrong length of feature_names

* raise error if length of feature_names does not match number of features in the decision tree

* fix pep8

* remove unused assert_warns_message import in test_export.py

* add bug fix in doc/whats_new.rst

* fix the english in doc/whats_new.rst

* fix the format and english in sklearn/tree/export.py

* fix contributor format in doc/whats_new.rst

* fix english, use bracket and avoid \ in error message

* fix pep8

* fix pep8
paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…n#8477) (scikit-learn#8512)

* scikit-learn#8477

check length of feature_names in export_graphviz

- raise ValueError if len(feature_names) > tree.n_features
- add unit test for len(feature_names) > tree.n_features
- change the comment of existing unit test for len(feature_names) <
tree.n_features

* fix error and warning

- include length of feature_names and number of features in tree in the
error and warning message.
- raise error for too few feature_names
- for too much feature_names, will use the first n_features. raise an
warning for users
- use assert_raise_message and assert_warns_message in test to check
message.

* move the error and warning from node_to_str to export_graphviz so it will fail early for wrong length of feature_names

* raise error if length of feature_names does not match number of features in the decision tree

* fix pep8

* remove unused assert_warns_message import in test_export.py

* add bug fix in doc/whats_new.rst

* fix the english in doc/whats_new.rst

* fix the format and english in sklearn/tree/export.py

* fix contributor format in doc/whats_new.rst

* fix english, use bracket and avoid \ in error message

* fix pep8

* fix pep8
AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
…n#8477) (scikit-learn#8512)

* scikit-learn#8477

check length of feature_names in export_graphviz

- raise ValueError if len(feature_names) > tree.n_features
- add unit test for len(feature_names) > tree.n_features
- change the comment of existing unit test for len(feature_names) <
tree.n_features

* fix error and warning

- include length of feature_names and number of features in tree in the
error and warning message.
- raise error for too few feature_names
- for too much feature_names, will use the first n_features. raise an
warning for users
- use assert_raise_message and assert_warns_message in test to check
message.

* move the error and warning from node_to_str to export_graphviz so it will fail early for wrong length of feature_names

* raise error if length of feature_names does not match number of features in the decision tree

* fix pep8

* remove unused assert_warns_message import in test_export.py

* add bug fix in doc/whats_new.rst

* fix the english in doc/whats_new.rst

* fix the format and english in sklearn/tree/export.py

* fix contributor format in doc/whats_new.rst

* fix english, use bracket and avoid \ in error message

* fix pep8

* fix pep8
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…n#8477) (scikit-learn#8512)

* scikit-learn#8477

check length of feature_names in export_graphviz

- raise ValueError if len(feature_names) > tree.n_features
- add unit test for len(feature_names) > tree.n_features
- change the comment of existing unit test for len(feature_names) <
tree.n_features

* fix error and warning

- include length of feature_names and number of features in tree in the
error and warning message.
- raise error for too few feature_names
- for too much feature_names, will use the first n_features. raise an
warning for users
- use assert_raise_message and assert_warns_message in test to check
message.

* move the error and warning from node_to_str to export_graphviz so it will fail early for wrong length of feature_names

* raise error if length of feature_names does not match number of features in the decision tree

* fix pep8

* remove unused assert_warns_message import in test_export.py

* add bug fix in doc/whats_new.rst

* fix the english in doc/whats_new.rst

* fix the format and english in sklearn/tree/export.py

* fix contributor format in doc/whats_new.rst

* fix english, use bracket and avoid \ in error message

* fix pep8

* fix pep8
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
…n#8477) (scikit-learn#8512)

* scikit-learn#8477

check length of feature_names in export_graphviz

- raise ValueError if len(feature_names) > tree.n_features
- add unit test for len(feature_names) > tree.n_features
- change the comment of existing unit test for len(feature_names) <
tree.n_features

* fix error and warning

- include length of feature_names and number of features in tree in the
error and warning message.
- raise error for too few feature_names
- for too much feature_names, will use the first n_features. raise an
warning for users
- use assert_raise_message and assert_warns_message in test to check
message.

* move the error and warning from node_to_str to export_graphviz so it will fail early for wrong length of feature_names

* raise error if length of feature_names does not match number of features in the decision tree

* fix pep8

* remove unused assert_warns_message import in test_export.py

* add bug fix in doc/whats_new.rst

* fix the english in doc/whats_new.rst

* fix the format and english in sklearn/tree/export.py

* fix contributor format in doc/whats_new.rst

* fix english, use bracket and avoid \ in error message

* fix pep8

* fix pep8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

4 participants
You can’t perform that action at this time.