Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Issue#5803 : Regression Test added #8112

Merged
merged 7 commits into from Jun 8, 2017

Conversation

amanp10
Copy link
Contributor

@amanp10 amanp10 commented Dec 24, 2016

Fixes: #5803

Regression Test for testing the functioning of predict for hard voting.

Tests whether the fix for np.bincount by changing the datatype of predictions array to int works properly.

@jnothman
Copy link
Member

I don't think this failed before #5802's patch:

$ git fetch https://github.com/scikit-learn/scikit-learn refs/pull/8099/head
From https://github.com/scikit-learn/scikit-learn
 * branch            refs/pull/8099/head -> FETCH_HEAD
$ git checkout FETCH_HEAD
HEAD is now at 0a0492a... Merge branch 'master' of https://github.com/amanp10/scikit-learn into Branch1
$ git revert a161f2c
[detached HEAD 5e96367] Revert "predictions are float but np.bincount can only be used on integers, thus an error comes out when using it"
 1 file changed, 1 insertion(+), 1 deletion(-)
$ nosetests sklearn/ensemble/tests/test_voting_classifier.py
..............
----------------------------------------------------------------------
Ran 14 tests in 1.711s

OK

@MechCoder
Copy link
Member

Unless I'm reading the old issue wrongly, isn't the output of the method of the _predict output already encoded? That seems to be enforced here (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/voting_classifier.py#L164) no?

@amanp10
Copy link
Contributor Author

amanp10 commented Dec 28, 2016

I have used regression algorithms as classifiers for the test because only they return prediction values as float.
There were also some PEP8 errors from before in the test file. Should I remove them?

@amanp10
Copy link
Contributor Author

amanp10 commented Dec 28, 2016

@MechCoder sorry I didn't get you.

@jnothman
Copy link
Member

jnothman commented Jan 9, 2017

The point is, @amanp10, that the success of this test was not affected by the patch in a161f2c. We want a test that would have failed at the last release, but succeeds now.

@amanp10
Copy link
Contributor Author

amanp10 commented Jan 10, 2017

When I run the test on my machine it does give the following error before the patch
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'
Also, it seems logical since algorithms returning float values as predictions would produce an error from np.bincount.

@jnothman
Copy link
Member

Really? Is the bug a function of numpy version?

$ git checkout 0.18.1
HEAD is now at a5ab948... skip tree-test on 32bit
$ ipython
Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:52:12)
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: %paste
from sklearn.ensemble.tests.test_voting_classifier import *
from sklearn.linear_model import Ridge, Lasso

def test_predict_for_hard_voting():
  # Test predictions array data type error
  clf1 = Ridge(random_state=123)
  clf2 = Lasso(random_state=123)
  clf3 = SVC(probability=True, random_state=123)
  eclf1 = VotingClassifier(estimators=[
      ('rd', clf1), ('la', clf2), ('svc', clf3)], weights=[1, 2, 3],
      voting='hard')

  eclf1.fit(X, y)
  eclf1.predict(X)
## -- End pasted text --

In [2]:

In [2]:

In [2]:

In [2]: test_predict_for_hard_voting()

In [3]:

@amanp10
Copy link
Contributor Author

amanp10 commented Jan 10, 2017

What I have done is manually reverted the changes in the commit a161f2c since it was only a one line change. Then I ran the test and it failed as expected. Here is the error log.

Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/amanp10/gsoc/scikit-learn/sklearn/ensemble/tests/test_voting_classifier.py", line 290, in test_predict_for_hard_voting
eclf1.predict(X)
File "/home/amanp10/gsoc/scikit-learn/sklearn/ensemble/voting_classifier.py", line 195, in predict
arr=predictions)
File "/usr/lib/python2.7/dist-packages/numpy/lib/shape_base.py", line 79, in apply_along_axis
res = func1d(arr[tuple(i.tolist())],*args)
File "/home/amanp10/gsoc/scikit-learn/sklearn/ensemble/voting_classifier.py", line 193, in
weights=self.weights)),
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'


Ran 14 tests in 2.368s

FAILED (errors=1)

@amanp10
Copy link
Contributor Author

amanp10 commented Jan 11, 2017

I am waiting for your final response, meanwhile I may take up another issue if its okay?

@jnothman
Copy link
Member

You're right. I can get that test to fail if I revert the patch.

But now I'm confused as to why other tests aren't failing in the same situation. There are other tests performing hard voting, with weights, where predict is called, such as test_predict_on_toy_problem (although there the weights are all equal) and test_gridsearch.

I'm sorry that I don't know the code well, and this is exacerbated by a very poor description of the issue at #5802 and #5803; there is no changelog entry for #5802 to tell me what's fixed exactly, either.

Could you please explain... and add a changelog entry? Thanks so much.

@amanp10
Copy link
Contributor Author

amanp10 commented Jan 12, 2017

I understand your dilemma. The test test_predict_on_toy_problem and test_gridsearch passed because they used classification algorithms for the voting classifier, thus returning INT prediction values (class labels). However, the fix was for FLOAT prediction values which we get only if regression algorithms are used in the voting classifier. Hence, my test failed due to FLOAT prediction values, which is indeed fixed in the commit a161f2c

@jnothman
Copy link
Member

Oh, of course. And yet, I'm not sure that we should be admitting regressors here. Those regressors are not outputting ints, and an astype(int) transformation isn't going to make sense of a regressor trained on values (0, 1), or (-1, 1). I suspect we should require that VotingClassifier only accept constituent classifiers. Really if a classifier is fit on integer y it should return integer y, just as it is expected to return strings if it is fit on strings. That's what LabelEncoder and classes_'s job is. If all of scikit-learn's classifiers are well-behaved, you're only going to be able to construct a test by building an ill-behaved classifier in the test :\

I'm glad I finally understand what's going on. In the future it would be appreciated if you made sure your PR description is explicit about what you're fixing (particularly if the issue description is lacking).

@amanp10
Copy link
Contributor Author

amanp10 commented Jan 12, 2017

How do you suggest I do it? I will need some guidance here on making an ill-behaved classifier.

@amanp10
Copy link
Contributor Author

amanp10 commented Jan 12, 2017

@alvarouc Please tell me what exactly failed before a161f2c so I can put it in the regression test.

@jnothman
Copy link
Member

class DodgySVC(SVC):
    def predict(self, X):
        return super(DodgySVC, self).predict(X).astype(float)

?
:P

@amanp10
Copy link
Contributor Author

amanp10 commented Jan 12, 2017

I used this class as it is and it fulfills our need. It fails before the patch and works after it. Do I need to add more classifiers like this or just one is okay?

@jnothman
Copy link
Member

jnothman commented Jan 12, 2017 via email

@amanp10
Copy link
Contributor Author

amanp10 commented Jan 12, 2017

What do you suggest I do? There seems to be no other way to make the test.

@jnothman
Copy link
Member

I'm okay with using that. I might not call it Dodgy if I were you... but I would leave a comment that it is dubiously conforming to our API.

@amanp10
Copy link
Contributor Author

amanp10 commented Jan 12, 2017

I have made some changes, please have a look @jnothman.

@amanp10
Copy link
Contributor Author

amanp10 commented Jan 15, 2017

@jnothman I am waiting for the final reviews on this one, so that I can make the corrections soon as I wont be available for a week.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'm not sure we should care about this case, but this LGTM.

@jnothman jnothman changed the title [MRG] Issue#5803 : Regression Test added [MRG+1] Issue#5803 : Regression Test added Jan 18, 2017
@agramfort
Copy link
Member

LGTM

shall I merge?



def test_predict_for_hard_voting():
# Test predictions array data type error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary comment

@glemaitre
Copy link
Member

LGTM +1 after removing the comment

@agramfort agramfort merged commit b93be1f into scikit-learn:master Jun 8, 2017
@agramfort
Copy link
Member

thx @glemaitre

i'll remove the comment in master

@amanp10 amanp10 deleted the Branch1 branch June 8, 2017 15:58
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
* Issue#5803 Regression Test added

* Float predictions values used for testing

* Custom Classifier used for test
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
* Issue#5803 Regression Test added

* Float predictions values used for testing

* Custom Classifier used for test
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
* Issue#5803 Regression Test added

* Float predictions values used for testing

* Custom Classifier used for test
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
* Issue#5803 Regression Test added

* Float predictions values used for testing

* Custom Classifier used for test
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
* Issue#5803 Regression Test added

* Float predictions values used for testing

* Custom Classifier used for test
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
* Issue#5803 Regression Test added

* Float predictions values used for testing

* Custom Classifier used for test
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
* Issue#5803 Regression Test added

* Float predictions values used for testing

* Custom Classifier used for test
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
* Issue#5803 Regression Test added

* Float predictions values used for testing

* Custom Classifier used for test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error when using VotingClassifier on data with more than 2 classes
5 participants