Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add common tests for consistent decision_function behavior in binary case #10175

Closed
amueller opened this issue Nov 20, 2017 · 12 comments
Closed

Comments

@amueller
Copy link
Member

I think there is no common test right now to check that the decision_function behavior is consistent across all estimators. We check that it is consistent with predict_proba and predict if the classes are [0, 1], but I don't think there is a test for [-1, 1] or arbitrary strings. I vaguely recall hard-coded cases for this, so I think adding a test would be really good.

@NarineK
Copy link
Contributor

NarineK commented Nov 26, 2017

Do you still need help with this issue ? May I work on it, @amueller ?

@jnothman
Copy link
Member

Sure you may.

@NarineK
Copy link
Contributor

NarineK commented Dec 15, 2017

@amueller
Copy link
Member Author

that tests already tests something like this, but uses integers for classes.
You could check out https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/estimator_checks.py#L1394 which is multi-class, and maybe add a binary case and also check the decision functions there?

@NarineK
Copy link
Contributor

NarineK commented Dec 15, 2017

I see. Let me check that one

@NarineK
Copy link
Contributor

NarineK commented Dec 28, 2017

As I was writing the tests, I've noticed that the output of decision_function for NuSVC and SVC is inconsistent with predict when decision_function_shape='ovo'. It works fine with 'orv'
Here is an example:

classifier = NuSVC(decision_function_shape=‘ovo’)
X, y = make_blobs(n_samples=30, random_state=0, cluster_std=0.1)
X, y = shuffle(X, y, random_state=7)
X = StandardScaler().fit_transform(X)
X -= X.min() - .1
X = pairwise_estimator_convert_X(X, clf)
y_names = np.array(["one", "two", "three"])[y]
y_ = y_names
set_random_state(classifier)
classifier.fit(X, y_)
y_pred = classifier.predict(X)

decision = classifier.decision_function(X)
decision_y = np.argmax(decision, axis=1).astype(int)
dec_func = classifier.classes_[decision_y]

output:

dec_func:

>>> array(['two', 'three', 'two', 'two', 'two', 'one', 'two', 'one', 'one',
       'three', 'two', 'one', 'one', 'three', 'two', 'one', 'one', 'one',
       'two', 'one', 'one', 'three', 'two', 'three', 'three', 'two',
       'three', 'one', 'one', 'one'],
      dtype='|S5')

y_pred:
>>> 
array(['three', 'one', 'three', 'three', 'three', 'one', 'three', 'one',
       'two', 'one', 'three', 'two', 'two', 'one', 'three', 'two', 'two',
       'two', 'three', 'two', 'two', 'one', 'three', 'one', 'one', 'three',
       'one', 'two', 'two', 'one'],
      dtype='|S5')

Are you aware of this ? @amueller @jnothman
Should I exclude it from the tests ?

@jnothman
Copy link
Member

jnothman commented Dec 31, 2017 via email

@NarineK
Copy link
Contributor

NarineK commented Dec 31, 2017

I didn't see exactly the same issue and created: #10388.
Should I skip SVC test cases or wait for the issue to be fixed ?

@jnothman
Copy link
Member

jnothman commented Jan 1, 2018 via email

@NarineK
Copy link
Contributor

NarineK commented Jan 2, 2018

I see, I'll do the checks only for binary then.

@NarineK
Copy link
Contributor

NarineK commented Feb 2, 2018

@jnothman , @amueller have you had time to take a look into my pull request?

@jnothman
Copy link
Member

jnothman commented Feb 3, 2018

Sorry, I'm not sure how this flew under my radar. Will try take a look soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants