Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Change ovr_decision_function scale #10440

Merged
merged 22 commits into from Feb 28, 2019

Conversation

6 participants
@Johayon
Copy link
Contributor

Johayon commented Jan 10, 2018

Reference Issues/PRs

Fixes #9174

What does this implement/fix? Explain your changes.

TODO;

  • add unit tests for _ovr_decision_function main properties

  • decision values near votes values (-.5,+.5) range

  • can resolve tie for a sample (information in decision_function rows values)

  • higher confidence means higher decision values (information in decision_function columns values)

  • subset invariance (not check for current implementation)

  • change scaling implementation in _ovr_decision_function

  • x/(2*(abs(x)+1))

@Johayon

This comment has been minimized.

Copy link
Contributor Author

Johayon commented Jan 10, 2018

@jnothman do you have any other unit test that I can add for the behaviour of _ovr_decision_function ?

Johayon added some commits Jan 10, 2018

max_abs_confidence = max(abs(max_confidences), abs(min_confidences))
scale = (0.5 - eps) / max_abs_confidence
return votes + sum_of_confidences * scale
sigmoid_confidences = ((np.exp(sum_of_confidences/10) - 1) /

This comment has been minimized.

@jnothman

jnothman Jan 10, 2018

Member

On what basis do you choose /10?

This comment has been minimized.

@Johayon

Johayon Jan 10, 2018

Author Contributor

not sure myself. I am worried about the behaviour of the sigmoid if the sum_of_confidences values are high.

This comment has been minimized.

@Johayon

Johayon Jan 10, 2018

Author Contributor

maybe a simple function like

lambda x:  x/(2*(np.abs(x)+1))

will be better, because we don't have to worry about overflow like with the exponential.
The only thing we need to verify the conditions we want, is a strictly increasing function which goes from (-inf,+inf) to (-.5,+.5).

# as confidence is higher on the 3rd sample for the
# two last classifier, dec value should also be higher
# for the third class
assert_true(dec_values[2, 2] > dec_values[3, 2])

This comment has been minimized.

@jnothman

jnothman Jan 10, 2018

Member

I'm not sure I get this... Try reword the comment?

This comment has been minimized.

@Johayon

Johayon Jan 10, 2018

Author Contributor

the idea here is to check that the rank inside the columns are meaningful. A sample with higher confidence in the classification should have a higher decision value.

Johayon added some commits Jan 10, 2018

@Johayon Johayon changed the title [WIP] Change ovr_decision_function scale [MRG] Change ovr_decision_function scale Jan 11, 2018

@jnothman
Copy link
Member

jnothman left a comment

LGTM!

scale = (0.5 - eps) / max_abs_confidence
return votes + sum_of_confidences * scale
transformed_confidences = (sum_of_confidences /
((2 + eps) * (np.abs(sum_of_confidences) + 1)))

This comment has been minimized.

@jnothman

jnothman Jan 15, 2018

Member

Nice choice of function!

This comment has been minimized.

@jnothman

jnothman Jan 15, 2018

Member

Maybe comment on the purpose of eps here. Is it to ensure we never reach the limits?

This comment has been minimized.

@Johayon

Johayon Jan 15, 2018

Author Contributor

yes, that's it.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jan 15, 2018

Please add an entry to the change log at doc/whats_new/v0.20.rst under bug fixes. Also please mention the changed decision_function behaviour in the changed models section up top. Like the other entries in the changelog, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

@jnothman jnothman changed the title [MRG] Change ovr_decision_function scale [MRG+1] Change ovr_decision_function scale Jan 15, 2018


if max_confidences == min_confidences:
return votes

# Scale the sum_of_confidences to (-0.5, 0.5) and add it with votes.

This comment has been minimized.

@jnothman

jnothman Jan 16, 2018

Member

scale -> monotonically transform

@albertcthomas
Copy link
Contributor

albertcthomas left a comment

A few comments. Otherwise LGTM. Thanks @Johayon! One small remark/question but more for my own understanding than for this PR particularly: is it really necessary to add eps in the scaling? The fact that we use abs(x) + 1 should be sufficient to ensure that the returned values are in (-0.5, 0.5) for any x. I agree that the limit at + and - infinity would be respectively equal to -0.5 and 0.5 but are we going to observe this in practice?


- Fixed a bug in :func:`utils.multiclass._ovr_decision_function` which would give
different results when applied to a subset due to local scaling method.
:issue:`10440` :user:`Jonathan Ohayon <Johayon>`

This comment has been minimized.

@albertcthomas

albertcthomas Mar 11, 2018

Contributor

issue by user

assert_array_equal(np.argmax(dec_values, axis=1), expected_prediction)

# third and fourth sample have the same vote but third sample
# have higher confidence, this should reflect on the decision values

This comment has been minimized.

@albertcthomas

albertcthomas Mar 11, 2018

Contributor

has a higher confidence

assert_allclose(votes, dec_values, atol=0.5)

# check that the prediction are what we expect
# highest vote or highest confidence if there is a tie

This comment has been minimized.

@albertcthomas

albertcthomas Mar 11, 2018

Contributor

add a full stop after "if there is a tie" to indicate that the next line is a new sentence

@Johayon

This comment has been minimized.

Copy link
Contributor Author

Johayon commented Mar 12, 2018

@albertcthomas because of float imprecision, it is possible that the value returned is equal to the upper (or lower) bound. It seems that epsilon is not even enough, it should be 2+2*eps because 2+eps is still 2.0 as a floating point, I should have checked that.

import numpy as np
eps = np.finfo('float').eps

f = lambda x: x/((2)*(np.abs(x)+1))
print(f(1e16) < 0.5, f(-1e16) > -0.5)

f = lambda x: x/((2+eps)*(np.abs(x)+1))
print(f(1e16) < 0.5, f(-1e16) > -0.5)

f = lambda x: x/((2+2*eps)*(np.abs(x)+1))
print(f(1e16) < 0.5, f(-1e16) > -0.5)

I will add a test to check that this is ok, it seems to be missing.

@albertcthomas

This comment has been minimized.

Copy link
Contributor

albertcthomas commented Mar 12, 2018

Thanks for the clarification. I had just compared the first and second case of your snippet and observed that they seemed to be the same.

@albertcthomas

This comment has been minimized.

Copy link
Contributor

albertcthomas commented Mar 12, 2018

So in master the values of sum_of_confidences are scaled to (-0.5, 0.5) whatever the min and max values of sum_of_confidences. I guess this was ensuring a kind of continuity of the returned _ovr_decision_function. With this PR, and the suggested monotonically increasing transform it seems to me that we are no longer ensuring this continuity, but just preserving the rankings. In that case can we use 1/3 instead of 1/2?

@Johayon

This comment has been minimized.

Copy link
Contributor Author

Johayon commented Mar 12, 2018

@albertcthomas I don't follow what you mean by "continuity" here. But I think that you are correct and we should use 1/3 in order to escape from the float imprecision.

I changed one of the example in the test, to check for float imprecision, and this test will fail in both the master and the PR if we take 1/2 even if we add an epsilon.

Johayon added some commits Mar 12, 2018

@albertcthomas

This comment has been minimized.

Copy link
Contributor

albertcthomas commented Mar 12, 2018

Sorry I was wrong for the continuity... If we just want to preserve the rankings, 1/3, although arbitrary, should be OK. I wonder if there was a good reason to choose 0.5...

scale = (0.5 - eps) / max_abs_confidence
return votes + sum_of_confidences * scale
# Use 1/3 instead of 1/2 to ensure that we won't reach the limits
transformed_confidences = (sum_of_confidences /

This comment has been minimized.

@amueller

amueller May 23, 2018

Member

Does this transformation have a name?

@amueller

This comment has been minimized.

Copy link
Member

amueller commented May 23, 2018

looks good, I think. Does this transform have a name? It's a bit adhoc, I think.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented May 23, 2018

@amueller

This comment has been minimized.

Copy link
Member

amueller commented May 24, 2018

fair. But why this and not sigmoid for example? But I guess an arbitrary choice is fine. Can we at least make sure the user guide and docstring are useful ;)

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Dec 17, 2018

Just merge master to update the Circle config

@glemaitre glemaitre self-requested a review Jan 28, 2019

sdpython added a commit to sdpython/scikit-learn that referenced this pull request Feb 14, 2019

@albertcthomas

This comment has been minimized.

Copy link
Contributor

albertcthomas commented Feb 25, 2019

@Johayon could you please rebase and fix the conflicts? I have good hopes that this PR could be merged this week.

@agramfort

This comment has been minimized.

Copy link
Member

agramfort commented Feb 25, 2019

@albertcthomas you should be able to push on this PR and do the rebase yourself.

@albertcthomas

This comment has been minimized.

Copy link
Contributor

albertcthomas commented Feb 25, 2019

@Johayon I don't have the rights to push commits to your PR. If you don't have time to do this, I can take over your PR (keeping all your commits) to finish it during the scikit-learn sprint currently happening in Paris.

@Johayon

This comment has been minimized.

Copy link
Contributor Author

Johayon commented Feb 27, 2019

Hi,
sorry about the delay, this should solve the conflicts in estimator checks.
Have a good sprint.

....................
- |Fix| Fixed an issue in :func:`utils.multiclass._ovr_decision_function` which would give
different results when applied to a subset due to local scaling method.
:pr:`10440` by :user:`Jonathan Ohayon <Johayon>`

This comment has been minimized.

@agramfort

agramfort Feb 27, 2019

Member

you should document the public funcions/classes impacted by this PR not the private functions. These are not part of the documentation.

This comment has been minimized.

@Johayon

Johayon Feb 27, 2019

Author Contributor

ok, is it better like this ?

"one-against-one" classifiers to a decision function of shape ``(n_samples,
n_classes)``::
n_classes)`` using the function :func:`_ovr_decision_function` :ref:`multiclass`::

This comment has been minimized.

@agramfort

agramfort Feb 27, 2019

Member

still refer to private function

:func:`utils.multiclass._ovr_decision_function`. :pr:`10440` by :user:`Jonathan Ohayon <Johayon>`

:mod:`sklearn.multiclass`
....................

This comment has been minimized.

@agramfort

agramfort Feb 27, 2019

Member

not enough ...

This comment has been minimized.

@Johayon

Johayon Feb 27, 2019

Author Contributor

should be fine this time.

This comment has been minimized.

@agramfort

agramfort Feb 27, 2019

Member

no. You should have as many "." as number of chars in the header title

This comment has been minimized.

@Johayon

Johayon Feb 27, 2019

Author Contributor

sorry, I did not know about that.

Johayon added some commits Feb 27, 2019

@albertcthomas

This comment has been minimized.

Copy link
Contributor

albertcthomas commented Feb 27, 2019

Thanks @Johayon! Maybe @jnothman would like to have a second look, or @amueller?

..................

- |Fix| Fixed an issue in :func:`svm.SVC.decision_function` which would give
different results when applied to a subset due to local scaling method.

This comment has been minimized.

@jnothman

jnothman Feb 28, 2019

Member

I don't think this wording is especially clear, but I think the important thing to note is that results are changed only for decision_function_shape='ovr'

This comment has been minimized.

@Johayon

Johayon Feb 28, 2019

Author Contributor

maybe something like this :

Fixed an issue in :func:`svm.SVC.decision_function` which would give inconsistent 
results for a sample due to a scaling method depend on the dataset when
decision_function_shape='ovr'.

This comment has been minimized.

@albertcthomas

albertcthomas Feb 28, 2019

Contributor

Might be better to put when decision_function_shape='ovr' at the top if this is the important information and use the problem described in the original issue, something like

Fixed an issue in :func:`svm.SVC.decision_function` when decision_function_shape='ovr'.
The decision_function value of a given sample was different depending on whether the
decision_function was evaluated on the sample alone or on a batch containing this same
sample due to the scaling used in decision_function.

@agramfort agramfort changed the title [MRG+1] Change ovr_decision_function scale [MRG+2] Change ovr_decision_function scale Feb 28, 2019

@agramfort agramfort added this to Needs review in Sprint Paris 2019 Feb 28, 2019

@GaelVaroquaux
Copy link
Member

GaelVaroquaux left a comment

LGTM. Merging. Thank you!

@GaelVaroquaux GaelVaroquaux merged commit d045c1a into scikit-learn:master Feb 28, 2019

10 checks passed

LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 92.51%)
Details
codecov/project 92.55% (+0.04%) compared to adc1e59
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

Sprint Paris 2019 automation moved this from Needs review to Done Feb 28, 2019

@Johayon

This comment has been minimized.

Copy link
Contributor Author

Johayon commented Feb 28, 2019

Thank you all.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Feb 28, 2019

Kiku-git added a commit to Kiku-git/scikit-learn that referenced this pull request Mar 4, 2019

[MRG+2] Change ovr_decision_function scale (scikit-learn#10440)
* unit test for _ovr_decision_function

* sigmoid scale

* error on subset test

* clearer comment on test

* new scaling function forget sigmoid

* remove exception in subset check for SVC decision_function

* add comment and update whats_new

* update comment

* change test for float imprecision

* use 1/3 instead of 1/2

* merge conflict error

* flake8 correction and assert_true replacement

* better comment

* refer to public functions

* remove reference to private function

* rst dots

* change fix comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.