New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed a bug in isotonic regression #9484

Merged
merged 22 commits into from Sep 13, 2017

Conversation

Projects
None yet
5 participants
@dallascard
Contributor

dallascard commented Aug 2, 2017

Reference Issue

Fixes #9432

What does this implement/fix? Explain your changes.

There was an apparent bug in the preprocessing steps for isotonic regression (_make_unique), in that the weights for duplicate points were averaged, rather that summed, effectively down-weighting duplicate points in the actual isotonic regression. This fix changes the default behaviour to summing the weights, such that isotonic regression now returns the correct predictions. However, I also implemented an option (do_weight_averaging; default=False) to preserve the previous behaviour as an option.

Any other comments?

Here is a simple example to demonstrate the change:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.isotonic import IsotonicRegression
x = np.array([0, 1, 1, 2, 3])
y = np.array([0, 0, 1, 0, 1])
weights = np.ones(len(y))
ir = IsotonicRegression()
ir.fit(x, y, sample_weight=weights)
y_pred = ir.predict(x)
print("Predictions: %s" % str(y_pred))
squared_error = np.dot(weights, (y - y_pred)**2)
print("squared error = %0.3f" % squared_error)

Under the old code, this produced:

Predictions: [ 0. 0.25 0.25 0.25 1. ]
squared error = 0.688

After the update, this produces:

Predictions: [ 0. 0.33333333 0.33333333 0.33333333 1. ]
squared error = 0.667

Setting do_weight_averaging to true (i.e. ir = IsotonicRegression(do_weight_averaging=True)) restores the original result.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Aug 3, 2017

Member

Thanks a lot for the PR! You need to fix the Travis errors, they look genuine:

  • flake8 violations
  • docstring problems

However, I also implemented an option (do_weight_averaging; default=False) to preserve the previous behaviour as an option.

I don't know much about IsotonicRegression but it does feel like this is a bug, so I would think that we do not need to have a parameter to preserve the previous behaviour.

Member

lesteve commented Aug 3, 2017

Thanks a lot for the PR! You need to fix the Travis errors, they look genuine:

  • flake8 violations
  • docstring problems

However, I also implemented an option (do_weight_averaging; default=False) to preserve the previous behaviour as an option.

I don't know much about IsotonicRegression but it does feel like this is a bug, so I would think that we do not need to have a parameter to preserve the previous behaviour.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Aug 3, 2017

Member

Also you will need to add a non regression test.

Member

lesteve commented Aug 3, 2017

Also you will need to add a non regression test.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Aug 3, 2017

Member

On top of the smaller squared error, which is already quite convincing, I quickly checked with R and it agrees with your fix:

x <- c(0, 1, 1, 2, 3)
y <- c(0, 0, 1, 0, 1)
m <- isoreg(x, y)
m$yf

Output:

[1] 0.0000000 0.3333333 0.3333333 0.3333333 1.0000000
Member

lesteve commented Aug 3, 2017

On top of the smaller squared error, which is already quite convincing, I quickly checked with R and it agrees with your fix:

x <- c(0, 1, 1, 2, 3)
y <- c(0, 0, 1, 0, 1)
m <- isoreg(x, y)
m$yf

Output:

[1] 0.0000000 0.3333333 0.3333333 0.3333333 1.0000000
@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Aug 3, 2017

Member

It indeed looks like a bug (which the R check you did seems to confirm). In that respect I am not sure we need to introduce a new option to restore the old (buggy) behavior.

WDYT @amueller @jnothman ?

In any case @dallascard please add you reproduction case as a new non-regression test in sklearn/tests/test_isotonic.py.

Member

ogrisel commented Aug 3, 2017

It indeed looks like a bug (which the R check you did seems to confirm). In that respect I am not sure we need to introduce a new option to restore the old (buggy) behavior.

WDYT @amueller @jnothman ?

In any case @dallascard please add you reproduction case as a new non-regression test in sklearn/tests/test_isotonic.py.

@dallascard

This comment has been minimized.

Show comment
Hide comment
@dallascard

dallascard Aug 3, 2017

Contributor

Okay, will do.

I think it's a bug too, and that would certainly simplify the changes required, but I'm not sure what the person who originally authored the code had in mind.

Contributor

dallascard commented Aug 3, 2017

Okay, will do.

I think it's a bug too, and that would certainly simplify the changes required, but I'm not sure what the person who originally authored the code had in mind.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Aug 5, 2017

Member

I don't know much about isotonic regression, and am not able to investigate, though it looks like the author of that code was very intentional about averaging, so I can see why @dallascard is hesitant. But yes, if it's certainly a bug it might not be necessary to provide an option.

I think @NelleV usually has input on isotonic.

Member

jnothman commented Aug 5, 2017

I don't know much about isotonic regression, and am not able to investigate, though it looks like the author of that code was very intentional about averaging, so I can see why @dallascard is hesitant. But yes, if it's certainly a bug it might not be necessary to provide an option.

I think @NelleV usually has input on isotonic.

Show outdated Hide outdated sklearn/_isotonic.pyx Outdated
@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Aug 7, 2017

Member

though it looks like the author of that code was very intentional about averaging, so I can see why @dallascard is hesitant

Anything in particular that makes you say that? The vague feeling I had when I looked at it I think the Xs need to be averaged and the weights could have been averaged just by mistake, e.g. by copying and pasting a few lines of code.

Member

lesteve commented Aug 7, 2017

though it looks like the author of that code was very intentional about averaging, so I can see why @dallascard is hesitant

Anything in particular that makes you say that? The vague feeling I had when I looked at it I think the Xs need to be averaged and the weights could have been averaged just by mistake, e.g. by copying and pasting a few lines of code.

@dallascard

This comment has been minimized.

Show comment
Hide comment
@dallascard

dallascard Aug 7, 2017

Contributor

Yeah, the more I think about it, the more I think this was a bug in the original code. Perhaps there is an alternate formulation for isotonic regression, but the documentation pretty clearly states that we are using the "secondary" method from Leeuw, 1977 for breaking ties. Averaging the weights seems to contradict both the objective function, and the details of that paper.

Contributor

dallascard commented Aug 7, 2017

Yeah, the more I think about it, the more I think this was a bug in the original code. Perhaps there is an alternate formulation for isotonic regression, but the documentation pretty clearly states that we are using the "secondary" method from Leeuw, 1977 for breaking ties. Averaging the weights seems to contradict both the objective function, and the details of that paper.

@dallascard

This comment has been minimized.

Show comment
Hide comment
@dallascard

dallascard Aug 17, 2017

Contributor

Okay, following the discussion above, I decided to remove the code to preserve the old behaviour, since it seemed to be a bug. The PR is now a much simpler change (it simply removes the weight averaging). Hopefully this is now ready to merge.

Contributor

dallascard commented Aug 17, 2017

Okay, following the discussion above, I decided to remove the code to preserve the old behaviour, since it seemed to be a bug. The PR is now a much simpler change (it simply removes the weight averaging). Hopefully this is now ready to merge.

@lesteve

This looks good to me. Can you fix the minor comments and add an entry in doc/whats_new.rst to document the breaking change?

Show outdated Hide outdated sklearn/isotonic.py Outdated
https://github.com/scikit-learn/scikit-learn/issues/9432
Compare against output in R:
> library("isotone")

This comment has been minimized.

@lesteve

lesteve Aug 21, 2017

Member

Not an R expert, but I would use something like this to make sure that the isotone package is installed:

require('isotone') || (install.packages('isotone') && require('isotone'))
@lesteve

lesteve Aug 21, 2017

Member

Not an R expert, but I would use something like this to make sure that the isotone package is installed:

require('isotone') || (install.packages('isotone') && require('isotone'))
@dallascard

This comment has been minimized.

Show comment
Hide comment
@dallascard

dallascard Aug 22, 2017

Contributor

Thanks for the feedback @lesteve! I've made the changes you suggested.

Contributor

dallascard commented Aug 22, 2017

Thanks for the feedback @lesteve! I've made the changes you suggested.

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Aug 24, 2017

Member

LGTM

please update what's new and let's merge

Member

agramfort commented Aug 24, 2017

LGTM

please update what's new and let's merge

dallascard added some commits Sep 12, 2017

Revert "updating whats_new"
This reverts commit d843940.
@dallascard

This comment has been minimized.

Show comment
Hide comment
@dallascard

dallascard Sep 13, 2017

Contributor

Okay, sorry for the delay. I guess I missed the release of 0.19, but hopefully we can merge this into 0.20dev.

Also, sincere apologies for the last dozen commits. I got a bit confused by the splitting of the whats_new file, and then had to figure out how to undo some changes. But, it all seems to be in good shape now.

Contributor

dallascard commented Sep 13, 2017

Okay, sorry for the delay. I guess I missed the release of 0.19, but hopefully we can merge this into 0.20dev.

Also, sincere apologies for the last dozen commits. I got a bit confused by the splitting of the whats_new file, and then had to figure out how to undo some changes. But, it all seems to be in good shape now.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Sep 13, 2017

Member

@dallascard thanks for merging master into your branch! Kudos for figuring out the whats_new refactoring and sorry for the pain!

I pushed a minor change and will merge this one when the CIs are green.

Member

lesteve commented Sep 13, 2017

@dallascard thanks for merging master into your branch! Kudos for figuring out the whats_new refactoring and sorry for the pain!

I pushed a minor change and will merge this one when the CIs are green.

@lesteve lesteve merged commit a01443b into scikit-learn:master Sep 13, 2017

6 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.17%)
Details
codecov/project 96.17% (+<.01%) compared to 21b0bb4
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
lgtm analysis: Python No alert changes
Details
@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Sep 13, 2017

Member

Merged, thanks a lot @dallascard!

Member

lesteve commented Sep 13, 2017

Merged, thanks a lot @dallascard!

massich added a commit to massich/scikit-learn that referenced this pull request Sep 15, 2017

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment