New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Adding return_std options for models in linear_model/bayes.py #7838

Merged
merged 32 commits into from Dec 1, 2016

Conversation

Projects
None yet
5 participants
@sergeyf
Contributor

sergeyf commented Nov 7, 2016

Reference Issue

The reason for this pull request appears in a conversation for #4844

What does this implement/fix? Explain your changes.

This is the first of two pull requests. The ultimate goal is to add the MICE imputation algorithm to scikit-learn. To do so, we need sklearn's Bayesian regression algorithms to be able to return standard deviations as well as predictions.

This pull requests adds the option return_std to the predict methods of both BayesianRidge and ARDRegression.

Any other comments?

Once this is accepted, I will make a pull request that implements MICE using BayesianRidge by default (which seems more robust to small sample sizes than ARD in my limited experience).

@raghavrv raghavrv added this to the 0.19 milestone Nov 7, 2016

@raghavrv raghavrv added the New Feature label Nov 7, 2016

Show outdated Hide outdated sklearn/linear_model/bayes.py

@sergeyf sergeyf closed this Nov 7, 2016

@sergeyf sergeyf reopened this Nov 7, 2016

@sergeyf sergeyf changed the title from [MRG] Adding predict_std options for models in linear_model/bayes.py to [MRG] Adding return_std options for models in linear_model/bayes.py Nov 7, 2016

@sergeyf

This comment has been minimized.

Show comment
Hide comment
@sergeyf

sergeyf Nov 7, 2016

Contributor

@jnothman I changed all instances of predict_std to return_std. Thanks for pointing that one out.

Contributor

sergeyf commented Nov 7, 2016

@jnothman I changed all instances of predict_std to return_std. Thanks for pointing that one out.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 7, 2016

Member

Could you put the new figure in the fourth quadrant of the existing plots?

Member

jnothman commented Nov 7, 2016

Could you put the new figure in the fourth quadrant of the existing plots?

@sergeyf

This comment has been minimized.

Show comment
Hide comment
@sergeyf

sergeyf Nov 7, 2016

Contributor

The problem is that it the uncertainty grows linearly. I tried zooming out further, but the plot looks qualitatively the same.

One idea is to have a non-linear function f(x), and then use a polynomial kernel to estimate it. I'll give this a shot.

And yes, I'll put the figures in the fourth quadrant.

Thanks.

Contributor

sergeyf commented Nov 7, 2016

The problem is that it the uncertainty grows linearly. I tried zooming out further, but the plot looks qualitatively the same.

One idea is to have a non-linear function f(x), and then use a polynomial kernel to estimate it. I'll give this a shot.

And yes, I'll put the figures in the fourth quadrant.

Thanks.

@sergeyf

This comment has been minimized.

Show comment
Hide comment
@sergeyf

sergeyf Nov 8, 2016

Contributor

@jnothman I tried the polynomial regression thing. Check it out. It works better with BayesianRidge than with ARD, but the idea is there in both I think.

Contributor

sergeyf commented Nov 8, 2016

@jnothman I tried the polynomial regression thing. Check it out. It works better with BayesianRidge than with ARD, but the idea is there in both I think.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 8, 2016

Member

ARD:
ARD
BRR:
BRR

Inserted with:

ARD:
![ARD](https://$CIRCLE_BUILDNO-843222-gh.circle-artifacts.com/0/home/ubuntu/scikit-learn/doc/_build/html/stable/_images/sphx_glr_plot_ard_004.png)
BRR:   
![BRR](https://$CIRCLE_BUILDNO-843222-gh.circle-artifacts.com/0/home/ubuntu/scikit-learn/doc/_build/html/stable/_images/sphx_glr_plot_bayesian_ridge_004.png)
Member

jnothman commented Nov 8, 2016

ARD:
ARD
BRR:
BRR

Inserted with:

ARD:
![ARD](https://$CIRCLE_BUILDNO-843222-gh.circle-artifacts.com/0/home/ubuntu/scikit-learn/doc/_build/html/stable/_images/sphx_glr_plot_ard_004.png)
BRR:   
![BRR](https://$CIRCLE_BUILDNO-843222-gh.circle-artifacts.com/0/home/ubuntu/scikit-learn/doc/_build/html/stable/_images/sphx_glr_plot_bayesian_ridge_004.png)

Sergey Feldman added some commits Nov 8, 2016

@sergeyf

This comment has been minimized.

Show comment
Hide comment
@sergeyf

sergeyf Nov 8, 2016

Contributor

That's pretty weird. On my machine the ARD plot looks like this:

image

I guess it takes a huge portion of the 30 polynomial features and decides they are not "relevant." I changed it as follows: clf_poly = ARDRegression(threshold_lambda=1e5) Hopefully it won't get rid of all of these dimensions now.

Contributor

sergeyf commented Nov 8, 2016

That's pretty weird. On my machine the ARD plot looks like this:

image

I guess it takes a huge portion of the 30 polynomial features and decides they are not "relevant." I changed it as follows: clf_poly = ARDRegression(threshold_lambda=1e5) Hopefully it won't get rid of all of these dimensions now.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 8, 2016

Member

yes that looks better. I'll look at the code later.

ARD:
ARD
BRR:
BRR

Member

jnothman commented Nov 8, 2016

yes that looks better. I'll look at the code later.

ARD:
ARD
BRR:
BRR

sergeyf added some commits Nov 8, 2016

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 8, 2016

Member

The main code looks good. I now think you should probably pull out those plots as a separate example specifically about return_std. Otherwise, there needs to be more emphasis (e.g. in the example docstring) on how the new plot relates to the rest.

Member

jnothman commented Nov 8, 2016

The main code looks good. I now think you should probably pull out those plots as a separate example specifically about return_std. Otherwise, there needs to be more emphasis (e.g. in the example docstring) on how the new plot relates to the rest.

@sergeyf

This comment has been minimized.

Show comment
Hide comment
@sergeyf

sergeyf Nov 10, 2016

Contributor

Hey @jnothman sorry for the delay. I hit a big patch of work, so I took the easy way out and added a few sentences to the doc string.

Contributor

sergeyf commented Nov 10, 2016

Hey @jnothman sorry for the delay. I hit a big patch of work, so I took the easy way out and added a few sentences to the doc string.

Show outdated Hide outdated sklearn/linear_model/bayes.py
@sergeyf

This comment has been minimized.

Show comment
Hide comment
@sergeyf

sergeyf Nov 16, 2016

Contributor

@jnothman I've added your latest suggestions. Thanks for your patience, and sorry about my uneducated documentation style.

Contributor

sergeyf commented Nov 16, 2016

@jnothman I've added your latest suggestions. Thanks for your patience, and sorry about my uneducated documentation style.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 16, 2016

Member

No apologies required!

Member

jnothman commented Nov 16, 2016

No apologies required!

@jnothman jnothman changed the title from [MRG] Adding return_std options for models in linear_model/bayes.py to [MRG+1] Adding return_std options for models in linear_model/bayes.py Nov 16, 2016

@sergeyf

This comment has been minimized.

Show comment
Hide comment
@sergeyf

sergeyf Nov 20, 2016

Contributor

@jnothman One of the checks is failing. It says FAIL: sklearn.linear_model.tests.test_logistic.test_multinomial_logistic_regression_string_inputs but I didn't modify that test at all. Why did it start failing?

Contributor

sergeyf commented Nov 20, 2016

@jnothman One of the checks is failing. It says FAIL: sklearn.linear_model.tests.test_logistic.test_multinomial_logistic_regression_string_inputs but I didn't modify that test at all. Why did it start failing?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 20, 2016

Member

I agree those test failures don't appear to be your problem.

Member

jnothman commented Nov 20, 2016

I agree those test failures don't appear to be your problem.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Nov 21, 2016

Member

@sergeyf you still have flake8 errors in Travis. My advice is to take some time to have on-the-fly flake8 linting in your editor of choice, it will save you a lot of time in the long run.

Member

lesteve commented Nov 21, 2016

@sergeyf you still have flake8 errors in Travis. My advice is to take some time to have on-the-fly flake8 linting in your editor of choice, it will save you a lot of time in the long run.

@sergeyf

This comment has been minimized.

Show comment
Hide comment
@sergeyf

sergeyf Nov 21, 2016

Contributor

@lesteve Thanks for the suggestion. I use Spyder - I think it has pyflakes integration. I'll dig around. The last two should be fixed now in any case.

Contributor

sergeyf commented Nov 21, 2016

@lesteve Thanks for the suggestion. I use Spyder - I think it has pyflakes integration. I'll dig around. The last two should be fixed now in any case.

@amueller

Looks ok, haven't checked the math (yet?)

Show outdated Hide outdated sklearn/linear_model/bayes.py
Show outdated Hide outdated sklearn/linear_model/bayes.py
Show outdated Hide outdated sklearn/linear_model/bayes.py
Show outdated Hide outdated sklearn/linear_model/bayes.py
return np.dot(X, w) + b
def f_noise(X, noise_mult):
return f(X) + np.random.randn(X.shape[0])*noise_mult

This comment has been minimized.

@amueller

amueller Nov 23, 2016

Member

is that pep8 without space around *? hm

@amueller

amueller Nov 23, 2016

Member

is that pep8 without space around *? hm

This comment has been minimized.

@sergeyf
@sergeyf

This comment has been minimized.

@amueller

amueller Nov 30, 2016

Member

fair

@amueller
Show outdated Hide outdated sklearn/linear_model/tests/test_bayes.py
Show outdated Hide outdated sklearn/linear_model/tests/test_bayes.py
Sergey Feldman
@sergeyf

This comment has been minimized.

Show comment
Hide comment
@sergeyf

sergeyf Nov 24, 2016

Contributor

@amueller I've made the requested fixes. Thanks!

Contributor

sergeyf commented Nov 24, 2016

@amueller I've made the requested fixes. Thanks!

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Nov 30, 2016

Member

can you fix pep8? Integration is failing.

Member

amueller commented Nov 30, 2016

can you fix pep8? Integration is failing.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Nov 30, 2016

Member

Other than that LGTM

Member

amueller commented Nov 30, 2016

Other than that LGTM

@amueller amueller changed the title from [MRG+1] Adding return_std options for models in linear_model/bayes.py to [MRG+2] Adding return_std options for models in linear_model/bayes.py Nov 30, 2016

@sergeyf

This comment has been minimized.

Show comment
Hide comment
@sergeyf

sergeyf Nov 30, 2016

Contributor

Thanks @amueller, just fixed that last one.

Contributor

sergeyf commented Nov 30, 2016

Thanks @amueller, just fixed that last one.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Dec 1, 2016

Member

thanks!

Member

amueller commented Dec 1, 2016

thanks!

@amueller amueller merged commit 67a85b8 into scikit-learn:master Dec 1, 2016

2 of 3 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@sergeyf sergeyf deleted the sergeyf:MICE branch Dec 2, 2016

adis300 added a commit to adis300/scikit-learn that referenced this pull request Dec 13, 2016

[MRG+2] Adding return_std options for models in linear_model/bayes.py (
…#7838)

* initial commit for return_std

* initial commit for return_std

* adding tests, examples, ARD predict_std

* adding tests, examples, ARD predict_std

* a smidge more documentation

* a smidge more documentation

* Missed a few PEP8 issues

* Changing predict_std to return_std #1

* Changing predict_std to return_std #2

* Changing predict_std to return_std #3

* Changing predict_std to return_std final

* adding better plots via polynomial regression

* trying to fix flake error

* fix to ARD plotting issue

* fixing some flakes

* Two blank lines part 1

* Two blank lines part 2

* More newlines!

* Even more newlines

* adding info to the doc string for the two plot files

* Rephrasing "polynomial" for Bayesian Ridge Regression

* Updating "polynomia" for ARD

* Adding more formal references

* Another asked-for improvement to doc string.

* Fixing flake8 errors

* Cleaning up the tests a smidge.

* A few more flakes

* requested fixes from Andy

* Mini bug fix

* Final pep8 fix

* pep8 fix round 2

* Fix beta_ to alpha_ in the comments

sergeyf added a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017

[MRG+2] Adding return_std options for models in linear_model/bayes.py (
…#7838)

* initial commit for return_std

* initial commit for return_std

* adding tests, examples, ARD predict_std

* adding tests, examples, ARD predict_std

* a smidge more documentation

* a smidge more documentation

* Missed a few PEP8 issues

* Changing predict_std to return_std #1

* Changing predict_std to return_std #2

* Changing predict_std to return_std #3

* Changing predict_std to return_std final

* adding better plots via polynomial regression

* trying to fix flake error

* fix to ARD plotting issue

* fixing some flakes

* Two blank lines part 1

* Two blank lines part 2

* More newlines!

* Even more newlines

* adding info to the doc string for the two plot files

* Rephrasing "polynomial" for Bayesian Ridge Regression

* Updating "polynomia" for ARD

* Adding more formal references

* Another asked-for improvement to doc string.

* Fixing flake8 errors

* Cleaning up the tests a smidge.

* A few more flakes

* requested fixes from Andy

* Mini bug fix

* Final pep8 fix

* pep8 fix round 2

* Fix beta_ to alpha_ in the comments

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG+2] Adding return_std options for models in linear_model/bayes.py (
…#7838)

* initial commit for return_std

* initial commit for return_std

* adding tests, examples, ARD predict_std

* adding tests, examples, ARD predict_std

* a smidge more documentation

* a smidge more documentation

* Missed a few PEP8 issues

* Changing predict_std to return_std #1

* Changing predict_std to return_std #2

* Changing predict_std to return_std #3

* Changing predict_std to return_std final

* adding better plots via polynomial regression

* trying to fix flake error

* fix to ARD plotting issue

* fixing some flakes

* Two blank lines part 1

* Two blank lines part 2

* More newlines!

* Even more newlines

* adding info to the doc string for the two plot files

* Rephrasing "polynomial" for Bayesian Ridge Regression

* Updating "polynomia" for ARD

* Adding more formal references

* Another asked-for improvement to doc string.

* Fixing flake8 errors

* Cleaning up the tests a smidge.

* A few more flakes

* requested fixes from Andy

* Mini bug fix

* Final pep8 fix

* pep8 fix round 2

* Fix beta_ to alpha_ in the comments

NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

[MRG+2] Adding return_std options for models in linear_model/bayes.py (
…#7838)

* initial commit for return_std

* initial commit for return_std

* adding tests, examples, ARD predict_std

* adding tests, examples, ARD predict_std

* a smidge more documentation

* a smidge more documentation

* Missed a few PEP8 issues

* Changing predict_std to return_std #1

* Changing predict_std to return_std #2

* Changing predict_std to return_std #3

* Changing predict_std to return_std final

* adding better plots via polynomial regression

* trying to fix flake error

* fix to ARD plotting issue

* fixing some flakes

* Two blank lines part 1

* Two blank lines part 2

* More newlines!

* Even more newlines

* adding info to the doc string for the two plot files

* Rephrasing "polynomial" for Bayesian Ridge Regression

* Updating "polynomia" for ARD

* Adding more formal references

* Another asked-for improvement to doc string.

* Fixing flake8 errors

* Cleaning up the tests a smidge.

* A few more flakes

* requested fixes from Andy

* Mini bug fix

* Final pep8 fix

* pep8 fix round 2

* Fix beta_ to alpha_ in the comments

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+2] Adding return_std options for models in linear_model/bayes.py (
…#7838)

* initial commit for return_std

* initial commit for return_std

* adding tests, examples, ARD predict_std

* adding tests, examples, ARD predict_std

* a smidge more documentation

* a smidge more documentation

* Missed a few PEP8 issues

* Changing predict_std to return_std #1

* Changing predict_std to return_std #2

* Changing predict_std to return_std #3

* Changing predict_std to return_std final

* adding better plots via polynomial regression

* trying to fix flake error

* fix to ARD plotting issue

* fixing some flakes

* Two blank lines part 1

* Two blank lines part 2

* More newlines!

* Even more newlines

* adding info to the doc string for the two plot files

* Rephrasing "polynomial" for Bayesian Ridge Regression

* Updating "polynomia" for ARD

* Adding more formal references

* Another asked-for improvement to doc string.

* Fixing flake8 errors

* Cleaning up the tests a smidge.

* A few more flakes

* requested fixes from Andy

* Mini bug fix

* Final pep8 fix

* pep8 fix round 2

* Fix beta_ to alpha_ in the comments

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+2] Adding return_std options for models in linear_model/bayes.py (
…#7838)

* initial commit for return_std

* initial commit for return_std

* adding tests, examples, ARD predict_std

* adding tests, examples, ARD predict_std

* a smidge more documentation

* a smidge more documentation

* Missed a few PEP8 issues

* Changing predict_std to return_std #1

* Changing predict_std to return_std #2

* Changing predict_std to return_std #3

* Changing predict_std to return_std final

* adding better plots via polynomial regression

* trying to fix flake error

* fix to ARD plotting issue

* fixing some flakes

* Two blank lines part 1

* Two blank lines part 2

* More newlines!

* Even more newlines

* adding info to the doc string for the two plot files

* Rephrasing "polynomial" for Bayesian Ridge Regression

* Updating "polynomia" for ARD

* Adding more formal references

* Another asked-for improvement to doc string.

* Fixing flake8 errors

* Cleaning up the tests a smidge.

* A few more flakes

* requested fixes from Andy

* Mini bug fix

* Final pep8 fix

* pep8 fix round 2

* Fix beta_ to alpha_ in the comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment