Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG + 1] Fix numerical instability in LassoLars when alpha=0 (#7778) #7849

Merged
merged 10 commits into from Nov 12, 2016

Conversation

jmontoyam
Copy link

This pull request fixes issue #7778: sklearn LassoLars implementation does not give the same result that the LassoLars implementation available in R (lars library).

What does this implement/fix? Explain your changes.

This mismatch is due to a bug in the file least_angle.py, in lines 406 and 407 (lars_path method).
The bug is related to the way these two lines use the np.resize function.
According to the docs of the np.resize function, If the new array is larger than the original array, then the new array is filled with repeated copies of a, which causes a subtle error, because in the lars_path implementation those new values are used as if they were equal to zero.

@jmontoyam
Copy link
Author

@agramfort, @tguillemot, thank you very much for all your help and advices. This is my first pull request to sklearn, I'm very happy to start contributing to this amazing project!

@raghavrv raghavrv changed the title Fix bug 7778 [MRG] Fix numerical instability in LassoLars when alpha=0 (#7778) Nov 9, 2016
@raghavrv raghavrv added this to the 0.19 milestone Nov 9, 2016
@raghavrv raghavrv added the Bug label Nov 9, 2016
@raghavrv
Copy link
Member

raghavrv commented Nov 9, 2016

@jmontoyam Yohooo! Looking forward to more such awesome PRs :)

coefs = np.resize(coefs, (n_iter + add_features, n_features))
alphas = np.resize(alphas, n_iter + add_features)
coefs = np.concatenate((coefs,
np.zeros((add_features, n_features))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a space before np... to match the indentation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raghavrv Thank you very much!, I have added the space you suggested me ;)

Copy link
Author

@jmontoyam jmontoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raghavrv Thank you very much!, I have added the space you suggested me ;)

coefs = np.resize(coefs, (n_iter + add_features, n_features))
alphas = np.resize(alphas, n_iter + add_features)
coefs = np.concatenate((coefs,
np.zeros((add_features, n_features))),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raghavrv Thank you very much!, I have added the space you suggested me ;)

alphas = np.resize(alphas, n_iter + add_features)
coefs = np.concatenate((coefs,
np.zeros((add_features, n_features))),
axis=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you bench if this is faster:

coefs = np.resize(coefs, (n_iter + add_features, n_features))
coefs[:-add_features, :] = 0.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean:
coefs[-add_features:] = 0

The version you propose is a little bit faster (but the difference is very tiny, it is approx. 0.2ms faster according to my toy benchmark).

I will modify the code following your advice ;)

Thanks!

@agramfort
Copy link
Member

thx @jmontoyam

can you check why the tests don't pass?

@agramfort
Copy link
Member

LGTM

can you just update what's new page to document the bug fix?

thanks heaps !

@raghavrv raghavrv changed the title [MRG] Fix numerical instability in LassoLars when alpha=0 (#7778) [MRG + 1] Fix numerical instability in LassoLars when alpha=0 (#7778) Nov 10, 2016
@srmcc
Copy link

srmcc commented Nov 10, 2016

Thank you for figuring this out! It's so much help...

@jmontoyam
Copy link
Author

@agramfort @raghavrv
Please help me:
Conflicting files: doc/whats_new.rst is
This is because this file has changed since the time I made the PR.
Please excuse me!, I know this is a very beginner question: what do you suggest me to do to solve this error?
If I update the upstream and at the same time some else modify again this file, there will be a conflict again, right?

@jmontoyam
Copy link
Author

@agramfort @raghavrv
I think I have solved the conflict...but I don't know if my solution was the cleanest one :)

@NelleV NelleV dismissed raghavrv’s stale review November 12, 2016 02:50

Review is out of date.

@NelleV NelleV merged commit 5230382 into scikit-learn:master Nov 12, 2016
@NelleV
Copy link
Member

NelleV commented Nov 12, 2016

Thanks!

@raghavrv raghavrv modified the milestones: 0.18.1, 0.19 Nov 12, 2016
@raghavrv
Copy link
Member

@amueller Maybe you can sqeeze this into 0.18.1

@tguillemot
Copy link
Contributor

@jmontoyam Congrats for your first PR !!!

@jmontoyam
Copy link
Author

@tguillemot , thank you very much for all the advices and suggestions!, and for answering very kindly all the beginner questions that I asked you :)

sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
…-learn#7778) (scikit-learn#7849)

* Fix bug 7778

* Add test_lasso_lars_vs_R_implementation

* Add a space to match the indentation

* Solve E501 line too long (80 > 79 characters)

* assert_array_almost_equal up to 12 decimals

* Tiny modification for increasing performance

* Update what's new page

* Trying to solve conflicts

* Solve conflict in doc/whats_new.rst
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
…-learn#7778) (scikit-learn#7849)

* Fix bug 7778

* Add test_lasso_lars_vs_R_implementation

* Add a space to match the indentation

* Solve E501 line too long (80 > 79 characters)

* assert_array_almost_equal up to 12 decimals

* Tiny modification for increasing performance

* Update what's new page

* Trying to solve conflicts

* Solve conflict in doc/whats_new.rst
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
…-learn#7778) (scikit-learn#7849)

* Fix bug 7778

* Add test_lasso_lars_vs_R_implementation

* Add a space to match the indentation

* Solve E501 line too long (80 > 79 characters)

* assert_array_almost_equal up to 12 decimals

* Tiny modification for increasing performance

* Update what's new page

* Trying to solve conflicts

* Solve conflict in doc/whats_new.rst
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…-learn#7778) (scikit-learn#7849)

* Fix bug 7778

* Add test_lasso_lars_vs_R_implementation

* Add a space to match the indentation

* Solve E501 line too long (80 > 79 characters)

* assert_array_almost_equal up to 12 decimals

* Tiny modification for increasing performance

* Update what's new page

* Trying to solve conflicts

* Solve conflict in doc/whats_new.rst
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…-learn#7778) (scikit-learn#7849)

* Fix bug 7778

* Add test_lasso_lars_vs_R_implementation

* Add a space to match the indentation

* Solve E501 line too long (80 > 79 characters)

* assert_array_almost_equal up to 12 decimals

* Tiny modification for increasing performance

* Update what's new page

* Trying to solve conflicts

* Solve conflict in doc/whats_new.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants