New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG + 1] Fix numerical instability in LassoLars when alpha=0 (#7778) #7849
Conversation
@agramfort, @tguillemot, thank you very much for all your help and advices. This is my first pull request to sklearn, I'm very happy to start contributing to this amazing project! |
@jmontoyam Yohooo! Looking forward to more such awesome PRs :) |
coefs = np.resize(coefs, (n_iter + add_features, n_features)) | ||
alphas = np.resize(alphas, n_iter + add_features) | ||
coefs = np.concatenate((coefs, | ||
np.zeros((add_features, n_features))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a space before np...
to match the indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raghavrv Thank you very much!, I have added the space you suggested me ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raghavrv Thank you very much!, I have added the space you suggested me ;)
coefs = np.resize(coefs, (n_iter + add_features, n_features)) | ||
alphas = np.resize(alphas, n_iter + add_features) | ||
coefs = np.concatenate((coefs, | ||
np.zeros((add_features, n_features))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raghavrv Thank you very much!, I have added the space you suggested me ;)
alphas = np.resize(alphas, n_iter + add_features) | ||
coefs = np.concatenate((coefs, | ||
np.zeros((add_features, n_features))), | ||
axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you bench if this is faster:
coefs = np.resize(coefs, (n_iter + add_features, n_features))
coefs[:-add_features, :] = 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean:
coefs[-add_features:] = 0
The version you propose is a little bit faster (but the difference is very tiny, it is approx. 0.2ms faster according to my toy benchmark).
I will modify the code following your advice ;)
Thanks!
thx @jmontoyam can you check why the tests don't pass? |
LGTM can you just update what's new page to document the bug fix? thanks heaps ! |
Thank you for figuring this out! It's so much help... |
@agramfort @raghavrv |
@agramfort @raghavrv |
Thanks! |
@amueller Maybe you can sqeeze this into 0.18.1 |
@jmontoyam Congrats for your first PR !!! |
@tguillemot , thank you very much for all the advices and suggestions!, and for answering very kindly all the beginner questions that I asked you :) |
…-learn#7778) (scikit-learn#7849) * Fix bug 7778 * Add test_lasso_lars_vs_R_implementation * Add a space to match the indentation * Solve E501 line too long (80 > 79 characters) * assert_array_almost_equal up to 12 decimals * Tiny modification for increasing performance * Update what's new page * Trying to solve conflicts * Solve conflict in doc/whats_new.rst
…-learn#7778) (scikit-learn#7849) * Fix bug 7778 * Add test_lasso_lars_vs_R_implementation * Add a space to match the indentation * Solve E501 line too long (80 > 79 characters) * assert_array_almost_equal up to 12 decimals * Tiny modification for increasing performance * Update what's new page * Trying to solve conflicts * Solve conflict in doc/whats_new.rst
…-learn#7778) (scikit-learn#7849) * Fix bug 7778 * Add test_lasso_lars_vs_R_implementation * Add a space to match the indentation * Solve E501 line too long (80 > 79 characters) * assert_array_almost_equal up to 12 decimals * Tiny modification for increasing performance * Update what's new page * Trying to solve conflicts * Solve conflict in doc/whats_new.rst
…-learn#7778) (scikit-learn#7849) * Fix bug 7778 * Add test_lasso_lars_vs_R_implementation * Add a space to match the indentation * Solve E501 line too long (80 > 79 characters) * assert_array_almost_equal up to 12 decimals * Tiny modification for increasing performance * Update what's new page * Trying to solve conflicts * Solve conflict in doc/whats_new.rst
…-learn#7778) (scikit-learn#7849) * Fix bug 7778 * Add test_lasso_lars_vs_R_implementation * Add a space to match the indentation * Solve E501 line too long (80 > 79 characters) * assert_array_almost_equal up to 12 decimals * Tiny modification for increasing performance * Update what's new page * Trying to solve conflicts * Solve conflict in doc/whats_new.rst
This pull request fixes issue #7778: sklearn LassoLars implementation does not give the same result that the LassoLars implementation available in R (lars library).
What does this implement/fix? Explain your changes.
This mismatch is due to a bug in the file least_angle.py, in lines 406 and 407 (lars_path method).
The bug is related to the way these two lines use the np.resize function.
According to the docs of the np.resize function, If the new array is larger than the original array, then the new array is filled with repeated copies of a, which causes a subtle error, because in the lars_path implementation those new values are used as if they were equal to zero.