-
-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG + 2] Allow f_regression to accept a sparse matrix with centering #8065
Conversation
Allows f_regression to accept a sparse matrix when centering=True.
thanks this looks nice. We have a bit of a backlog on reviews though |
LGTM +1 for merge after a what's new update. thx @acadiansith |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending a minor clarification (for my understanding), this LGTM. There is no algorithmic change and we now support sparse...
Thanks!!
n_samples = X.shape[0] | ||
|
||
# compute centered values | ||
# note that E[(x - mean(x))*(y - mean(y))] = E[x*(y - mean(y))], so we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is applicable only when y is mean centered correct? In which case it would be E[x*y]
? (Sorry if I'm misunderstanding)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is applicable even when y
is not centered.
Yet you are right, here we compute E[x*(y - mean(y))
by first centering y
and then computing E[x*y]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes sorry my math is a bit rusty... Thanks heaps for the clarification (online and offline)!
This needs a whatsnew entry as observed by @agramfort ... |
Thanks @acadiansith |
…scikit-learn#8065) * Updated centering for f_regression Allows f_regression to accept a sparse matrix when centering=True. * Fixed E226 spacing issue. * Added f_regression sparse update to whats_new.rst
…scikit-learn#8065) * Updated centering for f_regression Allows f_regression to accept a sparse matrix when centering=True. * Fixed E226 spacing issue. * Added f_regression sparse update to whats_new.rst
…scikit-learn#8065) * Updated centering for f_regression Allows f_regression to accept a sparse matrix when centering=True. * Fixed E226 spacing issue. * Added f_regression sparse update to whats_new.rst
…scikit-learn#8065) * Updated centering for f_regression Allows f_regression to accept a sparse matrix when centering=True. * Fixed E226 spacing issue. * Added f_regression sparse update to whats_new.rst
…scikit-learn#8065) * Updated centering for f_regression Allows f_regression to accept a sparse matrix when centering=True. * Fixed E226 spacing issue. * Added f_regression sparse update to whats_new.rst
Reference Issue
N/A
What does this implement/fix? Explain your changes.
f_regression
currently doesn't accept sparse matrices whencenter=True
to avoid allocating a dense matrix of the centeredX
values, but the computation can be completed without this dense matrix. The numerators can take advantage of the observation thatE[(X - E[X])(Y - E[Y])] = E[X(Y - E[Y])]
, and the denominator can useE[(X - E[X])^2] = E[X^2] - E[X]^2
.I've also included a unit test to verify that the output is the same for sparse and dense versions of a matrix.
Any other comments?
The output is the same as before (I've checked by hand), but I don't have any tests included to confirm this.