Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG + 2] Allow f_regression to accept a sparse matrix with centering #8065

Merged
merged 3 commits into from
Dec 20, 2016

Conversation

acadiansith
Copy link
Contributor

Reference Issue

N/A

What does this implement/fix? Explain your changes.

f_regression currently doesn't accept sparse matrices when center=True to avoid allocating a dense matrix of the centered X values, but the computation can be completed without this dense matrix. The numerators can take advantage of the observation that E[(X - E[X])(Y - E[Y])] = E[X(Y - E[Y])], and the denominator can use E[(X - E[X])^2] = E[X^2] - E[X]^2.

I've also included a unit test to verify that the output is the same for sparse and dense versions of a matrix.

Any other comments?

The output is the same as before (I've checked by hand), but I don't have any tests included to confirm this.

Allows f_regression to accept a sparse matrix when centering=True.
@amueller
Copy link
Member

thanks this looks nice. We have a bit of a backlog on reviews though

@agramfort
Copy link
Member

LGTM

+1 for merge after a what's new update.

thx @acadiansith

@agramfort agramfort changed the title [MRG] Allow f_regression to accept a sparse matrix with centering [MRG+1] Allow f_regression to accept a sparse matrix with centering Dec 18, 2016
Copy link
Member

@raghavrv raghavrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending a minor clarification (for my understanding), this LGTM. There is no algorithmic change and we now support sparse...

Thanks!!

n_samples = X.shape[0]

# compute centered values
# note that E[(x - mean(x))*(y - mean(y))] = E[x*(y - mean(y))], so we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is applicable only when y is mean centered correct? In which case it would be E[x*y]? (Sorry if I'm misunderstanding)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is applicable even when y is not centered.
Yet you are right, here we compute E[x*(y - mean(y)) by first centering y and then computing E[x*y].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes sorry my math is a bit rusty... Thanks heaps for the clarification (online and offline)!

@raghavrv raghavrv changed the title [MRG+1] Allow f_regression to accept a sparse matrix with centering [MRG + 2] Allow f_regression to accept a sparse matrix with centering Dec 19, 2016
@raghavrv
Copy link
Member

This needs a whatsnew entry as observed by @agramfort ...

@TomDLT TomDLT merged commit 456fb56 into scikit-learn:master Dec 20, 2016
@TomDLT
Copy link
Member

TomDLT commented Dec 20, 2016

Thanks @acadiansith

sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
…scikit-learn#8065)

* Updated centering for f_regression

Allows f_regression to accept a sparse matrix when centering=True.

* Fixed E226 spacing issue.

* Added f_regression sparse update to whats_new.rst
@Przemo10 Przemo10 mentioned this pull request Mar 17, 2017
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
…scikit-learn#8065)

* Updated centering for f_regression

Allows f_regression to accept a sparse matrix when centering=True.

* Fixed E226 spacing issue.

* Added f_regression sparse update to whats_new.rst
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
…scikit-learn#8065)

* Updated centering for f_regression

Allows f_regression to accept a sparse matrix when centering=True.

* Fixed E226 spacing issue.

* Added f_regression sparse update to whats_new.rst
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…scikit-learn#8065)

* Updated centering for f_regression

Allows f_regression to accept a sparse matrix when centering=True.

* Fixed E226 spacing issue.

* Added f_regression sparse update to whats_new.rst
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…scikit-learn#8065)

* Updated centering for f_regression

Allows f_regression to accept a sparse matrix when centering=True.

* Fixed E226 spacing issue.

* Added f_regression sparse update to whats_new.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants