Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT:lstsq: Switch to tranposed problem if the array is tall #12206

Merged
merged 2 commits into from Jun 9, 2020

Conversation

ilayn
Copy link
Member

@ilayn ilayn commented May 24, 2020

Reference issue

Closes #12196

What does this implement/fix?

The pseudo-inverse solution via scipy.linalg.pinv is through forming a least-squares problem AX=B where X is the pseudo-inverse of A and B is identity. When A is tall, necessarily B assumes the row number of A and is equally tall. However due to the properties of pseudo-inverse, one can also solve for A.T X.T = B and in this case B will be substantially smaller if A is substantially taller. This PR enables this behavior via an internal trans flag.

The switching point for assuming tallness is when row number is higher than column number by 10 percent. It is kind of arbitrary but derived from some numerical experiments. I would appreciate a more rigorous derivation of the switch point.

@ilayn ilayn added scipy.linalg maintenance Items related to regular maintenance tasks labels May 24, 2020
@ilayn ilayn added this to the 1.5.0 milestone May 24, 2020
@ilayn
Copy link
Member Author

ilayn commented May 25, 2020

pinging @sturlamolden and others. It is a relatively simple fix but a second pair of eyes won't hurt.

@ev-br
Copy link
Member

ev-br commented May 26, 2020

Looks good

@sturlamolden
Copy link
Contributor

It needs a testcase.

@ilayn
Copy link
Member Author

ilayn commented May 26, 2020

Hmm I don't know how to test this actually since behavior is not changed. Any ideas anyone?

@sturlamolden
Copy link
Contributor

Just compute the pinv for two arrays. Let they be above and below the threshold for transposition. Then compare with some other means of computing pinv, like

cho_solve(cho_factor(A.T@A),np.eye(p))@A.T

@sturlamolden
Copy link
Contributor

sturlamolden commented May 26, 2020

And since you are returning a transpose, it might also be possible to see if this happened by looking at the .flags and .strides for the returned pinv array.

@tylerjereddy tylerjereddy modified the milestones: 1.5.0, 1.6.0 May 27, 2020
@miladsade96 miladsade96 self-requested a review May 28, 2020 12:28
Copy link
Member

@miladsade96 miladsade96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thanks @ilayn fot this PR

@sturlamolden
Copy link
Contributor

LGTM

@tylerjereddy
Copy link
Contributor

Reviews are positive here, CI is green, no old tests were modified, just a new test added, so merging.

Thanks @ilayn and reviewers.

@tylerjereddy tylerjereddy merged commit 99b8660 into scipy:master Jun 9, 2020
@ilayn ilayn deleted the tall_pinv branch June 9, 2020 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.linalg
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: scipy.linalg.pinv is very slow compared to numpy.linalg.pinv
5 participants