Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul SVDS input validation and test suite #20

Merged
merged 35 commits into from
Jun 29, 2021
Merged

Conversation

mdhaber
Copy link
Collaborator

@mdhaber mdhaber commented Jun 27, 2021

Some work is not complete:

  • I haven't looked closely at the accuracy tests Update: I confirmed that the basic test makes sense. I refactored it to cut lines, add cases, and strengthen the checks. I haven't looked carefully at test_svd_linop. I can't tell if it's an important addition or not, so rather than improving it or cutting it, I just left it alone. One thing that I think is missing is an accuracy test for a real-world, sparse matrix, but actually that is already in the svdp test suite, so I don't think I'll add it here.
  • I haven't looked closely at the edge cases Update: I confirmed that the edge case tests make sense and refactored them to cut lines. Note that we're still not perfect here, especially with PROPACK.
  • There is no support for the PROPACK-specific options. Update: save this for follow-up work.

There are a few kinks to work out:

  • The meaning of maxiter for PROPACK should be investigated more carefully. I think you had hooked it up to maxiter of svdp, but that is only used if irl_mode is True, so it seemed more appropriate to hook it up to kmax which is described as "Maximal number of iterations...". However, there is a requirement that kmax >= k that is more stringent than the usual requirements on maxiter. If we want to keep it this way, we should adjust the input validation test for solver='propack'. But I don't really know what's going on under the hood, so we should talk about this.
  • For arpack and lobpcg, v0 is supposed to be of length min(A.shape). This is not the case for PROPACK, and this is causing failure of one of the IV tests and also test_svd_linop. I think we can change the documentation and IV for PROPACK and this will fix the tests. Update: fixed.
  • The behavior of return_singular_value is really restrictive for PROPACK; it doesn't care about the shape of the matrix when you tell it to not calculate some of the eigenvectors. Currently I am enforcing the shape restrictions of ARPACK/LOBPCG, but that should be reconsidered.Update: PROPACK now respects the return_singular_value option regardless of matrix shape.
  • PROPACK is inaccurate when return_singular_vectors is FALSE in one test case. Weird. Update: this was fixed by changing the random state.
  • PROPACK is failing one of the edge case tests. I think you noticed this before. I think I looked into it a bit and it might not be appropriate to raise an error here. If I remember correctly
    • there is only one singular vector/value, and PROPACK finds it.
    • When the remaining singular values are zero, it doesn't really attempt to find orthogonal singular vectors because they are not unique. I think the test checks for orthogonality of the singular vectors, so even when I commented out the error, PROPACK failed the test - but I think the test can be relaxed.
  • The documentation of return_singular_value did not reflect the behavior of ARPACK and LOBPCG after all. I think the inequalities on the shapes need to switch which is strict and which isn't to reflect the actual behavior.
  • AFAIK propack can handle k=min(A.shape); it does not need to be a strict inequality. This requirement of the IV should be relaxed. Update: done.
  • I added random_state keyword support for PROPACK; I need to include it for the others. Update: added for the others, but peculiarities of existing behavior needs to be documented (or we should drop the requirement that the outputs must be exactly the same as they were before.)
  • I've found a bunch of accuracy issues in LOBPCG in informal testing.
  • I should probably open an issue about LOBPCG's treatment of the v0 parameter.

I am continually tempted to abandon svds and just expose PROPACK in a function with the interface that PROPACK wants. But I figure I'll do the harder thing first - force PROPACK into SVDS - and then splitting it into its own function, if desired, would be very easy.

There was never a guarantee that solver=None was supported;
solver='arpack' has always been the default value. No need
to carry around this baggage.
If specified, must satistify ``k + 1 < ncv < N``; ``ncv > 2*k`` is
recommended.
If specified, must satistify ``k + 1 < ncv < min(M, N)``; ``ncv > 2*k``
is recommended.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed to be wrong. I let's double check master before changing this.

if k <= 0 or k >= min(n, m):
raise ValueError("k must be between 1 and min(A.shape), k=%d" % k)

if isinstance(A, LinearOperator):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just converted everything to a LinearOperator in the input validation so we can get rid of this branching.

@@ -1,8 +1,10 @@
import re
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up basically rewriting all this. There were just a lot of holes, and it didn't make sense to leave the existing tests and add new ones that were more comprehensive.

@@ -823,4 +823,5 @@ def aslinearoperator(A):
rmatmat=rmatmat, dtype=dtype)

else:
raise TypeError('type not understood')
A = np.atleast_2d(np.asarray(A))
return MatrixLinearOperator(A)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdhaber mdhaber changed the title Overhaul PROPACK input validation and test suite Overhaul SVDS input validation and test suite Jun 27, 2021

# input validation/standardization for `maxiter`
if maxiter is not None and (int(maxiter) != maxiter or maxiter <= 0):
message = "`maxiter` must be a non-negative integer."
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"must be a positive integer"


largest = (which == 'LM')
n, m = A.shape
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this where svds doesn't match docs? I would have expected m, n = A.shape

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, surprisingly. Despite flipping n, m between code and documentation, everything is right except for the inequality strictness being swapped in the documentation.

@mdhaber
Copy link
Collaborator Author

mdhaber commented Jun 29, 2021

I think I'm done with this PR. With these changes and any other cleanups you want to do, I think we're about ready to open a PR on the main repo. When we do, there are a few things I'd like to bring to the attention of reviewers there:

  • To keep things simple, I decided not to expose any PROPACK-specific options yet. We proposed to add a wrapper for PROPACK; we've done that and much more. There are additional things that would be nice to have, but I don't think that should stop merging a PR that is already huge.
  • PROPACK doesn't return orthogonal singular vectors corresponding with zero singular values. This is basically the same as svds returning nans for zero input matrix scipy/scipy#3452 reported for ARPACK. I see now why you questioned how in svds you initially just returned the results of svdp without looking for small singular values. At the time, I didn't know PROPACK would have this problem, so it seemed reasonable to just return the results given by PROPACK. But yes, it would be nice to re-use some of that code that ensures that zeros out sufficiently small eigenvalues and makes sure that corresponding entries of u and vh are orthogonal. I don't think this is essential for the initial PR, but a reviewer might disagree.
  • Reviewers should think carefully about how we call svdp from svds - that is, how the keywords passed to svds map to PROPACK arguments. We've had to make some decisions about that and how to do the input validation. I think it's reasonable, but it's all stuff that could use another set of eyes.
  • Reviewers should note how the random_state parameter is implemented. Did I need to be so careful to avoid changing the default behavior, or should the default be like other functions with a random_state keyword and default to using the np.RandomState singleton? Also, it was intentional to use random_state, supporting the old np.RandomState, which we technically don't need to do anymore. I understand that use of the singleton is not considered good practice when you care about randomness, but I don't think we do care about randomness, and I think it is more convenient to set np.random.seed once to get repeatable results than it is to pass a random seed into every single function call.

@mckib2 mckib2 merged commit 33d8c1b into mckib2:propack Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants