Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] LOBPCG Solver #2772

Closed
wants to merge 11 commits into from
Closed

Conversation

venkywonka
Copy link
Contributor

@venkywonka venkywonka commented Aug 28, 2020

  • This is a WIP initial implementation of the locally optimal block preconditioned conjugate gradient solver in CuPy
  • Used to solve for the largest/smallest K eigen pairs of a generalized eigen value problem more info

@venkywonka venkywonka requested a review from a team as a code owner August 28, 2020 11:56
@GPUtester
Copy link
Contributor

Can one of the admins verify this patch?

1 similar comment
@GPUtester
Copy link
Contributor

Can one of the admins verify this patch?

@JohnZed
Copy link
Contributor

JohnZed commented Aug 28, 2020

add to whitelist

@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@cjnolet cjnolet added 2 - In Progress Currenty a work in progress Cython / Python Cython or Python issue New Prim For tracking new prims that will be added to our existing collection labels Sep 2, 2020
@cjnolet
Copy link
Member

cjnolet commented Sep 4, 2020

@venkywonka, I’m leaving a note to let you know that I have been reviewing this. Can you provide some tests for your implementation as well? Pytests against scipy would be great.

Are we planning to also build a prim in RAFT for this? Do you guys think there’s a lot of performance to be gained by going closer to CUDA?

@venkywonka
Copy link
Contributor Author

@venkywonka, I’m leaving a note to let you know that I have been reviewing this. Can you provide some tests for your implementation as well? Pytests against scipy would be great.

Are we planning to also build a prim in RAFT for this? Do you guys think there’s a lot of performance to be gained by going closer to CUDA?

sure, thanks @cjnolet , will update the tests soon... from what I know, cupy is not supported in RAFT so @teju85 asked me to implement first in cuml. Not too sure by how much there will be a performance improvement moving to CUDA C/C++ but there should definitely be some...

@teju85
Copy link
Member

teju85 commented Sep 4, 2020

Hi @cjnolet,
Venkat still has not done an extensive benchmarking to know how much perf we are leaving on the table. Based on that, we can then take a call whether to stick with the current cupy implementation or migrate to RAFT. What say?

@cjnolet
Copy link
Member

cjnolet commented Sep 8, 2020

@teju85 I think a call is a good idea, but I think it will be a better discussion after we have prototyped a TSVD/PCA implementation with it in Python so we can better measure end-to-end performance. What do you think?

* tests for blopex lobpcg (the algorithm used in scipy's implementation)
comparing with scipy's implementation
* updated CHANGELOG.md
* updated init.py in `cuml/python/cuml/`
@venkywonka
Copy link
Contributor Author

rerun tests

@venkywonka
Copy link
Contributor Author

rerun tests

@teju85
Copy link
Member

teju85 commented Sep 26, 2020

Tagging @vinaydes to help review this PR, once our 0.16 deliverables are done.

@cjnolet
Copy link
Member

cjnolet commented Oct 20, 2020

@venkywonka @teju85, @anaruse has been building implementations of scipy.sparse eigsh and svds for cupy. Since lobpcg is also an option in svds, what do you say we contribute this implementation directly to cupy?

Ref: cupy/cupy#4155

@cjnolet cjnolet closed this Oct 20, 2020
@cjnolet cjnolet reopened this Oct 21, 2020
@cjnolet
Copy link
Member

cjnolet commented Oct 21, 2020

My apologies, I did not mean to close this PR.

@cjnolet
Copy link
Member

cjnolet commented Oct 21, 2020

rerun tests

@venkywonka venkywonka changed the base branch from branch-0.16 to branch-0.17 October 22, 2020 09:36
@venkywonka venkywonka requested review from a team as code owners October 22, 2020 09:36
@teju85
Copy link
Member

teju85 commented Oct 22, 2020

@cjnolet overall I think it is a good idea to contribute this to cupy repo, instead. The only unknown here is what would happen if we later want to move this inside raft as a C++ prim? Will we then have a cupy-based lobpcg and raft-based lobpcg solvers duplicated?

@ajschmidt8 ajschmidt8 removed the request for review from a team October 23, 2020 15:01
@ajschmidt8
Copy link
Member

Removing ops-codeowners from the required reviews since it doesn't seem there are any file changes that we're responsible for. Feel free to add us back if necessary.

@cjnolet
Copy link
Member

cjnolet commented Oct 28, 2020

The only unknown here is what would happen if we later want to move this inside raft as a C++ prim? Will we then have a cupy-based lobpcg and raft-based lobpcg solvers duplicated?

@teju85, ideally, if we ended up with an LOBPCG prim inside RAFT then both cuml and cupy could use that instead. I've been hoping that at some point (when RAFT is mature enough) we can have Cupy just use RAFT for cases like this. There are some graph-based prims that could be used in cupy as well. This would allow us to share the same prims across multiple projects in the ecosystem, which is the goal of RAFT. What do you think?

@cjnolet
Copy link
Member

cjnolet commented Oct 28, 2020

@venkywonka, I'm continuing the discussion from cupy/cupy#2359 (comment) here. One of the requirements we have for sparse PCA (from the links provided in the referenced issue) is that we need the ability to mean center very sparse matrices but cannot just subtract the mean directly because doing so would convert the sparse matrix into a dense matrix. The "implicit mean centering trick" performs the mean centering in the optimizer using closures (the LinearOperator API) that are evaluated during optimization. Is there a way to accomplish this with the current implementation?

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really excited to contribute this implementation to Cupy and have it available for cuML! As mentioned in the review, my primary concern is that we're able to provide a lazy custom linear operator so that we can implement the PCA implicit mean centering.

I do sincerely apologize that it took a little while to get to this review but I've done a first-pass for cuML's needs (and initial prep for Cupy contribution.)

"""


def is_sparse(A):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would cupyx.scipy.sparse.isspmatrix work for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it should @cjnolet . This is a helper script that is exclusively for the robust_lobpcg (algo used in pytorch). This should be pruned along with robust_lobpcg for now while our focus is on supporting scipy's implementation.

arr_cols = cp.array([])
for col in row: #col is a cupy ndarray
if col.ndim == 0:
col = cp.array([col])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might be able to achieve this with cp.newaxis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update accordingly 👍

cp_error_flag = False
scipy_error_flag = False

A = np.random.rand(n, n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solver is of particular interest to us on cuML because of 1) its ability find the eigenvalues when given a sparse input and 2) its ability to accept a custom linear operation that we can use to perform mean centering at the vector level. It's important that we test sparse inputs here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, in scipy, linearOperator abstracts sparsity/density from the operations performed. Will add tests for sparse inputs as soon as I add support for linearOperator (and subsequently, sparsity)

# TODO: debug and run tests for method ortho and basic
# currently they aren't that accurate
@pytest.mark.parametrize("method", [
# "ortho",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about pulling the robust_lobpcg out of this initial implementation to start, then adding them back in when they are able to be tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking the same. Since it was precedent to blopex_lobpcg, I had it dangling around. Will clean it up! 👍

])
@pytest.mark.parametrize("maxiter", [
20,
# 200, #Not practically required
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these values just taking too long to run? It would be helpful to try a couple different values here, just to verify correctness. I could understand something less than 20 might lower the chances for convergence, but perhaps a couple values >20, but maybe less than an order of magnitude?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in fact they were... scipy set it's cap on maxiter (if unspecified) to 20 (see this line). That was the basis on which I decided that higher orders of magnitude were unnecessary.
But yes, you're definitely right; will add a few more of them >20 👍

print('order-updated lambda:', _lambda)

# # Normalize eigenvectors!
# aux = cp.sum( eigBlockVector.conj() * eigBlockVector, 0 )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these no longer needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an artifact from the original scipy code 😅 They aren't needed I suppose. For some reason they have it in the scipy code-base. Will prune up!👍

def _applyConstraints(blockVectorV, YBY, blockVectorBY, blockVectorY):
"""Changes blockVectorV in place."""
YBV = cp.dot(blockVectorBY.T.conj(), blockVectorV)
tmp = solve(YBY, YBV)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daxiongshu is in the process of contributing cho_solve to cupy, if it helps here: cupy/cupy#4172

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @cjnolet ! That should definitely help 👍

aux.shape = (ar.shape[0], 1)
return aux


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to include and use _makeoperator in here as well? It looks like this version follows the Scipy version pretty closely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes corey, will have to basically implement this in cupy. (This will also take us a step closer in completing this comparison table!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the time of writing it, sparsity was not as much of a priority as the feature itself; so was prioritizing the functionality for dense input. I think it's time I prioritize sparsity ✌️

definite (SPD) generalized eigenproblems.
Parameters
----------
A : {cupy.ndarray}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are able to support cupyx.scipy.sparse arrays here too, right?

Copy link
Contributor Author

@venkywonka venkywonka Oct 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically yes, but linearOperator abstracts this out and enables matmuls in scipy. The same cannot be done in cupy (as of now) so one has to densify, then do a matmul. Once linearOperator support is added (which is next on the agenda), cupyx.scipy.sparse arrays will get automatically supported.

# 2000
])
@pytest.mark.parametrize("isB", [True, False])
def test_lobpcg(dtype, k, n, method, largest, maxiter, isB):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the blopex implementation follows so closely to the Scipy implementation, it would be helpful to also include many of their tests directly: https://github.com/scipy/scipy/blob/v1.5.3/scipy/sparse/linalg/eigen/lobpcg/tests/test_lobpcg.py. This will also help verify that not only does the implementation provide the proper eigenvalues, but all the potential inputs that we support are handled properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @cjnolet , those are some very elaborate tests that are definitely missing here! will add them 👍

@teju85
Copy link
Member

teju85 commented Oct 29, 2020

@teju85, ideally, if we ended up with an LOBPCG prim inside RAFT then both cuml and cupy could use that instead. I've been hoping that at some point (when RAFT is mature enough) we can have Cupy just use RAFT for cases like this. There are some graph-based prims that could be used in cupy as well. This would allow us to share the same prims across multiple projects in the ecosystem, which is the goal of RAFT. What do you think?

I like this idea, @cjnolet . I also agree that this primitive is better-off in cupy-land, for now. Let's continue with our cupy-based implementation and in case we need more perf and/or we need this operation in cuml-C++, then think about moving it to raft and updating cupy to use raft.

@teju85
Copy link
Member

teju85 commented Oct 29, 2020

Initially, when Venkat started, we did have plans to support LinearOperator API over the course of time. @vinaydes can you comment on what the plans are for supporting LinearOperator API? Any ETAs?

@cjnolet
Copy link
Member

cjnolet commented Nov 18, 2020

Closing this PR because this feature is being implemented directly in cupy.

@cjnolet cjnolet closed this Nov 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currenty a work in progress Cython / Python Cython or Python issue New Prim For tracking new prims that will be added to our existing collection
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants