Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added jvp rule for eigh, tests #358

Merged
merged 4 commits into from Feb 14, 2019
Merged

added jvp rule for eigh, tests #358

merged 4 commits into from Feb 14, 2019

Conversation

levskaya
Copy link
Collaborator

This adds the jvp rule for the symmetric eigendecomposition eigh operator. I followed Matthew's approach with the cholesky op of symmetrizing the input argument. I'd be happy to change that up if it's not how we want to deal with symmetry-bound operators.

@levskaya
Copy link
Collaborator Author

levskaya commented Feb 12, 2019

😠sorry, all these tests pass on our internal testing suite, will investigate what's going on

@hawkinsp
Copy link
Member

A couple of things to watch out for that might explain different results on the CI builder:

  • we randomly sample tests to run, since we have so many. Different environments might have different RNG behavior and run a different subset. You can run a larger sample by setting the environment variable JAX_NUM_GENERATED_CASES=1000 (default is 10), and it's usually worth doing this when adding new tests.

  • the random initialization of tensors will also likely differ across machines/numpy versions.

@levskaya
Copy link
Collaborator Author

Yeah the tests probably weren't being run with enough examples initially. There's definitely a real problem with a subset of complex matrices here, though I'm a little baffled by what's going on - will try to hunt down the root cause soon.

@levskaya
Copy link
Collaborator Author

After trying to solve it again I finally decided to think.

The gradient of the complex eigenvector coordinates is ill-posed - every column of the eigenvector matrix has a complex degree of freedom (2*n real degrees of freedom for a nxn hermitian problem). e.g. onp.eig/onp.eigh will produce radically different coordinates given the differences in phase conventions in the underlying algorithms. Unless you were to backprop through the actual solver algorithms you're not going to match the particular coordinate gradients accurately due to all the irrelevant degrees of freedom of the problem. This is fine, it just means a simple numerical gradient check will fail.

What is true and can be tested is that (v+dv) are close to being true eigenvectors of the new eigenvalues of (A+dA). I suppose I should write a different sort of unit test for this case.

@levskaya
Copy link
Collaborator Author

Ok, there's a first version of eigh grad. Just let me know if anything is lacking. (Sometime later I should review the degenerate theory to see how hard it would be to make the grad safer for degeneracy, at least for the common case where the perturbation naturally splits the degeneracy...)

@mattjj
Copy link
Member

mattjj commented Feb 14, 2019

Great thinking! That makes a lot of sense; why didn't we realize that before!

These are some pretty deep cuts into numerical linear algebra, and fantastic contributions. I'll take a look over the code now.

@mattjj mattjj self-requested a review February 14, 2019 16:43
Copy link
Member

@mattjj mattjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beautiful. Seriously, amazing comments with references. I learned some things about the mathematics of differentiating eigh (and eigh algorithms).

@mattjj mattjj merged commit fc4c8bd into google:master Feb 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants