-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Fixed rerr computation in lobpcg #152789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed rerr computation in lobpcg #152789
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152789
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 0ca3bd9 with merge base 56879f6 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Didn't find following labels among repository labels: release notes: bug fix |
@pytorchbot label "module: linear algebra" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Can you add a regression test?
A few comments:
python test_linalg.py TestLinalgCPU.test_lobpcg_ortho_cpu_float64
Mismatched elements: 2 / 3000 (0.1%)
Greatest absolute difference: 0.0005147071740729814 at index (1, 2, 46, 0) (up to 0.0005 allowed)
Greatest relative difference: 6.17151121231244e-05 at index (1, 2, 46, 0) (up to 0 allowed)
python test_linalg.py TestLinalgCUDA.test_lobpcg_ortho_cuda_float64
Mismatched elements: 1 / 108 (0.9%)
Greatest absolute difference: 0.0006575654327463099 at index (0, 1, 2, 0) (up to 0.0005 allowed)
Greatest relative difference: 3.5065780611903477e-05 at index (0, 1, 2, 0) (up to 0 allowed) I have also added a check in [1] This is actually a weird one – using python test_linalg.py TestLinalgCUDA.test_lobpcg_ortho_cuda_float64
Mismatched elements: 2 / 3000 (0.1%)
Greatest absolute difference: 0.0007664745520128396 at index (1, 2, 95, 0) (up to 0.0005 allowed)
Greatest relative difference: 0.39717887328301443 at index (1, 2, 25, 0) (up to 0 allowed) Mainly, the relative error is non-negligible. The difference between the two is just that the former uses Gaussian sampling, while the latter uses Uniform sampling in [-9, 9] (as defined in make_tensor). I tried changing the scale of the Gaussian sampler in I am not sure what is going on there – if it's an issue with one of the sampling schemes, or with the LOBPCG implementation. For now, I skipped on changing the sampling to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a minor nit, otherwise LGTM!
Thanks, @ignasa007!
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
5f77962
to
6ed38f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use random_symmetric_pd_matrix
here. A gaussian matrix has rather well behaved eigenvalues, while a matrix with uniform values on [-9,9] may have very large eigenvalues really. I think that's the issue you are seeing.
Mm, alright. Should I add a comment warning against changing the sampling scheme, and/or removing the |
Changing the sampling scheme is OT for this PR, so I suggest to use the original sampling function per @lezcano suggestion. Re the TODO item: I believe it was created with the assumption that using |
To make the PR ready to land, please address the lint failures. |
From what I understand, the 11 errors seem to come from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a suggestion to fix CI failures and I think we still miss (read: it is not obvious that we have) the regression tests that reproduce the original issue.
The faulty SciPy version check – |
Use
as it is used elsewhere in pytorch/test/. |
d4d3059
to
27e380c
Compare
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks, @ignasa007!
@lezcano @nikitaved if the PR looks good to you, could you trigger the merging process? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Alright, thanks everyone for guiding me through my first contribution to open-source! :) |
Congrats, @ignasa007 ! Hope you had fun! :) |
Sure did! Looking forward to more collaborations :') |
Fixes #101075
This PR fixes an issue with the computation of residuals in the LOBPCG algorithm.
Bug: Line 788 is supposed to compute the denominator in Equation 9 of Duersch et al., 2018, as also suggested in line 776, but it uses the raw eigenvalue-estimates instead of their absolute values.
Consequence: This made the algorithm's success sensitive to initialization of eigenvectors.
Tests:
Let me know if further test cases or benchmarks are needed.
cc @jianyuh @nikitaved @mruberry @walterddr @xwang233 @lezcano