Fixed rerr computation in lobpcg #152789

ignasa007 · 2025-05-04T21:36:22Z

This PR fixes an issue with the computation of residuals in the LOBPCG algorithm.

Bug: Line 788 is supposed to compute the denominator in Equation 9 of Duersch et al., 2018, as also suggested in line 776, but it uses the raw eigenvalue-estimates instead of their absolute values.

Consequence: This made the algorithm's success sensitive to initialization of eigenvectors.

Tests:

I have tested @jtorde's script, and I did NOT run into any assertion errors for a few minutes (as opposed to the original implementation, which fails after a few seconds).
I have also tried @pearu's specific test case, which also executes successfully - the residuals remain positive, and the final output is the same as one returned by SciPy (with and without enforcing the use of LOBPCG).
I extracted the relevant test cases from test/test_autograd.py and test/test_linalg.py, and they ran successfully.

Let me know if further test cases or benchmarks are needed.

cc @jianyuh @nikitaved @mruberry @walterddr @xwang233 @lezcano

pytorch-bot · 2025-05-04T21:36:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152789

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0ca3bd9 with merge base 56879f6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2025-05-04T21:36:26Z

The committers listed above are authorized under a signed CLA.

✅ login: ignasa007 / name: Jasraj Singh (59defff, 447675d, 57f20bb, 0ca3bd9)

pytorch-bot · 2025-05-04T21:53:11Z

Didn't find following labels among repository labels: release notes: bug fix

ignasa007 · 2025-05-04T22:04:58Z

@pytorchbot label "module: linear algebra"

lezcano

Fair enough. Can you add a regression test?

torch/_lobpcg.py

ignasa007 · 2025-05-05T19:31:41Z

A few comments:

I think the bug was not being picked up because A and B were being sampled as SPD matrices, which restricts the eigenvalues to be positive. The problem arises when the eigenvalue estimates are negative which are somehow not being encountered.¹
Moreover, A only needs to be symmetric in SPD generalized eigenvalue problems. By changing the tests to use symmetric A, the tests immediately pick on the error in the original implementation – 4 out of 5 tests fail, each with 100% mismatched elements.
After updating the implementation, 2 tests fail, but I think that's due to imprecise numerics:

python test_linalg.py TestLinalgCPU.test_lobpcg_ortho_cpu_float64
Mismatched elements: 2 / 3000 (0.1%)
Greatest absolute difference: 0.0005147071740729814 at index (1, 2, 46, 0) (up to 0.0005 allowed)
Greatest relative difference: 6.17151121231244e-05 at index (1, 2, 46, 0) (up to 0 allowed)

python test_linalg.py TestLinalgCUDA.test_lobpcg_ortho_cuda_float64
Mismatched elements: 1 / 108 (0.9%)
Greatest absolute difference: 0.0006575654327463099 at index (0, 1, 2, 0) (up to 0.0005 allowed)
Greatest relative difference: 3.5065780611903477e-05 at index (0, 1, 2, 0) (up to 0 allowed)

I have also added a check in test_linalg.py for residuals to stay non-negative over the iterations, which the previous implementation was failing to satisfy in 4 of the 5 tests.

[1] This is actually a weird one – using random_symmetric_pd_matrix for sampling A and B does not fail any tests. However, replacing both with make_symmetric_pd_matrices – as suggested – fails the following test:

python test_linalg.py TestLinalgCUDA.test_lobpcg_ortho_cuda_float64
Mismatched elements: 2 / 3000 (0.1%)
Greatest absolute difference: 0.0007664745520128396 at index (1, 2, 95, 0) (up to 0.0005 allowed)
Greatest relative difference: 0.39717887328301443 at index (1, 2, 25, 0) (up to 0 allowed)

Mainly, the relative error is non-negligible. The difference between the two is just that the former uses Gaussian sampling, while the latter uses Uniform sampling in [-9, 9] (as defined in make_tensor). I tried changing the scale of the Gaussian sampler in random_symmetric_pd_matrix to 9, but the test still passes.

I am not sure what is going on there – if it's an issue with one of the sampling schemes, or with the LOBPCG implementation. For now, I skipped on changing the sampling to make_symmetric_pd_matrices.

pearu

I have a minor nit, otherwise LGTM!

Thanks, @ignasa007!

torch/_lobpcg.py

pearu · 2025-05-06T07:05:07Z

@pytorchbot rebase

pytorchmergebot · 2025-05-06T07:06:41Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-05-06T07:06:45Z

Successfully rebased fix-lobpcg-rerr onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout fix-lobpcg-rerr && git pull --rebase)

lezcano

Better to use random_symmetric_pd_matrix here. A gaussian matrix has rather well behaved eigenvalues, while a matrix with uniform values on [-9,9] may have very large eigenvalues really. I think that's the issue you are seeing.

ignasa007 · 2025-05-06T08:04:04Z

Mm, alright. Should I add a comment warning against changing the sampling scheme, and/or removing the random_symmetric_pd_matrix function altogether, as is suggested in the TODO?

pearu · 2025-05-06T08:50:46Z

Mm, alright. Should I add a comment warning against changing the sampling scheme, and/or removing the random_symmetric_pd_matrix function altogether, as is suggested in the TODO?

Changing the sampling scheme is OT for this PR, so I suggest to use the original sampling function per @lezcano suggestion.

Re the TODO item: I believe it was created with the assumption that using make_symmetric_pd_matrices instead of random_symmetric_pd_matrix (as both have the same goals) will not cause accuracy problems as seen here. The problem here indicates that make_tensor or make_symmetric_pd_matrices ought to be generalized to support selecting random distribution, or similar, to enable removing the TODO item. I suggest leave it as a follow-up task.

pearu · 2025-05-06T08:51:53Z

To make the PR ready to land, please address the lint failures.

ignasa007 · 2025-05-06T19:10:00Z

From what I understand, the 11 errors seem to come from test/test_linalg.py, and are to do with numerical imprecision – they pass with atol=2e-3, rtol=2e-3 for generalized eigenvalue problem with smallest eigenvalues.

pearu

I have a suggestion to fix CI failures and I think we still miss (read: it is not obvious that we have) the regression tests that reproduce the original issue.

test/test_linalg.py

ignasa007 · 2025-05-07T09:00:44Z

The faulty SciPy version check – scipy.__version__ < '1.4.1' – is keeping the test_linalg.py script from testing against the SciPy implementation (eg., mine skips the test because '1.13.1' < '1.4.1'). A simple fix is from packaging.version import Version and checking Version(scipy.__version__) < Version('1.4.1'). I can push a commit for this, along with the regression test.

pearu · 2025-05-07T09:39:16Z

The faulty SciPy version check – scipy.__version__ < '1.4.1' – is keeping the test_linalg.py script from testing against the SciPy implementation (eg., mine skips the test because '1.13.1' < '1.4.1'). A simple fix is from packaging.version import Version and checking Version(scipy.__version__) < Version('1.4.1'). I can push a commit for this, along with the regression test.

Use

version.parse(scipy.__version__) < version.parse("1.4.1")

as it is used elsewhere in pytorch/test/.

test/test_linalg.py

ignasa007 · 2025-05-07T10:52:44Z

Use
version.parse(scipy.__version__) < version.parse("1.4.1")
as it is used elsewhere in pytorch/test/.

Done!

pearu

LGTM! Thanks, @ignasa007!

pearu · 2025-05-08T09:41:13Z

@lezcano @nikitaved if the PR looks good to you, could you trigger the merging process?

lezcano

LGTM. Thank you!

lezcano · 2025-05-08T09:47:13Z

@pytorchbot merge

pytorchmergebot · 2025-05-08T09:49:11Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

ignasa007 · 2025-05-08T12:25:46Z

Alright, thanks everyone for guiding me through my first contribution to open-source! :)

nikitaved · 2025-05-08T13:49:37Z

Congrats, @ignasa007 ! Hope you had fun! :)

ignasa007 · 2025-05-08T14:24:18Z

Sure did! Looking forward to more collaborations :')

ignasa007 mentioned this pull request May 4, 2025

torch.lobpcg producing different largest eigenvalue than scipy and np.linalg.eig #101075

Closed

pytorchbot added the open source label May 4, 2025

pytorch-bot bot added the module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul label May 4, 2025

lezcano reviewed May 5, 2025

View reviewed changes

torch/_lobpcg.py Outdated Show resolved Hide resolved

lezcano removed the module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul label May 5, 2025

pytorch-bot bot added the release notes: linalg_frontend release notes category label May 5, 2025

ignasa007 force-pushed the fix-lobpcg-rerr branch from e26f9c0 to a498da3 Compare May 5, 2025 19:35

ignasa007 marked this pull request as ready for review May 5, 2025 19:36

ignasa007 requested review from nikitaved and IvanYashchuk as code owners May 5, 2025 19:36

pearu approved these changes May 5, 2025

View reviewed changes

torch/_lobpcg.py Outdated Show resolved Hide resolved

pytorchmergebot force-pushed the fix-lobpcg-rerr branch from 5f77962 to 6ed38f2 Compare May 6, 2025 07:06

lezcano approved these changes May 6, 2025

View reviewed changes

ignasa007 force-pushed the fix-lobpcg-rerr branch from 6ed38f2 to 93374a4 Compare May 6, 2025 10:16

pearu requested changes May 7, 2025

View reviewed changes

test/test_linalg.py Outdated Show resolved Hide resolved

test/test_linalg.py Outdated Show resolved Hide resolved

ignasa007 force-pushed the fix-lobpcg-rerr branch from 93374a4 to 569d68b Compare May 7, 2025 09:38

pearu reviewed May 7, 2025

View reviewed changes

test/test_linalg.py Outdated Show resolved Hide resolved

ignasa007 force-pushed the fix-lobpcg-rerr branch 2 times, most recently from d4d3059 to 27e380c Compare May 7, 2025 10:44

Fixed residual computation to use torch.abs(eigenvalue)

57f20bb

ignasa007 force-pushed the fix-lobpcg-rerr branch from 27e380c to 1181acd Compare May 7, 2025 10:49

ignasa007 added 3 commits May 7, 2025 17:43

regression test for issue pytorch#101075

59defff

remove unnecessary .real from convergence check

447675d

fixed scipy version check

0ca3bd9

ignasa007 force-pushed the fix-lobpcg-rerr branch from 1181acd to 0ca3bd9 Compare May 7, 2025 16:44

pearu approved these changes May 7, 2025

View reviewed changes

pearu requested a review from lezcano May 7, 2025 19:28

lezcano approved these changes May 8, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 8, 2025

pytorchmergebot added the merging label May 8, 2025

pytorchmergebot added the Merged label May 8, 2025

pytorchmergebot closed this in 22c3104 May 8, 2025

pytorchmergebot removed the merging label May 8, 2025

ignasa007 deleted the fix-lobpcg-rerr branch July 2, 2025 22:21

Fixed rerr computation in lobpcg #152789

Fixed rerr computation in lobpcg #152789

Uh oh!

Conversation

ignasa007 commented May 4, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152789

✅ No Failures

Uh oh!

linux-foundation-easycla bot commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 4, 2025

Uh oh!

ignasa007 commented May 4, 2025

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ignasa007 commented May 5, 2025

Uh oh!

pearu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pearu commented May 6, 2025

Uh oh!

pytorchmergebot commented May 6, 2025

Uh oh!

pytorchmergebot commented May 6, 2025

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

ignasa007 commented May 6, 2025

Uh oh!

pearu commented May 6, 2025

Uh oh!

pearu commented May 6, 2025

Uh oh!

ignasa007 commented May 6, 2025

Uh oh!

pearu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ignasa007 commented May 7, 2025

Uh oh!

pearu commented May 7, 2025

Uh oh!

Uh oh!

ignasa007 commented May 7, 2025

Uh oh!

pearu left a comment

Choose a reason for hiding this comment

Uh oh!

pearu commented May 8, 2025

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

lezcano commented May 8, 2025

Uh oh!

pytorchmergebot commented May 8, 2025

Merge started

Uh oh!

ignasa007 commented May 8, 2025

Uh oh!

nikitaved commented May 8, 2025

Uh oh!

ignasa007 commented May 8, 2025

Uh oh!

Uh oh!

ignasa007 commented May 4, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented May 4, 2025 •

edited

Loading

linux-foundation-easycla bot commented May 4, 2025 •

edited

Loading