Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated derivative rules for complex svd and pinverse #47761

Closed
wants to merge 47 commits into from

Conversation

IvanYashchuk
Copy link
Collaborator

Updated svd_backward to work correctly for complex-valued inputs.
Updated common_methods_invocations.py to take dtype, device arguments for input construction.
Removed test_pinverse from test_autograd.py, it is replaced by entries to common_methods_invocations.py.
Added svd and pinverse to list of complex tests.

References for complex-valued SVD differentiation:

The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant.
https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/

The same rule is implemented in Tensorflow and BackwardsLinalg.jl.

Ref. #33152

@IvanYashchuk IvanYashchuk added module: complex Related to complex number support in PyTorch module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul complex_autograd labels Nov 11, 2020
@dr-ci
Copy link

dr-ci bot commented Nov 11, 2020

💊 CI failures summary and remediations

As of commit cb7590e (more details on the Dr. CI page):


  • 6/6 failures possibly* introduced in this PR
    • 1/6 non-CircleCI failure(s)

🕵️ 5 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (1/5)

Step: "Report results" (full log | diagnosis details | 🔁 rerun)

Dec 20 11:07:43 ModuleNotFoundError: No module named 'boto3'
Dec 20 11:07:43 + CIRCLE_BRANCH=pull/47761
Dec 20 11:07:43 + export CIRCLE_JOB=pytorch_linux_xenial_py3_6_gcc5_4_test
Dec 20 11:07:43 + CIRCLE_JOB=pytorch_linux_xenial_py3_6_gcc5_4_test
Dec 20 11:07:43 + export CIRCLE_WORKFLOW_ID=0275807c-f472-416b-b480-59f01926ccb8
Dec 20 11:07:43 + CIRCLE_WORKFLOW_ID=0275807c-f472-416b-b480-59f01926ccb8
Dec 20 11:07:43 + cd workspace
Dec 20 11:07:43 + python test/print_test_stats.py test
Dec 20 11:07:43 Traceback (most recent call last):
Dec 20 11:07:43   File "test/print_test_stats.py", line 12, in <module>
Dec 20 11:07:43     import boto3
Dec 20 11:07:43 ModuleNotFoundError: No module named 'boto3'


Exited with code exit status 1

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_test (2/5)

Step: "Report results" (full log | diagnosis details | 🔁 rerun)

Dec 20 10:43:08 ModuleNotFoundError: No module named 'boto3'
Dec 20 10:43:08 + CIRCLE_BRANCH=pull/47761
Dec 20 10:43:08 + export CIRCLE_JOB=pytorch_xla_linux_bionic_py3_6_clang9_test
Dec 20 10:43:08 + CIRCLE_JOB=pytorch_xla_linux_bionic_py3_6_clang9_test
Dec 20 10:43:08 + export CIRCLE_WORKFLOW_ID=0275807c-f472-416b-b480-59f01926ccb8
Dec 20 10:43:08 + CIRCLE_WORKFLOW_ID=0275807c-f472-416b-b480-59f01926ccb8
Dec 20 10:43:08 + cd workspace
Dec 20 10:43:08 + python test/print_test_stats.py test
Dec 20 10:43:08 Traceback (most recent call last):
Dec 20 10:43:08   File "test/print_test_stats.py", line 12, in <module>
Dec 20 10:43:08     import boto3
Dec 20 10:43:08 ModuleNotFoundError: No module named 'boto3'


Exited with code exit status 1

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_jit_legacy_test (3/5)

Step: "Report results" (full log | diagnosis details | 🔁 rerun)

Dec 20 09:41:26 ModuleNotFoundError: No module named 'boto3'
Dec 20 09:41:26 + CIRCLE_BRANCH=pull/47761
Dec 20 09:41:26 + export CIRCLE_JOB=pytorch_linux_xenial_py3_6_gcc5_4_jit_legacy_test
Dec 20 09:41:26 + CIRCLE_JOB=pytorch_linux_xenial_py3_6_gcc5_4_jit_legacy_test
Dec 20 09:41:26 + export CIRCLE_WORKFLOW_ID=0275807c-f472-416b-b480-59f01926ccb8
Dec 20 09:41:26 + CIRCLE_WORKFLOW_ID=0275807c-f472-416b-b480-59f01926ccb8
Dec 20 09:41:26 + cd workspace
Dec 20 09:41:26 + python test/print_test_stats.py test
Dec 20 09:41:26 Traceback (most recent call last):
Dec 20 09:41:26   File "test/print_test_stats.py", line 12, in <module>
Dec 20 09:41:26     import boto3
Dec 20 09:41:26 ModuleNotFoundError: No module named 'boto3'


Exited with code exit status 1

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (4/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Dec 20 10:35:54 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 20 10:35:54 At:
Dec 20 10:35:54   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 20 10:35:54   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 20 10:35:54 
Dec 20 10:35:54 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 20 10:35:54 
Dec 20 10:35:54 At:
Dec 20 10:35:54   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 20 10:35:54   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 20 10:35:54 
Dec 20 10:35:54 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 20 10:35:54 
Dec 20 10:35:54 At:
Dec 20 10:35:54   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 20 10:35:54   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 20 10:35:54 
Dec 20 10:35:54 [W tensorpipe_agent.cpp:547] RPC agent for worker2 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown)
Dec 20 10:35:54 [W tensorpipe_agent.cpp:547] RPC agent for worker0 encountered error when reading incoming request from worker3: EOF: end of file (this is expected to happen during shutdown)
Dec 20 10:35:55 ok (2.458s)
Dec 20 10:35:56   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:547] RPC agent for worker3 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown)
Dec 20 10:35:56 [W tensorpipe_agent.cpp:547] RPC agent for worker2 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown)

See CircleCI build pytorch_linux_backward_compatibility_check_test (5/5)

Step: "Report results" (full log | diagnosis details | 🔁 rerun)

Dec 20 09:38:59 ModuleNotFoundError: No module named 'boto3'
Dec 20 09:38:59 + CIRCLE_BRANCH=pull/47761
Dec 20 09:38:59 + export CIRCLE_JOB=pytorch_linux_backward_compatibility_check_test
Dec 20 09:38:59 + CIRCLE_JOB=pytorch_linux_backward_compatibility_check_test
Dec 20 09:38:59 + export CIRCLE_WORKFLOW_ID=0275807c-f472-416b-b480-59f01926ccb8
Dec 20 09:38:59 + CIRCLE_WORKFLOW_ID=0275807c-f472-416b-b480-59f01926ccb8
Dec 20 09:38:59 + cd workspace
Dec 20 09:38:59 + python test/print_test_stats.py test
Dec 20 09:38:59 Traceback (most recent call last):
Dec 20 09:38:59   File "test/print_test_stats.py", line 12, in <module>
Dec 20 09:38:59     import boto3
Dec 20 09:38:59 ModuleNotFoundError: No module named 'boto3'


Exited with code exit status 1


1 job timed out:

  • pytorch_linux_bionic_py3_8_gcc9_coverage_test2

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 70 times.

@ailzhang ailzhang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 11, 2020
@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by f5b68e7.

@mruberry
Copy link
Collaborator

Hey @IvanYashchuk, bad news / interesting news. I had to unland this because it caused a significant increase in test timings, like adding 30 minutes to some test builds. That's the bad news.

The interesting news is now we get to have a discussion about testing thoroughness vs. cost.

cc @malfet, @ngimel, @albanD, @kurtamohler

For example, here's a CI build after this PR:

https://app.circleci.com/pipelines/github/pytorch/pytorch/252033/workflows/b9605b69-1006-4c0e-a49f-e89f85a469ad/jobs/9675634

And here's the same build before it:

https://app.circleci.com/pipelines/github/pytorch/pytorch/251979/workflows/dcf1c55f-a041-497d-a1c0-11c0e8d153f6/jobs/9674037

Is this discrepancy really caused by this PR? Perhaps surprisingly: yes.

While most of the tests this PR adds run quickly, there are some standouts:

test_method_grad_svd_cuda_complex128 (__main__.TestGradientsCUDA) ... ok (95.033s)
test_method_grad_svd_cuda_float64 (__main__.TestGradientsCUDA) ... ok (47.357s)
test_method_gradgrad_svd_cuda_complex128 (__main__.TestGradientsCUDA) ... ok (294.281s)
test_method_gradgrad_svd_cuda_float64 (__main__.TestGradientsCUDA) ... ok (99.565s)
test_fn_gradgrad_svd_cuda_complex128 (__main__.TestGradientsCUDA) ... ok (307.492s)
test_fn_gradgrad_svd_cuda_float64 (__main__.TestGradientsCUDA) ... ok (104.823s)
test_fn_grad_svd_cuda_complex128 (__main__.TestGradientsCUDA) ... ok (93.672s)
test_fn_grad_svd_cuda_float64 (__main__.TestGradientsCUDA) ... ok (47.417s)

For just CUDA grad and gradgrad testing of SVD that's almost 20 minutes! Other tests and the pinverse tests are much faster.

Here's some of the tests from above running on CPU, for example:

test_method_gradgrad_svd_cpu_complex128 (__main__.TestGradientsCPU) ... ok (37.549s)
test_method_gradgrad_svd_cpu_float64 (__main__.TestGradientsCPU) ... ok (10.695s)

While these tests aren't super fast, they take less than a minute compared to the corresponding CUDA tests that over six minutes.

A few things here:

  • it's clear we have to be smarter about what we're testing vs. the cost of testing it
  • a check for build time regressions in the Github CI would be nice so people don't have to compare before and after timings for different builds regularly, since that's time-consuming

Some options for being smarter:

  • We could stop testing method grad and gradgrad and rely on function grad and gradgrad tests (@albanD)
  • We can stop testing svd grad and gradgrad on CUDA, if svd_backward is a composite operation (and a cursory look suggests it is) without custom CUDA bits then this seems reasonable given how long it takes
  • We could limit SVD's sample inputs

I'd like to suggest starting with the first two options. Maybe we should stop testing the CUDA grad and gradgrad of other functions whose backward is a composite operation, too?

Separately, we should follow-up on CUDA SVD performance. Luckily @xwang233 already has a PR that may address some of these issues: #48436.

@rgommers
Copy link
Collaborator

a check for build time regressions in the Github CI would be nice so people don't have to compare before and after timings for different builds regularly, since that's time-consuming

That seems hard to do reliably. Normally you'd notice this anyway when adding the tests, since you're running only those tests when developing. So perhaps best to do as a combination of setting expectations and once in a while cleaning up outliers? Easy to find outliers:

$ pytest test/test_nn.py --durations 10
...
17.34s call     test/test_nn.py::TestNN::test_conv_double_backward
10.99s call     test/test_nn.py::TestNNInit::test_trunc_normal
8.08s call     test/test_nn.py::TestNN::test_TransformerEncoderLayer_relu_activation
6.33s call     test/test_nn.py::TestNNDeviceTypeCPU::test_triplet_margin_with_distance_loss_cpu
5.77s call     test/test_nn.py::TestNN::test_LocalResponseNorm_3d_custom_params
5.34s call     test/test_nn.py::TestNN::test_FractionalMaxPool3d_asymsize
5.33s call     test/test_nn.py::TestNN::test_grid_sample
5.22s call     test/test_nn.py::TestNNDeviceTypeCPU::test_grid_sample_large_index_2d_cpu_float64
4.63s call     test/test_nn.py::TestNN::test_interpolate_nearest_scale_3d
4.12s call     test/test_nn.py::TestNN::test_l1_loss_correct

@mruberry
Copy link
Collaborator

a check for build time regressions in the Github CI would be nice so people don't have to compare before and after timings for different builds regularly, since that's time-consuming

That seems hard to do reliably. Normally you'd notice this anyway when adding the tests, since you're running only those tests when developing.

I don't know if it is so challenging. What if we diffed the test files produced for new tests, for example, and summed their total time and reported if it was over a minute?

@rgommers
Copy link
Collaborator

I don't know if it is so challenging. What if we diffed the test files produced for new tests, for example, and summed their total time and reported if it was over a minute?

You will have the same problem that codecov has, which is that diffing against a baseline run under different circumstances or which may be further from the branch point of your PR can be wrong or nonexisting for lots of reasons. I'd expect the false positive rate to be high.

@malfet
Copy link
Contributor

malfet commented Dec 17, 2020

@mruberry thank you very much for the revert. Please note, that adding all those tests pushed pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test total test time beyond 5h limit. We can always shard it to test1 and test2, but it would be good to understand why those tests need to be so slow.

@albanD
Copy link
Collaborator

albanD commented Dec 17, 2020

I do agree with Mike that for such cases, it is ok to skip the grad and gradgrad check on cuda.
Or maybe do only one with a 2x2 matrix to make sure it actually runs.

We could stop testing method grad and gradgrad and rely on function grad and gradgrad tests

You mean that right now, when an op is both a method and a function, we (grad)gradcheck both?

@mruberry
Copy link
Collaborator

I do agree with Mike that for such cases, it is ok to skip the grad and gradgrad check on cuda.
Or maybe do only one with a 2x2 matrix to make sure it actually runs.

We could stop testing method grad and gradgrad and rely on function grad and gradgrad tests

You mean that right now, when an op is both a method and a function, we (grad)gradcheck both?

Yes. We currently re-run the grad and grad grad checks for the function variant and the method variant of each operation. Seems like we can stop running them on the method variant?

@albanD
Copy link
Collaborator

albanD commented Dec 17, 2020

For all the ops that are automatically bound to python via codegen from native_functions.yaml, we can definitely do that yes.

@mruberry
Copy link
Collaborator

@ngimel and I looked at the current test timings for SVD CUDA backward, and the existing tests take about 5 minutes. By creating an OpInfo there's a 4x expansion of these tests ({function, method} x {double, cdouble}). Removing the method variants would still create a 2x expansion. @ngimel mentioned she'd like the opportunity to review SVD's CUDA backwards before we decide on a course of action.

@mruberry
Copy link
Collaborator

Thanks to @ngimel and @albanD we're going to stop testing method grad and gradgrad for the moment and suggest we stop testing CUDA svd gradgrad. @IvanYashchuk, sorry to ask you to make another revision of this PR, but would you add skips for svd's CUDA gradgrad's tasks?

With these changes we expect svd's CUDA gradient test time to be lower than in builds today.

@IvanYashchuk
Copy link
Collaborator Author

IvanYashchuk commented Dec 18, 2020

@mruberry I added skips for cuda gradchecks. I also marked gradgrad tests as slow.
I removed one test case and changed batch size from 3 and 3x3 to 2 18651b4.
Here are the timings for my local tests before changing the dimensions

217.71s call     test/test_ops.py::TestGradientsCUDA::test_fn_grad_svd_cuda_complex128
101.21s call     test/test_ops.py::TestGradientsCUDA::test_fn_grad_svd_cuda_float64
9.02s call     test/test_ops.py::TestGradientsCPU::test_fn_grad_svd_cpu_complex128
4.67s call     test/test_ops.py::TestGradientsCPU::test_fn_grad_svd_cpu_float64
747.57s call     test/test_ops.py::TestGradientsCUDA::test_fn_gradgrad_svd_cuda_complex128
251.72s call     test/test_ops.py::TestGradientsCUDA::test_fn_gradgrad_svd_cuda_float64
150.30s call     test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_svd_cpu_complex128
43.29s call     test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_svd_cpu_float64

and after

26.10s call     test/test_ops.py::TestGradientsCUDA::test_fn_grad_svd_cuda_complex128
12.88s call     test/test_ops.py::TestGradientsCUDA::test_fn_grad_svd_cuda_float64
3.34s call     test/test_ops.py::TestGradientsCPU::test_fn_grad_svd_cpu_complex128
1.76s call     test/test_ops.py::TestGradientsCPU::test_fn_grad_svd_cpu_float64
178.80s call     test/test_ops.py::TestGradientsCUDA::test_fn_gradgrad_svd_cuda_complex128
58.08s call     test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_svd_cpu_complex128
56.15s call     test/test_ops.py::TestGradientsCUDA::test_fn_gradgrad_svd_cuda_float64
16.74s call     test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_svd_cpu_float64

# cuda gradchecks are very slow
# see discussion https://github.com/pytorch/pytorch/pull/47761#issuecomment-747316775
SkipInfo('TestGradients', 'test_fn_gradgrad', device_type='cuda'),
SkipInfo('TestGradients', 'test_fn_grad', device_type='cuda'))),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep the grad checks going, let's just exclude the gradgrad checks on CUDA for now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should CUDA gradgrad check be always skipped or tested only with PYTORCH_TEST_WITH_SLOW?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just unconditionally skip them on svd for now. @albanD and I are going to look at our test matrix more closely in January.

test/test_jit.py Outdated Show resolved Hide resolved
@mruberry
Copy link
Collaborator

New test timings look good. Let's keep CUDA grad for now and just skip CUDA gradgrad. Then let's merge this.

@IvanYashchuk
Copy link
Collaborator Author

Done. Let's try merging it.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

hwangdeyu pushed a commit to hwangdeyu/pytorch that referenced this pull request Jan 6, 2021
Summary:
Updated `svd_backward` to work correctly for complex-valued inputs.
Updated `common_methods_invocations.py` to take dtype, device arguments for input construction.
Removed `test_pinverse` from `test_autograd.py`, it is replaced by entries to `common_methods_invocations.py`.
Added `svd` and `pinverse` to list of complex tests.

References for complex-valued SVD differentiation:

- https://giggleliu.github.io/2019/04/02/einsumbp.html
- https://arxiv.org/abs/1909.02659

The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant.
https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/

The same rule is implemented in Tensorflow and [BackwardsLinalg.jl](https://github.com/GiggleLiu/BackwardsLinalg.jl).

Ref. pytorch#33152

Pull Request resolved: pytorch#47761

Reviewed By: izdeby

Differential Revision: D25574962

Pulled By: mruberry

fbshipit-source-id: 832b61303e883ad3a451b84850ccf0f36763a6f6
hwangdeyu pushed a commit to hwangdeyu/pytorch that referenced this pull request Jan 6, 2021
Summary:
Updated `svd_backward` to work correctly for complex-valued inputs.
Updated `common_methods_invocations.py` to take dtype, device arguments for input construction.
Removed `test_pinverse` from `test_autograd.py`, it is replaced by entries to `common_methods_invocations.py`.
Added `svd` and `pinverse` to list of complex tests.

References for complex-valued SVD differentiation:

- https://giggleliu.github.io/2019/04/02/einsumbp.html
- https://arxiv.org/abs/1909.02659

The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant.
https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/

The same rule is implemented in Tensorflow and [BackwardsLinalg.jl](https://github.com/GiggleLiu/BackwardsLinalg.jl).

Ref. pytorch#33152

Pull Request resolved: pytorch#47761

Reviewed By: ngimel

Differential Revision: D25658897

Pulled By: mruberry

fbshipit-source-id: ba33ecbbea3f592238c01e62c7f193daf22a9d01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed complex_autograd Merged module: complex Related to complex number support in PyTorch module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants