Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add torch.eig complex forward (CPU, CUDA) #49168

Closed
wants to merge 14 commits into from

Conversation

antocuni
Copy link
Contributor

Related to issue #42666

@dr-ci
Copy link

dr-ci bot commented Dec 10, 2020

💊 CI failures summary and remediations

As of commit e1c1ebe (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test1 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Dec 24 20:00:45 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 24 20:00:45 At:
Dec 24 20:00:45   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 24 20:00:45   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 24 20:00:45 
Dec 24 20:00:45 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 24 20:00:45 
Dec 24 20:00:45 At:
Dec 24 20:00:45   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 24 20:00:45   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 24 20:00:45 
Dec 24 20:00:45 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 24 20:00:45 
Dec 24 20:00:45 At:
Dec 24 20:00:45   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 24 20:00:45   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 24 20:00:45 
Dec 24 20:00:45 [W tensorpipe_agent.cpp:547] RPC agent for worker0 encountered error when reading incoming request from worker3: EOF: end of file (this is expected to happen during shutdown)
Dec 24 20:00:45 [W tensorpipe_agent.cpp:547] RPC agent for worker0 encountered error when reading incoming request from worker2: EOF: end of file (this is expected to happen during shutdown)
Dec 24 20:00:45 [W tensorpipe_agent.cpp:547] RPC agent for worker0 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown)
Dec 24 20:00:46 ok (2.346s)
Dec 24 20:00:47   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:547] RPC agent for worker2 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown)

1 job timed out:

  • pytorch_linux_bionic_py3_8_gcc9_coverage_test1

🚧 5 fixed upstream failures:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 19 times.

@antocuni antocuni changed the title WIP Add torch.eig complex forward (CPU, CUDA) Add torch.eig complex forward (CPU, CUDA) Dec 24, 2020
@antocuni antocuni marked this pull request as ready for review December 24, 2020 21:31
@antocuni
Copy link
Contributor Author

it seems that all the failing tests also fail upstream, so I think that this PR is ready for review

@mruberry mruberry self-requested a review December 29, 2020 17:36
@mruberry mruberry added module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Dec 29, 2020
@mruberry
Copy link
Collaborator

Test failures are unrelated. + @anjali411 to review, too.

This looks correct to me.

@nikitaved
Copy link
Collaborator

Is there any progress on this? Thinking about rewriting the eig_backward with complex input support (alas, the real case with complex numbers is still desired).

@mruberry
Copy link
Collaborator

Thanks for the ping, @nikitaved, looks like this got lost over the holidays. I'll take a note to schedule it.


// the API is slightly different for the complex vs real case: if the input
// is complex, eigenvals will be a vector of complex. If the input is real,
// eigenvals will be a (n, 2) matrix containing the real and imaginary parts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in #43081, I think we should always return a complex tensor. This will be a bc breaking change, so we should only update this behavior for torch.linalg.eig

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that the point of torch.linalg.* was to be as compatible to numpy as possible. This would be an unnecessary breakage for people porting their code from numpy to pytorch.
What about adding a flag such as returns_complex=False (or =True if we want the correct-but-numpy-incompatible behavior by default) to let the user choose?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @antocuni , looks like Numpy is already doing it:

In [18]: x = numpy.array([[1, 2], [-2, 1]], numpy.double)

In [19]: x
Out[19]: 
array([[ 1.,  2.],
       [-2.,  1.]])

In [20]: numpy.linalg.eig(x)
Out[20]: 
(array([1.+2.j, 1.-2.j]),
 array([[0.        -0.70710678j, 0.        +0.70710678j],
        [0.70710678+0.j        , 0.70710678-0.j        ]]))

Copy link
Contributor Author

@antocuni antocuni Jan 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouch, my fault, I overlooked it. Ok, I'll implement the "proper" behavior then, thanks for pointing it out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anjali411 @nikitaved I just realized that torch.linalg.eig doesn't exists yet!
So I think that the current behavior of my branch is correct. I agree that torch.linalg.eig should always return complex, but I don't think there is anything we can do in this branch

@antocuni
Copy link
Contributor Author

@anjali411 @mruberry:

  • I addressed the issue about using self.is_complex() (thank you for the suggestion, it's much better now!).
  • I think that the issue about the future behaviour of torch.linalg.eig is not relevant to this PR.
  • I merged upstream/master into this branch to make sure the branch still work. It seems it didn't introduce any new failure.

I think the branch is ready for a final round of review and hopefully merging

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Would you just add a test that this throws a runtime error when trying to do complex backward through it, @antocuni? Then just ping me and we'll get this merged.

Adding these functions makes a lot of sense since I imagine we'll reuse them for torch.linalg.eig, but maybe we should implement torch.linalg.eig next and not bother implementing complex backward for torch.eig?

@codecov
Copy link

codecov bot commented Jan 19, 2021

Codecov Report

Merging #49168 (01c168d) into master (c458558) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #49168      +/-   ##
==========================================
- Coverage   80.64%   80.64%   -0.01%     
==========================================
  Files        1913     1913              
  Lines      208061   208077      +16     
==========================================
+ Hits       167797   167807      +10     
- Misses      40264    40270       +6     

@antocuni
Copy link
Contributor Author

Cool! Would you just add a test that this throws a runtime error when trying to do complex backward through it, @antocuni? Then just ping me and we'll get this merged.

@mruberry done by commit 01c168d

Adding these functions makes a lot of sense since I imagine we'll reuse them for torch.linalg.eig, but maybe we should implement torch.linalg.eig next and not bother implementing complex backward for torch.eig?

It's very likely that we will be able to implement torch.linalg.eig in terms of torch.eig (or vice versa) using dispatch: Math as we did for svd and qr: in that case, once we implement complex backward for one, we should get the other "for free".
So, the order in which we implement torch.eig's backward and torch.linalg.eig should be irrelevant.

I think that @nikitaved planned to work on complex backward for torch.eig, so maybe he has some opinion on this.

@mruberry
Copy link
Collaborator

It's very likely that we will be able to implement torch.linalg.eig in terms of torch.eig (or vice versa) using dispatch: Math as we did for svd and qr: in that case, once we implement complex backward for one, we should get the other "for free".
So, the order in which we implement torch.eig's backward and torch.linalg.eig should be irrelevant.

That's probably true. Whatever @nikitaved thinks best.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@nikitaved
Copy link
Collaborator

I agree, the order does not matter, the backward for real inputs with complex eigenvalues is not implemented anyway, so we will need to just insert a check for dtype or shape (if the tensor representing eigenvalues is real but has real and complex parts).

@mruberry
Copy link
Collaborator

Unfortunately it looks like this is hitting some internal errors, like this:

clahqr.c:function clahqr_: error: undefined reference to 'c_sqrt'
clahqr.c:function clahqr_: error: undefined reference to 'c_sqrt'
claqr0.c:function claqr0_: error: undefined reference to 'c_sqrt'

cc @IvanYashchuk and @ngimel

I have to run but can take a closer look later, too.

@IvanYashchuk
Copy link
Collaborator

From the filename, my guess is that CLAPACK is used somewhere and it is missing -lm flag when compiling.

@mruberry
Copy link
Collaborator

From the filename, my guess is that CLAPACK is used somewhere and it is missing -lm flag when compiling.

Good guess:

clapack/clapack/SRC/clahqr.c:448: error: undefined reference to 'c_sqrt'

@mruberry
Copy link
Collaborator

Update: @malfet has identified the issue. Some builds were excluding c_sqrt because of issues compiling it on Windows. He's testing a fix now.

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 880f007.

@mruberry
Copy link
Collaborator

The attribution above is erroneous, @malfet fixed and merged this PR by fixing Windows builds using CLAPACK so they could built c_sqrt properly, then fixing the build rules for those builds so they included it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants