Skip to content

add mv operator to SparseTensor #21782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

shihongzhi
Copy link
Contributor

Fixes #21266 add mv operator to SparseTensor

add mv operator to SparseTensor
@pytorchbot pytorchbot added module: operators module: sparse Related to torch.sparse labels Jun 14, 2019
@shihongzhi shihongzhi changed the title Fixes #21266 add mv operator to SparseTensor Jun 14, 2019
@zhangguanheng66 zhangguanheng66 requested a review from ezyang June 14, 2019 18:14
@zhangguanheng66
Copy link
Contributor

@ezyang if you have time to review this PR. Thanks.

@zhangguanheng66 zhangguanheng66 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 14, 2019
Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to see a doc update saying that this does not work on cuda sparse tensors. Also, @ezyang , what was our resolution for the torch.sparse namespace? Should this be in there or not?

@shihongzhi
Copy link
Contributor Author

@pytorchbot retest this please.

@shihongzhi
Copy link
Contributor Author

@ezyang I think this is ready for review. Thanks

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shihongzhi, unfortunately, writing a (naive) matrix-vector multiply by hand is not the recommended way of solving the issue. One reason it's not recommended is that in your PR, there is no implementation of the kernel in CUDA; ideally, we'd also have a CUDA implementation.

Since a matrix-vector multiply is simply a matrix-matrix multiply, where the latter matrix is n x 1 matrix, the easiest implementation is to unsqueeze the vector into a matrix, call the existing matrix-matrix multiply, and then squeeze it back into a vector.

@ezyang
Copy link
Contributor

ezyang commented Jun 17, 2019

Also, @ezyang , what was our resolution for the torch.sparse namespace? Should this be in there or not?

I don't remember what our resolution was. Since @weiyangfb is not working on sparse anymore, we will probably end up relitigating this when someone else picks it up. However, since matmul is not in sparse namespace, this probably should not be in there either (for consistency, for now).

(FWIW, these days, I'm a lot more on the side of "dense semantics don't have to match sparse semantics exactly, no namespace needed, you never really want a densified gradient flowing to a sparse input")

@shihongzhi
Copy link
Contributor Author

@ezyang Thank for for review. I will use matrix-matrix multiply, and also to add a CUDA implementation.

@ezyang
Copy link
Contributor

ezyang commented Jun 18, 2019

Good, this is an improvement. But you need to do more:

  1. Make mv_sparse_cpu CPU/CUDA agnostic, by:
    a. Renaming it to mv_sparse
    b. Removing the device checks
  2. Test both cases by removing the decorator

Also I'm pretty sure you need a squeeze at the end.

Don't forget to dismiss the "changes requested" when you want more review.

2. adapt mv_sparse for cpu and cuda both
@shihongzhi
Copy link
Contributor Author

@pytorchbot rebase this please.

@shihongzhi
Copy link
Contributor Author

@pytorchbot rebase this please.

@shihongzhi
Copy link
Contributor Author

@ezyang I think it's ready for review again.

# Conflicts:
#	aten/src/ATen/native/native_functions.yaml
@shihongzhi shihongzhi requested review from ezyang and ssnl March 20, 2020 03:24
@ezyang ezyang requested review from kurtamohler and nikitaved March 23, 2020 18:33
@ezyang
Copy link
Contributor

ezyang commented Mar 23, 2020

@kurtamohler and @nikitaved, since y'all are getting more involved in sparse, do you think you could help review this?

y = torch.ones(3, device=self.device)

self.assertEqual(self.value_tensor([3, 9]), x.matmul(y))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have test cases that make sure your asserts are triggered correctly, with self.assertRaisesRegex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't use self.assertRaisesRegex before. So I don't how and why to use this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.assertRaisesRegex is for checking to make sure functions raise exceptions when expected. The way to use it is:

with self.assertRaisesRegex(<type of exception>, <regex string that matches the expected error message>):
    <call the function with arguments that will cause the expected exception>

The error messages you're regex-matching against are the messages you wrote in your TORCH_CHECK calls. There are many examples in test_sparse.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thank you for your detail explain.

@dr-ci
Copy link

dr-ci bot commented Mar 24, 2020

💊 CircleCI build failures summary and remediations

As of commit 45c7079 (more details on the Dr. CI page):


  • 2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following build failures do not appear to be due to upstream breakages (reran 1 job to discount flakiness):

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (1/2)

Step: "Test" (full log | pattern match details) <confirmed not flaky by 2 failures>

Apr 01 04:41:44 caused by: Connection refused (os error 111)
Apr 01 04:41:44 +++ eval 'extract_trap_cmd ' 
Apr 01 04:41:44 ++++ extract_trap_cmd 
Apr 01 04:41:44 ++++ printf '%s\n' '' 
Apr 01 04:41:44 +++ printf '%s\n' cleanup 
Apr 01 04:41:44 ++ trap -- ' 
Apr 01 04:41:44 cleanup' EXIT 
Apr 01 04:41:44 ++ which sccache 
Apr 01 04:41:44 ++ sccache --stop-server 
Apr 01 04:41:44 Stopping sccache server... 
Apr 01 04:41:44 error: couldn't connect to server 
Apr 01 04:41:44 caused by: Connection refused (os error 111) 
Apr 01 04:41:44 ++ true 
Apr 01 04:41:44 ++ rm /var/lib/jenkins/sccache_error.log 
Apr 01 04:41:44 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
Apr 01 04:41:44 ++ SCCACHE_IDLE_TIMEOUT=1200 
Apr 01 04:41:44 ++ RUST_LOG=sccache::server=error 
Apr 01 04:41:44 ++ sccache --start-server 
Apr 01 04:41:44 Starting sccache server... 
Apr 01 04:41:44 ++ sccache --zero-stats 
Apr 01 04:41:44 Compile requests                 0 
Apr 01 04:41:44 Compile requests executed        0 

See CircleCI build pytorch_xla_linux_xenial_py3_6_clang7_test (2/2)

Step: "Test" (full log | pattern match details) <confirmed not flaky by 2 failures>

Apr 01 05:04:34 AssertionError: 0.22364577306378963 not less than or equal to 0.2 :
Apr 01 05:04:34   File "/var/lib/jenkins/workspace/xla/test/../../test/test_torch.py", line 9764, in helper 
Apr 01 05:04:34     self.assertEqual(t_transform(q[99:100]).std(), std_transform(1), 0.2) 
Apr 01 05:04:34   File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 370, in assertEqual 
Apr 01 05:04:34     **kwargs) 
Apr 01 05:04:34   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 802, in assertEqual 
Apr 01 05:04:34     allow_inf=allow_inf, exact_dtype=exact_dtype) 
Apr 01 05:04:34   File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 370, in assertEqual 
Apr 01 05:04:34     **kwargs) 
Apr 01 05:04:34   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 925, in assertEqual 
Apr 01 05:04:34     super(TestCase, self).assertLessEqual(abs(x - y), prec, message) 
Apr 01 05:04:34 AssertionError: 0.22364577306378963 not less than or equal to 0.2 :  
Apr 01 05:04:34  
Apr 01 05:04:34 ---------------------------------------------------------------------- 
Apr 01 05:04:34 Ran 839 tests in 2039.290s 
Apr 01 05:04:34  
Apr 01 05:04:34 FAILED (failures=1, skipped=291) 
Apr 01 05:04:34  
Apr 01 05:04:34 Generating XML reports... 
Apr 01 05:04:34 Generated XML report: test-reports/python-unittest/TEST-TestTorchDeviceTypeXLA-20200401043034.xml 
Apr 01 05:04:35 + cleanup 
Apr 01 05:04:35 + retcode=1 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 20 times.

@shihongzhi shihongzhi requested a review from kurtamohler March 24, 2020 15:02
@shihongzhi
Copy link
Contributor Author

@kurtamohler could you help review the code again? thanks.

@ezyang
Copy link
Contributor

ezyang commented Mar 31, 2020

Looks like you still need to fix some tests

======================================================================
FAIL: test_mv (__main__.TestUncoalescedSparse)
----------------------------------------------------------------------
RuntimeError: expected self.size(-1) == vec.size(-1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_sparse.py", line 1539, in test_mv
    test_shape(10, 100, 10, 20)
AssertionError: "expected self.size(-1) == vec.size(-1)" does not match "expected self.size(-1) == vec.size(-1)"

@shihongzhi
Copy link
Contributor Author

Looks like you still need to fix some tests

======================================================================
FAIL: test_mv (__main__.TestUncoalescedSparse)
----------------------------------------------------------------------
RuntimeError: expected self.size(-1) == vec.size(-1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_sparse.py", line 1539, in test_mv
    test_shape(10, 100, 10, 20)
AssertionError: "expected self.size(-1) == vec.size(-1)" does not match "expected self.size(-1) == vec.size(-1)"

Done. Fixed this bug

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 74ef0ad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged module: sparse Related to torch.sparse open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SparseTensor multiplication with 1D vector
8 participants