add mv operator to SparseTensor #21782

shihongzhi · 2019-06-14T15:15:08Z

Fixes #21266 add mv operator to SparseTensor

add mv operator to SparseTensor

zhangguanheng66 · 2019-06-14T18:15:00Z

@ezyang if you have time to review this PR. Thanks.

ssnl

I would love to see a doc update saying that this does not work on cuda sparse tensors. Also, @ezyang , what was our resolution for the torch.sparse namespace? Should this be in there or not?

shihongzhi · 2019-06-15T02:44:57Z

@pytorchbot retest this please.

shihongzhi · 2019-06-17T01:55:49Z

@ezyang I think this is ready for review. Thanks

ezyang

Hi @shihongzhi, unfortunately, writing a (naive) matrix-vector multiply by hand is not the recommended way of solving the issue. One reason it's not recommended is that in your PR, there is no implementation of the kernel in CUDA; ideally, we'd also have a CUDA implementation.

Since a matrix-vector multiply is simply a matrix-matrix multiply, where the latter matrix is n x 1 matrix, the easiest implementation is to unsqueeze the vector into a matrix, call the existing matrix-matrix multiply, and then squeeze it back into a vector.

ezyang · 2019-06-17T14:14:25Z

Also, @ezyang , what was our resolution for the torch.sparse namespace? Should this be in there or not?

I don't remember what our resolution was. Since @weiyangfb is not working on sparse anymore, we will probably end up relitigating this when someone else picks it up. However, since matmul is not in sparse namespace, this probably should not be in there either (for consistency, for now).

(FWIW, these days, I'm a lot more on the side of "dense semantics don't have to match sparse semantics exactly, no namespace needed, you never really want a densified gradient flowing to a sparse input")

shihongzhi · 2019-06-17T14:32:46Z

@ezyang Thank for for review. I will use matrix-matrix multiply, and also to add a CUDA implementation.

test/test_sparse.py

aten/src/ATen/native/sparse/SparseTensorMath.cpp

ezyang · 2019-06-18T14:44:59Z

Good, this is an improvement. But you need to do more:

Make mv_sparse_cpu CPU/CUDA agnostic, by:
a. Renaming it to mv_sparse
b. Removing the device checks
Test both cases by removing the decorator

Also I'm pretty sure you need a squeeze at the end.

Don't forget to dismiss the "changes requested" when you want more review.

2. adapt mv_sparse for cpu and cuda both

shihongzhi · 2019-07-24T06:46:28Z

@pytorchbot rebase this please.

shihongzhi · 2019-07-30T11:17:27Z

@pytorchbot rebase this please.

shihongzhi · 2019-07-31T01:05:44Z

@ezyang I think it's ready for review again.

# Conflicts: # aten/src/ATen/native/native_functions.yaml

ezyang · 2020-03-23T18:34:07Z

@kurtamohler and @nikitaved, since y'all are getting more involved in sparse, do you think you could help review this?

aten/src/ATen/native/sparse/SparseTensorMath.cpp

kurtamohler · 2020-03-23T19:39:25Z

test/test_sparse.py

+        y = torch.ones(3, device=self.device)
+
+        self.assertEqual(self.value_tensor([3, 9]), x.matmul(y))
+


We should probably have test cases that make sure your asserts are triggered correctly, with self.assertRaisesRegex.

Sorry, I don't use self.assertRaisesRegex before. So I don't how and why to use this.

self.assertRaisesRegex is for checking to make sure functions raise exceptions when expected. The way to use it is:

with self.assertRaisesRegex(<type of exception>, <regex string that matches the expected error message>): <call the function with arguments that will cause the expected exception>

The error messages you're regex-matching against are the messages you wrote in your TORCH_CHECK calls. There are many examples in test_sparse.py.

Done. Thank you for your detail explain.

test/test_sparse.py

dr-ci · 2020-03-24T03:46:19Z

💊 CircleCI build failures summary and remediations

As of commit 45c7079 (more details on the Dr. CI page):

2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following build failures do not appear to be due to upstream breakages (reran 1 job to discount flakiness):

pytorch_linux_xenial_py3_6_gcc5_4_test (1/2)

Step: "Test" (full log | pattern match details) <confirmed not flaky by 2 failures>

Apr 01 04:41:44 caused by: Connection refused (os error 111)

Apr 01 04:41:44 +++ eval 'extract_trap_cmd ' 
Apr 01 04:41:44 ++++ extract_trap_cmd 
Apr 01 04:41:44 ++++ printf '%s\n' '' 
Apr 01 04:41:44 +++ printf '%s\n' cleanup 
Apr 01 04:41:44 ++ trap -- ' 
Apr 01 04:41:44 cleanup' EXIT 
Apr 01 04:41:44 ++ which sccache 
Apr 01 04:41:44 ++ sccache --stop-server 
Apr 01 04:41:44 Stopping sccache server... 
Apr 01 04:41:44 error: couldn't connect to server 
Apr 01 04:41:44 caused by: Connection refused (os error 111) 
Apr 01 04:41:44 ++ true 
Apr 01 04:41:44 ++ rm /var/lib/jenkins/sccache_error.log 
Apr 01 04:41:44 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
Apr 01 04:41:44 ++ SCCACHE_IDLE_TIMEOUT=1200 
Apr 01 04:41:44 ++ RUST_LOG=sccache::server=error 
Apr 01 04:41:44 ++ sccache --start-server 
Apr 01 04:41:44 Starting sccache server... 
Apr 01 04:41:44 ++ sccache --zero-stats 
Apr 01 04:41:44 Compile requests                 0 
Apr 01 04:41:44 Compile requests executed        0

pytorch_xla_linux_xenial_py3_6_clang7_test (2/2)

Step: "Test" (full log | pattern match details) <confirmed not flaky by 2 failures>

Apr 01 05:04:34 AssertionError: 0.22364577306378963 not less than or equal to 0.2 :

Apr 01 05:04:34   File "/var/lib/jenkins/workspace/xla/test/../../test/test_torch.py", line 9764, in helper 
Apr 01 05:04:34     self.assertEqual(t_transform(q[99:100]).std(), std_transform(1), 0.2) 
Apr 01 05:04:34   File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 370, in assertEqual 
Apr 01 05:04:34     **kwargs) 
Apr 01 05:04:34   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 802, in assertEqual 
Apr 01 05:04:34     allow_inf=allow_inf, exact_dtype=exact_dtype) 
Apr 01 05:04:34   File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 370, in assertEqual 
Apr 01 05:04:34     **kwargs) 
Apr 01 05:04:34   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 925, in assertEqual 
Apr 01 05:04:34     super(TestCase, self).assertLessEqual(abs(x - y), prec, message) 
Apr 01 05:04:34 AssertionError: 0.22364577306378963 not less than or equal to 0.2 :  
Apr 01 05:04:34  
Apr 01 05:04:34 ---------------------------------------------------------------------- 
Apr 01 05:04:34 Ran 839 tests in 2039.290s 
Apr 01 05:04:34  
Apr 01 05:04:34 FAILED (failures=1, skipped=291) 
Apr 01 05:04:34  
Apr 01 05:04:34 Generating XML reports... 
Apr 01 05:04:34 Generated XML report: test-reports/python-unittest/TEST-TestTorchDeviceTypeXLA-20200401043034.xml 
Apr 01 05:04:35 + cleanup 
Apr 01 05:04:35 + retcode=1

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 20 times.

shihongzhi · 2020-03-31T02:26:58Z

@kurtamohler could you help review the code again? thanks.

ezyang · 2020-03-31T21:21:08Z

Looks like you still need to fix some tests

======================================================================
FAIL: test_mv (__main__.TestUncoalescedSparse)
----------------------------------------------------------------------
RuntimeError: expected self.size(-1) == vec.size(-1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_sparse.py", line 1539, in test_mv
    test_shape(10, 100, 10, 20)
AssertionError: "expected self.size(-1) == vec.size(-1)" does not match "expected self.size(-1) == vec.size(-1)"

shihongzhi · 2020-04-01T06:32:34Z

Looks like you still need to fix some tests

======================================================================
FAIL: test_mv (__main__.TestUncoalescedSparse)
----------------------------------------------------------------------
RuntimeError: expected self.size(-1) == vec.size(-1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_sparse.py", line 1539, in test_mv
    test_shape(10, 100, 10, 20)
AssertionError: "expected self.size(-1) == vec.size(-1)" does not match "expected self.size(-1) == vec.size(-1)"

Done. Fixed this bug

facebook-github-bot

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-04-02T00:31:01Z

@ezyang merged this pull request in 74ef0ad.

Fixes pytorch#21266

b2bf5bc

add mv operator to SparseTensor

pytorchbot added module: operators module: sparse Related to torch.sparse labels Jun 14, 2019

shihongzhi changed the title ~~Fixes #21266~~ add mv operator to SparseTensor Jun 14, 2019

ezyang added the open source label Jun 14, 2019

zhangguanheng66 requested a review from ezyang June 14, 2019 18:14

zhangguanheng66 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 14, 2019

ssnl reviewed Jun 14, 2019

View reviewed changes

Fix python lint error

a135e43

ezyang requested changes Jun 17, 2019

View reviewed changes

use unsqueeze

55ecf89

ezyang reviewed Jun 18, 2019

View reviewed changes

test/test_sparse.py Outdated Show resolved Hide resolved

ezyang reviewed Jun 18, 2019

View reviewed changes

aten/src/ATen/native/sparse/SparseTensorMath.cpp Show resolved Hide resolved

ezyang reviewed Jun 18, 2019

View reviewed changes

aten/src/ATen/native/sparse/SparseTensorMath.cpp Outdated Show resolved Hide resolved

1. squeeze the result

2f9b186

2. adapt mv_sparse for cpu and cuda both

pytorchbot and others added 2 commits July 24, 2019 06:46

Merge remote-tracking branch 'origin/master' into HEAD

a82ce96

fix gpu test error

de1aefc

Merge remote-tracking branch 'origin/master' into HEAD

7d63053

Merge branch 'master' into feature/fix_21266

51dbe1b

# Conflicts: # aten/src/ATen/native/native_functions.yaml

shihongzhi requested review from ezyang and ssnl March 20, 2020 03:24

ezyang requested review from kurtamohler and nikitaved March 23, 2020 18:33

kurtamohler requested changes Mar 23, 2020

View reviewed changes

tmp: resolve PR comment

4687cae

shihongzhi added 2 commits March 24, 2020 22:09

fix

8b85466

fix device bug

7646100

shihongzhi requested a review from kurtamohler March 24, 2020 15:02

Merge branch 'master' into feature/fix_21266

bc40933

add assertRaisesRegex in test

e5fec23

kurtamohler approved these changes Mar 31, 2020

View reviewed changes

ezyang approved these changes Mar 31, 2020

View reviewed changes

Fix regax pattern

45c7079

facebook-github-bot reviewed Apr 1, 2020

View reviewed changes

facebook-github-bot closed this in 74ef0ad Apr 1, 2020

facebook-github-bot added the merged label Apr 2, 2020

mruberry added the Merged label Oct 28, 2020

		y = torch.ones(3, device=self.device)

		self.assertEqual(self.value_tensor([3, 9]), x.matmul(y))

add mv operator to SparseTensor #21782

add mv operator to SparseTensor #21782

Uh oh!

Conversation

shihongzhi commented Jun 14, 2019

Uh oh!

zhangguanheng66 commented Jun 14, 2019

Uh oh!

ssnl left a comment

Choose a reason for hiding this comment

Uh oh!

shihongzhi commented Jun 15, 2019

Uh oh!

shihongzhi commented Jun 17, 2019

Uh oh!

ezyang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jun 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shihongzhi commented Jun 17, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ezyang commented Jun 18, 2019

Uh oh!

shihongzhi commented Jul 24, 2019

Uh oh!

shihongzhi commented Jul 30, 2019

Uh oh!

shihongzhi commented Jul 31, 2019

Uh oh!

ezyang commented Mar 23, 2020

Uh oh!

Uh oh!

Uh oh!

kurtamohler Mar 23, 2020

Choose a reason for hiding this comment

Uh oh!

shihongzhi Mar 24, 2020

Choose a reason for hiding this comment

Uh oh!

kurtamohler Mar 31, 2020

Choose a reason for hiding this comment

Uh oh!

shihongzhi Mar 31, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dr-ci bot commented Mar 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CircleCI build failures summary and remediations

🕵️ 2 new failures recognized by patterns

pytorch_linux_xenial_py3_6_gcc5_4_test (1/2)

pytorch_xla_linux_xenial_py3_6_clang7_test (2/2)

Uh oh!

shihongzhi commented Mar 31, 2020

Uh oh!

ezyang commented Mar 31, 2020

Uh oh!

shihongzhi commented Apr 1, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Apr 2, 2020

Uh oh!

Uh oh!

ezyang left a comment •

edited

Loading

ezyang commented Jun 17, 2019 •

edited

Loading

dr-ci bot commented Mar 24, 2020 •

edited

Loading