Bmm sparse dense #33430

kurtamohler · 2020-02-17T22:29:33Z

Add sparse-dense BMM operation for CUDA and CPU.

dr-ci · 2020-02-17T22:49:07Z

💊 Build failures summary and remediations

As of commit d01b921 (more details on the Dr. CI page):

1/1 failures introduced in this PR

XLA failure

Job pytorch_xla_linux_xenial_py3_6_clang7_test is failing. Please create an issue with title prefixed by [PT_BREAK] in pytorch/xla and link to to this PR. If you have questions, please reach out to @ailzhang / @dlibenzi / @JackCaoG.

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 266 times.

kurtamohler · 2020-02-18T16:24:19Z

The clang-tidy check is failing to install clang-tidy:

+ sudo apt-get install -y clang-tidy-8
Reading package lists...
Building dependency tree...
Reading state information...
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 clang-tidy-8 : Depends: libllvm8 (= 1:8.0.1+svn369350-1~exp1~20200112113617.82) but 1:8.0.1+svn369350-1~exp1~20200114191400.80 is to be installed
                Depends: clang-tools-8 but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
##[error]Process completed with exit code 100.

aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu

aten/src/ATen/native/native_functions.yaml

aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu

ezyang · 2020-02-19T16:03:33Z

@pearu @nikitaved How about you guys do the first pass reviewing the algorithm? I can help with more framework review but I'd like you guys to do the bulk of the actual algorithm review.

kurtamohler · 2020-02-20T21:46:58Z

Below are charts of sparse-dense bmm's performance compared to a workaround function that someone posted to the original issue. My performance script is here: https://github.com/kurtamohler/pytorch-perf-test-scripts/blob/master/bmm-sparse-dense/bmm_perf.py

The "input cases" that the x-axis refers to are the input size combinations found in the same order in the table below the charts. "Pre-coalescing" means that before bmm() is called, I'm coalescing the sparse matrix outside the timed loop so that it does not have to be done inside the timed bmm() call. Measuring with and without the pre-coalescing step allows us to see how much of an impact coalescing has on the run time. Note that the way the workaround works, it does not need a coalesced sparse matrix, so it can skip that step. It looks like coalescing increases run time significantly in some cases but not in others.

The builtin methods are almost always faster than the workarounds, and CUDA is almost always faster than CPU.

Input cases:

num_matrices	squ_mat_elements	output_size	sparsity	nnz
10	100	10000	0.9	100
10	100	100000	0	1000
10	10000	1000000	0.999	100
10	10000	10000000	0.99	1000
10	10000	100000000	0.9	10000
10	10000	1000000000	0	100000
10	1000000	10000000000	0.999	10000
10	1000000	100000000000	0.99	100000
100	100	100000	0.9	1000
100	100	1000000	0	10000
100	10000	10000000	0.999	1000
100	10000	100000000	0.99	10000
100	10000	1000000000	0.9	100000
100	1000000	100000000000	0.999	100000
1000	100	1000000	0.9	10000
1000	100	10000000	0	100000
1000	10000	100000000	0.999	10000
1000	10000	1000000000	0.99	100000
10000	100	10000000	0.9	100000
10000	10000	1000000000	0.999	100000

kurtamohler · 2020-03-03T22:51:41Z

I'm getting some CI errors related to JIT:

Feb 21 00:52:01 ======================================================================
Feb 21 00:52:01 ERROR [0.108s]: test_nested2 (__main__.EagerModePostTrainingQuantTest)
Feb 21 00:52:01 ----------------------------------------------------------------------
Feb 21 00:52:01 Traceback (most recent call last):
Feb 21 00:52:01   File "test_quantization.py", line 195, in test_nested2
Feb 21 00:52:01     checkQuantized(model)
Feb 21 00:52:01   File "test_quantization.py", line 193, in checkQuantized
Feb 21 00:52:01     self.checkScriptable(model, self.calib_data)
Feb 21 00:52:01   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_quantization.py", line 140, in checkScriptable
Feb 21 00:52:01     self._checkScriptable(orig_mod, traced, calib_data, check_save_load)
Feb 21 00:52:01   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_quantization.py", line 144, in _checkScriptable
Feb 21 00:52:01     self._checkModuleCorrectnessAgainstOrig(orig_mod, script_mod, calib_data)
Feb 21 00:52:01   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_quantization.py", line 161, in _checkModuleCorrectnessAgainstOrig
Feb 21 00:52:01     scripted_output = test_mod(inp)
Feb 21 00:52:01   File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
Feb 21 00:52:01     result = self.forward(*input, **kwargs)
Feb 21 00:52:01   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 90, in prof_meth_call
Feb 21 00:52:01     return prof_callable(meth_call, *args, **kwargs)
Feb 21 00:52:01   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 84, in prof_callable
Feb 21 00:52:01     return callable(*args, **kwargs)
Feb 21 00:52:01 RuntimeError: 
Feb 21 00:52:01 Couldn't find an operator for aten::bmm(Tensor self, Tensor mat2) -> Tensor. Do you have to update a set of hardcoded JIT ops?

I'm not really sure what I need to do.

ezyang · 2020-03-04T22:55:36Z

@kurtamohler Alright, it's this problem. Can you split bmm into two functions, bmm(Tensor, Tensor) and _bmm(Tensor, Tensor, bool deterministic)? Then have bmm just call _bmm with the deterministic flag set, but define autograd separately on bmm and _bmm. I think that will suffice to make this error go away.

cc'ing @eellison if you have a better protocol. (I am aware that we could also just bash this out by finding the schema string in JIT and updating it, but I kind of don't want to do that here; I feel the JIT doesn't want to see the determinism flag.)

eellison · 2020-03-04T23:35:02Z

@ezyang @kurtamohler is the error is in shape analysis because aten::bmm(Tensor self, Tensor mat2) -> Tensor doesn't exist ? If that's the case, i would say just update it and hope that we are done deprecating it completely soon (cc @Krovatkin). Otherwise I'm not sure.

kurtamohler · 2020-03-05T20:45:34Z

Alright well I'm not sure if what I just changed fixed the issue or not. I haven't figured out how to manually run the failing test. So I'll just let CI run it.

kurtamohler · 2020-03-10T20:13:03Z

Yesterday I discovered that my CPU implementation has a flaw in its method of searching the 3-D sparse tensor's indices for each 2-D matrix. Apparently, depending on how you create the sparse tensor, the tensor of indices can be laid out in either row-major or column-major order in memory. Looks like the dense_tensor.to_sparse() method gives the opposite index matrix ordering than what you get if you create a sparse matrix directly with something like torch.sparse.Tensor(). My implementation was only handling one of these cases, and the following gave the wrong result:

a = torch.rand([2,2,2]).to_sparse()
b = torch.rand([2,2,2])
a.bmm(b)

Here is an example of how two different sparse tensors' index matrices can have different strides:

>>> b = torch.sparse.FloatTensor(torch.LongTensor([[0,0,1],[0,1,1],[0,1,1]]),torch.FloatTensor([1,2,3]),torch.Size([2,2,2])).coalesce()
>>> b.indices().stride()
(3, 1)
>>> b = b.to_dense().to_sparse()
>>> b.indices().stride()
(1, 3)

I have a working fix to my search function that takes this into account, and I will push it shortly.

ezyang · 2020-03-11T13:31:58Z

re the hip errors (cc @iotamudelta )

22:43:07 /var/lib/jenkins/workspace/aten/src/ATen/native/sparse/hip/SparseHIPTensorMath.hip:793:1: error: unknown type name 'hipDataType_t'
22:43:07 hipDataType_t getTensorCudaDataType(Tensor self) {
22:43:07 ^
22:43:07 /var/lib/jenkins/workspace/aten/src/ATen/native/sparse/hip/SparseHIPTensorMath.hip:794:3: error: unknown type name 'hipDataType_t'
22:43:07   hipDataType_t cuda_data_type;
22:43:07   ^
22:43:07 /var/lib/jenkins/workspace/aten/src/ATen/native/sparse/hip/SparseHIPTensorMath.hip:797:24: error: use of undeclared identifier 'hipR32F'
22:43:07       cuda_data_type = hipR32F;
22:43:07                        ^
22:43:07 /var/lib/jenkins/workspace/aten/src/ATen/native/sparse/hip/SparseHIPTensorMath.hip:800:24: error: use of undeclared identifier 'hipR64F'
22:43:07       cuda_data_type = hipR64F;
22:43:07                        ^
22:43:07 /var/lib/jenkins/workspace/aten/src/ATen/native/sparse/hip/SparseHIPTensorMath.hip:880:3: error: unknown type name 'cusparseSpMMAlg_t'
22:43:07   cusparseSpMMAlg_t mm_alg = deterministic ? CUSPARSE_COOMM_ALG2 : CUSPARSE_COOMM_ALG1;
22:43:07   ^
22:43:07 /var/lib/jenkins/workspace/aten/src/ATen/native/sparse/hip/SparseHIPTensorMath.hip:880:46: error: use of undeclared identifier 'CUSPARSE_COOMM_ALG2'
22:43:07   cusparseSpMMAlg_t mm_alg = deterministic ? CUSPARSE_COOMM_ALG2 : CUSPARSE_COOMM_ALG1;
22:43:07                                              ^
22:43:07 /var/lib/jenkins/workspace/aten/src/ATen/native/sparse/hip/SparseHIPTensorMath.hip:880:68: error: use of undeclared identifier 'CUSPARSE_COOMM_ALG1'
22:43:07   cusparseSpMMAlg_t mm_alg = deterministic ? CUSPARSE_COOMM_ALG2 : CUSPARSE_COOMM_ALG1;
22:43:07                                                                    ^
22:43:07 /var/lib/jenkins/workspace/aten/src/ATen/native/sparse/hip/SparseHIPTensorMath.hip:899:11: error: unknown type name 'hipDataType_t'
22:43:07           hipDataType_t cuda_data_type = getTensorCudaDataType(mat2_contig);
22:43:07           ^
22:43:07 /var/lib/jenkins/workspace/aten/src/ATen/native/sparse/hip/SparseHIPTensorMath.hip:904:11: error: unknown type name 'cusparseSpMatDescr_t'; did you mean

for now you should just ifdef out the code in the HIP case and say that it's not supported on HIP

iotamudelta · 2020-03-11T15:40:01Z

It looks like either missing features in ROCm or (more likely) a mishipification. Agreed with @ezyang that this can be ifdef'd out for now, we'll have a look what's up there and may also ping you if we need some input?

kurtamohler · 2020-03-11T15:42:36Z

Alright, thanks for letting me know. I'll put in the ifdef.

kurtamohler · 2020-04-14T04:16:01Z

@peterjc123, I think you're right, WIN32_ should be enough. I think I was using TORCH_INTERNAL_ASSERT incorrectly. I incorrectly thought that it was an unconditional version of TORCH_CHECK and only needed one argument (the error message). So, changing back to using TORCH_CHECK with the conditional set to false must have been the real reason why the error is now being thrown given the correct conditions. I think this might be a demonstration of why preprocessor macros should be avoided if possible.

facebook-github-bot

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2020-04-15T14:01:52Z

Drat, it looks like our internal version of cusparse is too old. Is there a way you can add macro ifdefs that appropriate test the version of cusparse before making API calls?

stderr: caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(807): warning: statement is unreachable
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(898): error: identifier "cusparseSpMMAlg_t" is undefined
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(898): error: identifier "CUSPARSE_COOMM_ALG2" is undefined
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(898): error: identifier "CUSPARSE_COOMM_ALG1" is undefined
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(902): error: identifier "cusparseSpMatDescr_t" is undefined
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(902): error: identifier "CUSPARSE_INDEX_32I" is undefined
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(902): error: identifier "cusparseCreateCoo" is undefined
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(902): error: identifier "cusparseDnMatDescr_t" is undefined
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(902): error: identifier "CUSPARSE_ORDER_COL" is undefined
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(902): error: identifier "cusparseCreateDnMat" is undefined
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(902): error: identifier "cusparseDnMatDescr_t" is undefined
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(902): error: identifier "CUSPARSE_ORDER_COL" is undefined
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(902): error: identifier "cusparseCreateDnMat" is undefined
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(902): error: identifier "cusparseSpMM_bufferSize" is undefined
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(902): error: identifier "cusparseSpMM" is undefined
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(902): error: identifier "cusparseDestroySpMat" is undefined
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(902): error: identifier "cusparseDestroyDnMat" is undefined
caffe2/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu(902): error: identifier "cusparseDestroyDnMat" is undefined

kurtamohler · 2020-04-15T17:58:46Z

Alright, I changed the ifdefs so that an error is thrown if the cuda version is less than 10.1. I also added a test to make sure the error is being thrown correctly and skip the other tests if less than cuda 10.1.

kurtamohler · 2020-04-15T20:09:05Z

Looks like the macos environment doesn't have the torch._C._cuda_getCompiledVersion() function that I was using to decide whether to skip tests. I changed it to use torch.version.cuda instead, hopefully that's available in all the testing environments.

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

kurtamohler · 2020-04-15T21:47:15Z

Darnit, the version string comparison module I was using doesn't exist on macos and windows:

    from packaging import version
ModuleNotFoundError: No module named 'packaging'

kurtamohler · 2020-04-15T22:07:23Z

I removed the packaging module import and chose to use this method of comparison instead:

[int(x) for x in torch.version.cuda.split(".")] >= [10, 1]

Hopefully that's robust enough not to break.

kurtamohler · 2020-04-15T23:33:29Z

Wasn't robust enough. I think this will be.

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2020-04-16T17:40:45Z

This looks like it was sufficient. Unfortunately it looks like we need to merge with master.

facebook-github-bot

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

kurtamohler · 2020-04-17T22:15:59Z

@ezyang, do you know what caused Facebook Internal to fail?

ezyang · 2020-04-20T13:52:14Z

it's fake, I think

facebook-github-bot · 2020-04-20T18:16:56Z

@ezyang merged this pull request in c7cf4c1.

kurtamohler requested review from ezyang, nikitaved and pearu February 17, 2020 22:29

pytorchbot added the open source label Feb 17, 2020

kurtamohler force-pushed the bmm-sparse-dense-5672 branch 2 times, most recently from 05ef9cc to 06b33ee Compare February 18, 2020 05:47

kurtamohler changed the title ~~Bmm sparse dense 5672~~ Bmm sparse dense Feb 18, 2020

kurtamohler commented Feb 18, 2020

View reviewed changes

aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu Outdated Show resolved Hide resolved

ezyang reviewed Feb 19, 2020

View reviewed changes

aten/src/ATen/native/native_functions.yaml Outdated Show resolved Hide resolved

ezyang reviewed Feb 19, 2020

View reviewed changes

aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu Outdated Show resolved Hide resolved

yf225 added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: sparse Related to torch.sparse labels Feb 19, 2020

ezyang mentioned this pull request Feb 20, 2020

Bad error message when _out and non-out variants don't match #33547

Closed

kurtamohler requested a review from apaszke as a code owner March 5, 2020 20:44

kurtamohler force-pushed the bmm-sparse-dense-5672 branch from 716b27c to f1aba62 Compare March 5, 2020 23:31

kurtamohler force-pushed the bmm-sparse-dense-5672 branch 2 times, most recently from 035faa4 to 61c1f64 Compare March 10, 2020 22:24

kurtamohler force-pushed the bmm-sparse-dense-5672 branch from 61c1f64 to dbc89bb Compare March 11, 2020 18:59

facebook-github-bot reviewed Apr 14, 2020

View reviewed changes

kurtamohler added 2 commits April 15, 2020 12:49

Throw error if bmm sparse-dense is run with CUDA less than 10.1

ce6e907

Fix bmm cuda version test skipIf conditions

9824aec

Fix cuda version comparison

aa06aa8

facebook-github-bot reviewed Apr 15, 2020

View reviewed changes

Fix version string comparison

89c9821

Avoid version compare if torch.version.cuda is None

f4280a0

Fix flake error

63d79df

facebook-github-bot reviewed Apr 16, 2020

View reviewed changes

Merge remote-tracking branch 'origin/master' into bmm-sparse-dense-5672

d01b921

facebook-github-bot reviewed Apr 16, 2020

View reviewed changes

ezyang approved these changes Apr 20, 2020

View reviewed changes

facebook-github-bot closed this in c7cf4c1 Apr 20, 2020

facebook-github-bot added the merged label Apr 20, 2020

xwang233 mentioned this pull request Aug 2, 2020

Relax cusparse windows guard on cuda 11 #42412

Closed

mruberry added the Merged label Oct 28, 2020

xwang233 mentioned this pull request Mar 17, 2022

Can this cudaDeviceSynchronize call be removed? #74391

Open

Bmm sparse dense #33430

Bmm sparse dense #33430

Uh oh!

Conversation

kurtamohler commented Feb 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Feb 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 Build failures summary and remediations

XLA failure

Uh oh!

kurtamohler commented Feb 18, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ezyang commented Feb 19, 2020

Uh oh!

kurtamohler commented Feb 20, 2020

Uh oh!

kurtamohler commented Mar 3, 2020

Uh oh!

ezyang commented Mar 4, 2020

Uh oh!

eellison commented Mar 4, 2020

Uh oh!

kurtamohler commented Mar 5, 2020

Uh oh!

kurtamohler commented Mar 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Mar 11, 2020

Uh oh!

iotamudelta commented Mar 11, 2020

Uh oh!

kurtamohler commented Mar 11, 2020

Uh oh!

kurtamohler commented Apr 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang commented Apr 15, 2020

Uh oh!

kurtamohler commented Apr 15, 2020

Uh oh!

kurtamohler commented Apr 15, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

kurtamohler commented Apr 15, 2020

Uh oh!

kurtamohler commented Apr 15, 2020

Uh oh!

kurtamohler commented Apr 15, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang commented Apr 16, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

kurtamohler commented Apr 17, 2020

Uh oh!

ezyang commented Apr 20, 2020

Uh oh!

facebook-github-bot commented Apr 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

kurtamohler commented Feb 17, 2020 •

edited

Loading

dr-ci bot commented Feb 17, 2020 •

edited

Loading

kurtamohler commented Mar 10, 2020 •

edited

Loading

kurtamohler commented Apr 14, 2020 •

edited

Loading