Issue: support auto generation of device check for sparse tensors #59058

aocsa · 2021-05-27T02:00:21Z

🐛 Bug

To Reproduce

@dtypes(torch.double)
def test_matmul_device_mismatch(self, device, dtype):
    cpu = torch.rand((10, 10))
    cuda = cpu.cuda()
    for s, m1, m2 in itertools.product((cpu, cuda), repeat=3):
        print(s.device, m1.device, m2.device)
        csr = m1.to_sparse() #         csr = m1.to_csr_sparse()
        if s.device == csr.device == m2.device:
            torch.addmm(s, csr, m2)
        else:
            with self.assertRaisesRegex(RuntimeError, "Expected all tensors to be on the same device"):
                torch.addmm(s, csr, m2)

If we disable checks

pytorch/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp

Line 100 in 1bd22e2

TORCH_INTERNAL_ASSERT(t.device().type() == kCPU);

pytorch/aten/src/ATen/native/sparse/SparseTensorMath.cpp

Line 609 in 1bd22e2

AT_ASSERT(!t.is_cuda()); // the dispatch argument

Fails with Segmentation fault (core dumped) when (s.device, m1.device, m2.device) are cpu, cpu, cuda:0. Basically gen_device_check->common_device_check_failure is not called. It fails even when 2nd parameter of addmm is a COO or CSR sparse tensor.

Expected behavior

code-gen calls gen_device_check->common_device_check_failure.

Additional info

Following PR #56872

cc @aocsa @nikitaved @pearu @cpuhrsch @IvanYashchuk @mruberry

The text was updated successfully, but these errors were encountered:

ngimel · 2021-05-27T02:29:43Z

cc @wenleix

wenleix · 2021-05-28T00:18:08Z

Device guard is only generated for CUDA dispatch key:

pytorch/tools/codegen/dest/register_dispatch_key.py

Lines 241 to 258 in 89d7885

    
                           device_guard = "// DeviceGuard omitted"  # default 
        
                           if f.device_guard and is_cuda_dispatch_key(self.backend_index.dispatch_key): 
        
                               has_tensor_options = any(isinstance(a.argument, TensorOptionsArguments) for a in args) 
        
                               if has_tensor_options: 
        
                                   # kernel is creating a tensor 
        
                                   device_guard = """globalContext().lazyInitCUDA(); 
        
             const DeviceGuard device_guard(device_or_default(device));""" 
        
                               else: 
        
                                   # kernel is operating on existing tensors 
        
                                   # There is precedence for which argument we use to do 
        
                                   # device guard.  This describes the precedence order. 
        
                                   self_arg = [f.func.arguments.self_arg.argument] if f.func.arguments.self_arg is not None else [] 
        
                                   candidate_args = itertools.chain( 
        
                                       self_arg, 
        
                                       f.func.arguments.out, 
        
                                       f.func.arguments.flat_positional 
        
                                   )

I incorrectly also only generate device check for CUDA dispatch key, in your example, it's a CPU dispatch so there is no device check generated.

I think the following lines needs to be removed (likely needs to fix some tests):

pytorch/tools/codegen/dest/register_dispatch_key.py

Line 234 in 89d7885

if is_cuda_dispatch_key(self.backend_index.dispatch_key):

pytorch/tools/codegen/dest/register_dispatch_key.py

Line 515 in 89d7885

if is_cuda_dispatch_key(self.backend_index.dispatch_key):

Will work on this.

ngimel · 2021-05-28T00:45:55Z

It shouldn't be cpu dispatch if at least one of the tensors is cuda, are we doing multi-dispatch incorrectly in this case? I don't know what's the priority between sparse/dense cuda/cpu keys, @ezyang do you know?

ezyang · 2021-06-01T17:47:28Z

Priority is SparseCUDA, SparseCPU, CUDA, CPU. So yes, if there is a sparse CUDA tensor anywhere, it will dispatch to SparseCUDA.

ngimel · 2021-06-01T18:56:06Z

So SparseCPU + Dense CUDA will dispatch to SparseCPU, and SparseCPU won't have device checks autogenerated. We don't have this problem in the dense land (if you are dispatched to dense CPU, there can't possibly be any cuda tensors around).

ezyang · 2021-06-02T15:37:01Z

Good catch, @ngimel

This was referenced May 27, 2021

CUDA support in the CSR layout: CUDA addmm/matvec #59012

Closed

CUDA support in the CSR layout: sparse_to_dense/add_sparse_csr #59011

Closed

aocsa added the module: sparse Related to torch.sparse label May 27, 2021

aocsa added this to To do in Sparse tensors via automation May 27, 2021

aocsa mentioned this issue May 31, 2021

Sparse tensor CSR layout for CUDA #56485

Open

16 tasks

ngimel added module: codegen Issues related to the codegen for Aten and Autograd triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jun 1, 2021

IvanYashchuk mentioned this issue Oct 12, 2021

Sparse CSR CUDA: add addmv_out #61407

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue: support auto generation of device check for sparse tensors #59058

Issue: support auto generation of device check for sparse tensors #59058

aocsa commented May 27, 2021 •

edited by pytorch-probot bot

ngimel commented May 27, 2021

wenleix commented May 28, 2021

ngimel commented May 28, 2021

ezyang commented Jun 1, 2021

ngimel commented Jun 1, 2021

ezyang commented Jun 2, 2021

Issue: support auto generation of device check for sparse tensors #59058

Issue: support auto generation of device check for sparse tensors #59058

Comments

aocsa commented May 27, 2021 • edited by pytorch-probot bot

🐛 Bug

To Reproduce

Expected behavior

Additional info

ngimel commented May 27, 2021

wenleix commented May 28, 2021

ngimel commented May 28, 2021

ezyang commented Jun 1, 2021

ngimel commented Jun 1, 2021

ezyang commented Jun 2, 2021

aocsa commented May 27, 2021 •

edited by pytorch-probot bot