Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: support auto generation of device check for sparse tensors #59058

Open
aocsa opened this issue May 27, 2021 · 6 comments
Open

Issue: support auto generation of device check for sparse tensors #59058

aocsa opened this issue May 27, 2021 · 6 comments
Labels
module: codegen Issues related to the codegen for Aten and Autograd module: sparse Related to torch.sparse triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@aocsa
Copy link
Contributor

aocsa commented May 27, 2021

馃悰 Bug

To Reproduce

@dtypes(torch.double)
def test_matmul_device_mismatch(self, device, dtype):
    cpu = torch.rand((10, 10))
    cuda = cpu.cuda()
    for s, m1, m2 in itertools.product((cpu, cuda), repeat=3):
        print(s.device, m1.device, m2.device)
        csr = m1.to_sparse() #         csr = m1.to_csr_sparse()
        if s.device == csr.device == m2.device:
            torch.addmm(s, csr, m2)
        else:
            with self.assertRaisesRegex(RuntimeError, "Expected all tensors to be on the same device"):
                torch.addmm(s, csr, m2)

If we disable checks

TORCH_INTERNAL_ASSERT(t.device().type() == kCPU);

AT_ASSERT(!t.is_cuda()); // the dispatch argument

Fails with Segmentation fault (core dumped) when (s.device, m1.device, m2.device) are cpu, cpu, cuda:0. Basically gen_device_check->common_device_check_failure is not called. It fails even when 2nd parameter of addmm is a COO or CSR sparse tensor.

Expected behavior

code-gen calls gen_device_check->common_device_check_failure.

Additional info

Following PR #56872

cc @aocsa @nikitaved @pearu @cpuhrsch @IvanYashchuk @mruberry

@ngimel
Copy link
Collaborator

ngimel commented May 27, 2021

cc @wenleix

@aocsa aocsa added the module: sparse Related to torch.sparse label May 27, 2021
@aocsa aocsa added this to To do in Sparse tensors via automation May 27, 2021
@wenleix
Copy link
Contributor

wenleix commented May 28, 2021

Device guard is only generated for CUDA dispatch key:

device_guard = "// DeviceGuard omitted" # default
if f.device_guard and is_cuda_dispatch_key(self.backend_index.dispatch_key):
has_tensor_options = any(isinstance(a.argument, TensorOptionsArguments) for a in args)
if has_tensor_options:
# kernel is creating a tensor
device_guard = """globalContext().lazyInitCUDA();
const DeviceGuard device_guard(device_or_default(device));"""
else:
# kernel is operating on existing tensors
# There is precedence for which argument we use to do
# device guard. This describes the precedence order.
self_arg = [f.func.arguments.self_arg.argument] if f.func.arguments.self_arg is not None else []
candidate_args = itertools.chain(
self_arg,
f.func.arguments.out,
f.func.arguments.flat_positional
)

I incorrectly also only generate device check for CUDA dispatch key, in your example, it's a CPU dispatch so there is no device check generated.

I think the following lines needs to be removed (likely needs to fix some tests):

if is_cuda_dispatch_key(self.backend_index.dispatch_key):

if is_cuda_dispatch_key(self.backend_index.dispatch_key):

Will work on this.

@ngimel
Copy link
Collaborator

ngimel commented May 28, 2021

It shouldn't be cpu dispatch if at least one of the tensors is cuda, are we doing multi-dispatch incorrectly in this case? I don't know what's the priority between sparse/dense cuda/cpu keys, @ezyang do you know?

@ngimel ngimel added module: codegen Issues related to the codegen for Aten and Autograd triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jun 1, 2021
@ezyang
Copy link
Contributor

ezyang commented Jun 1, 2021

Priority is SparseCUDA, SparseCPU, CUDA, CPU. So yes, if there is a sparse CUDA tensor anywhere, it will dispatch to SparseCUDA.

@ngimel
Copy link
Collaborator

ngimel commented Jun 1, 2021

So SparseCPU + Dense CUDA will dispatch to SparseCPU, and SparseCPU won't have device checks autogenerated. We don't have this problem in the dense land (if you are dispatched to dense CPU, there can't possibly be any cuda tensors around).

@ezyang
Copy link
Contributor

ezyang commented Jun 2, 2021

Good catch, @ngimel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: codegen Issues related to the codegen for Aten and Autograd module: sparse Related to torch.sparse triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Development

No branches or pull requests

4 participants