Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.to_dense backward ignores unspecified elements in sparse inputs #95550

Closed
pearu opened this issue Feb 25, 2023 · 6 comments
Closed

torch.to_dense backward ignores unspecified elements in sparse inputs #95550

pearu opened this issue Feb 25, 2023 · 6 comments
Assignees
Labels
bug module: autograd Related to torch.autograd, and the autograd engine in general module: bc-breaking Related to a BC-breaking change module: sparse Related to torch.sparse triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@pearu
Copy link
Collaborator

pearu commented Feb 25, 2023

Issue description

For historical reasons, torch.to_dense backward on sparse inputs implements masked semantics that contradicts the current interpretation of sparse tensors, that is, sparse tensors are semantically equivalent to strided tensors. The use of a sparse format is considered merely a memory optimization that does not define a mask for operations on sparse tensors.

The masked semantics of tensors is currently implemented in torch.masked (the future) and torch.sparse (to be deprecated).

This issue breaks autograd on sparse tensors because the use of to_dense method is required for operations resulting in sparse tensors, for example:

>>> import torch
>>> a = torch.tensor([[0, 1], [2, 3]], dtype=torch.float64).to_sparse().requires_grad_()
>>> torch.autograd.gradcheck(torch.Tensor.t, a)
<snip>
ValueError: Sparse output is not supported at gradcheck yet. Please call to_dense() on the output of fn for gradcheck.

Code example

When using the recommendation from the above example, gradcheck using non-masked semantics fails:

>>> torch.autograd.gradcheck(lambda x: torch.Tensor.t(x).to_dense(), a, masked=False)
<snip>
torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[1.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 1.0000, 0.0000],
        [0.0000, 1.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 1.0000]], dtype=torch.float64)
analytical:tensor([[0., 0., 0., 0.],
        [0., 0., 1., 0.],
        [0., 1., 0., 0.],
        [0., 0., 0., 1.]], dtype=torch.float64)

but succeeds under masked semantics:

>>> torch.autograd.gradcheck(lambda x: torch.Tensor.t(x).to_dense(), a, masked=True, check_sparse_nnz=True)
True

Possible solutions

  1. A solution is implemented in Add sparse semantics context manager. #94728 that introduces a global Context flag that defines how the operations and their backward implementations should interpret the unspecified elements of sparse tensors.

  2. Introduce a masked kw argument (default is False) to to_dense that enables explicit control of semantics when using to_dense method. For example, the following examples should succeed:

torch.autograd.gradcheck(lambda x: torch.Tensor.t(x).to_dense(masked=False), a, masked=False)

torch.autograd.gradcheck(lambda x: torch.Tensor.t(x).to_dense(masked=True), a, masked=True)

Discussion

Both solutions have pros and cons (see discussions in #94728) and both will be BC-breaking because the default semantics of operations in torch namespace need to switch to non-masked semantics. This cannot be avoided when pursuing the idea of considering sparse tensors semantically equivalent to strided tensors. Fortunately, most tensor operations (from torch namespace) on sparse tensors already implement non-masked semantics, for example:

>>> torch.autograd.gradcheck(lambda x: torch.Tensor.t(x)[0, 0], a, masked=False)
True
>>> torch.autograd.gradcheck(lambda x: torch.Tensor.t(x)[0, 0], a, masked=True, check_sparse_nnz=True)
<snip>
torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[0.],
        [0.],
        [0.],
        [0.]], dtype=torch.float64)
analytical:tensor([[1.],
        [0.],
        [0.],
        [0.]], dtype=torch.float64)
>>> torch.autograd.gradcheck(lambda x: torch.Tensor.t(x)[0, 1], a, masked=True, check_sparse_nnz=True)
True
>>> torch.autograd.gradcheck(lambda x: torch.Tensor.t(x)[0, 1], a, masked=False)
True

The failure above is expected because both transposing and indexing use non-masked semantics (the use of indexing op circumvents "Sparse output is not supported at gradcheck yet" exception and the current issue with to_dense).

From the future perspective, when the masked and non-masked semantics will be well separated between torch.masked and torch namespaces, the use of masked kw argument both in gradcheck and to_dense become unnecessary because the semantics will be defined by the operations (whether these are from torch.masked or torch namespaces). Ironically, this sounds very similar to the original plan of having torch.sparse and torch namespaces for different semantics. I guess the main difference is that the memory optimization and masked semantics features from using sparse tensors become uncoupled features.

System Info

  • PyTorch version:

#95405 that implements masked kw argument support to gradcheck.

cc @alexsamardzic @nikitaved @cpuhrsch @amjames @bhosmer @ezyang @gchanan @albanD @zou3519 @gqchen @soulitzer @lezcano @Varal7

@pearu pearu added module: sparse Related to torch.sparse module: bc-breaking Related to a BC-breaking change module: autograd Related to torch.autograd, and the autograd engine in general bug labels Feb 25, 2023
@vadimkantorov
Copy link
Contributor

Personally, I'm in favor of more explicit solutions (such as masked=True arguments) first, and then if needed - context managers.

@nikitaved
Copy link
Collaborator

nikitaved commented Feb 27, 2023

I am not sure we have to enforce any masked semantics, be it autograd (looking at to_dense specifically), torch.masked and/or context managers, because:

  1. We have the tools to implement any sparse semantics with differentiable sparse_mask (95165).
    For example, using sparse_mask it is easy to mimic the behavior of torch.sparse.mm:
In [1]: import torch

In [2]: x_data = torch.rand(3, 3)

In [3]: y_data = torch.rand(3, 3)

In [4]: x1 = x_data.mul(x_data < 0.5).to_sparse().requires_grad_(True)

In [5]: x2 = x_data.mul(x_data < 0.5).to_sparse().requires_grad_(True)

In [6]: y1 = y_data.mul(y_data < 0.5).to_sparse().requires_grad_(True)

In [7]: y2 = y_data.mul(y_data < 0.5).to_sparse().requires_grad_(True)

In [8]: def custom_sparse_mm(x, y):
   ...:     x = x.sparse_mask(x)
   ...:     y = y.sparse_mask(y)
   ...:     res = x @ y
   ...:     res = res.sparse_mask(res)
   ...:     return res
   ...: 

In [9]: torch.autograd.grad(custom_sparse_mm(y1, x1), (y1, x1), torch.ones(3, 3).to_sparse())
<ipython-input-8-4d9400ee8449>:4: UserWarning: Sparse CSR tensor support is in beta state. If you miss a functionality in the sparse tensor support, please submit a feature request to https://github.com/pytorch/pytorch/issues. (Triggered internally at /home/nik/git/Quansight/pytorch/aten/src/ATen/SparseCsrTensorImpl.cpp:54.)
  res = x @ y
Out[9]: 
(tensor(indices=tensor([[0, 1, 2, 2, 2],
                        [1, 0, 0, 1, 2]]),
        values=tensor([0.8071, 0.6476, 0.6476, 0.8071, 0.2095]),
        size=(3, 3), nnz=5, layout=torch.sparse_coo),
 tensor(indices=tensor([[0, 0, 0, 1, 1, 1, 2],
                        [0, 1, 2, 0, 1, 2, 2]]),
        values=tensor([0.1705, 0.1705, 0.1705, 0.6261, 0.6261, 0.6261, 0.3517]),
        size=(3, 3), nnz=7, layout=torch.sparse_coo))

In [10]: torch.autograd.grad(torch.sparse.mm(y2, x2), (y2, x2), torch.ones(3, 3).to_sparse())
Out[10]: 
(tensor(indices=tensor([[0, 1, 2, 2, 2],
                        [1, 0, 0, 1, 2]]),
        values=tensor([0.8071, 0.6476, 0.6476, 0.8071, 0.2095]),
        size=(3, 3), nnz=5, layout=torch.sparse_coo),
 tensor(indices=tensor([[0, 0, 0, 1, 1, 1, 2],
                        [0, 1, 2, 0, 1, 2, 2]]),
        values=tensor([0.1705, 0.1705, 0.1705, 0.6261, 0.6261, 0.6261, 0.3517]),
        size=(3, 3), nnz=7, layout=torch.sparse_coo))


In [11]: x1
Out[11]: 
tensor(indices=tensor([[0, 0, 0, 1, 1, 1, 2],
                       [0, 1, 2, 0, 1, 2, 2]]),
       values=tensor([0.0543, 0.1362, 0.4572, 0.0271, 0.4451, 0.3349, 0.2095]),
       size=(3, 3), nnz=7, layout=torch.sparse_coo, requires_grad=True)

In [12]: y1
Out[12]: 
tensor(indices=tensor([[0, 1, 2, 2, 2],
                       [1, 0, 0, 1, 2]]),
       values=tensor([0.3481, 0.0818, 0.0887, 0.2780, 0.3517]),
       size=(3, 3), nnz=5, layout=torch.sparse_coo, requires_grad=True)

In general, any composition of operations (f_n o ... o f_1)(x) can be turned into a "masked" operation
[(m_n o f_n) o ... o (m_1 o f_1)](x.sparse_mask(x)), where m_i(x) := id(x) or x.sparse_mask(mask_i) with mask_i potentially equal to x.

This approach:
a. Is very flexible and does not rely on any global behavior but just sparse being an optimization layout.
b. Masking of type x.sparse_mask(x) is a trivial operation (in forward) with the only overhead of creating additional node in the computational graph.
c. Masks can trivially be shared along the composition of functions for reduced memory footprint.
d. No need to multiply our codebase at all! In fact, we can remove some parts :)

  1. Masked semantics (just like with sparse.mm, gradient projection onto specified elements) implies sparse parametrization only in linear cases.

  2. Instead of Masked semantics, it is better to have a notion of sparse parametrization. This is what is expected from sparse in some cases: Triangular solver for sparse matrices #87358 (comment).

  3. Autograd can treat sparse tensors just the same as dense tensors. If needed, the output can be modified with output = output.sparse_mask(output); return output.to_dense(), so we can eliminate sparse-specific parts of code in the gradcheck like check_sparse_nnz and related. But I am not sure it is that easy though, cc @albanD .
    The example of failing gradcheck can be modified to not rely on any explicit masked=True semantics:

>>> a = torch.tensor([[0, 1], [2, 3]], dtype=torch.float64).to_sparse().requires_grad_()
>>> def f(a):
...     a = a.sparse_mask(a)
...     res = torch.Tensor.t(a)
...     res = res.sparse_mask(res)
...     return res.to_dense()
... 
>>> torch.autograd.gradcheck(lambda a: f(a), a, check_sparse_nnz=True)
True

Alternatively, without explicit masked=True and check_sparse_nnz=True:

>>> a = torch.tensor([[0, 1], [2, 3]], dtype=torch.float64).requires_grad_(True)
>>> mask = a.detach().to_sparse()
>>> def f(a, mask):
...     res = a.sparse_mask(mask)
...     res = torch.Tensor.t(res)
...     res = res.sparse_mask(res)
...     return res.to_dense()
... 
>>> torch.autograd.gradcheck(lambda a: f(a, mask), (a,))
True

@pearu
Copy link
Collaborator Author

pearu commented Feb 27, 2023

I am not sure we have to enforce any masked semantics, be it autograd (looking at to_dense specifically), torch.masked and/or context managers.

@nikitaved Do you agree or disagree that to_dense_backward implementation is currently wrong in that it applies input as mask to grad when the input is a tensor with a sparse layout?

@nikitaved
Copy link
Collaborator

nikitaved commented Feb 27, 2023

@pearu, I agree with that statement. But I also propose an alternative solution to the ones you posited.

@pearu
Copy link
Collaborator Author

pearu commented Feb 27, 2023

@pearu, I agree with that statement. But I also propose an alternative solution to the ones you posited.

Great!

IIUC, your solution is to not have masked kw arguments neither in gradcheck nor in to_dense. Both will assume non-masked semantics. That is:

  • in gradcheck implementation, sparse inputs will be always densified for numerical jacobian calculations
  • in to_dense backward implementation, remove sparse_mask call

With the above, torch.sparse.mm numerical jacobian will be the same as the numerical jacobian of torch.mm but the corresponding analytical jacobians will be different.

So, how would you test torch.sparse.mm autograd correctness using gradcheck?

@ezyang ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 27, 2023
@nikitaved
Copy link
Collaborator

nikitaved commented Feb 27, 2023

@pearu, like this. Based on our PRs:

diff --git a/tools/autograd/derivatives.yaml b/tools/autograd/derivatives.yaml
index 71ee6f03773..bc0a1bae4cb 100644
--- a/tools/autograd/derivatives.yaml
+++ b/tools/autograd/derivatives.yaml
@@ -1679,7 +1679,7 @@
 # - name: to_dense(Tensor self, ScalarType? dtype=None) -> Tensor
 #
 - name: _to_dense(Tensor self, ScalarType? dtype=None) -> Tensor
-  self: to_dense_backward(grad, self)
+  self: grad.to_sparse()
>>> x = torch.tensor([[0, 1], [2, 3]], dtype=torch.float64).to_sparse().requires_grad_()
>>> mask = x.detach()
>>> torch.autograd.gradcheck(lambda t: t.sparse_mask(mask).to_dense(), (x,), masked=False)
True
>>> torch.autograd.gradcheck(lambda t: torch.Tensor.t(t.sparse_mask(mask)).to_dense(), (x,), masked=False)
True
>>> def f(x, mask):
...     x = x.sparse_mask(mask)
...     x = torch.sparse.mm(x, x)
...     return x.to_dense()
>>> torch.autograd.gradcheck(lambda x: f(x, mask), (x,), masked=False)
True

sparse_mask enforces sparse semantics and makes sure that both forward and backward respect it.

@pearu pearu self-assigned this Mar 6, 2023
pearu added a commit that referenced this issue Mar 6, 2023
As in the title.

The masked kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked==False`.

The PR is BC-breaking in the sense that the masked semantics has been the default semantics for `to_dense` (its backward ignores unspecified elements in the input) and this PR enables the non-masked semantics as the default.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked=True)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked=True))
```

Fixes #95550




cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 6, 2023
As in the title.

The masked kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked==False`.

The PR is BC-breaking in the sense that the masked semantics has been the default semantics for `to_dense` (its backward ignores unspecified elements in the input) and this PR enables the non-masked semantics as the default.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked=True)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked=True))
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 9, 2023
As in the title.

The masked kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked==False`.

The PR is BC-breaking in the sense that the masked semantics has been the default semantics for `to_dense` (its backward ignores unspecified elements in the input) and this PR enables the non-masked semantics as the default.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked=True)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked=True))
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 9, 2023
As in the title.

The masked kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked==False`.

The PR is BC-breaking in the sense that the masked semantics has been the default semantics for `to_dense` (its backward ignores unspecified elements in the input) and this PR enables the non-masked semantics as the default.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked=True)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked=True))
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 9, 2023
As in the title.

The masked kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked==False`.

The PR is BC-breaking in the sense that the masked semantics has been the default semantics for `to_dense` (its backward ignores unspecified elements in the input) and this PR enables the non-masked semantics as the default.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked=True)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked=True))
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 9, 2023
As in the title.

The masked kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked==False`.

The PR is BC-breaking in the sense that the masked semantics has been the default semantics for `to_dense` (its backward ignores unspecified elements in the input) and this PR enables the non-masked semantics as the default.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked=True)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked=True))
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 9, 2023
As in the title.

The masked kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked==False` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

~The PR is BC-breaking in the sense that the masked semantics has been the default semantics for `to_dense` (its backward ignores unspecified elements in the input) and this PR enables the non-masked semantics as the default.~

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 9, 2023
As in the title.

The masked kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked==False` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

~The PR is BC-breaking in the sense that the masked semantics has been the default semantics for `to_dense` (its backward ignores unspecified elements in the input) and this PR enables the non-masked semantics as the default.~

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 9, 2023
As in the title.

The masked kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked==False` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

~The PR is BC-breaking in the sense that the masked semantics has been the default semantics for `to_dense` (its backward ignores unspecified elements in the input) and this PR enables the non-masked semantics as the default.~

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 9, 2023
As in the title.

The masked kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked==False` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

~The PR is BC-breaking in the sense that the masked semantics has been the default semantics for `to_dense` (its backward ignores unspecified elements in the input) and this PR enables the non-masked semantics as the default.~

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 14, 2023
As in the title.

The masked kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked==False` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

~The PR is BC-breaking in the sense that the masked semantics has been the default semantics for `to_dense` (its backward ignores unspecified elements in the input) and this PR enables the non-masked semantics as the default.~

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 14, 2023
As in the title.

The masked kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked==False` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

~The PR is BC-breaking in the sense that the masked semantics has been the default semantics for `to_dense` (its backward ignores unspecified elements in the input) and this PR enables the non-masked semantics as the default.~

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 14, 2023
As in the title.

The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 14, 2023
As in the title.

The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 14, 2023
As in the title.

The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 14, 2023
As in the title.

The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 14, 2023
As in the title.

The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 14, 2023
As in the title.

The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 14, 2023
As in the title.

The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 14, 2023
As in the title.

The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 15, 2023
As in the title.

The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 15, 2023
As in the title.

The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 16, 2023
As in the title.

The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pearu added a commit that referenced this issue Mar 16, 2023
As in the title.

The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True)
```

Fixes #95550




cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
cyyever pushed a commit to cyyever/pytorch_private that referenced this issue Mar 23, 2023
As in the title.

The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True)
```

Fixes pytorch/pytorch#95550

Pull Request resolved: pytorch/pytorch#96095
Approved by: https://github.com/cpuhrsch
cyyever pushed a commit to cyyever/pytorch_private that referenced this issue Mar 27, 2023
As in the title.

The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True)
```

Fixes pytorch/pytorch#95550

Pull Request resolved: pytorch/pytorch#96095
Approved by: https://github.com/cpuhrsch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug module: autograd Related to torch.autograd, and the autograd engine in general module: bc-breaking Related to a BC-breaking change module: sparse Related to torch.sparse triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants