-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gradcheck
produces false positives with sparse inputs when masked=False
.
#103518
Comments
The
I think this is the key point for determining if the raised issue is valid or not. By definition, Recall, in non-masked semantics, the "sparsity pattern" of a tensor (in terms of its indices set) does not define a mask because non-masked semantics is layout-agnostic. The
where Btw, the title "gradcheck produces false positives.." may be misleading as the reported issue depends on the backward formula of |
…ient wrt self" As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
The function
|
Ah, right. So, the requirements for
The numerical path in Could you propose a fix to the backward formula of |
I am not sure that such a fix will cut it. The usage of Suppose Now suppose that Suppose I have a function
def wrapped_sampled_addmm(self, mask, *args, **kwargs)
return sampled_addmm(self.sparse_mask(mask), *args, **kwargs).sparse_mask(mask)
gradcheck(lambda self, mask: wrapped_sampled_addmm(self, mask, ...), (self, self.detach() != 0), masked=False)
|
…ient wrt self" As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
Right. This is exactly what
One of the fundamental assumptions in gradcheck numerical path is that the inputs are perturbed in-place (for efficiency). The densification of sparse inputs allowed more-or-less transparent addition of sparse tensors support to autograd for non-masked semantics. The alternative to the densification pre-processing step is to perturb the inputs by creating a new input tensor (
I think the fix to the backward formula would be a good way to get some attention to your suggestion as it would provide a reproducer for the possible gradcheck problem. Another approach for testing what you want is to rephrase the problem using strided tensors only (replace a sparse input with a pair of strided values and a mask tensor). |
Modification of inputs is wrong in general,
Sorry, what is the fundamental assumption of gradcheck? I always thought that PyTorch follows the principle of correctness first.
There is nothing to fix in backward, it is correct. Is It is just so much easier for |
Oh, I feel like I should just reference https://github.com/pytorch/pytorch/blob/main/torch/autograd/gradcheck.py instead of trying to explain here how gradcheck works with sparse inputs in detail.
pytorch/torch/autograd/gradcheck.py Line 74 in 7a2a006
no
As a user, you should not be concerned about the
|
How can I get a full Jacobian of a sparse function? |
I am not certain about this. As I see it, with
This does not make sense to me. There is one and only one function used everywhere in |
OK, the backward formula for Now, let's assume
But |
@nikitaved your example has many typos but when eliminating these and taking into account that
No. Two functions are equal when their values are equal for all possible argument combinations. Within non-masked semantics, In the case of That said, notice that there is a discrepancy in the current implementation of >>> m1 = torch.eye(2, 2) * 2
>>> m2 = torch.eye(2, 2) * 3
>>> x = torch.tensor([[1, 0], [0, 0.]]).to_sparse_csr()
>>> densified_x = (x.to_dense() - 99).to_sparse_csr() + 99. * torch.ones(x.shape).to_sparse_csr()
>>> x
tensor(crow_indices=tensor([0, 1, 1]),
col_indices=tensor([0]),
values=tensor([1.]), size=(2, 2), nnz=1, layout=torch.sparse_csr)
>>> densified_x
tensor(crow_indices=tensor([0, 2, 4]),
col_indices=tensor([0, 1, 0, 1]),
values=tensor([1., 0., 0., 0.]), size=(2, 2), nnz=4,
layout=torch.sparse_csr)
>>> torch.sparse.sampled_addmm(x, m1, m2)
tensor(crow_indices=tensor([0, 1, 1]),
col_indices=tensor([0]),
values=tensor([7.]), size=(2, 2), nnz=1, layout=torch.sparse_csr)
>>> torch.sparse.sampled_addmm(densified_x, m1, m2)
tensor(crow_indices=tensor([0, 2, 4]),
col_indices=tensor([0, 1, 0, 1]),
values=tensor([7., 0., 0., 6.]), size=(2, 2), nnz=4,
layout=torch.sparse_csr) that is, the implementation considers specified elements as non-zeros (when evaluating |
The order of the approximation does not matter, really, it only applies to differentiable functions. This function is not differentiable since it's right derivative is not finite which the example from above shows. These functions are different because operate over different masks, and mask is a hidden parameter. Specified indices specify parameters, and these are different. |
Right. What matters is the usage of the central finite difference scheme which will never give |
Such functions are not using non-masked semantics. Hence, using |
…ient wrt self" As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…ient wrt self" As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…ient wrt self" As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…ient wrt self" As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…ient wrt self" As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…ient wrt self" As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…ient wrt self" As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…ient wrt self" As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…ient wrt self" As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…ient wrt self" As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…ient wrt self" As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…ient wrt self" As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…ient wrt self" As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…ient wrt self" As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
As per title. Previous gradient was only correct under the Sparse Semantics, i.e. with`alpha * (mat1 @ mat2)` ignored. However, then, it is wrongly parametrized in the backward pass, as we need to project the gradient in a generic case. Under this parametrization we can expect `gradcheck` to succeed with either `masked=True` or `masked=False` even after #103518 is fixed. cc pearu . cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang gchanan albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
🐛 Describe the bug
As per title. As an example, let's consider the
sampled_addmm
method which is semantically equivalent tosampled_addmm(s, m1, m2, alpha, beta) := alpha * (m1 @ m2).sparse_mask(s) + beta * s
.If we inspect the subgradient of
sampled_addmm
wrts
inderivatives.yaml
, we find the following:Note, that under the assumption of masked semantics this formula is correct, even though it does not account for the
(mat1 @ mat2).sparse_mask(self)
part. This follows from the sparse semantics that impliesself.indices == (self + perturbation_of_self).indices
. Hence we can expectgradcheck
to work withmasked=True
:However, the situation is reversed for
masked=False
. In this case the backward formula forself
should takealpha * (m1 @ m2).sparse_mask(self)
into consideration, so it is expected forgradcheck
withmasked=False
to fail.This, however, does not happen:
As per @pearu's insight, this happens during the densification process in gradcheck. Namely, it sometimes expands
self.indices
to full dimensions while producing a new sparse inputself_densified
. Unfortunately,sampled_addmm(self)
andsampled_addmm(self_densified)
are not equivalent in backward, becausesampled_addmm(self_densified)
should pass gradcheck with eithermasked=True
ormasked=False
since it's mask is the whole space.Versions
Current master.
cc @alexsamardzic @pearu @cpuhrsch @amjames @bhosmer @ezyang @albanD @zou3519 @gqchen @soulitzer @lezcano @Varal7
The text was updated successfully, but these errors were encountered: