New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement sparse semantics support in gradcheck (2nd try) #95405
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/95405
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 FailuresAs of commit 9cc8a29: NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: 27a15673993edd8f7d199aa9ff3ed90b066207fd Pull Request resolved: #95405
Replaces #94714 that was reverted due to #94714 (comment) cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
ghstack-source-id: 638094c34df5cb164e7488179126490e85d6e6e4 Pull Request resolved: #95405
Replaces #94714 that was reverted due to #94714 (comment) cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
ghstack-source-id: fb5de9d0fa81e42ddd728971073ff4acc6ece20a Pull Request resolved: #95405
Replaces #94714 that was reverted due to #94714 (comment) cc alexsamardzic nikitaved cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
ghstack-source-id: 513500c246013bd6dce10388f137de84b5e86270 Pull Request resolved: #95405
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 2 jobs have failed, first few of them are: periodic / cuda11.7-py3.10-gcc7-sm86-periodic-dynamo-benchmarks / test (aot_eager_all, 1, 1, linux.g5.4xlarge.nvidia.gpu), periodic / linux-bionic-cuda11.7-py3-gcc7-slow-gradcheck / test (default, 2, 2, linux.4xlarge.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
Do we actually benefit from it in the long run? FWIW, #95550 (comment). |
This PR fixes gradcheck for sparse inputs under non-masked semantics while keeping the masked semantics support. So, I don't understand the question under the assumption that tensors with sparse layouts are semantically equivalent to strided tensors. Do you challenge this assumption? |
@pearu, I would argue there is a better fix, namely, to_dense should not impose any masking, and that if the masking behavior is tested, it could be done in the forward function with the the use of differentiable |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 2 jobs have failed, first few of them are: periodic / linux-bionic-cuda11.7-py3-gcc7-slow-gradcheck / test (default, 2, 2, linux.4xlarge.nvidia.gpu), periodic / cuda11.7-py3.10-gcc7-sm86-periodic-dynamo-benchmarks / test (aot_eager_all, 1, 1, linux.g5.4xlarge.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
From the discussions #95550 (comment), above, and DM with @nikitaved , there exists another approach for fixing First, we'll postulate the following requirements:
The above should be sufficient for Now, if the user-specified operation uses operations that backward use masked semantics, the user must also specify the corresponding masks that are applied within the user-specified function to inputs using |
@pytorchbot merge -m "Leftover failures are unrelated" |
❌ 🤖 pytorchbot command failed:
Try |
@pytorchbot merge -f "Leftover failures are unrelated" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pearu , I am not sure that providing masks is a good strategy:
How do we even define a Masked Semantics anyway? |
What do you mean by "perturbations that result in a different mask"? When masks are provided as inputs, these are typically bool tensors that are never perturbed. In general, only float or complex input tensors that have the I agree that the >>> def foo(x):
... y = torch.sparse.mm(x, x)
... z = torch.mm(y, x)
... return z
...
>>> t = torch.tensor([[0, 1], [2, 3]], dtype=torch.float64).to_sparse().requires_grad_()
>>> torch.autograd.gradcheck(lambda x: foo(x).to_dense(), t, masked=False)
<snip>
torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[ 4.0000, 3.0000, 6.0000, 2.0000],
[ 6.0000, 13.0000, 4.0000, 12.0000],
[ 3.0000, 1.0000, 13.0000, 6.0000],
[ 2.0000, 6.0000, 12.0000, 31.0000]], dtype=torch.float64)
analytical:tensor([[ 2., 0., 6., 0.],
[ 6., 13., 4., 12.],
[ 3., 1., 13., 6.],
[ 2., 6., 12., 31.]], dtype=torch.float64)
>>> torch.autograd.gradcheck(lambda x: foo(x).to_dense(), t, masked=True, check_sparse_nnz=True)
<snip>
torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[ 0.0000, 0.0000, 0.0000, 0.0000],
[ 6.0000, 13.0000, 4.0000, 12.0000],
[ 3.0000, 1.0000, 13.0000, 6.0000],
[ 2.0000, 6.0000, 12.0000, 31.0000]], dtype=torch.float64)
analytical:tensor([[ 2., 0., 6., 0.],
[ 6., 13., 4., 12.],
[ 3., 1., 13., 6.],
[ 2., 6., 12., 31.]], dtype=torch.float64) because Clearly, the above example makes sense only in masked semantics (because it uses operations that backwards are implemented for masked semantics). The numerical jacobian appears to be correct when the input mask is >>> def foo(x, x_mask):
... x = x.sparse_mask(x_mask)
... y = torch.sparse.mm(x, x)
... z = torch.mm(y, x)
... return z
...
>>> t_mask = torch.tensor([[0, 1], [1, 1]], dtype=torch.bool).to_sparse()
>>> torch.autograd.gradcheck(lambda *args: foo(*args).to_dense(), (t, t_mask), masked=True, check_sparse_nnz=True)
True (Btw, here I am using pytorch version that suffers from issue #95550 . When applying the patch from #95550 (comment), the example above still works.) |
Oh, I thought the point was to manipulate mask inside the gradcheck. My mistake. |
Use https://pytorch.org/docs/master/masked.html as a starting point for the answer. One can also ask "what is a tensor anyway?". A tensor is a set of values that are arranged over a regular grid where the grid nodes are indexed. The index-value pair is called a tensor element. One can define various high-level operations with tensors using operations defined on the tensor values and taking into account the regular arrangement of values on the grid. |
Understanding of "what is a tensor" is important to understand what a sparse tensor is, and what the functions that manipulate them do. Let me just gather some thoughts... A real tensor, matrix for simplicity, of size There is a clear bijection between a sparse matrix Masked Semantics, or better be called Sparse Parametrization(?) is more tricky when translating the mappings defined over dense inputs to sparse inputs, because dense mappings map full bases to full bases, not subsets of bases to subsets of bases. Recall the representation of a sparse tensor Disabling the masked semantics in
The Jacobian for My whole point above and here, @pearu, was just to understand better and advocated that this is a better way to test Masked Semantics grads than having just Although masking the result should not be necessary if the function is assumed to map to "dense" instead, provided that it projects the grads correctly onto the inputs' subspaces... So, I believe it is important to understand what Masked Semantics is, it's relationship to dense tensors, and domains/co-domains of functions that manipulate them as to understand when things are differentiable and where they are not, because with the current implementation of
So more reasons to fix |
@nikitaved thanks for the formalized discussion! I think we are mostly on the same page but I have a few notes.
|
Replaces pytorch/pytorch#94714 that was reverted due to pytorch/pytorch#94714 (comment) Pull Request resolved: pytorch/pytorch#95405 Approved by: https://github.com/albanD
👍 @pearu, thank you for your comments and this discussion! Regarding 2: I believe we were careful in defining Sparse Parametrization. Even though, under zero nse, a sparse tensor is represented as |
Replaces pytorch/pytorch#94714 that was reverted due to pytorch/pytorch#94714 (comment) Pull Request resolved: pytorch/pytorch#95405 Approved by: https://github.com/albanD
Replaces pytorch/pytorch#94714 that was reverted due to pytorch/pytorch#94714 (comment) Pull Request resolved: pytorch/pytorch#95405 Approved by: https://github.com/albanD
Replaces pytorch/pytorch#94714 that was reverted due to pytorch/pytorch#94714 (comment) Pull Request resolved: pytorch/pytorch#95405 Approved by: https://github.com/albanD
Replaces pytorch/pytorch#94714 that was reverted due to pytorch/pytorch#94714 (comment) Pull Request resolved: pytorch/pytorch#95405 Approved by: https://github.com/albanD
…torch#95405)" This reverts commit b89fda5.
) Replaces pytorch#94714 that was reverted due to pytorch#94714 (comment) Pull Request resolved: pytorch#95405 Approved by: https://github.com/albanD
Replaces #94714 that was reverted due to #94714 (comment)
Stack from ghstack (oldest at bottom):
cc @alexsamardzic @nikitaved @cpuhrsch @amjames @bhosmer @ezyang @albanD @zou3519 @gqchen @soulitzer @lezcano @Varal7