-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
illegal memory access
for torch.sparse.mm(src, other) / deg.view(-1, 1).clamp_(min=1)
#111574
Comments
With $ git diff
diff --git a/torch_geometric/utils/spmm.py b/torch_geometric/utils/spmm.py
index d951b7e0e..dbdd80e93 100644
--- a/torch_geometric/utils/spmm.py
+++ b/torch_geometric/utils/spmm.py
@@ -96,7 +96,7 @@ def spmm(src: Adj, other: Tensor, reduce: str = "sum") -> Tensor:
deg = scatter(torch.ones_like(src.values()), src.row_indices(),
dim=0, dim_size=src.size(0), reduce='sum')
- return torch.sparse.mm(src, other) / deg.view(-1, 1).clamp_(min=1)
+ return torch.sparse.mm(src.cpu(), other.cpu()).cuda() / deg.view(-1, 1).clamp_(min=1) (that works fine), I confirm that the issue is in |
I think there is something wrong with the way sparse CSR matrix is stored. If I do something like this: torch.sparse.mm(src.to_sparse_csc(), other) / deg.view(-1, 1).clamp_(min=1) or even this: torch.sparse.mm(src.to_sparse_csc().to_sparse_csr(), other) / deg.view(-1, 1).clamp_(min=1) it works fine as well. However, the same trick using |
With the current reports, I think we can only say that something is wrong:) Doing return torch.sparse.mm(src.clone(), other) / deg.view(-1, 1).clamp_(min=1) also works fine.
$ git diff
diff --git a/torch_geometric/utils/sparse.py b/torch_geometric/utils/sparse.py
index fa15e547d..2db918108 100644
--- a/torch_geometric/utils/sparse.py
+++ b/torch_geometric/utils/sparse.py
@@ -264,7 +264,7 @@ def to_torch_csr_tensor(
adj = torch.sparse_csr_tensor(
crow_indices=index2ptr(edge_index[0], size[0]),
col_indices=edge_index[1],
- values=edge_attr,
+ values=edge_attr.contiguous(),
size=tuple(size) + edge_attr.size()[1:],
device=edge_index.device,
) |
The issue was caused by the fact that
|
@pearu thanks for the fix :). How soon do you expect the fix PRs linked above to land? if it could be by the end of this week it would probably be simpler to have @Aidyn-A cherry pick the fix into our pytorch containers instead of me changing pyg to use the temporary workaround and then reverting it when the real fix lands. |
@puririshi98 The PR is now approved and is in the process of merging. If CI's will be all green, the PR should land shortly. |
@puririshi98 can you please clarify whether or not this is a regression? (I.e. can you run the same sample using torch-2.0.0 and pass or will it crash as well?) |
FWIW, from reading pytorch_geometric code, the crash should occur with torch-2.0 as well. |
i just confirmed the same issue occurs w/ torch2.0 |
…Descriptor (pytorch#111742) Fixes pytorch#111574 Pull Request resolved: pytorch#111742 Approved by: https://github.com/amjames, https://github.com/cpuhrsch
…Descriptor (pytorch#111742) Fixes pytorch#111574 Pull Request resolved: pytorch#111742 Approved by: https://github.com/amjames, https://github.com/cpuhrsch
…Descriptor (pytorch#111742) Fixes pytorch#111574 Pull Request resolved: pytorch#111742 Approved by: https://github.com/amjames, https://github.com/cpuhrsch
Validated this issue with final rc:
|
🐛 Describe the bug
Original Issue from PyG: pyg-team/pytorch_geometric#8213
Failing example: https://github.com/pyg-team/pytorch_geometric/blob/master/examples/rev_gnn.py
Versions
cc @ezyang @gchanan @zou3519 @kadeng @alexsamardzic @nikitaved @pearu @cpuhrsch @amjames @bhosmer @ptrblck
The text was updated successfully, but these errors were encountered: