New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sparse tensors support to dataloader. #112842
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112842
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f9b9a4b with merge base 826ab0e (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: 8354ec8b237636db4683caee2830194e42dfeb88 Pull Request resolved: #112842
Great! Thank you. |
Fixes #106837 cc alexsamardzic nikitaved cpuhrsch amjames bhosmer [ghstack-poisoned]
ghstack-source-id: d179a299a94ccf34035f1cac51330c3c262395f0 Pull Request resolved: #112842
if elem.layout in {torch.sparse_coo, torch.sparse_csr, torch.sparse_bsr, torch.sparse_csc, torch.sparse_bsc}: | ||
raise RuntimeError( | ||
"Batches of sparse tensors are not currently supported by the default collate_fn; " | ||
"please provide a custom collate_fn to handle them appropriately." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do users commonly know how to handle sparse tensors in a custom collate_fn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do users commonly know how to handle sparse tensors in a custom collate_fn?
I think there is no definite answer. However, I trust that when users work with sparse tensors, they also are skilled to implement a custom collate_fn that meets their application requirements re how to stack batches of sparse tensors together. Some options are listed below.
IIUC, in terms of strided tensors, the purpose of collate function is to stack the strided batches together while users are provided a way to define their own stacking method.
In terms of sparse tensors, there also exists multiple approaches to stack batches of sparse tensors. For instance, stacking of COO tensors could mean stacking of indices and values to form a tensor with higher sparse dimension than its batches have. Or, if the indices of all COO batches are the same, one would want to stack only the values to form a hybrid tensor with extra dense dimensions.
Similarly, there exists multiple stacking methods for batches of sparse compressed tensors. For instance, one could create a sparse compressed tensor with a batch dimension, or if indices of all batches match, one could stack only the values by introducing an extra dense dimension.
@@ -158,6 +158,11 @@ def collate_tensor_fn(batch, *, collate_fn_map: Optional[Dict[Union[Type, Tuple[ | |||
"Batches of nested tensors are not currently supported by the default collate_fn; " | |||
"please provide a custom collate_fn to handle them appropriately." | |||
) | |||
if elem.layout in {torch.sparse_coo, torch.sparse_csr, torch.sparse_bsr, torch.sparse_csc, torch.sparse_bsc}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this set of sparse types available elsewhere - curious to know if this can expand in the future and would require updating all instances of such sets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we have not created a public API that would provide all sparse layouts as a set.
While various tools have implemented such a set (e.g.,
pytorch/torch/sparse/__init__.py
Line 522 in c6f435b
sparse_layouts = {torch.sparse_coo, torch.sparse_csr, torch.sparse_csc, torch.sparse_bsr, torch.sparse_bsc} |
On the other hand, if there will be a new sparse layout added (which btw, is a very rare event considering the high threshold of introducing new sparse layouts), relying on such a set of sparse layouts here would not avoid the need to update dataloader support for new sparse layouts because the support relies on the details of sparse tensor implementations.
@gokulavasan - Do you have time to take another look? |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / macos-12-py3-arm64 / test (default, 1, 3, macos-m1-12) Details for Dev Infra teamRaised by workflow job |
Fixes #106837 cc alexsamardzic nikitaved cpuhrsch amjames bhosmer [ghstack-poisoned]
ghstack-source-id: 6304fb430cae40dbe4fa33346567c601ee096f6c Pull Request resolved: #112842
Fixes #106837 cc alexsamardzic nikitaved cpuhrsch amjames bhosmer [ghstack-poisoned]
ghstack-source-id: 8fdaa4b3b7f5347b1956b9b175f1df34ee0c4a4b Pull Request resolved: #112842
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Fixes #106837
Stack from ghstack (oldest at bottom):
cc @alexsamardzic @nikitaved @cpuhrsch @amjames @bhosmer