Add sparse tensors support to dataloader. #112842

pearu · 2023-11-03T10:57:59Z

Fixes #106837

Stack from ghstack (oldest at bottom):

-> Add sparse tensors support to dataloader. #112842

cc @alexsamardzic @nikitaved @cpuhrsch @amjames @bhosmer

[ghstack-poisoned]

pytorch-bot · 2023-11-03T10:58:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112842

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f9b9a4b with merge base 826ab0e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 8354ec8b237636db4683caee2830194e42dfeb88 Pull Request resolved: #112842

cpuhrsch · 2023-11-03T18:10:29Z

Great! Thank you.

Fixes #106837 cc alexsamardzic nikitaved cpuhrsch amjames bhosmer [ghstack-poisoned]

ghstack-source-id: d179a299a94ccf34035f1cac51330c3c262395f0 Pull Request resolved: #112842

gokulavasan · 2023-11-06T18:42:54Z

torch/utils/data/_utils/collate.py

+    if elem.layout in {torch.sparse_coo, torch.sparse_csr, torch.sparse_bsr, torch.sparse_csc, torch.sparse_bsc}:
+        raise RuntimeError(
+            "Batches of sparse tensors are not currently supported by the default collate_fn; "
+            "please provide a custom collate_fn to handle them appropriately."


Do users commonly know how to handle sparse tensors in a custom collate_fn?

Do users commonly know how to handle sparse tensors in a custom collate_fn?

I think there is no definite answer. However, I trust that when users work with sparse tensors, they also are skilled to implement a custom collate_fn that meets their application requirements re how to stack batches of sparse tensors together. Some options are listed below.

IIUC, in terms of strided tensors, the purpose of collate function is to stack the strided batches together while users are provided a way to define their own stacking method.

In terms of sparse tensors, there also exists multiple approaches to stack batches of sparse tensors. For instance, stacking of COO tensors could mean stacking of indices and values to form a tensor with higher sparse dimension than its batches have. Or, if the indices of all COO batches are the same, one would want to stack only the values to form a hybrid tensor with extra dense dimensions.

Similarly, there exists multiple stacking methods for batches of sparse compressed tensors. For instance, one could create a sparse compressed tensor with a batch dimension, or if indices of all batches match, one could stack only the values by introducing an extra dense dimension.

gokulavasan · 2023-11-06T18:44:56Z

torch/utils/data/_utils/collate.py

@@ -158,6 +158,11 @@ def collate_tensor_fn(batch, *, collate_fn_map: Optional[Dict[Union[Type, Tuple[
            "Batches of nested tensors are not currently supported by the default collate_fn; "
            "please provide a custom collate_fn to handle them appropriately."
        )
+    if elem.layout in {torch.sparse_coo, torch.sparse_csr, torch.sparse_bsr, torch.sparse_csc, torch.sparse_bsc}:


Is this set of sparse types available elsewhere - curious to know if this can expand in the future and would require updating all instances of such sets?

No, we have not created a public API that would provide all sparse layouts as a set.

While various tools have implemented such a set (e.g.,

pytorch/torch/sparse/__init__.py

Line 522 in c6f435b

sparse_layouts = {torch.sparse_coo, torch.sparse_csr, torch.sparse_csc, torch.sparse_bsr, torch.sparse_bsc}

), they are typically private to the corresponding tools.

On the other hand, if there will be a new sparse layout added (which btw, is a very rare event considering the high threshold of introducing new sparse layouts), relying on such a set of sparse layouts here would not avoid the need to update dataloader support for new sparse layouts because the support relies on the details of sparse tensor implementations.

cpuhrsch · 2023-11-14T17:27:40Z

@gokulavasan - Do you have time to take another look?

pearu · 2023-11-19T09:39:46Z

@pytorchbot merge

pytorchmergebot · 2023-11-19T09:42:15Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-11-19T09:59:19Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-12-py3-arm64 / test (default, 1, 3, macos-m1-12)

Details for Dev Infra team

Raised by workflow job

Fixes #106837 cc alexsamardzic nikitaved cpuhrsch amjames bhosmer [ghstack-poisoned]

ghstack-source-id: 6304fb430cae40dbe4fa33346567c601ee096f6c Pull Request resolved: #112842

Fixes #106837 cc alexsamardzic nikitaved cpuhrsch amjames bhosmer [ghstack-poisoned]

ghstack-source-id: 8fdaa4b3b7f5347b1956b9b175f1df34ee0c4a4b Pull Request resolved: #112842

pearu · 2023-11-19T15:57:34Z

@pytorchbot merge

pytorchmergebot · 2023-11-19T15:59:30Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Add sparse tensors support to dataloader.

580eea9

[ghstack-poisoned]

pearu requested a review from ejguan as a code owner November 3, 2023 10:58

pytorch-bot bot added the release notes: dataloader release notes category label Nov 3, 2023

pearu added a commit that referenced this pull request Nov 3, 2023

Add sparse tensors support to dataloader.

bc6936e

ghstack-source-id: 8354ec8b237636db4683caee2830194e42dfeb88 Pull Request resolved: #112842

pearu added module: sparse Related to torch.sparse open source topic: new features topic category labels Nov 3, 2023

pearu requested review from cpuhrsch and amjames November 3, 2023 11:01

pearu added this to In progress in Sparse tensors via automation Nov 3, 2023

pearu linked an issue Nov 3, 2023 that may be closed by this pull request

Multiprocess DataLoader doesn't work with sparse tensor as it'll try to access the underlying storage #106837

Closed

pearu mentioned this pull request Nov 3, 2023

Multiprocess DataLoader doesn't work with sparse tensor as it'll try to access the underlying storage #106837

Closed

cpuhrsch approved these changes Nov 3, 2023

View reviewed changes

Sparse tensors automation moved this from In progress to Reviewer approved Nov 3, 2023

Update on "Add sparse tensors support to dataloader."

5080fd1

Fixes #106837 cc alexsamardzic nikitaved cpuhrsch amjames bhosmer [ghstack-poisoned]

pearu added a commit that referenced this pull request Nov 4, 2023

Add sparse tensors support to dataloader.

9b65ce7

ghstack-source-id: d179a299a94ccf34035f1cac51330c3c262395f0 Pull Request resolved: #112842

cpuhrsch requested a review from albanD November 6, 2023 17:36

gokulavasan reviewed Nov 6, 2023

View reviewed changes

pearu requested a review from gokulavasan November 7, 2023 11:50

gokulavasan approved these changes Nov 19, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 19, 2023

pytorchmergebot added the merging label Nov 19, 2023

pytorchmergebot removed the merging label Nov 19, 2023

Update on "Add sparse tensors support to dataloader."

1175f4d

Fixes #106837 cc alexsamardzic nikitaved cpuhrsch amjames bhosmer [ghstack-poisoned]

pearu added a commit that referenced this pull request Nov 19, 2023

Add sparse tensors support to dataloader.

1715eec

ghstack-source-id: 6304fb430cae40dbe4fa33346567c601ee096f6c Pull Request resolved: #112842

Update on "Add sparse tensors support to dataloader."

f9b9a4b

Fixes #106837 cc alexsamardzic nikitaved cpuhrsch amjames bhosmer [ghstack-poisoned]

pearu added a commit that referenced this pull request Nov 19, 2023

Add sparse tensors support to dataloader.

5d24686

ghstack-source-id: 8fdaa4b3b7f5347b1956b9b175f1df34ee0c4a4b Pull Request resolved: #112842

pytorchmergebot added the merging label Nov 19, 2023

pytorchmergebot added the Merged label Nov 19, 2023

pytorchmergebot removed the merging label Nov 19, 2023

pytorchmergebot closed this in 0bd4d1f Nov 19, 2023

Sparse tensors automation moved this from Reviewer approved to Done Nov 19, 2023

facebook-github-bot deleted the gh/pearu/130/head branch November 23, 2023 15:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sparse tensors support to dataloader. #112842

Add sparse tensors support to dataloader. #112842

pearu commented Nov 3, 2023 •

edited

pytorch-bot bot commented Nov 3, 2023 •

edited

cpuhrsch commented Nov 3, 2023

gokulavasan Nov 6, 2023

pearu Nov 7, 2023

gokulavasan Nov 6, 2023

pearu Nov 7, 2023

cpuhrsch commented Nov 14, 2023

pearu commented Nov 19, 2023

pytorchmergebot commented Nov 19, 2023

pytorchmergebot commented Nov 19, 2023

pearu commented Nov 19, 2023

pytorchmergebot commented Nov 19, 2023

Add sparse tensors support to dataloader. #112842

Add sparse tensors support to dataloader. #112842

Conversation

pearu commented Nov 3, 2023 • edited

pytorch-bot bot commented Nov 3, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112842

✅ No Failures

cpuhrsch commented Nov 3, 2023

gokulavasan Nov 6, 2023

Choose a reason for hiding this comment

pearu Nov 7, 2023

Choose a reason for hiding this comment

gokulavasan Nov 6, 2023

Choose a reason for hiding this comment

pearu Nov 7, 2023

Choose a reason for hiding this comment

cpuhrsch commented Nov 14, 2023

pearu commented Nov 19, 2023

pytorchmergebot commented Nov 19, 2023

Merge started

pytorchmergebot commented Nov 19, 2023

Merge failed

pearu commented Nov 19, 2023

pytorchmergebot commented Nov 19, 2023

Merge started

pearu commented Nov 3, 2023 •

edited

pytorch-bot bot commented Nov 3, 2023 •

edited