Memory Consumption of DistributedSampler too large when deal with huge datasets

## 🐛 Bug



## To Reproduce

Steps to reproduce the behavior:

1. create a dataset with large __len__
1. use DistributedSampler as its sampler
1. it could be find that `.tolist()` operation would cause many times of memory than original torch.Tensor object



## Expected behavior


We can delete `to_list` operation and write a simply iterator to prevent huge memory consumption:

## Environment

Please copy and paste the output from our
[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)
(or fill out the checklist below manually).

You can get the script and run it with:
```
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
```

 - PyTorch Version (e.g., 1.0):
 - OS (e.g., Linux):
 - How you installed PyTorch (`conda`, `pip`, source):
 - Build command you used (if compiling from source):
 - Python version:
 - CUDA/cuDNN version:
 - GPU models and configuration:
 - Any other relevant information:

## Additional context




cc @SsnL @VitalyFedyunin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory Consumption of DistributedSampler too large when deal with huge datasets #45427

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory Consumption of DistributedSampler too large when deal with huge datasets #45427

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions