Skip to content

Memory Consumption of DistributedSampler too large when deal with huge datasets #45427

@YuxianMeng

Description

@YuxianMeng

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. create a dataset with large len
  2. use DistributedSampler as its sampler
  3. it could be find that .tolist() operation would cause many times of memory than original torch.Tensor object

Expected behavior

We can delete to_list operation and write a simply iterator to prevent huge memory consumption:

Environment

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
  • PyTorch Version (e.g., 1.0):
  • OS (e.g., Linux):
  • How you installed PyTorch (conda, pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Additional context

cc @ssnl @VitalyFedyunin

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureA request for a proper, new feature.module: dataloaderRelated to torch.utils.data.DataLoader and SamplertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions