Skip to content

Latest commit

 

History

History
51 lines (35 loc) · 1.31 KB

datasets.rst

File metadata and controls

51 lines (35 loc) · 1.31 KB

composer.datasets

composer.datasets

DataloaderHparams contains the torch.utils.data.dataloader settings that are common across both training and eval datasets:

  • num_workers
  • prefetch_factor
  • persistent_workers
  • pin_memory
  • timeout

Each DatasetHparams is then responsible for returning a DataloaderSpec, which is a NamedTuple of dataset-specific settings such as:

  • dataset
  • drop_last
  • shuffle
  • collate_fn

This indirection (instead of directly creating the dataloader at the start) is needed because for multi-GPU training, dataloaders require the global rank to initialize their torch.utils.data.distributed.DistributedSampler.

As a result, our trainer uses the DataloaderSpec and DataloaderHparams to create the dataloaders after DDP has forked the processes.

Base Classes and Hyperparameters

DataloaderHparams DataloaderSpec DatasetHparams

Datasets

MNISTDatasetHparams CIFAR10DatasetHparams ImagenetDatasetHparams LMDatasetHparams SyntheticDatasetHparams BratsDatasetHparams