Hi, thanks for your attention.
When reading the source code of transformers, I cannot understand the implementation of _get_train_sampler in trainer.py. Why the default data sampler is RandomSampler rather than DistributedSampler? How does the trainer handle the sampler for data parallel?
reference code: https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L975