PyTorch-Distributed-Tutorials

Detailed blog on various Distributed Training startegies can be read here.

To train standalone PyTorch script run:

python train.py

To train DataParallel PyTorch script run:

python train_dataparallel.py

To train DistributedDataParallel(DDP) PyTorch script run:

torchrun --nnodes=1 --nproc-per-node=4 train_ddp.py

To train FullyShardedDataParallel(FSDP) PyTorch script run:

torchrun --nnodes=1 --nproc-per-node=4 train_fsdp.py