Replies: 2 comments
-
@tjruwase can you help me with this/ |
Beta Was this translation helpful? Give feedback.
0 replies
-
Historically,
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I am new to distributed training and am using huggingface to train large models. I see many options to run distributed training. Can I know what is the difference between the following options:
python train.py .....<ARGS>
python -m torch.distributed.launch <ARGS>
deepspeed train.py <ARGS>
I did not expect option 1 to use distributed training. But it even seem to use some sort of torch distributed training? In that case, whats the difference between option 1 and option 2?
Does deepspeed use torch.distributed in the background?
Beta Was this translation helpful? Give feedback.
All reactions