[Trainer] Add optional communication backends for torch.distributed when using GPU #22247

heya5 · 2023-03-18T17:14:04Z

What does this PR do?

Add optional backends for torch.distributed when using GPU.
I want to use other communication backends according the pytorch_distribution_tutorial, but I found Trainer only uses nccl when self.no_cuda is False
.
Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

trainer: @sgugger

HuggingFaceDocBuilderDev · 2023-03-18T17:27:36Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for your contibution!

…hen using GPU (huggingface#22247) Update training_args.py

Update training_args.py

7877f28

heya5 changed the title ~~Update training_args.py~~ Add optional communication backends for torch.distributed when using GPU Mar 18, 2023

heya5 changed the title ~~Add optional communication backends for torch.distributed when using GPU~~ [Trainer] Add optional communication backends for torch.distributed when using GPU Mar 18, 2023

sgugger approved these changes Mar 20, 2023

View reviewed changes

sgugger merged commit cf0af9a into huggingface:main Mar 20, 2023

raghavanone pushed a commit to raghavanone/transformers that referenced this pull request Apr 5, 2023

[Trainer] Add optional communication backends for torch.distributed w…

1c588fe

…hen using GPU (huggingface#22247) Update training_args.py

novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023

[Trainer] Add optional communication backends for torch.distributed w…

3a312a3

…hen using GPU (huggingface#22247) Update training_args.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Trainer] Add optional communication backends for torch.distributed when using GPU #22247

[Trainer] Add optional communication backends for torch.distributed when using GPU #22247

heya5 commented Mar 18, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 18, 2023 •

edited

Loading

sgugger left a comment

[Trainer] Add optional communication backends for torch.distributed when using GPU #22247

[Trainer] Add optional communication backends for torch.distributed when using GPU #22247

Conversation

heya5 commented Mar 18, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Mar 18, 2023 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

heya5 commented Mar 18, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 18, 2023 •

edited

Loading