-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a parallel_mode
property to TrainingArguments
#8877
Conversation
src/transformers/training_args.py
Outdated
class DistributedEnvironment(Enum): | ||
SINGLE = "single" | ||
PARALLEL = "parallel" | ||
DISTRIBUTED_PARALLEL = "distributed_parallel" | ||
TPU = "tpu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps sticking to the commonly used abbreviations: dp/dpp?
- dp = DataParallel
- ddp = DistributedDataParallel
- tpu = TPU
and "single" is too ambiguous - single gpu or single process?
Possible ideas:
- dnp = DataNotParallel (goes well with dp/dpp, but not mp/pp)
- np = NotParallel
- ds = DataSingle or DataSimple (possible confusion with DeepSpeed)
- sp = SimpleProcess (ambiguous processing unit-wise)
- ss = SinglegpuorcpuSingleprocess
- 11 = 1 cpu/gpu 1 process (variation of ss)
- pu = ProcessingUnit (G or C) (ambiguous process-wise)
- bu = Basic Unit
- one = one of cpu/gpu and of process (not an abbreviation, like the rest)
- basic = Well, basic (not an abbreviation, like the rest)
- simple = Same as basic (not an abbreviation, like the rest)
I think I like "np" the most - as it works well with dp/ddp/mp/pp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need a mode for MP (ModelParallel) and PP (PipeParallel) too. But of course these can be added when needed. Just mentioning these so that it will be easier to build a nicely mapped enum set.
- mp = ModelParallel
- pp = PipeParallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A three letter abbreviation won't be clear for users who are not super familiar with PyTorch and we generally try to avoid them in the Transformers codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but there are existing conventions - why reinvent names when we have them already. Can use the full DistributedDataParallel
, etc. if abbreviations are not fitting.
Given our discussion yesterday, I'm not sure Given that with the exception of tpu, all dp/ddp/mp/pp are SomethingParallel, should it be called I don't know anything about tpu, so it's hard for me to know where it fits. But it's probably not distributed either. And not parallel either. So perhaps we call it |
src/transformers/training_args.py
Outdated
class ParallelMode(Enum): | ||
NO = "no" | ||
NOT_DISTRIBUTED = "not_distributed" | ||
DISTRIBUTED = "distributed" | ||
TPU = "tpu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, This is a different way. Let's try to apply these:
So if self.parallel_mode == ParallelMode.DISTRIBUTED
- works
But if self.parallel_mode == ParallelMode.NO
is weird and can be confused with ParallelMode.NOT_DISTRIBUTED
. I'd rename NO
=> NOT_PARALLEL
, so if self.parallel_mode == ParallelMode.NOT_PARALLEL
is more readable, no?
does this one make sense if self.parallel_mode == TPU
(again knowing that you know TPU).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.parallel_mode == ParallelMode.TPU
works fine, ok to change NO to NOT_PARALLEL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and I assume we will add ParallelMode.PIPE
and ParallelMode.MODEL
for PP and MP, right?
distributed_env
property to TrainingArgumentsparallel_mode
property to TrainingArguments
LGTM, @sgugger! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool names :)
* Add a `distributed_env` property to TrainingArguments * Change name * Address comment
What does this PR do?
This PR adds a
distributed_env
property to theTrainingArugments
making it clear if we are in:Fixes #8858