Add a `parallel_mode` property to TrainingArguments #8877

sgugger · 2020-12-01T16:20:15Z

What does this PR do?

This PR adds a distributed_env property to the TrainingArugments making it clear if we are in:

a single process (CPU or one GPU)
a parallel setting (one process but several GPUs)
a distributed parallel setting (several processes, one per GPU)
a TPU setting

stas00 · 2020-12-01T16:49:13Z

src/transformers/training_args.py

+class DistributedEnvironment(Enum):
+    SINGLE = "single"
+    PARALLEL = "parallel"
+    DISTRIBUTED_PARALLEL = "distributed_parallel"
+    TPU = "tpu"


perhaps sticking to the commonly used abbreviations: dp/dpp?

dp = DataParallel

ddp = DistributedDataParallel

tpu = TPU

and "single" is too ambiguous - single gpu or single process?

Possible ideas:

dnp = DataNotParallel (goes well with dp/dpp, but not mp/pp)

np = NotParallel

ds = DataSingle or DataSimple (possible confusion with DeepSpeed)

sp = SimpleProcess (ambiguous processing unit-wise)

ss = SinglegpuorcpuSingleprocess

11 = 1 cpu/gpu 1 process (variation of ss)

pu = ProcessingUnit (G or C) (ambiguous process-wise)

bu = Basic Unit

one = one of cpu/gpu and of process (not an abbreviation, like the rest)

basic = Well, basic (not an abbreviation, like the rest)

simple = Same as basic (not an abbreviation, like the rest)

I think I like "np" the most - as it works well with dp/ddp/mp/pp

We will need a mode for MP (ModelParallel) and PP (PipeParallel) too. But of course these can be added when needed. Just mentioning these so that it will be easier to build a nicely mapped enum set.

mp = ModelParallel

pp = PipeParallel

A three letter abbreviation won't be clear for users who are not super familiar with PyTorch and we generally try to avoid them in the Transformers codebase.

Sure, but there are existing conventions - why reinvent names when we have them already. Can use the full DistributedDataParallel, etc. if abbreviations are not fitting.

stas00 · 2020-12-01T17:22:46Z

Given our discussion yesterday, I'm not sure distributed_env is fitting. As you convinced me that DP is not distributed when it comes to pytorch conventions, if self.distributed_env == "dp" is back to being confusing.

Given that with the exception of tpu, all dp/ddp/mp/pp are SomethingParallel, should it be called parallel_mode?

I don't know anything about tpu, so it's hard for me to know where it fits. But it's probably not distributed either. And not parallel either.

So perhaps we call it compute_env

stas00 · 2020-12-01T18:03:10Z

src/transformers/training_args.py

+class ParallelMode(Enum):
+    NO = "no"
+    NOT_DISTRIBUTED = "not_distributed"
+    DISTRIBUTED = "distributed"
+    TPU = "tpu"


OK, This is a different way. Let's try to apply these:

So if self.parallel_mode == ParallelMode.DISTRIBUTED - works

But if self.parallel_mode == ParallelMode.NO is weird and can be confused with ParallelMode.NOT_DISTRIBUTED. I'd rename NO => NOT_PARALLEL, so if self.parallel_mode == ParallelMode.NOT_PARALLEL is more readable, no?

does this one make sense if self.parallel_mode == TPU (again knowing that you know TPU).

self.parallel_mode == ParallelMode.TPU works fine, ok to change NO to NOT_PARALLEL

and I assume we will add ParallelMode.PIPE and ParallelMode.MODEL for PP and MP, right?

stas00 · 2020-12-01T18:13:16Z

LGTM, @sgugger!

LysandreJik

Cool names :)

* Add a `distributed_env` property to TrainingArguments * Change name * Address comment

Add a distributed_env property to TrainingArguments

eca25ad

sgugger requested review from stas00 and LysandreJik December 1, 2020 16:20

stas00 reviewed Dec 1, 2020

View reviewed changes

Change name

64c1764

stas00 reviewed Dec 1, 2020

View reviewed changes

Address comment

04cdb3a

stas00 changed the title ~~Add a distributed_env property to TrainingArguments~~ Add a parallel_mode property to TrainingArguments Dec 1, 2020

LysandreJik approved these changes Dec 1, 2020

View reviewed changes

sgugger merged commit b08843c into master Dec 1, 2020

sgugger deleted the distributed_env branch December 1, 2020 18:46

stas00 mentioned this pull request Dec 1, 2020

[trainer] start using training_args.parallel_mode #8882

Merged

stas00 pushed a commit to stas00/transformers that referenced this pull request Dec 5, 2020

Add a parallel_mode property to TrainingArguments (huggingface#8877)

ce512d7

* Add a `distributed_env` property to TrainingArguments * Change name * Address comment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a `parallel_mode` property to TrainingArguments #8877

Add a `parallel_mode` property to TrainingArguments #8877

sgugger commented Dec 1, 2020

stas00 Dec 1, 2020 •

edited

stas00 Dec 1, 2020 •

edited

sgugger Dec 1, 2020

stas00 Dec 1, 2020 •

edited

stas00 commented Dec 1, 2020 •

edited

stas00 Dec 1, 2020 •

edited

sgugger Dec 1, 2020

stas00 Dec 1, 2020

stas00 commented Dec 1, 2020

LysandreJik left a comment

Add a parallel_mode property to TrainingArguments #8877

Add a parallel_mode property to TrainingArguments #8877

Conversation

sgugger commented Dec 1, 2020

What does this PR do?

stas00 Dec 1, 2020 • edited

Choose a reason for hiding this comment

stas00 Dec 1, 2020 • edited

Choose a reason for hiding this comment

sgugger Dec 1, 2020

Choose a reason for hiding this comment

stas00 Dec 1, 2020 • edited

Choose a reason for hiding this comment

stas00 commented Dec 1, 2020 • edited

stas00 Dec 1, 2020 • edited

Choose a reason for hiding this comment

sgugger Dec 1, 2020

Choose a reason for hiding this comment

stas00 Dec 1, 2020

Choose a reason for hiding this comment

stas00 commented Dec 1, 2020

LysandreJik left a comment

Choose a reason for hiding this comment

Add a `parallel_mode` property to TrainingArguments #8877

Add a `parallel_mode` property to TrainingArguments #8877

stas00 Dec 1, 2020 •

edited

stas00 Dec 1, 2020 •

edited

stas00 Dec 1, 2020 •

edited

stas00 commented Dec 1, 2020 •

edited

stas00 Dec 1, 2020 •

edited