deepspeed enhancements and fixes #676

pacman100 · 2022-09-03T06:47:12Z

What does this PR do?

Addresses feature request of https://discuss.huggingface.co/t/when-using-deepspeed-why-do-i-need-to-pass-dataloaders-to-the-accelerator-prepare/22432/2
Addresses issue of https://discuss.huggingface.co/t/deepspeed-deepspeed-config-file-path-is-lowered-case/22464/2
As deepspeed handles loss scaling by gradient_accumulation_steps in its backward, fixing the related bug.

HuggingFaceDocBuilderDev · 2022-09-03T06:50:27Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for all those fixes!

stas00 · 2022-11-17T02:58:14Z

Hmm, I tried to remove the fake dataloader workaround that was discussed at https://discuss.huggingface.co/t/when-using-deepspeed-why-do-i-need-to-pass-dataloaders-to-the-accelerator-prepare/22432

and it's super cumbersome. It appears that the only way to get to the batch size is from dataloader? why can't it be derived from a batch_size argument? or is it by design that batch_size is derived from dataloader?

Specifically to this PR:

The original idea behind auto was to try to make the ds config file as simple as possible wrt to hardcoding values and have command line args set these things once. So this doesn't work:

    "train_micro_batch_size_per_gpu": "auto",

To remind, the intention of creating auto-values in DS config was to avoid mismatch of the same configuration value used by different sub-systems. So the way I was thinking when integrating DS in HF Trainer, is let's have one place where batch_size is set and then all other sub-systems would inherit that value, rather than relying on the user to remember to change the same value in many places. I hope this makes sense. I don't understand the design here so I'm surely missing something important.

I was trying to remove the originally used workaround
https://github.com/huggingface/m4/pull/610
but I think the workaround with the dummy dataloader is a way user-friendlier than the hardcoded train_micro_batch_size_per_gpu config value.

Also while this feature that supports train_micro_batch_size_per_gpu config was added, the end user has no idea it's supported w/o reading the source code - the error message is still:

You must specify a training or evaluation dataloader in accelerate.prepare() when using DeepSpeed.

Should it perhaps say:

when using DeepSpeed accelerate.prepare() requires you to pass at least one of training or evaluation dataloaders or alternatively set an integer value in train_micro_batch_size_per_gpu in the deepspeed config file.

or something of a kind? otherwise the feature is there but nobody knows about it.

The API doc could also say that, except it's private so there is no documentation.

Thank you for reading.

pacman100 · 2022-11-17T05:26:10Z

Hello Stas,

Inline with the intention of using auto values, we use the dataloader to infer the batch_size, one reason reason being that accelerator.prepare function or accelerator object doesn't have access to command line arguments, i.e., batch_size argument from the main code. Another reason being that the single place where batch_size gets set is while the user creates dataloaders as part of conventional training, e.g., the user can have batch_size argument which they may modify because of certain custom logic and the modified batch_size now gets used to create dataloaders. As we have no control over training loop/code unlike Trainer, it makes sense to infer batch_size directly from dataloaders.

Would the alternative suggestion work?
AcceleratorState().deepspeed_plugin.deepspeed_config["train_micro_batch_size_per_gpu"]=my_batch_size before calling accelerator.prepare()?

Thank you for the pointer on the error message, I will update it.

stas00 · 2022-11-17T05:47:44Z

Hi Sourab!

It appears that accelerator.prepare relies on the deepspeed_plugin.* settings - if those were already parsed so that any autos get replaced with the proper values then it'd have access to the correct setting, no? or is it still too early in the game - perhaps it's a matter of an order of execution?

Perhaps there should be another wrapper that a user should call explicitly for deepspeed with args like bs early in the code, so that no auto values remain and then it'd be easy to rely on the actual values later on. All the values should be available early on. e.g. in HF Trainer we only had to wait for later for num_training_steps:
https://github.com/huggingface/transformers/blob/0a144b8c6bd16f7b08119a233f8e7cbd33fe5bfd/src/transformers/deepspeed.py#L167

Would the alternative suggestion work?
AcceleratorState().deepspeed_plugin.deepspeed_config["train_micro_batch_size_per_gpu"]=my_batch_size before calling accelerator.prepare()?

That would definitely work.

My first reaction is that suggestion could potentially be much more problematic should the user set the value in the ds config file and it might be an unexpected override (even though if written correctly it should be the same value). Somehow this feels like replacing one hack with another hack.

I think the dummy dataset wrapped dataloader is a much cleaner way over the above, especially if the code isn't necessarily always using the deepspeed backend.

If this is the best that can be done, and there is no simpler way, let's just leave it as is.

pacman100 · 2022-11-17T06:16:54Z

Perhaps there should be another wrapper that a user should call explicitly for deepspeed with args like bs early in the code, so that no auto values remain and then it'd be easy to rely on the actual values later on

There is already a way to do this deepspeed_plugin.deepspeed_config_process(**kwargs) (Process the DeepSpeed config with the values from the kwargs.): https://github.com/huggingface/accelerate/blob/main/src/accelerate/utils/dataclasses.py#L417
Example below:

if AcceleratorState().deepspeed_plugin is not None:
  kwargs = {
              "fp16.enabled": True,
              "optimizer.params.lr": 5e-5,
              "optimizer.params.weight_decay": 0.0,
              "scheduler.params.warmup_min_lr": 0.0,
              "scheduler.params.warmup_max_lr": 5e-5,
              "scheduler.params.warmup_num_steps": 0,
              "train_micro_batch_size_per_gpu": my_batch_size,
              "gradient_clipping": 1.0,
          } 
  AcceleratorState().deepspeed_plugin.deepspeed_config_process(must_match=True, **kwargs)

# call `accelerator.prepare` below `_` are just dummy placeholders 
_, _, _ = accelerator.prepare(_, _, _)

...

stas00 · 2022-11-17T18:42:46Z

should deepspeed_config_process call that last accelerator.prepare command internally? This looks super-weird with _ input and _ output args, unless you meant something else there.

pacman100 · 2022-11-18T00:08:45Z

Hello @stas00 , in the example above, it is the user code, I was just mentioning/showcasing that deepspeed_config_process should be called before accelerator.prepare. As mentioned, deepspeed_config_process only sets auto values from the given kwargs dict, doesn't internally call accelerator.prepare at all and those _ were placeholders 😅.

stas00 · 2022-11-18T00:10:55Z

Thank you for clarifying, @pacman100! It's crystal clear now.

pacman100 added 2 commits September 3, 2022 12:00

deepspeed enhancement and fixes

4223c7e

refactor code

3aa297b

pacman100 requested a review from sgugger September 3, 2022 06:47

pacman100 changed the title ~~/deepspeed enhancements and fixes~~ deepspeed enhancements and fixes Sep 3, 2022

pacman100 added 2 commits September 3, 2022 13:03

🐛 fix

c60d84e

😅

0dd3861

sgugger approved these changes Sep 6, 2022

View reviewed changes

pacman100 merged commit 200546c into huggingface:main Sep 6, 2022

pacman100 deleted the samngrul/deepspeed-fixes branch September 6, 2022 12:04

pacman100 mentioned this pull request Sep 6, 2022

🐛 fix #683

Merged

pacman100 mentioned this pull request Nov 17, 2022

update deepspeed error message wrt batch_size #861

Merged

pacman100 mentioned this pull request Dec 27, 2022

raise error for duplicate accelerate config values when using deepspeed_config_file #941

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepspeed enhancements and fixes #676

deepspeed enhancements and fixes #676

pacman100 commented Sep 3, 2022

HuggingFaceDocBuilderDev commented Sep 3, 2022 •

edited

sgugger left a comment

stas00 commented Nov 17, 2022 •

edited

pacman100 commented Nov 17, 2022

stas00 commented Nov 17, 2022

pacman100 commented Nov 17, 2022 •

edited

stas00 commented Nov 17, 2022 •

edited

pacman100 commented Nov 18, 2022 •

edited

stas00 commented Nov 18, 2022

deepspeed enhancements and fixes #676

deepspeed enhancements and fixes #676

Conversation

pacman100 commented Sep 3, 2022

What does this PR do?

HuggingFaceDocBuilderDev commented Sep 3, 2022 • edited

sgugger left a comment

Choose a reason for hiding this comment

stas00 commented Nov 17, 2022 • edited

pacman100 commented Nov 17, 2022

stas00 commented Nov 17, 2022

pacman100 commented Nov 17, 2022 • edited

stas00 commented Nov 17, 2022 • edited

pacman100 commented Nov 18, 2022 • edited

stas00 commented Nov 18, 2022

HuggingFaceDocBuilderDev commented Sep 3, 2022 •

edited

stas00 commented Nov 17, 2022 •

edited

pacman100 commented Nov 17, 2022 •

edited

stas00 commented Nov 17, 2022 •

edited

pacman100 commented Nov 18, 2022 •

edited