Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deepspeed enhancements and fixes #676

Merged
merged 4 commits into from Sep 6, 2022

Conversation

pacman100
Copy link
Collaborator

What does this PR do?

  1. Addresses feature request of https://discuss.huggingface.co/t/when-using-deepspeed-why-do-i-need-to-pass-dataloaders-to-the-accelerator-prepare/22432/2
  2. Addresses issue of https://discuss.huggingface.co/t/deepspeed-deepspeed-config-file-path-is-lowered-case/22464/2
  3. As deepspeed handles loss scaling by gradient_accumulation_steps in its backward, fixing the related bug.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 3, 2022

The documentation is not available anymore as the PR was closed or merged.

@pacman100 pacman100 changed the title /deepspeed enhancements and fixes deepspeed enhancements and fixes Sep 3, 2022
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all those fixes!

@pacman100 pacman100 merged commit 200546c into huggingface:main Sep 6, 2022
@pacman100 pacman100 deleted the samngrul/deepspeed-fixes branch September 6, 2022 12:04
@pacman100 pacman100 mentioned this pull request Sep 6, 2022
@stas00
Copy link
Contributor

stas00 commented Nov 17, 2022

Hmm, I tried to remove the fake dataloader workaround that was discussed at https://discuss.huggingface.co/t/when-using-deepspeed-why-do-i-need-to-pass-dataloaders-to-the-accelerator-prepare/22432

and it's super cumbersome. It appears that the only way to get to the batch size is from dataloader? why can't it be derived from a batch_size argument? or is it by design that batch_size is derived from dataloader?

Specifically to this PR:

  1. The original idea behind auto was to try to make the ds config file as simple as possible wrt to hardcoding values and have command line args set these things once. So this doesn't work:
    "train_micro_batch_size_per_gpu": "auto",

To remind, the intention of creating auto-values in DS config was to avoid mismatch of the same configuration value used by different sub-systems. So the way I was thinking when integrating DS in HF Trainer, is let's have one place where batch_size is set and then all other sub-systems would inherit that value, rather than relying on the user to remember to change the same value in many places. I hope this makes sense. I don't understand the design here so I'm surely missing something important.

I was trying to remove the originally used workaround
https://github.com/huggingface/m4/pull/610
but I think the workaround with the dummy dataloader is a way user-friendlier than the hardcoded train_micro_batch_size_per_gpu config value.

  1. Also while this feature that supports train_micro_batch_size_per_gpu config was added, the end user has no idea it's supported w/o reading the source code - the error message is still:

You must specify a training or evaluation dataloader in accelerate.prepare() when using DeepSpeed.

Should it perhaps say:

when using DeepSpeed accelerate.prepare() requires you to pass at least one of training or evaluation dataloaders or alternatively set an integer value in train_micro_batch_size_per_gpu in the deepspeed config file.

or something of a kind? otherwise the feature is there but nobody knows about it.

The API doc could also say that, except it's private so there is no documentation.

Thank you for reading.

@pacman100
Copy link
Collaborator Author

Hello Stas,

  1. Inline with the intention of using auto values, we use the dataloader to infer the batch_size, one reason reason being that accelerator.prepare function or accelerator object doesn't have access to command line arguments, i.e., batch_size argument from the main code. Another reason being that the single place where batch_size gets set is while the user creates dataloaders as part of conventional training, e.g., the user can have batch_size argument which they may modify because of certain custom logic and the modified batch_size now gets used to create dataloaders. As we have no control over training loop/code unlike Trainer, it makes sense to infer batch_size directly from dataloaders.

Would the alternative suggestion work?
AcceleratorState().deepspeed_plugin.deepspeed_config["train_micro_batch_size_per_gpu"]=my_batch_size before calling accelerator.prepare()?

  1. Thank you for the pointer on the error message, I will update it.

@stas00
Copy link
Contributor

stas00 commented Nov 17, 2022

Hi Sourab!

It appears that accelerator.prepare relies on the deepspeed_plugin.* settings - if those were already parsed so that any autos get replaced with the proper values then it'd have access to the correct setting, no? or is it still too early in the game - perhaps it's a matter of an order of execution?

Perhaps there should be another wrapper that a user should call explicitly for deepspeed with args like bs early in the code, so that no auto values remain and then it'd be easy to rely on the actual values later on. All the values should be available early on. e.g. in HF Trainer we only had to wait for later for num_training_steps:
https://github.com/huggingface/transformers/blob/0a144b8c6bd16f7b08119a233f8e7cbd33fe5bfd/src/transformers/deepspeed.py#L167

Would the alternative suggestion work?
AcceleratorState().deepspeed_plugin.deepspeed_config["train_micro_batch_size_per_gpu"]=my_batch_size before calling accelerator.prepare()?

That would definitely work.

My first reaction is that suggestion could potentially be much more problematic should the user set the value in the ds config file and it might be an unexpected override (even though if written correctly it should be the same value). Somehow this feels like replacing one hack with another hack.

I think the dummy dataset wrapped dataloader is a much cleaner way over the above, especially if the code isn't necessarily always using the deepspeed backend.

If this is the best that can be done, and there is no simpler way, let's just leave it as is.

@pacman100
Copy link
Collaborator Author

pacman100 commented Nov 17, 2022

Perhaps there should be another wrapper that a user should call explicitly for deepspeed with args like bs early in the code, so that no auto values remain and then it'd be easy to rely on the actual values later on

There is already a way to do this deepspeed_plugin.deepspeed_config_process(**kwargs) (Process the DeepSpeed config with the values from the kwargs.): https://github.com/huggingface/accelerate/blob/main/src/accelerate/utils/dataclasses.py#L417
Example below:

if AcceleratorState().deepspeed_plugin is not None:
  kwargs = {
              "fp16.enabled": True,
              "optimizer.params.lr": 5e-5,
              "optimizer.params.weight_decay": 0.0,
              "scheduler.params.warmup_min_lr": 0.0,
              "scheduler.params.warmup_max_lr": 5e-5,
              "scheduler.params.warmup_num_steps": 0,
              "train_micro_batch_size_per_gpu": my_batch_size,
              "gradient_clipping": 1.0,
          } 
  AcceleratorState().deepspeed_plugin.deepspeed_config_process(must_match=True, **kwargs)

# call `accelerator.prepare` below `_` are just dummy placeholders 
_, _, _ = accelerator.prepare(_, _, _)

...

@stas00
Copy link
Contributor

stas00 commented Nov 17, 2022

should deepspeed_config_process call that last accelerator.prepare command internally? This looks super-weird with _ input and _ output args, unless you meant something else there.

@pacman100
Copy link
Collaborator Author

pacman100 commented Nov 18, 2022

Hello @stas00 , in the example above, it is the user code, I was just mentioning/showcasing that deepspeed_config_process should be called before accelerator.prepare. As mentioned, deepspeed_config_process only sets auto values from the given kwargs dict, doesn't internally call accelerator.prepare at all and those _ were placeholders 😅.

@stas00
Copy link
Contributor

stas00 commented Nov 18, 2022

Thank you for clarifying, @pacman100! It's crystal clear now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants