Support deepspeed dynamo #2460

oraluben · 2024-02-18T07:44:43Z

What does this PR do?

This is a PR that tries to respect microsoft/DeepSpeed#4878 in 🤗 accelerate/transformers.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@pacman100 since it's deepspeed related, and @tohtana since you implemented the deepspeed part.

setup.py

HuggingFaceDocBuilderDev · 2024-02-26T19:06:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

pacman100

Thank you @oraluben for adding the torch compile support for DeepSpeed ✨! It would be great to have related tests in tests/deepspeed/deepspeed.py file.

…ynamo

tests/deepspeed/test_deepspeed.py

…ynamo

…rate into support-deepspeed-dynamo

pacman100 · 2024-03-15T07:42:27Z

Hello, overall comment, I get the below error when I run the below test:

pytest -sv tests/deepspeed/test_deepspeed.py -k test_basic_dynamo_run

pacman100

Hello, thank you @oraluben for the changes. Please look at the reply to your comment and the basic runs fails with compile, so we need to see first how to make this feature usable.

tests/deepspeed/test_deepspeed.py

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

oraluben · 2024-03-15T08:49:08Z

Thanks! I've committed your dynamo fix, and I'll look at the failed test.

…` to solve a warning

oraluben · 2024-03-19T03:52:28Z

This is ready to be reviewed again :) @pacman100

pacman100

Hello @oraluben, thank you for the updates to the PR, left a comment. The PR is almost good to merge. Also, I spent quite some time testing this. Here is a complete minimal example using DeepSpeed+Dynamo and the env variables suggested by you:

Clone repo https://github.com/sgugger/torchdynamo-tests
Change the config configs/dynamo_fp16.yaml to include DeepSpeed:

command_file: null
commands: null
compute_environment: LOCAL_MACHINE
deepspeed_config:
  gradient_accumulation_steps: 1
  gradient_clipping: 1.0
  offload_optimizer_device: none
  offload_param_device: none
  zero3_init_flag: true
  zero3_save_16bit_model: true
  zero_stage: 3
distributed_type: 'DEEPSPEED'
downcast_bf16: 'no'
dynamo_backend: INDUCTOR
fsdp_config: {}
gpu_ids: all
machine_rank: 0
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp16
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_name: null
tpu_zone: null
use_cpu: false

Run the below command:

TORCHDYNAMO_DEBUG_FUNCTION=forward  accelerate launch --config_file configs/dynamo_fp16.yaml scripts/text_classification.py --task_name mrpc --dynamo_backend "inductor" --batch_size 8

Output:

json = {
    "train_batch_size": 16, 
    "train_micro_batch_size_per_gpu": 8, 
    "gradient_accumulation_steps": 1, 
    "zero_optimization": {
        "stage": 3, 
        "offload_optimizer": {
            "device": "none", 
            "nvme_path": null
        }, 
        "offload_param": {
            "device": "none", 
            "nvme_path": null
        }, 
        "stage3_gather_16bit_weights_on_model_save": true
    }, 
    "gradient_clipping": 1.0, 
    "compile": {
        "enabled": true, 
        "backend": "inductor"
    }, 
    "steps_per_print": inf, 
    "fp16": {
        "enabled": false
    }, 
    "bf16": {
        "enabled": false
    }, 
    "zero_allow_untested_optimizer": true
}
...

Training Accuracy for backend inductor at epoch 0: {'accuracy': 0.7355349344978166, 'f1': 0.8169280181371624}
 67%|█████████████████████████████████████████████████████████████████████████████████████████▏                                            | 457/687 [01:49<00:44,  5.15it/s]03/20/2024 07:15:49 - INFO - accelerate.accelerator - The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.
 67%|█████████████████████████████████████████████████████████████████████████████████████████▎                                            | 458/687 [01:49<00:44,  5.16it/s]Training Accuracy for backend inductor at epoch 1: {'accuracy': 0.8681768558951966, 'f1': 0.902127659574468}
Training Accuracy for backend inductor at epoch 1: {'accuracy': 0.8681768558951966, 'f1': 0.902127659574468}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 686/687 [02:35<00:00,  5.09it/s]03/20/2024 07:16:36 - INFO - accelerate.accelerator - The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 687/687 [02:35<00:00,  5.11it/s]Training Accuracy for backend inductor at epoch 2: {'accuracy': 0.954967248908297, 'f1': 0.9664838513101768}
Training finished.
First iteration took: 17.61s
Average time after the first iteration: 201.59msTraining Accuracy for backend inductor at epoch 2: {'accuracy': 0.954967248908297, 'f1': 0.9664838513101768}

Training finished.
First iteration took: 17.84s
Average time after the first iteration: 201.58ms
[rank1]:[2024-03-20 07:16:42,607] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (8)
[rank1]:[2024-03-20 07:16:42,607] torch._dynamo.convert_frame: [WARNING]    function: 'forward' (/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:46)
[rank1]:[2024-03-20 07:16:42,607] torch._dynamo.convert_frame: [WARNING]    last reason: tensor 'L['input']' requires_grad mismatch. expected requires_grad=1
[rank1]:[2024-03-20 07:16:42,607] torch._dynamo.convert_frame: [WARNING] To log all recompilation reasons, use TORCH_LOGS="recompiles".
[rank1]:[2024-03-20 07:16:42,607] torch._dynamo.convert_frame: [WARNING] To diagnose recompilation issues, see https://pytorch.org/docs/master/compile/troubleshooting.html.
[rank0]:[2024-03-20 07:16:42,628] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (8)
[rank0]:[2024-03-20 07:16:42,628] torch._dynamo.convert_frame: [WARNING]    function: 'forward' (/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:46)
[rank0]:[2024-03-20 07:16:42,628] torch._dynamo.convert_frame: [WARNING]    last reason: tensor 'L['input']' requires_grad mismatch. expected requires_grad=1
[rank0]:[2024-03-20 07:16:42,628] torch._dynamo.convert_frame: [WARNING] To log all recompilation reasons, use TORCH_LOGS="recompiles".
[rank0]:[2024-03-20 07:16:42,628] torch._dynamo.convert_frame: [WARNING] To diagnose recompilation issues, see https://pytorch.org/docs/master/compile/troubleshooting.html.
03/20/2024 07:16:45 - INFO - accelerate.accelerator - The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.
Evaluation finished.
First iteration took: 7.44s
Average time after the first iteration: 77.50ms
Evaluation finished.
First iteration took: 6.44s
Average time after the first iteration: 119.29ms
Test Accuracy for backend inductor: {'accuracy': 0.8525, 'f1': 0.8952042628774423}
Test Accuracy for backend inductor: {'accuracy': 0.8525, 'f1': 0.8952042628774423}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 687/687 [02:45<00:00,  4.16it/s]

Speed-up:

Dynamo	FP16
z3+no	188.64ms/111.11ms
z3+inductor	201.58ms/119.29ms

Don't see any savings at all 😅.

pacman100 · 2024-03-20T06:01:51Z

tests/deepspeed/test_deepspeed.py

+                            # dynamo itself has some issue, use below to only compile `forward` for testing.
+                            # On deepspeed side, `deepspeed.util.z3_leaf_module.[un]set_z3_leaf_modules` is used for similar issue
+                            # that user want to compile/skip specific module.
+                            "TORCHDYNAMO_DEBUG_FUNCTION": "forward",


Why is this required? The above comment is unclear. A detailed explanation in this PR and a slightly detailed comment would help to know what needs to be done to get DeepSpeed+Compile working.

Dynamo still don't support all python op and may cause graph break and even failure during compile. However I didn't dive into the detail of the failure and use this whitelist to tell dynamo to only compile forward function, which will also break a module into several functions. This is why you didn't see improvement comparing inductor and no dynamo.

Based on my experience using dynamo with large model, user should specify which module they want to compile as one, which can be done with mentioned deepspeed api with user code modified. The focus of the test in this PR is on determining if dynamo is enabled, and I did not evaluate its performance.

In our internal scenario, dynamo can perform a ~10% speedup if compile each LLamaDecodeLayer.

pacman100 · 2024-03-20T06:44:18Z

Without using the env variable TORCHDYNAMO_DEBUG_FUNCTION=forward, I get the following error:

File "/raid/sourab/transformers/src/transformers/models/bert/modeling_bert.py", line 286, in forward
        mixed_query_layer = self.query(hidden_states)result = forward_call(*args, **kwargs)

  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 655, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state)
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 727, in _convert_frame
    return self._call_impl(*args, **kwargs)
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = inner_convert(frame, cache_entry, hooks, frame_state)
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 383, in _convert_frame_assert
    compiled_product = _compile(
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 665, in _compile
    result = forward_call(*args, **kwargs)
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
    raise InternalTorchDynamoError(str(e)).with_traceback(
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 646, in _compile
    return F.linear(input, self.weight, self.bias)
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 655, in catch_errors
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 244, in time_wrapper
    return callback(frame, cache_entry, hooks, frame_state)
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 727, in _convert_frame
    r = func(*args, **kwargs)
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 626, in compile_inner
    result = inner_convert(frame, cache_entry, hooks, frame_state)
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 383, in _convert_frame_assert
    check_fn = CheckFunctionManager(
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/guards.py", line 1011, in __init__
    compiled_product = _compile(
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 665, in _compile
        raise InternalTorchDynamoError(str(e)).with_traceback(guard.create(builder)

  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 646, in _compile
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_guards.py", line 246, in create
    return self.create_fn(builder, self)
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/guards.py", line 448, in CONSTANT_MATCH
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 244, in time_wrapper
    val = self.get(guard.name)
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/guards.py", line 258, in get
    r = func(*args, **kwargs)
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 626, in compile_inner
    return eval(name, self.scope, CLOSURE_VARS)
  File "<string>", line 1, in <module>
    check_fn = CheckFunctionManager(
  File "/raid/sourab/miniconda3/envs/hf/lib/python3.10/site-packages/torch/_dynamo/guards.py", line 1011, in __init__
torch._dynamo.exc.InternalTorchDynamoError: type object 'FunctionMeta' has no attribute 'forward'


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

cc @tohtana and @tjruwase in case you have idea about this and the steps to overcome this.

tohtana · 2024-03-27T18:29:46Z

Hi @oraluben, @pacman100, Thank you for your report! Sorry for my late response.

We found that the error is caused by no_grad for the evaluation. Currently we reuse the compiled model even after the grad mode is changed. I thought dynamo automatically recompiles a model when necessary, but it seems that it is not always the case.

I will try to fix this by compiling again when the grad mode is changed.

tohtana · 2024-03-28T00:58:44Z

Hi @oraluben, @pacman100,
I found that torch recompiles the model when grad mode is changed. Actually we have the two following issues.

Custom linear module
DeepSpeed enables its custom linear module when Z3 is enabled. However, it does not work with torch.compile. So we have disabled the module when torch.compile is enabled. DeepSpeed disables it in Init() and checks the compile is enabled or not. We expect enabled in the compile config is boolean but auto is passed to Init ('compile': {'enabled': 'auto', 'backend': 'auto'}). Then DeepSpeed doesn't disable the custom function.
It seems auto is set at

accelerate/src/accelerate/utils/dataclasses.py

Line 818 in a3ce1df

config["compile"] = {"enabled": "auto", "backend": "auto"}

Is this an expected behavior? On the other hand, DeepSpeedEngine receives 'compile': {'enabled': True, 'backend': 'inductor'}. Can we pass the same to Init()?
Z3 hook function
Dynamo fails to compile one of functions in Z3 hook. We can exclude the function from the compilation target as in Disable compile for Z3 hook function microsoft/DeepSpeed#5325. (We already excluded some other functions for Z3 hook)

I forcibly disable the custom Linear function and disabled the Z3 hook function, and the above example worked.
Can you give us a suggestion for the first one? If it works, we can merge microsoft/DeepSpeed#5325 for the second one.

oraluben · 2024-04-17T09:31:54Z

We expect enabled in the compile config is boolean but auto is passed to Init ('compile': {'enabled': 'auto', 'backend': 'auto'}).

That sounds like I'm initializing the config in wrong place, can you give some advice about the proper way? @pacman100

On the other hand, I'm submitting this patch in torch: pytorch/pytorch#124273. I think it's safe to land this if the patch goes into torch.

…rate into support-deepspeed-dynamo

tjruwase · 2024-04-30T20:29:17Z

@umchand, FYI

github-actions · 2024-05-25T15:06:33Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

oraluben · 2024-05-28T01:49:38Z

not stale, still working on that

github-actions · 2024-06-21T15:09:36Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

oraluben added 2 commits February 18, 2024 15:39

Use dynamo support in deepspeed.

6824b82

Unpin deepspeed.

2763315

oraluben force-pushed the support-deepspeed-dynamo branch from 5f23187 to 2763315 Compare February 18, 2024 07:46

oraluben added 2 commits February 18, 2024 17:39

fix

3813b52

style

d9ac1e0

oraluben marked this pull request as ready for review February 18, 2024 09:45

muellerzr reviewed Feb 26, 2024

View reviewed changes

setup.py Outdated Show resolved Hide resolved

fix version

1dc9582

pacman100 reviewed Feb 27, 2024

View reviewed changes

oraluben added 2 commits February 29, 2024 15:14

fix current fail and start dynamo test

be23852

update test

893fab1

oraluben requested a review from pacman100 February 29, 2024 10:36

oraluben added 2 commits February 29, 2024 18:37

Merge remote-tracking branch 'upstream/main' into support-deepspeed-d…

6492f54

…ynamo

update

893c580

oraluben commented Feb 29, 2024

View reviewed changes

tests/deepspeed/test_deepspeed.py Show resolved Hide resolved

oraluben and others added 3 commits March 9, 2024 11:04

format

9cbe3fc

Merge remote-tracking branch 'upstream/main' into support-deepspeed-d…

64b78c8

…ynamo

Merge branch 'support-deepspeed-dynamo' of github.com:oraluben/accele…

70e2b2c

…rate into support-deepspeed-dynamo

pacman100 reviewed Mar 15, 2024

View reviewed changes

tests/deepspeed/test_deepspeed.py Show resolved Hide resolved

Fix deepspeed when dynamo is passed via cmd arg.

c250bc7

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

oraluben added 2 commits March 19, 2024 11:42

Use is_torch_bf16_gpu_available to replace `is_torch_bf16_available…

246f51b

…` to solve a warning

fix deepspeed dynamo test, only compile forward function

b7c5924

oraluben force-pushed the support-deepspeed-dynamo branch from 485ce27 to b7c5924 Compare March 19, 2024 03:45

oraluben added 2 commits March 19, 2024 13:01

format again

ae7c51d

add comment

a3ce1df

oraluben force-pushed the support-deepspeed-dynamo branch from 397e301 to a3ce1df Compare March 19, 2024 06:10

pacman100 reviewed Mar 20, 2024

View reviewed changes

tohtana mentioned this pull request Mar 28, 2024

Disable compile for Z3 hook function microsoft/DeepSpeed#5325

Draft

This was referenced Apr 17, 2024

[REQUEST] Add torchdynamo disable decorators to graph-break on collectives microsoft/DeepSpeed#3150

Open

Skip deepspeed and triton in dynamo pytorch/pytorch#124273

Open

Merge branch 'main' into support-deepspeed-dynamo

452ae5d

oraluben added 2 commits April 30, 2024 17:05

Merge branch 'main' into support-deepspeed-dynamo

6bc4618

Merge branch 'support-deepspeed-dynamo' of github.com:oraluben/accele…

f6a775e

…rate into support-deepspeed-dynamo

github-actions bot closed this Jun 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support deepspeed dynamo #2460

Support deepspeed dynamo #2460

oraluben commented Feb 18, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 26, 2024

pacman100 left a comment •

edited

Loading

pacman100 commented Mar 15, 2024

pacman100 left a comment

oraluben commented Mar 15, 2024

oraluben commented Mar 19, 2024

pacman100 left a comment

pacman100 Mar 20, 2024

oraluben Mar 20, 2024 •

edited

Loading

oraluben Mar 20, 2024

pacman100 commented Mar 20, 2024

tohtana commented Mar 27, 2024

tohtana commented Mar 28, 2024

oraluben commented Apr 17, 2024

tjruwase commented Apr 30, 2024

github-actions bot commented May 25, 2024

oraluben commented May 28, 2024

github-actions bot commented Jun 21, 2024

Support deepspeed dynamo #2460

Support deepspeed dynamo #2460

Conversation

oraluben commented Feb 18, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Feb 26, 2024

pacman100 left a comment • edited Loading

Choose a reason for hiding this comment

pacman100 commented Mar 15, 2024

pacman100 left a comment

Choose a reason for hiding this comment

oraluben commented Mar 15, 2024

oraluben commented Mar 19, 2024

pacman100 left a comment

Choose a reason for hiding this comment

pacman100 Mar 20, 2024

Choose a reason for hiding this comment

oraluben Mar 20, 2024 • edited Loading

Choose a reason for hiding this comment

oraluben Mar 20, 2024

Choose a reason for hiding this comment

pacman100 commented Mar 20, 2024

tohtana commented Mar 27, 2024

tohtana commented Mar 28, 2024

oraluben commented Apr 17, 2024

tjruwase commented Apr 30, 2024

github-actions bot commented May 25, 2024

oraluben commented May 28, 2024

github-actions bot commented Jun 21, 2024

oraluben commented Feb 18, 2024 •

edited

Loading

pacman100 left a comment •

edited

Loading

oraluben Mar 20, 2024 •

edited

Loading