Deepspeed param check #1015

dhar174 · 2023-01-26T17:29:16Z

On line 146, in set_module_tensor_to_device(), adding a check for deepspeed parameters in the kwargs object, and not passing them solved the error I was receiving regarding the ds parameters not being recognized by torch.nn.Parameter.new(). With my admittedly limited knowledge, it seemed to me that the kwargs are not necessary to pass in the case of using Deepspeed+ Accelerate, and this bears out since the model loaded fine with zero-3 cpu parameter and buffer offload on a single-GPU machine, and performed perfectly comprehensible inference outputs (slowly) using the GPU.

The error, in my case, was occurring here as called from accelerator's dispatch_model().

Please let me know if my thinking on this is in anyway wrong! This fix worked for me.

transformers version: 4.26.0

Platform: Linux-5.15.83.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python version: 3.10.6
Huggingface_hub version: 0.11.1
PyTorch version (GPU?): 1.13.1+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: Yes and no (zero-3 on single machine)

On line 146, in set_module_tensor_to_device(), adding a check for deepspeed parameters in the kwargs object, and not passing them solved the error I was receiving regarding the ds parameters not being recognized by torch.nn.Parameter.__new__(). With my admittedly limited knowledge, it seemed to me that the kwargs are not necessary to pass in the case of using Deepspeed+ Accelerate, and this bears out since the model loaded fine with zero-3 cpu parameter and buffer offload on a single-GPU machine, and performed perfectly comprehensible inference outputs (slowly) using the GPU. The error, in my case, was occurring here as called from accelerator's dispatch_model(). Please let me know if my thinking on this is in anyway wrong! This fix worked for me. `transformers` version: 4.26.0 - Platform: Linux-5.15.83.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 - Python version: 3.10.6 - Huggingface_hub version: 0.11.1 - PyTorch version (GPU?): 1.13.1+cu117 (True) - Tensorflow version (GPU?): not installed (NA) - Flax version (CPU?/GPU?/TPU?): not installed (NA) - Jax version: not installed - JaxLib version: not installed - Using GPU in script?: Yes - Using distributed or parallel set-up in script?: Yes and no (zero-3 on single machine)

HuggingFaceDocBuilderDev · 2023-01-26T17:33:46Z

The documentation is not available anymore as the PR was closed or merged.

sgugger · 2023-01-26T18:33:33Z

Thanks for your PR!

@younesbelkada the kwargs added to support bnb and 8-bit load seem to clash with DeepSpeed, so we should probably change the test to only pass them along when the param_cls is one of the bnb class to be on the safe side?

younesbelkada · 2023-01-29T13:39:37Z

Thanks for the heads up!
I tried to run the bnb slow tests with this patch and everything seems to work fine.
This is because in transformers we call set_8bit_module_tensor_to_device before calling dispatch_model.
In dispatch_model this line:

set_module_tensor_to_device(module, name, self.execution_device)

is called recursively, hence the condition elif value is not None is never met.

In any case I agree with @sgugger and we should add a safety checker to be on the safe side

sgugger

Noted. @dhar174 can you edit a bit the test then? Should still fix since it's a stronger one.

sgugger · 2023-01-30T14:31:03Z

src/accelerate/utils/modeling.py

@@ -143,7 +143,12 @@ def set_module_tensor_to_device(
        elif value is not None or torch.device(device) != module._parameters[tensor_name].device:
            param_cls = type(module._parameters[tensor_name])
            kwargs = module._parameters[tensor_name].__dict__
-            new_value = param_cls(new_value, requires_grad=old_value.requires_grad, **kwargs).to(device)
+            if (kwargs.get("ds_tensor") is not None):


So let's make this check a bit stronger, like the param_cls.__name__ is bnb8 parameter class or not, since we only want to pass the kwargs for them.

Yea, no problem.

FYI the bnb parameter name is Int8Params ;) !

146-150 check for Int8 arguments. If found, send the args as well as the value.

dhar174

Simple change like this look good?

dhar174 · 2023-01-31T21:51:43Z

src/accelerate/utils/modeling.py

@@ -143,7 +143,12 @@ def set_module_tensor_to_device(
        elif value is not None or torch.device(device) != module._parameters[tensor_name].device:
            param_cls = type(module._parameters[tensor_name])
            kwargs = module._parameters[tensor_name].__dict__
-            new_value = param_cls(new_value, requires_grad=old_value.requires_grad, **kwargs).to(device)
+            if (param_cls.__name__ == "Int8Params"):


Changed to test param_cls.name for Int8 parameters ("Int8Params") to only pass kwargs if true, else do not send kwargs

Perfect, thanks!

sgugger

Thanks for iterating! Can you just run make style on your branch to fix the formatting?

sgugger · 2023-02-01T14:15:45Z

src/accelerate/utils/modeling.py

@@ -143,7 +143,12 @@ def set_module_tensor_to_device(
        elif value is not None or torch.device(device) != module._parameters[tensor_name].device:
            param_cls = type(module._parameters[tensor_name])
            kwargs = module._parameters[tensor_name].__dict__
-            new_value = param_cls(new_value, requires_grad=old_value.requires_grad, **kwargs).to(device)
+            if (param_cls.__name__ == "Int8Params"):


Perfect, thanks!

dhar174 · 2023-02-01T15:53:59Z

Apologies for that. I've run make style, and it changed 5 files. However, it still seems to fail the quality test. Did I do something wrong when using make style?

muellerzr · 2023-02-01T15:59:24Z

@dhar174 when you installed black and flake8 did you use pip install -e .[quality]? There's specific versions pinned we use :)

dhar174 · 2023-02-01T16:01:59Z

Ah yes, that must be the issue.

muellerzr requested a review from sgugger January 26, 2023 17:50

sgugger reviewed Jan 30, 2023

View reviewed changes

146-150 check for Int8 arguments

08da285

146-150 check for Int8 arguments. If found, send the args as well as the value.

dhar174 commented Jan 31, 2023

View reviewed changes

sgugger approved these changes Feb 1, 2023

View reviewed changes

dhar174 added 2 commits February 1, 2023 10:07

Merge branch 'huggingface:main' into patch-1

e07a146

Used make style on branch

45e9535

Used make style with correct versions of black and flake8 on branch

bfe1e52

sgugger merged commit 57cbcab into huggingface:main Feb 1, 2023

dhar174 deleted the patch-1 branch February 1, 2023 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepspeed param check #1015

Deepspeed param check #1015

dhar174 commented Jan 26, 2023

HuggingFaceDocBuilderDev commented Jan 26, 2023 •

edited

sgugger commented Jan 26, 2023

younesbelkada commented Jan 29, 2023

sgugger left a comment

sgugger Jan 30, 2023

dhar174 Jan 30, 2023

younesbelkada Jan 30, 2023

dhar174 left a comment

dhar174 Jan 31, 2023

sgugger Feb 1, 2023

sgugger left a comment

sgugger Feb 1, 2023

dhar174 commented Feb 1, 2023

muellerzr commented Feb 1, 2023

dhar174 commented Feb 1, 2023

Deepspeed param check #1015

Deepspeed param check #1015

Conversation

dhar174 commented Jan 26, 2023

HuggingFaceDocBuilderDev commented Jan 26, 2023 • edited

sgugger commented Jan 26, 2023

younesbelkada commented Jan 29, 2023

sgugger left a comment

Choose a reason for hiding this comment

sgugger Jan 30, 2023

Choose a reason for hiding this comment

dhar174 Jan 30, 2023

Choose a reason for hiding this comment

younesbelkada Jan 30, 2023

Choose a reason for hiding this comment

dhar174 left a comment

Choose a reason for hiding this comment

dhar174 Jan 31, 2023

Choose a reason for hiding this comment

sgugger Feb 1, 2023

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

sgugger Feb 1, 2023

Choose a reason for hiding this comment

dhar174 commented Feb 1, 2023

muellerzr commented Feb 1, 2023

dhar174 commented Feb 1, 2023

HuggingFaceDocBuilderDev commented Jan 26, 2023 •

edited