Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deepspeed param check #1015

Merged
merged 5 commits into from
Feb 1, 2023
Merged

Deepspeed param check #1015

merged 5 commits into from
Feb 1, 2023

Conversation

dhar174
Copy link
Contributor

@dhar174 dhar174 commented Jan 26, 2023

On line 146, in set_module_tensor_to_device(), adding a check for deepspeed parameters in the kwargs object, and not passing them solved the error I was receiving regarding the ds parameters not being recognized by torch.nn.Parameter.new(). With my admittedly limited knowledge, it seemed to me that the kwargs are not necessary to pass in the case of using Deepspeed+ Accelerate, and this bears out since the model loaded fine with zero-3 cpu parameter and buffer offload on a single-GPU machine, and performed perfectly comprehensible inference outputs (slowly) using the GPU.

The error, in my case, was occurring here as called from accelerator's dispatch_model().

Please let me know if my thinking on this is in anyway wrong! This fix worked for me.

transformers version: 4.26.0

  • Platform: Linux-5.15.83.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
  • Python version: 3.10.6
  • Huggingface_hub version: 0.11.1
  • PyTorch version (GPU?): 1.13.1+cu117 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: Yes and no (zero-3 on single machine)

On line 146, in set_module_tensor_to_device(), adding a check for deepspeed parameters in the kwargs object, and not passing them solved the error I was receiving regarding the ds parameters not being recognized by torch.nn.Parameter.__new__(). With my admittedly limited knowledge, it seemed to me that the kwargs are not necessary to pass in the case of using Deepspeed+ Accelerate, and this bears out since the model loaded fine with zero-3 cpu parameter and buffer offload on a single-GPU machine, and performed perfectly comprehensible inference outputs (slowly) using the GPU.

The error, in my case, was occurring here as called from accelerator's dispatch_model().

Please let me know if my thinking on this is in anyway wrong! This fix worked for me. 

 `transformers` version: 4.26.0
- Platform: Linux-5.15.83.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
- Python version: 3.10.6
- Huggingface_hub version: 0.11.1
- PyTorch version (GPU?): 1.13.1+cu117 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: Yes and no (zero-3 on single machine)
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jan 26, 2023

The documentation is not available anymore as the PR was closed or merged.

@sgugger
Copy link
Collaborator

sgugger commented Jan 26, 2023

Thanks for your PR!

@younesbelkada the kwargs added to support bnb and 8-bit load seem to clash with DeepSpeed, so we should probably change the test to only pass them along when the param_cls is one of the bnb class to be on the safe side?

@younesbelkada
Copy link
Contributor

Thanks for the heads up!
I tried to run the bnb slow tests with this patch and everything seems to work fine.
This is because in transformers we call set_8bit_module_tensor_to_device before calling dispatch_model.
In dispatch_model this line:

set_module_tensor_to_device(module, name, self.execution_device)

is called recursively, hence the condition elif value is not None is never met.

In any case I agree with @sgugger and we should add a safety checker to be on the safe side

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. @dhar174 can you edit a bit the test then? Should still fix since it's a stronger one.

@@ -143,7 +143,12 @@ def set_module_tensor_to_device(
elif value is not None or torch.device(device) != module._parameters[tensor_name].device:
param_cls = type(module._parameters[tensor_name])
kwargs = module._parameters[tensor_name].__dict__
new_value = param_cls(new_value, requires_grad=old_value.requires_grad, **kwargs).to(device)
if (kwargs.get("ds_tensor") is not None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's make this check a bit stronger, like the param_cls.__name__ is bnb8 parameter class or not, since we only want to pass the kwargs for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, no problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI the bnb parameter name is Int8Params ;) !

146-150 check for Int8 arguments. If found, send the args as well as the value.
Copy link
Contributor Author

@dhar174 dhar174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple change like this look good?

@@ -143,7 +143,12 @@ def set_module_tensor_to_device(
elif value is not None or torch.device(device) != module._parameters[tensor_name].device:
param_cls = type(module._parameters[tensor_name])
kwargs = module._parameters[tensor_name].__dict__
new_value = param_cls(new_value, requires_grad=old_value.requires_grad, **kwargs).to(device)
if (param_cls.__name__ == "Int8Params"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to test param_cls.name for Int8 parameters ("Int8Params") to only pass kwargs if true, else do not send kwargs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! Can you just run make style on your branch to fix the formatting?

@@ -143,7 +143,12 @@ def set_module_tensor_to_device(
elif value is not None or torch.device(device) != module._parameters[tensor_name].device:
param_cls = type(module._parameters[tensor_name])
kwargs = module._parameters[tensor_name].__dict__
new_value = param_cls(new_value, requires_grad=old_value.requires_grad, **kwargs).to(device)
if (param_cls.__name__ == "Int8Params"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks!

@dhar174
Copy link
Contributor Author

dhar174 commented Feb 1, 2023

Apologies for that. I've run make style, and it changed 5 files. However, it still seems to fail the quality test. Did I do something wrong when using make style?

@muellerzr
Copy link
Collaborator

@dhar174 when you installed black and flake8 did you use pip install -e .[quality]? There's specific versions pinned we use :)

@dhar174
Copy link
Contributor Author

dhar174 commented Feb 1, 2023

Ah yes, that must be the issue.

@sgugger sgugger merged commit 57cbcab into huggingface:main Feb 1, 2023
@dhar174 dhar174 deleted the patch-1 branch February 1, 2023 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants