-
Notifications
You must be signed in to change notification settings - Fork 25.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable removing shared tensors by default #28630
Comments
Hi @imoneoi 1- Warn users if they are using DS to not save their model with --> in general we encourage users to use safetensors, so I would say option 2 might be the best solution here Would you be happy to open a PR with one of these solutions ? cc @amyeroberts @pacman100 @muellerzr what do you think |
Hmmm I think what @imoneoi is reporting is a different issue than what you're describing @younesbelkada, namely that We're definitely aiming for this to be frictionless, so the more insights we have in the code that fails, the better we'll be able to help. Thanks @muellerzr for the minimal reproducer on the other thread, I'm pasting it below:
cc @Narsil if you have the bandwidth to take a look, this looks like it's impacting quite a few deepspeed users. Thanks a lot 🙌 |
Temporary solution: set |
Did look up, and this snippet works for me with all latest revisions. (accelerate, deepspeed, transformers) |
Hello,
Observations:
with config:
output:
Possible Solutions: |
I think the reproducer from Zach needs fixes. With below change to only call import torch
from accelerate import Accelerator
from accelerate.utils import DeepSpeedPlugin, HfDeepSpeedConfig
from transformers import AutoModelForCausalLM
from transformers.modeling_utils import unwrap_model
transformers_config = HfDeepSpeedConfig({
"train_micro_batch_size_per_gpu": 2,
"gradient_accumulation_steps": 2,
"gradient_clipping": 1.0,
"offload_optimizer_device": None,
"offload_param_device": None,
"zero3_init_flag": False,
"zero_optimization": {
"stage": 3,
"stage3_gather_16bit_weights_on_model_save": True
},
})
plugin = DeepSpeedPlugin(transformers_config)
accelerator = Accelerator(deepspeed_plugin=plugin)
model_name = "bert-base-cased"
model = AutoModelForCausalLM.from_pretrained(model_name)
opt = torch.optim.Adam(model.parameters(), lr=1e-5)
model, opt = accelerator._prepare_deepspeed(model, opt)
state_dict = accelerator.get_state_dict(model)
+ if accelerator.is_main_process:
model = unwrap_model(model)
model.save_pretrained(
"remove",
state_dict=state_dict,
safe_serialization=True
) |
@pacman100 @younesbelkada Thanks for your observations! Should we consider disabling |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
Who can help?
@younesbelkada @Narsil
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Minimal reproduction on DeepSpeed can be found at #27293 where disabling safe_serialization solves this issue.
Related (DeepSpeed): #27293
Expected behavior
Consider disabling removing shared tensors by default in https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2409-L2452. This piece of code determines shared tensors through storage locations, but there are many cases that tensors are views of a large tensor, thus sharing the same location.
One example is when
q_proj
,k_proj
, andv_proj
are views ofqkv_proj
, and also DeepSpeed ZeRO, where all parameters are views of a large flat tensor. We've observed failures in both cases.Besides, not removing shared tensors will not usually cause a large storage overhead as common shared tensors (such as tied embeddings) take up only a small fraction of the total parameters.
The text was updated successfully, but these errors were encountered: