-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION/HELP] ZERO3 weight modification after load #5326
Comments
In ds zero3, model parameters are partitioned at model initalization(huggingface In terms of you question, you can use the helper function model = BertLMHeadModelQF.from_pretrained(
"bert-base-uncased", config=encoder_config
)
state_dict = torch.load(model_path, map_location="cpu")
def load(module: nn.Module, prefix=""):
# because zero3 puts placeholders in model params, this context
# manager gathers (unpartitions) the params of the current layer, then loads from
# the state dict and then re-partitions them again
with deepspeed.zero.GatheredParameters(list(module.parameters(recurse=False)), modifier_rank=0):
if deepspeed.comm.get_rank() == 0:
module._load_from_state_dict(state_dict, prefix)
for name, child in module._modules.items():
if child is not None:
load(child, prefix + name + ".")
load(model, prefix="") |
Thank you so much for your reply! Your advice has been incredibly helpful, and you've truly done me a great favor. |
Hello, I met the same problem as you, may I ask if you finally solved it completely according to this method? I still have the original mistake after using this method |
I have the following problem following your method
How can I solve it |
Hello, I would like to ask how to solve the problem I encountered.
I'm training llama-vid, which provides zero2. However, due to VRAM limits, I encountered an OOM error, despite using zero2_offload. So, I used the zero3.json provided by llava. However, I encountered some problems when loading qformer. Firstly, I loaded "bert-base-uncased" through transformers:
I'm not very familiar with DeepSpeed, and I'm not sure if zero3 handles this part of the loading process. Later, when loading the pretrained qformer, an error occurs:
Error:
bert.encoder.layer.0.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([0]).
Simply, the code aims to modify some parameters again through torch load_state_dict after loading weights using transformers' from_pretrained.
Do you know how to handle this? Any help would be greatly appreciated.
The text was updated successfully, but these errors were encountered: