Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION/HELP] ZERO3 weight modification after load #5326

Closed
xxtars opened this issue Mar 28, 2024 · 4 comments
Closed

[QUESTION/HELP] ZERO3 weight modification after load #5326

xxtars opened this issue Mar 28, 2024 · 4 comments

Comments

@xxtars
Copy link

xxtars commented Mar 28, 2024

Hello, I would like to ask how to solve the problem I encountered.

I'm training llama-vid, which provides zero2. However, due to VRAM limits, I encountered an OOM error, despite using zero2_offload. So, I used the zero3.json provided by llava. However, I encountered some problems when loading qformer. Firstly, I loaded "bert-base-uncased" through transformers:

mm_model = BertLMHeadModelQF.from_pretrained(
    "bert-base-uncased", config=encoder_config
)

I'm not very familiar with DeepSpeed, and I'm not sure if zero3 handles this part of the loading process. Later, when loading the pretrained qformer, an error occurs:

self.vlm_att_projector.load_state_dict(get_w(att_projector_weights, 'vlm_att_projector'))

Error: bert.encoder.layer.0.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([0]).

Simply, the code aims to modify some parameters again through torch load_state_dict after loading weights using transformers' from_pretrained.
Do you know how to handle this? Any help would be greatly appreciated.

@mklf
Copy link

mklf commented Mar 28, 2024

In ds zero3, model parameters are partitioned at model initalization(huggingface from_pretrained wraped the zeor3 init process), original paramerters are placeholders, keep reference to the partitioned tensor, so it is 0 sized.

In terms of you question, you can use the helper function deepspeed.zero.GatheredParameters, here is a example usage:

model = BertLMHeadModelQF.from_pretrained(
    "bert-base-uncased", config=encoder_config
)

state_dict = torch.load(model_path, map_location="cpu")

def load(module: nn.Module, prefix=""):
    # because zero3 puts placeholders in model params, this context
    # manager gathers (unpartitions) the params of the current layer, then loads from
    # the state dict and then re-partitions them again
    with deepspeed.zero.GatheredParameters(list(module.parameters(recurse=False)), modifier_rank=0):
        if deepspeed.comm.get_rank() == 0:
            module._load_from_state_dict(state_dict, prefix)

    for name, child in module._modules.items():
        if child is not None:
            load(child, prefix + name + ".")

load(model, prefix="")

@xxtars
Copy link
Author

xxtars commented Mar 28, 2024

Thank you so much for your reply!

Your advice has been incredibly helpful, and you've truly done me a great favor.

@szbcasia
Copy link

Hello, I met the same problem as you, may I ask if you finally solved it completely according to this method? I still have the original mistake after using this method

@szbcasia
Copy link

In ds zero3, model parameters are partitioned at model initalization(huggingface from_pretrained wraped the zeor3 init process), original paramerters are placeholders, keep reference to the partitioned tensor, so it is 0 sized.

In terms of you question, you can use the helper function deepspeed.zero.GatheredParameters, here is a example usage:

model = BertLMHeadModelQF.from_pretrained(
    "bert-base-uncased", config=encoder_config
)

state_dict = torch.load(model_path, map_location="cpu")

def load(module: nn.Module, prefix=""):
    # because zero3 puts placeholders in model params, this context
    # manager gathers (unpartitions) the params of the current layer, then loads from
    # the state dict and then re-partitions them again
    with deepspeed.zero.GatheredParameters(list(module.parameters(recurse=False)), modifier_rank=0):
        if deepspeed.comm.get_rank() == 0:
            module._load_from_state_dict(state_dict, prefix)

    for name, child in module._modules.items():
        if child is not None:
            load(child, prefix + name + ".")

load(model, prefix="")

I have the following problem following your method

TypeError: Module._load_from_state_dict() missing 5 required positional arguments: 'local_metadata', 'strict', 'missing_keys', 'unexpected_keys', and 'error_msgs'

How can I solve it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants