[QUESTION/HELP] ZERO3 weight modification after load #5326

xxtars · 2024-03-28T04:30:56Z

Hello, I would like to ask how to solve the problem I encountered.

I'm training llama-vid, which provides zero2. However, due to VRAM limits, I encountered an OOM error, despite using zero2_offload. So, I used the zero3.json provided by llava. However, I encountered some problems when loading qformer. Firstly, I loaded "bert-base-uncased" through transformers:

mm_model = BertLMHeadModelQF.from_pretrained(
    "bert-base-uncased", config=encoder_config
)

I'm not very familiar with DeepSpeed, and I'm not sure if zero3 handles this part of the loading process. Later, when loading the pretrained qformer, an error occurs:

self.vlm_att_projector.load_state_dict(get_w(att_projector_weights, 'vlm_att_projector'))

Error: bert.encoder.layer.0.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([0]).

Simply, the code aims to modify some parameters again through torch load_state_dict after loading weights using transformers' from_pretrained.
Do you know how to handle this? Any help would be greatly appreciated.

The text was updated successfully, but these errors were encountered:

mklf · 2024-03-28T06:52:01Z

In ds zero3, model parameters are partitioned at model initalization(huggingface from_pretrained wraped the zeor3 init process), original paramerters are placeholders, keep reference to the partitioned tensor, so it is 0 sized.

In terms of you question, you can use the helper function deepspeed.zero.GatheredParameters, here is a example usage:

model = BertLMHeadModelQF.from_pretrained(
    "bert-base-uncased", config=encoder_config
)

state_dict = torch.load(model_path, map_location="cpu")

def load(module: nn.Module, prefix=""):
    # because zero3 puts placeholders in model params, this context
    # manager gathers (unpartitions) the params of the current layer, then loads from
    # the state dict and then re-partitions them again
    with deepspeed.zero.GatheredParameters(list(module.parameters(recurse=False)), modifier_rank=0):
        if deepspeed.comm.get_rank() == 0:
            module._load_from_state_dict(state_dict, prefix)

    for name, child in module._modules.items():
        if child is not None:
            load(child, prefix + name + ".")

load(model, prefix="")

xxtars · 2024-03-28T08:24:51Z

Thank you so much for your reply!

Your advice has been incredibly helpful, and you've truly done me a great favor.

szbcasia · 2024-05-16T15:25:12Z

Hello, I met the same problem as you, may I ask if you finally solved it completely according to this method? I still have the original mistake after using this method

szbcasia · 2024-05-16T15:36:14Z

In ds zero3, model parameters are partitioned at model initalization(huggingface from_pretrained wraped the zeor3 init process), original paramerters are placeholders, keep reference to the partitioned tensor, so it is 0 sized.

In terms of you question, you can use the helper function deepspeed.zero.GatheredParameters, here is a example usage:

model = BertLMHeadModelQF.from_pretrained(
    "bert-base-uncased", config=encoder_config
)

state_dict = torch.load(model_path, map_location="cpu")

def load(module: nn.Module, prefix=""):
    # because zero3 puts placeholders in model params, this context
    # manager gathers (unpartitions) the params of the current layer, then loads from
    # the state dict and then re-partitions them again
    with deepspeed.zero.GatheredParameters(list(module.parameters(recurse=False)), modifier_rank=0):
        if deepspeed.comm.get_rank() == 0:
            module._load_from_state_dict(state_dict, prefix)

    for name, child in module._modules.items():
        if child is not None:
            load(child, prefix + name + ".")

load(model, prefix="")

I have the following problem following your method

TypeError: Module._load_from_state_dict() missing 5 required positional arguments: 'local_metadata', 'strict', 'missing_keys', 'unexpected_keys', and 'error_msgs'

How can I solve it

xxtars closed this as completed Mar 28, 2024

xxtars mentioned this issue Mar 29, 2024

Zero-3 offload support dvlab-research/LLaMA-VID#60

Open

xxtars mentioned this issue Apr 7, 2024

About ZERO3 dvlab-research/LLaMA-VID#75

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION/HELP] ZERO3 weight modification after load #5326

[QUESTION/HELP] ZERO3 weight modification after load #5326

xxtars commented Mar 28, 2024

mklf commented Mar 28, 2024 •

edited

xxtars commented Mar 28, 2024

szbcasia commented May 16, 2024

szbcasia commented May 16, 2024

[QUESTION/HELP] ZERO3 weight modification after load #5326

[QUESTION/HELP] ZERO3 weight modification after load #5326

Comments

xxtars commented Mar 28, 2024

mklf commented Mar 28, 2024 • edited

xxtars commented Mar 28, 2024

szbcasia commented May 16, 2024

szbcasia commented May 16, 2024

mklf commented Mar 28, 2024 •

edited