Skip to content

GraniteMoE’s implementation is not compatible with HF’s peft #2545

@lhkhiem28

Description

@lhkhiem28

Feature request

When fine-tuning GraniteMoE (granite-3.1-1b-a400m-instruct) with LoRA, MoE’s experts are not added to LoRA as trainable modules. This is because the current implementation uses nn.Parameter for experts’ weight but peft's LoRA does not support nn.Parameter modules.

transformers/models/granitemoe/modeling_granitemoe.py

class GraniteMoeParallelExperts(nn.Module):
    def __init__(self, num_experts: int, input_size: int, output_size: int) -> None:
        """
        Initialize the GraniteMoeParallelExperts module.
        The experts weights are stored in [num_experts, output_size, input_size] format. Such that it's comptible with
        many MoE libraries, such as [Megablock](https://github.com/databricks/megablocks) and
        [ScatterMoE](https://github.com/shawntan/scattermoe), as well as the
        [MoE kernel](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/fused_moe/fused_moe.py)
        used in vllm.
        Args:
            num_experts (int):
                Number of experts.
            input_size (int):
                Size of the input.
            output_size (int):
                Size of the output.
        """
        super().__init__()
        self.weight = nn.Parameter(torch.empty(num_experts, output_size, input_size))
        self.num_experts = num_experts
        self.input_size = input_size
        self.output_size = output_size

Any possibility of changing the nn.Parameter in the implementation to nn.Linear as other MoE models?

Motivation

When I fine-tuned GraniteMoE I found out that MoE’s experts are not fine-tuned. After looking around, I figured out the reason as peft's LoRA does not support nn.Parameter modules.

#1272

Your contribution

cc: @ArthurZucker @mayank31398

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions