GraniteMoE’s implementation is not compatible with HF’s peft

### Feature request

When fine-tuning GraniteMoE (granite-3.1-1b-a400m-instruct) with LoRA, MoE’s experts are not added to LoRA as trainable modules. This is because the current implementation uses nn.Parameter for experts’ weight but peft's LoRA does not support nn.Parameter modules.

transformers/models/granitemoe/modeling_granitemoe.py
```
class GraniteMoeParallelExperts(nn.Module):
    def __init__(self, num_experts: int, input_size: int, output_size: int) -> None:
        """
        Initialize the GraniteMoeParallelExperts module.
        The experts weights are stored in [num_experts, output_size, input_size] format. Such that it's comptible with
        many MoE libraries, such as [Megablock](https://github.com/databricks/megablocks) and
        [ScatterMoE](https://github.com/shawntan/scattermoe), as well as the
        [MoE kernel](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/fused_moe/fused_moe.py)
        used in vllm.
        Args:
            num_experts (int):
                Number of experts.
            input_size (int):
                Size of the input.
            output_size (int):
                Size of the output.
        """
        super().__init__()
        self.weight = nn.Parameter(torch.empty(num_experts, output_size, input_size))
        self.num_experts = num_experts
        self.input_size = input_size
        self.output_size = output_size
```

Any possibility of changing the nn.Parameter in the implementation to nn.Linear as other MoE models?

### Motivation

When I fine-tuned GraniteMoE I found out that MoE’s experts are not fine-tuned. After looking around, I figured out the reason as peft's LoRA does not support nn.Parameter modules.

https://github.com/huggingface/peft/issues/1272

### Your contribution

cc: @ArthurZucker @mayank31398 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GraniteMoE’s implementation is not compatible with HF’s peft #2545

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GraniteMoE’s implementation is not compatible with HF’s peft #2545

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions