-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Closed
Description
Feature request
When fine-tuning GraniteMoE (granite-3.1-1b-a400m-instruct) with LoRA, MoE’s experts are not added to LoRA as trainable modules. This is because the current implementation uses nn.Parameter for experts’ weight but peft's LoRA does not support nn.Parameter modules.
transformers/models/granitemoe/modeling_granitemoe.py
class GraniteMoeParallelExperts(nn.Module):
def __init__(self, num_experts: int, input_size: int, output_size: int) -> None:
"""
Initialize the GraniteMoeParallelExperts module.
The experts weights are stored in [num_experts, output_size, input_size] format. Such that it's comptible with
many MoE libraries, such as [Megablock](https://github.com/databricks/megablocks) and
[ScatterMoE](https://github.com/shawntan/scattermoe), as well as the
[MoE kernel](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/fused_moe/fused_moe.py)
used in vllm.
Args:
num_experts (int):
Number of experts.
input_size (int):
Size of the input.
output_size (int):
Size of the output.
"""
super().__init__()
self.weight = nn.Parameter(torch.empty(num_experts, output_size, input_size))
self.num_experts = num_experts
self.input_size = input_size
self.output_size = output_size
Any possibility of changing the nn.Parameter in the implementation to nn.Linear as other MoE models?
Motivation
When I fine-tuned GraniteMoE I found out that MoE’s experts are not fine-tuned. After looking around, I figured out the reason as peft's LoRA does not support nn.Parameter modules.
Your contribution
Metadata
Metadata
Assignees
Labels
No labels