# DoRA: Weight-Decomposed Low-Rank Adaptation

DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to minimize the number of trainable parameters efficiently. By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. DoRA consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding.

In [1]:
# Import necessary modules
import logging
import warnings
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from peft import LoraConfig, get_peft_model

logging.getLogger("transformers.modeling_utils").setLevel(logging.ERROR)

warnings.filterwarnings("ignore")

# Function to count total and trainable parameters
def count_parameters(model):
    total_params = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    return total_params, trainable_params

# Load a small pretrained model that can run on a local MacBook
model_name = 'prajjwal1/bert-tiny'  # A tiny BERT model suitable for demonstration
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Count parameters in the base model
total_params, trainable_params = count_parameters(model)
print(f"Base Model Total Parameters: {total_params}")
print(f"Base Model Trainable Parameters: {trainable_params}\n")

# Apply LoRA to the model
lora_config = LoraConfig(
    r=4,  # Rank
    lora_alpha=32,
    target_modules=["query", "value"],
    lora_dropout=0.1,
    bias="none",
)

lora_model = get_peft_model(model, lora_config)

# Count parameters after applying LoRA
lora_total_params, lora_trainable_params = count_parameters(lora_model)
print(f"LoRA Model Total Parameters: {lora_total_params}")
print(f"LoRA Model Trainable Parameters: {lora_trainable_params}\n")

# Reset the model to the base state
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Apply DoRA to the model
dora_config = LoraConfig(
    r=4,  # Same rank as LoRA
    lora_alpha=32,
    target_modules=["query", "value"],
    lora_dropout=0.1,
    bias="none",
    use_dora=True,  # Enable DoRA
)

dora_model = get_peft_model(model, dora_config)

# Count parameters after applying DoRA
dora_total_params, dora_trainable_params = count_parameters(dora_model)
print(f"DoRA Model Total Parameters: {dora_total_params}")
print(f"DoRA Model Trainable Parameters: {dora_trainable_params}\n")

# Compare the number of trainable parameters
print(f"Difference in Trainable Parameters (DoRA - LoRA): {dora_trainable_params - lora_trainable_params}")
print(f"Percentage of Trainable Parameters in LoRA: {100 * lora_trainable_params / total_params:.4f}%")
print(f"Percentage of Trainable Parameters in DoRA: {100 * dora_trainable_params / total_params:.4f}%")


Base Model Total Parameters: 4386178
Base Model Trainable Parameters: 4386178

LoRA Model Total Parameters: 4390274
LoRA Model Trainable Parameters: 4096

DoRA Model Total Parameters: 4390786
DoRA Model Trainable Parameters: 4608

Difference in Trainable Parameters (DoRA - LoRA): 512
Percentage of Trainable Parameters in LoRA: 0.0934%
Percentage of Trainable Parameters in DoRA: 0.1051%
