DoRA: Weight-Decomposed Low-Rank Adaptation

Reproducing results in the paper

I have written a detailed blog post for this repo: https://shreyassk.substack.com/p/visualising-dora-weight-decomposed

This work is inspired from Sebastian Raschka's blog on DoRA - https://magazine.sebastianraschka.com/p/lora-and-dora-from-scratch

This repository has the code to reproduce results of Llama-2-7B for Cleaned Alpaca Instruction Dataset task specified in research paper Checkpoints of LoRA and DoRA are stored at intermediate steps which are later used to visualize direction a nd magnitude updates of query and value weight matrices

The official implementation of DoRA can be found here and is now fully supported by HuggingFace PEFT.

Research Paper - https://arxiv.org/abs/2402.09353

DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing DoRA, the learning capacity and training stability of LoRA can be enhanced while avoiding any additional inference overhead. DoRA consistently outperforms LoRA

DoRA hyperparameters settings

While fine-tuning with DoRA by utilizing the configuration of LoRA can already achieve better results most of the time, achieving optimal performance compared to LoRA still requires adjustments to the hyperparameters.

Start with a slightly lower learning rate than that of LoRA, and users may also experiment with varying lora dropout ratios.

Can also start with half of the rank of the LoRA configuration which oftentime can already results in comparable or even superior accuracy compared to that of LoRA.

Steps to execute the code

HuggingFace PEFT and other dependencies DoRA is now supported by the Huggingface PEFT package. You can install the PEFT package using

pip install git+https://github.com/huggingface/peft.git -q
pip install -r requirements.txt -q

Clone the repo:

git clone github.com/huggingface/trl.git

Copy this python scripts into

cp cleaned-alpaca-llama-2-dora trl/examples/scripts
cp cleaned-alpaca-llama-2-lora trl/examples/scripts

Using Deep Speed config for ZeRO1 with 4xA10 GPUs (24 GB each) with LoRA 4-bit Quantization

accelerate launch --config_file=examples/accelerate_configs/deepspeed_zero1.yaml examples/scripts/cleaned-alpaca-llama-2-lora.py

Using Deep Speed config for ZeRO1 with 4xA10 GPUs (24 GB each) with DoRA 4-bit Quantization

accelerate launch --config_file=examples/accelerate_configs/deepspeed_zero1.yaml examples/scripts/cleaned-alpaca-llama-2-dora.py

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
deepspeed_config		deepspeed_config
images		images
notebooks		notebooks
templates		templates
.gitignore		.gitignore
DORA.pdf		DORA.pdf
README.md		README.md
alpaca_utils.py		alpaca_utils.py
cleaned-alpaca-llama-2-dora.py		cleaned-alpaca-llama-2-dora.py
cleaned-alpaca-llama-2-lora.py		cleaned-alpaca-llama-2-lora.py
dora_utils.py		dora_utils.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DoRA: Weight-Decomposed Low-Rank Adaptation

Reproducing results in the paper

I have written a detailed blog post for this repo: https://shreyassk.substack.com/p/visualising-dora-weight-decomposed

DoRA hyperparameters settings

Steps to execute the code

Visualizations of Magnitude and Directional Components in LoRA and DoRA

Query Weight Matrix - LoRA (Correlation = 94%)

Value Weight Matrix - LoRA (Correlation = 83%)

Query Weight Matrix - DoRA (Correlation = -62%)

Value Weight Matrix - DoRA (Correlation = -69%)

About

Releases

Packages

Languages

shreyassks/DoRA

Folders and files

Latest commit

History

Repository files navigation

DoRA: Weight-Decomposed Low-Rank Adaptation

Reproducing results in the paper

I have written a detailed blog post for this repo: https://shreyassk.substack.com/p/visualising-dora-weight-decomposed

DoRA hyperparameters settings

Steps to execute the code

Visualizations of Magnitude and Directional Components in LoRA and DoRA

Query Weight Matrix - LoRA (Correlation = 94%)

Value Weight Matrix - LoRA (Correlation = 83%)

Query Weight Matrix - DoRA (Correlation = -62%)

Value Weight Matrix - DoRA (Correlation = -69%)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages