<a href="https://www.kaggle.com/code/rraydata/multi-lora-example?scriptVersionId=147764913" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Fine-Tuning with Llama 2, ASPEN

Today we'll explore fine-tuning the Llama 2 model available on Kaggle Models using Multi-lora.
- Multi-lora: [ASPEN: Efficient LLM Model Fine-tune and Inference via Multi-Lora Optimization](https://github.com/TUDB-Labs/multi-lora-fine-tune#experiment-results) - ASPEN is an open-source framework for fine-tuning Large Language Models (LLMs) using the efficient multiple LoRA/QLoRA methods. Key features of ASPEN include: 1. Efficient LoRA/QLoRA: ASPEN optimizes the fine-tuning process, significantly reducing GPU memory usage by leveraging a shared frozen-based model.2. Multiple LoRA Adapters: Support for concurrent fine-tuning of multiple LoRA/qLoRA adapters.
- QloRA: [Quantized Low Rank Adapters](https://arxiv.org/abs/2305.14314): This is a method for adjusting the LLMs that uses a small number of quantified and updatable parameters to limit the complexity of the training. This technique also allows these small sets of parameters to be efficiently added to the model itself, meaning that you can make fine-tuning adjustments on many datasets, potentially, and swap these "adapters" in your model when necessary.

## 1. Clone multi-lora repository

In [1]:
import os
os.chdir('/kaggle/working/')

In [2]:
!git clone https://github.com/TUDB-Labs/multi-lora-fine-tune.git

Cloning into 'multi-lora-fine-tune'...
remote: Enumerating objects: 615, done.[K
remote: Counting objects: 100% (302/302), done.[K
remote: Compressing objects: 100% (113/113), done.[K
remote: Total 615 (delta 197), reused 250 (delta 186), pack-reused 313[K
Receiving objects: 100% (615/615), 2.21 MiB | 27.98 MiB/s, done.
Resolving deltas: 100% (327/327), done.


## 2. Install dependencies

In [3]:
!pip install -r /kaggle/working/multi-lora-fine-tune/requirements.txt

Collecting torch==2.0.1 (from -r /kaggle/working/multi-lora-fine-tune/requirements.txt (line 1))
  Downloading torch-2.0.1-cp310-cp310-manylinux1_x86_64.whl (619.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m619.9/619.9 MB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting einops==0.6.1 (from -r /kaggle/working/multi-lora-fine-tune/requirements.txt (line 2))
  Downloading einops-0.6.1-py3-none-any.whl (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate==0.21.0 (from -r /kaggle/working/multi-lora-fine-tune/requirements.txt (line 3))
  Downloading accelerate-0.21.0-py3-none-any.whl (244 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m17.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers==4.31.0 (from -r /kaggle/working/multi-lora-fine-tune/requirements.txt (line 4))
  Downloading transformers

## 3. Config finetune datasets and parameters. You can add multiple lora parameters and datasets.

ASPEN can be used on:

1) Domain-Specific Fine-Tuning:  This involves adapting a single model with various parameters particularly for one domain.

2) Cross-Domain Fine-Tuning: This approach utilizes the foundational model to optimize multiple models, each designed for diverse domains, by incorporating datasets from various or identical domains.

The demo data and prompt are for demonstration purpose.

In [4]:
!cat /kaggle/working/multi-lora-fine-tune/data/data_demo.json

[
    {
        "instruction": "Instruction demo.",
        "input": "Input demo.",
        "output": "Output demo."
    },
    {
        "instruction": "Instruction demo.",
        "output": "Output demo."
    }
]

In [5]:
!cat /kaggle/working/multi-lora-fine-tune/template/template_demo.json

{
    "description": "",
    "parameter": [
        "input",
        "output",
        "instruction"
    ],
    "prompt": "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Output:\n{output}\n",
    "prompt_no_input": "### Instruction:\n{instruction}\n\n### Output:\n{output}\n"
}

In [6]:
config_string = """
{
    "cutoff_len": 256,
    "group_by_length": false,
    "expand_right": true,
    "pad_token_id": -1,
    "save_step": 2000,
    "early_stop_test_step": 2000,
    "train_lora_candidate_num": 4,
    "train_lora_simultaneously_num": 2,
    "train_strategy": "optim",
    "lora": [
        {
            "name": "lora_0",
            "output": "lora_0",
            "optim": "adamw",
            "lr": 3e-4,
            "batch_size": 16,
            "micro_batch_size": 4,
            "test_batch_size": 64,
            "num_epochs": 3,
            "r": 8,
            "alpha": 16,
            "dropout": 0.05,
            "target_modules": {
                "q_proj": true,
                "k_proj": false,
                "v_proj": true,
                "o_proj": false,
                "w1_proj": false,
                "w2_proj": false,
                "w3_proj": false
            },
            "data": "/kaggle/working/multi-lora-fine-tune/data/data_demo.json",
            "test_data": "/kaggle/working/multi-lora-fine-tune/data/data_demo.json",
            "prompt": "/kaggle/working/multi-lora-fine-tune/template/template_demo.json"
        },
        {
            "name": "lora_1",
            "output": "lora_1",
            "optim": "adamw",
            "lr": 3e-4,
            "batch_size": 16,
            "micro_batch_size": 4,
            "test_batch_size": 64,
            "num_epochs": 3,
            "r": 8,
            "alpha": 16,
            "dropout": 0.05,
            "target_modules": {
                "q_proj": true,
                "k_proj": false,
                "v_proj": true,
                "o_proj": false,
                "w1_proj": false,
                "w2_proj": false,
                "w3_proj": false
            },
            "data": "/kaggle/working/multi-lora-fine-tune/data/data_demo.json",
            "test_data": "/kaggle/working/multi-lora-fine-tune/data/data_demo.json",
            "prompt": "/kaggle/working/multi-lora-fine-tune/template/template_demo.json"
        }
    ]
}
"""

with open("./config.json", "w") as f:
    f.write(config_string)

 ## 4. Add the path of the base model and config file path to start finetune

remember to check whether the section is on GPU. 

In [7]:
!python /kaggle/working/multi-lora-fine-tune/mlora.py \
  --base_model /kaggle/input/llama-2/pytorch/7b-hf/1 \
  --config /kaggle/working/config.json \
  --load_8bit


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
CUDA SETUP: Loading binary /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
ASPEN requires NVIDIA CUDA computing capacity. Please check your PyTorch installation.


## 5. Then two files(lora_0, lora_1) appear in the current directory, that's the finetuned model adapter. We can download them.