# Fine-tuning a LLM(Large Language Model) with LoRA

<!-- 在本次練習，我們要微調一個大型語言模型，具體來說，我們會將 Meta 所預訓練的語言模型 LLaMA 微調在一份對話資料集 alpaca，讓模型變得像是聊天機器人

而本次將使用的 LLaMA 模型為 7B 的版本，其擁有 70 億個模型參數，是一個相當大的模型，為了能夠順利訓練，我們還會使用 [LoRA](https://arxiv.org/abs/2106.09685) 以及 [INT8](https://arxiv.org/abs/2208.07339) 來進行訓練

另外，因為訓練需要花上很長的時間，也沒有特別對超參數進行調整，如果訓練結果不好的話不需太在意，本次練習旨在 PyTorch Lightning 及 LLM 訓練的實作 -->

In this exercise, we will fine-tune a large-scale language model. Specifically, we will fine-tune Meta's pre-trained language model LLaMA on a dialogue dataset called alpaca, aiming to make the model behave like a chatbot.

The LLaMA model used for this exercise is the 7B version, which contains 7 billion model parameters, making it a considerably large model. To ensure a smooth training process, we will also utilize [LoRA](https://arxiv.org/abs/2106.09685) and [INT8](https://arxiv.org/abs/2208.07339) techniques during training.

Furthermore, due to the long training time required and without extensive hyperparameter tuning, it's not necessary to overly concern yourself if the training results are not optimal. The main purpose of this exercise is to provide practical implementation experience with PyTorch Lightning and LLM training.

## Prerequisite

### Install Requirements

In [1]:
!pip install -qU \
  accelerate \
  bitsandbytes \
  datasets \
  lightning \
  peft \
  sentencepiece \
  transformers \
  wandb


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


### Pre-download the LLaMA weights

<!-- 正規的下載渠道應該要先填寫 [此表格](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform?usp=send_form) 並等待 Meta 批准，才得以下載 LLaMA 的原始權重，並再得到原始權重後使用 HuggingFace 的轉換腳本轉換成 HuggingFace 的格式，但為了教學方便我們直接從 [這裡](https://huggingface.co/huggyllama/llama-7b) 下載已轉換的 LLaMA 權重 -->

The official download channel requires you to first fill out [this form](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform?usp=send_form) and await approval from Meta. Only after obtaining permission from Meta can you download the raw weights of LLaMA. After acquiring the raw weights, you can use HuggingFace's conversion script to convert them into the HuggingFace format. However, for the sake of convenience in teaching, we directly download the converted LLaMA weights from [here](https://huggingface.co/huggyllama/llama-7b).

In [1]:
import torch 
from torch.utils.data import DataLoader
import lightning as L
from transformers import PreTrainedTokenizer
from datasets import load_dataset
from huggingface_hub import snapshot_download

snapshot_download('huggyllama/llama-7b', local_dir='llama-7b', ignore_patterns='*safetensors*')

2023-08-09 13:26:22.811948: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-08-09 13:26:23.369665: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:
2023-08-09 13:26:23.369737: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:


Fetching 12 files:   0%|          | 0/12 [00:00<?, ?it/s]

'/home/joeyliang/112新生訓練/Day5/llama-7b'

### Login to Weights & Bias

1. Sign-up for Weights & Bias if you don't have an account. https://wandb.ai
2. Run the cell below and follow the steps to login.
3. If you have successfully logged in, the second execution of the command will display your username.

    ```
    wandb: Currently logged in as: xxx. Use `wandb login --relogin` to force relogin
    ```

In [2]:
!wandb login
!wandb login

[34m[1mwandb[0m: Currently logged in as: [33mjoeyliang[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Currently logged in as: [33mjoeyliang[0m. Use [1m`wandb login --relogin`[0m to force relogin


## Data Preparation

<!-- 這裡撰寫了兩個類別用來實作資料處理相關的邏輯 -->

Two classes have been written here to implement the logic related to data processing.

### `DataCollatorForSupervisedFineTuning`

<!-- 可以發現它其實就是 `DataLoader` 所需的 `collate_fn`，我們在此將原始資料套上模板，再進行斷詞等動作，最終轉換成模型所需的張量 -->

It can be observed that this is essentially the `collate_fn` required for the DataLoader. Here, we apply a template to the raw data, perform tokenization, and other actions, ultimately transforming it into the tensors required by the model.

In [3]:
class DataCollatorForSupervisedFineTuning:
  def __init__(self, tokenizer: PreTrainedTokenizer):
    self.tokenizer = tokenizer

    assert 'pad_token' in self.tokenizer.special_tokens_map
    assert self.tokenizer.padding_side == 'right'

    self.template = 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n'
    self.template_wo_input = 'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\n'

  def get_prompt(self, x: dict[str, str], with_output: bool):
    if not x['input']:
      prompt = self.template_wo_input.format_map(x)
    else:
      prompt = self.template.format_map(x)

    if with_output:
      prompt += x['output'] + self.tokenizer.eos_token

    return prompt

  def __call__(self, batch: list):
    batch_text = []
    batch_prompt_length = []
    for x in batch:
      prompt = self.get_prompt(x, with_output=False)
      prompt_length = self.tokenizer(prompt, return_length=True)['length']
      batch_prompt_length.append(prompt_length)
      batch_text.append(self.get_prompt(x, with_output=True))

    batch_encoding = self.tokenizer(batch_text, return_tensors='pt', padding=True)
    batch_labels = batch_encoding['input_ids'].masked_fill(batch_encoding['input_ids'] == self.tokenizer.pad_token_id, -100)
    for i, prompt_length in enumerate(batch_prompt_length):
      batch_labels[i, :prompt_length] = -100 # Mask out the prompt to only train on the output

    return {
      **batch_encoding,
      'labels': batch_labels,
    }

### `DataModuleForSupervisedFineTuning`

As you can see, it inherits from `LightningDataModule`, so it's actually an API provided by PyTorch Lightning. However, this aspect was not mentioned in the presentation.

Its main purpose is to encapsulate all the logic related to data processing, including preprocessing, data loading, and data splitting. Here's a simple demonstration of its usage. For detailed API information, you can refer to the [documentation](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.core.LightningDataModule.html#lightning.pytorch.core.LightningDataModule).

In [4]:
class DataModuleForSupervisedFineTuning(L.LightningDataModule):
  @property
  def dataloader_kwargs(self):
    return dict(
      batch_size=self.hparams.batch_size,
      num_workers=self.hparams.num_workers,
      pin_memory=self.hparams.pin_memory,
    )

  def __init__(
    self,
    tokenizer: PreTrainedTokenizer,
    data_path: str,
    batch_size: int = 1,
    num_workers: int = 1,
    pin_memory: bool = True,
  ):
    super().__init__()

    self.save_hyperparameters(ignore=['tokenizer'])

    self.tokenizer = tokenizer

  def setup(self, stage: str | None = None):
    self.dataset = load_dataset(self.hparams.data_path)['train']
    self.dataset = self.dataset.train_test_split(0.1, seed=42)
    self.dataset['val'] = self.dataset.pop('test')

  def train_dataloader(self):
    return DataLoader(
      self.dataset['train'],
      shuffle=True,
      collate_fn=DataCollatorForSupervisedFineTuning(self.tokenizer),
      **self.dataloader_kwargs
    )

  def val_dataloader(self):
    return DataLoader(
      self.dataset['val'],
      shuffle=False,
      collate_fn=DataCollatorForSupervisedFineTuning(self.tokenizer),
      **self.dataloader_kwargs
    )

## Write The Lightning Module

<!-- !!此處需修改!!

需求:
1. 儲存超參數
2. 設定優化器為 `bitsandbytes.optim.AdamW8bit`，並將超參數 `learning_rate` 傳遞給優化器
3. 定義訓練步的邏輯，並紀錄訓練 loss 為 `Loss/Train`
4. 定義驗證步的邏輯，並紀錄驗證 loss 為 `Loss/Val`

提示:
- [`LightningModule`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.core.LightningModule.html#lightning.pytorch.core.LightningModule)
- 在 `LightningModule` 中，使用 `self.hparams` 來存取超參數
- 計算 loss
    ```python3
    output = self.model(
        input_ids=batch['input_ids'],
        attention_mask=batch['attention_mask'],
        labels=batch['labels'],
        use_cache=False,
    )
    loss = output.loss
    ``` -->

!! Modifications Required Here !!

Requirements:
1. Save hyperparameters.
2. Set the optimizer to `bitsandbytes.optim.AdamW8bit` and pass the hyperparameter `learning_rate` to the optimizer.
3. Define the training step logic and record the training loss as `Loss/Train`.
4. Define the validation step logic and record the validation loss as `Loss/Val`.

Hints:
- [`LightningModule`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.core.LightningModule.html#lightning.pytorch.core.LightningModule)
- Inside the `LightningModule`, use `self.hparams` to access hyperparameters.
- Compute loss as follows:
    ```python3
    output = self.model(
        input_ids=batch['input_ids'],
        attention_mask=batch['attention_mask'],
        labels=batch['labels'],
        use_cache=False,
    )
    loss = output.loss
    ```

In [5]:
import lightning as L
from transformers import LlamaTokenizer, LlamaForCausalLM
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, get_peft_model_state_dict, set_peft_model_state_dict


class LightningModuleForSupervisedFineTuning(L.LightningModule):
  def __init__(
    self,
    model_path: str,
    lora_r: int,
    lora_alpha: int,
    lora_dropout: float,
    lora_target_modules: list[str],
    learning_rate: float
  ):
    super().__init__()

    self.tokenizer = LlamaTokenizer.from_pretrained(model_path, legacy=False, pad_token='<pad>')

    self.model = LlamaForCausalLM.from_pretrained(
      model_path,
      torch_dtype=torch.half,
      low_cpu_mem_usage=True,
      load_in_8bit=True,
      device_map={'': 0}
    )
    self.model.resize_token_embeddings(len(self.tokenizer))
    self.model = prepare_model_for_kbit_training(self.model, use_gradient_checkpointing=True)
    self.model = get_peft_model(self.model, LoraConfig(
      task_type='CAUSAL_LM',
      r=lora_r,
      lora_alpha=lora_alpha,
      lora_dropout=lora_dropout,
      target_modules=lora_target_modules,
    ))

  def state_dict(self, **kwargs):
    return get_peft_model_state_dict(self.model, self.model.state_dict(**kwargs))

  def load_state_dict(self, state_dict, strict: bool = True):
    return set_peft_model_state_dict(self.model, state_dict)

## Training

### Construct Model

In [None]:
model = LightningModuleForSupervisedFineTuning(
    model_path='llama-7b',
    lora_r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    lora_target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj'],
    learning_rate=1e-4
)

### Construct Data Module

In [None]:
datamodule = DataModuleForSupervisedFineTuning(
    model.tokenizer,
    data_path='yahma/alpaca-cleaned',
    batch_size=1,
    num_workers=2,
)

### Set-up the Logger

<!-- !!此處須修改!!

需求：
- 使用 Wandb 作為 Logger
- 設定此次實驗(訓練)要儲存在哪個 project
- 為此次實驗(訓練)設定一個名字

提示：
[WandbLogger](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.loggers.wandb.html#module-lightning.pytorch.loggers.wandb) -->

!! Modification Required Here !!

Requirements:
- Use Wandb as the logger.
- Set which project to save this run to.
- Assign a name to this run.

Hint:
- [WandbLogger](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.loggers.wandb.html#module-lightning.pytorch.loggers.wandb)

In [None]:
logger = WandbLogger(project="llama-7b")

### Set-up Callbacks

<!-- !!此處須修改!!

需求：
1. 紀錄學習率
2. 在 validation loss 沒有持續下降時自動停止訓練
3. 每 1 epoch 儲存 1 個存檔點，並且每個 epoch 的存檔點都保留
4. 每 500 steps 儲存 1 個存檔點，但只保留最新的存檔點
5. 保留 1 個 validation loss 最低的存檔點，並將檔名設為 `val_loss=xxx`，`xxx` 為當下的 validation loss

提示：
- https://lightning.ai/docs/pytorch/stable/api_references.html#callbacks
- 5\. 會需要用到 `auto_insert_metric_name` 和 `filename` -->

!! Modification Required Here !!

Requirements:
1. Record the learning rate.
2. Automatically stop training when the validation loss does not decrease continuously.
3. Save 1 checkpoint per 1 epoch, and retain all checkpoints for each epoch.
4. Save 1 checkpoint every 500 steps, but only retain the latest checkpoint.
5. Keep the checkpoint with the lowest validation loss, and name the file as `val_loss=xxx`, where `xxx` is the current validation loss.

Hints:
- https://lightning.ai/docs/pytorch/stable/api_references.html#callbacks
- Requirement 5 will require using `auto_insert_metric_name` and `filename`.

In [None]:
from pytorch_lightning.callbacks import LearningRateMonitor, ModelCheckpoint, EarlyStopping
checkpoint_dir = 'check_point'
callbacks = [
    LearningRateMonitor(logging_interval='epoch'),
    EarlyStopping(monitor='val_loss', patience=5, mode='min'),
    ModelCheckpoint(
        dirpath=checkpoint_dir,
        filename='checkpoint_epoch_{epoch:03d}',
        save_top_k=-1,  # Save all checkpoints per epoch
        verbose=True,
        every_n_train_steps=500  # Save checkpoint per 500 steps
    ),
    ModelCheckpoint(
        dirpath=checkpoint_dir,
        filename='{val_loss:.4f}-{epoch}',
        save_top_k=1,  # Save only the best validation loss checkpoint
        monitor='val_loss',
        mode='min',
        auto_insert_metric_name=False  # Disable auto-insertion of metric name
    )
]

### Set-up the Trainer

<!-- !!此處須修改!!

需求：
1. 使用 FP16 混合精度訓練
2. 使用 Gradient Clipping，數值隨意
3. 使用 Gradient Accumulation，數值隨意
4. 設定最多訓練幾 epoch，數值隨意
5. 設定每幾步進行 1 次 validation，數值隨意

提示：
- [`Trainer`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.trainer.trainer.Trainer.html#lightning.pytorch.trainer.trainer.Trainer) -->

!! Modification Required Here !!

Requirements:
1. Use FP16 mixed precision training.
2. Apply gradient clipping with an arbitrary value.
3. Enable gradient accumulation with an arbitrary value.
4. Set the maximum number of training epochs with an arbitrary value.
5. Define how often to perform validation every few steps with an arbitrary value.

Hint:
- [`Trainer`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.trainer.trainer.Trainer.html#lightning.pytorch.trainer.trainer.Trainer)

In [None]:
trainer = L.Trainer(
    logger=logger,
    callbacks=callbacks,
)

### Start Training

<!-- 記得要去 [W&B](https://wandb.ai) 觀察訓練過程並自己操作看看 -->
Remember to visit [W&B](https://wandb.ai) to observe the training process and try it out for yourself.

In [None]:
trainer.fit(model, datamodule)

## Test the trained model

Hint: You might want to restart the kernel to free the GPU memory

In [None]:
ckpt_path = '' # Choose a checkpoint file
model = LightningModuleForSupervisedFineTuning.load_from_checkpoint(ckpt_path)

In [None]:
from transformers import GenerationConfig

def generate(model: LightningModuleForSupervisedFineTuning, prompt: str):
    x = model.tokenizer(prompt, return_tensors='pt').to(model.device)
    l = x['input_ids'].size(1)
    x = model.model.generate(**x, generation_config=GenerationConfig(max_new_tokens=32))
    x = x[:, l:]
    x = model.tokenizer.batch_decode(x, skip_special_tokens=True, clean_up_tokenization_spaces=True)[0]
    return x

In [None]:
prompts = [
    'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nRearrange the following sentence to make the sentence more interesting.\n\n### Input:\nShe left the party early\n\n### Response:\n',
    'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nLet \n f(x) = {[ -x - 3 if x ≤ 1,; x/2 + 1 if x > 1. ].\nFind the sum of all values of x such that f(x) = 0.\n\n### Response:\n',
    'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nCompose a haiku poem about a summer day.\n\n### Response:\n',
    'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat methods can be used to improve the accuracy of machine learning models?\n\n### Response:\n',
    'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nFill in the blanks to complete the sentence.\n\n### Input:\nGlobal warming can be reversed by reducing ________ and __________.\n\n### Response:\n'
]

for p in prompts:
    print(p)
    print(generate(model, p) + '\n')
    print('=' * 100)