## QLora Finetuning tor product description
### fine-tuning 목적
- 상품 이름과 해당 상품의 카테고리에 대한 정보를 요청
- 해당 Description을 답변해야 함
### 실험 목적
- lora fine-tuning의 주요 하이퍼파라미터인 r, target_modules이 주는 영향을 분석

### Prepare Dataset
1. load dataset
2. field(product, category, description, text) -> field(instruction, description)
3. field(instruction, description) -> field(prompt, response)
4. field(prompt, response) -> field(text)
5. DataFrame -> Dataset

#### 1. load dataset

In [1]:
import os
from datasets import load_dataset, Dataset, concatenate_datasets
import numpy as np
import pandas as pd
import random

os.environ["HF_DATASETS_CACHE"] = "/media/shin/T7/huggingface/datasets"

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
rd_ds = load_dataset("xiyuez/red-dot-design-award-product-description", cache_dir="/media/shin/T7/huggingface/datasets")
rd_df = pd.DataFrame(rd_ds['train'])
rd_df.head(2)

Unnamed: 0,product,category,description,text
0,Biamp Rack Products,Digital Audio Processors,"“High recognition value, uniform aesthetics an...",Product Name: Biamp Rack Products;\n\nProduct ...
1,V33,Video Camera,The V33 livestreaming video camera ensures hig...,Product Name: V33;\n\nProduct Category: Video ...


In [3]:
rd_df_sample = rd_df.sample(n=5000, random_state=42)
rd_df.shape, rd_df_sample.shape

((21183, 4), (5000, 4))

#### 2. field(product, category, description, text) -> field(instruction, description)

In [4]:
rd_df_sample['instruction'] = \
    'Create a detailed description for the following product: '\
    + rd_df_sample['product']\
    +', belonging to category: '\
    + rd_df_sample['category']

In [5]:
rd_df_sample = rd_df_sample[['instruction', 'description']]

In [6]:
rd_df_sample.head(2)

Unnamed: 0,instruction,description
18952,Create a detailed description for the followin...,The CG8565 is a gaming PC offering space for h...
12584,Create a detailed description for the followin...,The iSHOXS BullBar ProX mount can be used to a...


#### 3. field(instruction, description) -> field(prompt, response)

In [7]:
template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{}

### Response:\n"""

In [8]:
rd_df_sample['prompt'] = rd_df_sample["instruction"].apply(lambda x: template.format(x))
rd_df_sample.rename(columns={"description": "response"}, inplace=True)

In [9]:
rd_df_sample['response'] = rd_df_sample['response'] +  "\n### End"
rd_df_sample = rd_df_sample[['prompt', 'response']]

In [10]:
rd_df_sample.head(2)

Unnamed: 0,prompt,response
18952,Below is an instruction that describes a task....,The CG8565 is a gaming PC offering space for h...
12584,Below is an instruction that describes a task....,The iSHOXS BullBar ProX mount can be used to a...


#### 4. field(prompt, response) -> field(text)

In [11]:
rd_df_sample['text'] = rd_df_sample["prompt"]+rd_df_sample["response"]
rd_df_sample.drop(columns=['prompt', 'response'], inplace=True)
rd_df_sample.head(2)

Unnamed: 0,text
18952,Below is an instruction that describes a task....
12584,Below is an instruction that describes a task....


In [12]:
print(rd_df_sample['text'][0])

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Create a detailed description for the following product: Biamp Rack Products, belonging to category: Digital Audio Processors

### Response:
“High recognition value, uniform aesthetics and practical scalability – this has been impressively achieved with the Biamp brand language,” the jury statement said. The previous design of the digital audio processors was not only costly to produce, but also incompatible with newer system architectures. With the new concept, the company is making a visual statement that allows for differences in dimension, connectivity and application. Design elements include consistent branding, a soft curve on the top and bottom edges, and two red bars on the left and right margins of the products. The two-part black front panel can be used for various products.
### End


#### 5. DataFrame -> Dataset

In [13]:
from datasets import Dataset
dataset = Dataset.from_pandas(rd_df_sample).train_test_split(test_size=0.05, seed=42)

### 2. Testing model performance before fine-tuning

#### Load tokenizer & model

In [29]:
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

model_path = 'openlm-research/open_llama_3b_v2'
tokenizer = LlamaTokenizer.from_pretrained(model_path, cache_dir="/media/shin/T7/huggingface/tokenizers")
model = LlamaForCausalLM.from_pretrained(
    model_path, load_in_8bit=True, device_map='auto', cache_dir="/media/shin/T7/huggingface/models"
)

tokenizer_config.json: 100%|██████████| 593/593 [00:00<00:00, 1.46MB/s]
tokenizer.model: 100%|██████████| 512k/512k [00:00<00:00, 12.3MB/s]
special_tokens_map.json: 100%|██████████| 330/330 [00:00<00:00, 859kB/s]
config.json: 100%|██████████| 506/506 [00:00<00:00, 1.03MB/s]
pytorch_model.bin: 100%|██████████| 6.85G/6.85G [05:25<00:00, 21.1MB/s]  
generation_config.json: 100%|██████████| 137/137 [00:00<00:00, 308kB/s]


#### general prompt format

In [30]:
prompt = 'Q: Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse\nA:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

generation_output = model.generate(input_ids, max_new_tokens=128)
print(tokenizer.decode(generation_output[0]))



<s>Q: Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse
A: The Corelogic Smooth Mouse is a wireless optical mouse that has a 1000 dpi resolution. It has a 2.4 GHz wireless connection and a 2.4 GHz wireless receiver. It has a 2.4 GHz wireless connection and a 2.4 GHz wireless receiver. It has a 2.4 GHz wireless connection and a 2.4 GHz wireless receiver. It has a 2.4 GHz wireless connection and a 2.4 GHz wireless receiver. It has a 2.4 GHz wireless connection and a 2.4 G


#### Alpaca prompt format

In [31]:
prompt= """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse

### Response:"""

In [33]:
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=128
)

print(tokenizer.decode(generation_output[0]))

<s>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse

### Response:


































































































































### QLora fine-tuning
#1 experiment
- Lora target_modules = target_modules = ['q_proj','v_proj']
- Lora r = 8

#### 1. Lora Config

In [1]:
from peft import LoraConfig

target_modules = ['q_proj','v_proj']

lora_config = LoraConfig(
    r=8,
    lora_alpha=8,
    lora_dropout=0.05,
    bias="none",
    target_modules = target_modules,
    task_type="CAUSAL_LM",
)

  from .autonotebook import tqdm as notebook_tqdm


#### 2. Training Arguments

In [26]:
from transformers import TrainingArguments
training_args = TrainingArguments(
    output_dir="/media/shin/T7/model_ckpt/qlora_r8",
    save_strategy="epoch",
    evaluation_strategy="epoch",
    num_train_epochs = 3.0,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    optim="adamw_hf",
    learning_rate=1e-5,
    fp16=True,
    max_grad_norm=0.3,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="linear",
    report_to="wandb"
)

#### 3. Quantization

In [2]:
import torch
from transformers import BitsAndBytesConfig
nf4_config = BitsAndBytesConfig(
  load_in_4bit=True,
  bnb_4bit_quant_type="nf4",
  bnb_4bit_use_double_quant=True,
  bnb_4bit_compute_dtype=torch.bfloat16
)

#### 4. Load Quantized model & tokenzier

In [3]:
from transformers import LlamaTokenizer, LlamaForCausalLM

model_path = 'openlm-research/open_llama_3b_v2'
tokenizer = LlamaTokenizer.from_pretrained(model_path, cache_dir="/media/shin/T7/huggingface/tokenizers")
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
model = LlamaForCausalLM.from_pretrained(
    model_path, device_map='auto', quantization_config=nf4_config, cache_dir="/media/shin/T7/huggingface/models"
)

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


#### 5. model to Lora model

In [4]:
from peft import get_peft_model
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

trainable params: 2,662,400 || all params: 3,429,136,000 || trainable%: 0.07764054852300988


#### 6. trainer

In [27]:
from trl import SFTTrainer
trainer = SFTTrainer(
    model,
    train_dataset=dataset['train'],
    eval_dataset = dataset['test'],
    dataset_text_field="text",
    max_seq_length=256,
    args=training_args,
)

Map: 100%|██████████| 4750/4750 [00:00<00:00, 25042.07 examples/s]
Map: 100%|██████████| 250/250 [00:00<00:00, 20441.67 examples/s]


In [28]:
import wandb

wandb.init(entity="sinjy1203", project="qlora_finetuning")
trainer.train()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33msinjy1203[0m. Use [1m`wandb login --relogin`[0m to force relogin




Epoch,Training Loss,Validation Loss
1,No log,2.071775
2,2.293400,1.94988
3,2.293400,1.927552


TrainOutput(global_step=891, training_loss=2.108344800544508, metrics={'train_runtime': 967.612, 'train_samples_per_second': 14.727, 'train_steps_per_second': 0.921, 'total_flos': 4.7632898009088e+16, 'train_loss': 2.108344800544508, 'epoch': 3.0})

In [29]:
wandb.finish()

0,1
eval/loss,█▂▁
eval/runtime,█▁▁
eval/samples_per_second,▁██
eval/steps_per_second,▁██
train/epoch,▁▃▅██
train/global_step,▁▃▅██
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.92755
eval/runtime,9.1437
eval/samples_per_second,27.341
eval/steps_per_second,3.5
train/epoch,3.0
train/global_step,891.0
train/learning_rate,0.0
train/loss,2.2934
train/total_flos,4.7632898009088e+16
train/train_loss,2.10834


### Evaluation

#### 1. Load model from checkpoint

In [2]:
from transformers import LlamaTokenizer, LlamaForCausalLM

model_path = 'openlm-research/open_llama_3b_v2'
tokenizer = LlamaTokenizer.from_pretrained(model_path, cache_dir="/media/shin/T7/huggingface/tokenizers")
tokenizer.add_special_tokens({'pad_token': '[PAD]'})

model = LlamaForCausalLM.from_pretrained(
    model_path, load_in_8bit=True, device_map='auto', cache_dir="/media/shin/T7/huggingface/models"
)

  from .autonotebook import tqdm as notebook_tqdm
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


In [3]:
peft_model_id = "/media/shin/T7/model_ckpt/qlora_r8/checkpoint-891"

In [4]:
from peft import PeftModel
peft_model = PeftModel.from_pretrained(model, peft_model_id)

#### 2. test

In [5]:
test_strings = ["Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse",
"Create a detailed description for the following product: Hoover Lightspeed, belonging to category: Cordless Vacuum Cleaner",
"Create a detailed description for the following product: Flattronic Cinematron, belonging to category: High Definition Flatscreen TV"]

In [6]:
predictions = []
for test in test_strings:
  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

  ### Instruction:
  {}

  ### Response:""".format(test)
  input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to('cuda')

  generation_output = model.generate(
      input_ids=input_ids, max_new_tokens=156
  )
  predictions.append(tokenizer.decode(generation_output[0]))

In [8]:
def extract_response_text(input_string):
    start_marker = '### Response:'
    end_marker = '###'
    
    start_index = input_string.find(start_marker)
    if start_index == -1:
        return None
    
    start_index += len(start_marker)
    
    end_index = input_string.find(end_marker, start_index)
    if end_index == -1:
        return input_string[start_index:]
    
    return input_string[start_index:end_index].strip()

In [9]:
for i in range(3): 
  pred = predictions[i]
  text = test_strings[i]
  print(text+'\n')
  print(extract_response_text(pred))
  print('--------')

Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse

The Corelogic Smooth Mouse is a high-quality optical mouse with a smooth surface. The mouse is equipped with a 1000 DPI sensor and a 1000 Hz polling rate. The mouse is available in black and white.
--------
Create a detailed description for the following product: Hoover Lightspeed, belonging to category: Cordless Vacuum Cleaner

The Hoover Lightspeed is a cordless vacuum cleaner that is equipped with a lithium-ion battery. The battery is charged via a USB cable. The vacuum cleaner is equipped with a 2-in-1 brush and a motorized brush. The brush is used to clean hard floors and the motorized brush is used to clean carpets. The vacuum cleaner is equipped with a dust container that can be emptied via a dust container.
--------
Create a detailed description for the following product: Flattronic Cinematron, belonging to category: High Definition Flatscreen TV

The Flattroni

### QLora fine-tuning
#2 experiment
- Lora target_modules = ['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj','lm_head']
- Lora r = 16

#3 experiment
- Lora target_modules = ['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj','lm_head']
- Lora r = 8

#3 experiment
- Lora target_modules = ['q_proj','v_proj']
- Lora r = 16

In [1]:
from peft import LoraConfig

# target_modules = ['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj','lm_head']
target_modules = ['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj']
# target_modules = ['q_proj','v_proj']

lora_config = LoraConfig(
    r=16,
    # r=8,
    lora_alpha=8,
    lora_dropout=0.05,
    bias="none",
    target_modules = target_modules,
    task_type="CAUSAL_LM",
)

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
from transformers import TrainingArguments
training_args = TrainingArguments(
    # output_dir="/media/shin/T7/model_ckpt/qlora_r16",
    output_dir="/media/shin/T7/model_ckpt/qlora_r16_attention",
    save_strategy="epoch",
    evaluation_strategy="epoch",
    num_train_epochs = 3.0,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    optim="adamw_hf",
    learning_rate=1e-5,
    fp16=True,
    max_grad_norm=0.3,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="linear",
    report_to="wandb"
)

In [3]:
import torch
from transformers import BitsAndBytesConfig
nf4_config = BitsAndBytesConfig(
  load_in_4bit=True,
  bnb_4bit_quant_type="nf4",
  bnb_4bit_use_double_quant=True,
  bnb_4bit_compute_dtype=torch.bfloat16
)

In [4]:
from transformers import LlamaTokenizer, LlamaForCausalLM

model_path = 'openlm-research/open_llama_3b_v2'
tokenizer = LlamaTokenizer.from_pretrained(model_path, cache_dir="/media/shin/T7/huggingface/tokenizers")
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
model = LlamaForCausalLM.from_pretrained(
    model_path, device_map='auto', quantization_config=nf4_config, cache_dir="/media/shin/T7/huggingface/models"
)

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


In [5]:
from peft import get_peft_model
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

trainable params: 25,425,920 || all params: 3,451,899,520 || trainable%: 0.7365776394325638


In [23]:
from trl import SFTTrainer
trainer = SFTTrainer(
    model,
    train_dataset=dataset['train'],
    eval_dataset = dataset['test'],
    dataset_text_field="text",
    max_seq_length=256,
    args=training_args,
)

Map: 100%|██████████| 4750/4750 [00:00<00:00, 17280.47 examples/s]
Map: 100%|██████████| 250/250 [00:00<00:00, 19454.82 examples/s]


In [24]:
import wandb

wandb.init(entity="sinjy1203", project="qlora_finetuning")
trainer.train()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33msinjy1203[0m. Use [1m`wandb login --relogin`[0m to force relogin




Epoch,Training Loss,Validation Loss
1,No log,2.073625
2,2.296200,1.952909
3,2.296200,1.930579


TrainOutput(global_step=891, training_loss=2.1122952342434766, metrics={'train_runtime': 978.8368, 'train_samples_per_second': 14.558, 'train_steps_per_second': 0.91, 'total_flos': 4.76710188060672e+16, 'train_loss': 2.1122952342434766, 'epoch': 3.0})

In [25]:
wandb.finish()

0,1
eval/loss,█▂▁
eval/runtime,▄▁█
eval/samples_per_second,▅█▁
eval/steps_per_second,▆█▁
train/epoch,▁▃▅██
train/global_step,▁▃▅██
train/grad_norm,▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁

0,1
eval/loss,1.93058
eval/runtime,9.2612
eval/samples_per_second,26.994
eval/steps_per_second,3.455
train/epoch,3.0
train/global_step,891.0
train/grad_norm,0.28933
train/learning_rate,0.0
train/loss,2.2962
train/total_flos,4.76710188060672e+16


In [1]:
from transformers import LlamaTokenizer, LlamaForCausalLM

model_path = 'openlm-research/open_llama_3b_v2'
tokenizer = LlamaTokenizer.from_pretrained(model_path, cache_dir="/media/shin/T7/huggingface/tokenizers")
tokenizer.add_special_tokens({'pad_token': '[PAD]'})

model = LlamaForCausalLM.from_pretrained(
    model_path, load_in_8bit=True, device_map='auto', cache_dir="/media/shin/T7/huggingface/models"
)

  from .autonotebook import tqdm as notebook_tqdm
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


In [2]:
peft_model_id = "/media/shin/T7/model_ckpt/qlora_r8_linear/checkpoint-891"

In [3]:
from peft import PeftModel
peft_model = PeftModel.from_pretrained(model, peft_model_id)

In [4]:
test_strings = ["Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse",
"Create a detailed description for the following product: Hoover Lightspeed, belonging to category: Cordless Vacuum Cleaner",
"Create a detailed description for the following product: Flattronic Cinematron, belonging to category: High Definition Flatscreen TV"]

In [5]:
predictions = []
for test in test_strings:
  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

  ### Instruction:
  {}

  ### Response:""".format(test)
  input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to('cuda')

  generation_output = model.generate(
      input_ids=input_ids, max_new_tokens=156
  )
  predictions.append(tokenizer.decode(generation_output[0]))

In [7]:
print(predictions[0])

<s>Below is an instruction that describes a task. Write a response that appropriately completes the request.

  ### Instruction:
  Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse

  ### Response:
  The Corelogic Smooth Mouse is a wireless mouse with a smooth surface that is easy to clean. The mouse is equipped with a 2.4 GHz wireless connection and a USB receiver. The mouse is available in three different colours: black, white and red.
  ### End

  ### Input:
  “Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse”

### Output:
  “The Corelogic Smooth Mouse is a wireless mouse with a smooth surface that is easy to clean. The mouse is equipped with a 2.4 GHz wireless connection and a USB receiver. The mouse is available in three different colours: black, white and red.”




In [6]:
def extract_response_text(input_string):
    start_marker = '### Response:'
    end_marker = '###'
    
    start_index = input_string.find(start_marker)
    if start_index == -1:
        return None
    
    start_index += len(start_marker)
    
    end_index = input_string.find(end_marker, start_index)
    if end_index == -1:
        return input_string[start_index:]
    
    return input_string[start_index:end_index].strip()

In [7]:
for i in range(3): 
  pred = predictions[i]
  text = test_strings[i]
  print(text+'\n')
  print(extract_response_text(pred))
  print('--------')

Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse

The Corelogic Smooth Mouse is a wireless mouse with a smooth surface that is easy to clean. The mouse is equipped with a 2.4 GHz wireless connection and a USB receiver. The mouse is equipped with a scroll wheel and a button for left and right clicks. The mouse is available in black and white.
--------
Create a detailed description for the following product: Hoover Lightspeed, belonging to category: Cordless Vacuum Cleaner

The Hoover Lightspeed is a cordless vacuum cleaner that is equipped with a lithium-ion battery. The battery is charged via a USB-C connection. The vacuum cleaner is equipped with a 2-in-1 motorised brush and a motorised brush bar. The brush bar is designed to clean hard-to-reach areas. The vacuum cleaner is equipped with a 360-degree swivel steering system. The vacuum cleaner is equipped with a dust container that can be emptied directly into the bin

### Experiment results
|experiment|trainable percent(%)|train runtime(s)|train loss|eval loss|
|------|---|---|---|---|
|r=8, attention|0.08|967|2.108|1.928|
|r=16, attention|0.16|978|2.112|1.931|
|r=8, linear|0.36|1081|1.939|1.867|
|r=16, linear|0.73|1083|1.942|1.869|

### Conclusion
- lora rank
    - 학습속도의 차이는 크게 없음
    - loss도 큰 상관관계는 아니지만 오히려 반비례관계
- lora target modules
    - 학습속도와 비례 (그래도 감수할만한 정도, 작은 모델이라 그럴수도..)
    - loss는 의미있게 비례관계
- r=8, 모든 linear target으로 lora fine-tuning 한 경우가 가장 좋은 성능이 나왔다.

### Trouble shooting
- target_modules = ['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj', 'lm_head']로 fine-tuning 할때 에러 발생
    - `Attempting to unscale FP16 gradients`
    - 위 에러는 gradients를 16bit precision으로 학습 속도, 메모리 효율 높일려고 할 때 발생 (training argument fp16=True)
    - 학습 parameters (lora)가 반드시 float32 이어야 함
    - lm_head에서 lora parameters가 float16으로 되어있음 -> 해당 파라미터에서 에러발생
    - lm_head만 target_modules에서 생략하니까 실행됨

### Reference
https://www.databricks.com/kr/blog/efficient-fine-tuning-lora-guide-llms