## LoRA(Low-Rank Adaptation)

로라 튜닝은 풀 파인튜닝과는 다르게 원래 모델의 모든 가중치를 업데이트 하는 것이 아니라 모델의 기존 파라미터들은 그대로 두고 별도의 작은 크기 (low-rank)의 행렬 두 개를 새롭게 모델에 추가하여 행렬 두 개만 학습하는 방식이다. 

원래 모델의 가중치를 W(0), 새롭게 추가된 두 작은 행렬을 각각 A, B라고 하면,
- W = W(0) + BA

여기서 원본 가중치 W(0)는 크기가 d x K라고 가정하면,
- 행렬 B = d x r
- 행렬 A = r x k

r 값은 어떤 값이어도 행렬 곱이 성립하므로 상관없다. r값은 LoRA 학습을 위한 하이퍼파라미터이다. 
즉, 원본 행렬의 차원이 768이라면 모델 파인튜닝에 업데이트 되는 파라미터는 768 x 768로 약 60만 개의 가중치를 갖는데, 로라 학습을 하면 학습이 되는 행렬 A, B의 가중치 개수는 768 x 4 + 4 x 768fh 6,144개에 불과하다. 풀 파인튜닝에 비해 훨씬 적은 양의 가중치만 학습하기 때문에 계산 속도와 메모리 효율이 매우 좋다.  

In [1]:
import numpy as np 

In [2]:
d = 6
h = 8
W = np.random.rand(d, h)

In [4]:
W.size

48

In [5]:
r = 2 # Rank for LoRA metrics 

In [6]:
A = np.random.randn(r, d)
B = np.random.randn(r, h)

In [7]:
A.shape

(2, 6)

In [8]:
B.shape

(2, 8)

In [9]:
A.size + B.size

28

In [22]:
f'{(A.size + B.size) / W.size * 100:.2f}%'

'58.33%'

In [11]:
AB = np.dot(A.T, B)

In [12]:
AB.size

48

In [13]:
AB.shape

(6, 8)

In [None]:
d = 2048
h = 2048
r = 32 # low-rank dimention

In [16]:
d * h # 원래 필요한 파라미터 수 

4194304

In [18]:
adapter1 = h * r 
adapter2 = d * r 

total = adapter1 + adapter2
total

131072

In [21]:
f"{total / (d * h) * 100:.3f}%"

'3.125%'

### LoRA fine tuning with `LGAI-EXAONE/EXAONE-4.0-1.2B`

In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from datasets import load_dataset 
from peft import LoraConfig, PeftModel # LoRA 모델 라이브러리 

import torch 

device = 'cuda' if torch.cuda.is_available() else "cpu"

model_ckpt = "LGAI-EXAONE/EXAONE-4.0-1.2B"

# 4-bit 양자화 설정 
bnb_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type = 'nf4',
    bnb_4bit_compute_dtype = torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
# 모델에 bnb_config을 추가하여 quantization 지정 
model = AutoModelForCausalLM.from_pretrained(model_ckpt, quantization_config = bnb_config).to(device)
model

  from .autonotebook import tqdm as notebook_tqdm
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Exaone4ForCausalLM(
  (model): Exaone4Model(
    (embed_tokens): Embedding(102400, 2048, padding_idx=0)
    (layers): ModuleList(
      (0-29): 30 x Exaone4DecoderLayer(
        (self_attn): Exaone4Attention(
          (q_proj): Linear4bit(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear4bit(in_features=2048, out_features=512, bias=False)
          (v_proj): Linear4bit(in_features=2048, out_features=512, bias=False)
          (o_proj): Linear4bit(in_features=2048, out_features=2048, bias=False)
          (q_norm): Exaone4RMSNorm((64,), eps=1e-05)
          (k_norm): Exaone4RMSNorm((64,), eps=1e-05)
        )
        (mlp): Exaone4MLP(
          (gate_proj): Linear4bit(in_features=2048, out_features=4096, bias=False)
          (up_proj): Linear4bit(in_features=2048, out_features=4096, bias=False)
          (down_proj): Linear4bit(in_features=4096, out_features=2048, bias=False)
          (act_fn): SiLU()
        )
        (post_attention_layernorm): Exaone4RMS

### EXAONE 총 파라미터 개수 확인 

In [7]:
total_params = sum(p.numel() for p in model.parameters())
print(f"총 파라미터 수: {total_params}")

총 파라미터 수: 744617728


In [12]:
# 'train' 분할의 SQuAD 데이터셋을 로드합니다.
squad_dataset = load_dataset("squad", split="train")

# 데이터셋의 구조와 첫 번째 예시를 확인합니다.
print(squad_dataset)
print("--------------------------")
print("첫 번째 예시:")
print(squad_dataset[0])

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Generating train split: 100%|██████████| 87599/87599 [00:00<00:00, 696152.65 examples/s]
Generating validation split: 100%|██████████| 10570/10570 [00:00<00:00, 680247.85 examples/s]


Dataset({
    features: ['id', 'title', 'context', 'question', 'answers'],
    num_rows: 87599
})
--------------------------
첫 번째 예시:
{'id': '5733be284776f41900661182', 'title': 'University_of_Notre_Dame', 'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.', 'question': 'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France

In [28]:
from transformers import TrainingArguments, Trainer 

# LoRA를 위한 데이터 전처리 함수
def preprocess_function(examples):
    # 질문과 지문을 결합하여 프롬프트를 생성
    inputs = [f"### Context:\n{c}\n\n### Question:\n{q}\n\n### Answer:\n"
              for c, q in zip(examples["context"], examples["question"])]
    
    max_length = 155 # exaone이 기대하는 최대 길이 
    
    # 정답 텍스트
    labels = [a["text"][0] for a in examples["answers"]]
    
    # 입력과 정답을 토큰화
    model_inputs = tokenizer(inputs, max_length=max_length, truncation=True, padding = "max_length")
    
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(labels, max_length=max_length, truncation=True, padding = "max_length")
    
    # 정답 토큰 ID를 'labels' 키에 저장
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

In [29]:
tokenized_dataset = squad_dataset.map(preprocess_function, batched=True, remove_columns=squad_dataset.column_names)

Map: 100%|██████████| 87599/87599 [00:11<00:00, 7538.21 examples/s]


In [30]:
from peft import get_peft_model, prepare_model_for_kbit_training

# 4bit 학습을 위해 모델 준비 
model = prepare_model_for_kbit_training(model)

# LoRA 설정 
lora_config = LoraConfig(
    r = 16,
    lora_alpha = 32,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj"], # EXAONE 모델의 Attention layer 
    lora_dropout = 0.05,
    bias = 'none',
    task_type = "CAUSAL_LM",
)

peft_model = get_peft_model(model, lora_config)
print(peft_model.print_trainable_parameters())

trainable params: 6,389,760 || all params: 1,285,781,248 || trainable%: 0.4970
None




### 모델 학습 

In [31]:
# TrainingArguments 정의
training_args = TrainingArguments(
    output_dir="./lora_squad_results",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    learning_rate=2e-4,
    fp16=True, # 혼합 정밀도 학습 활성화
    save_strategy="epoch",
    logging_steps=10,
    report_to="tensorboard" # 학습 로그를 텐서보드에 저장
)

# Trainer 생성 및 학습 시작
trainer = Trainer(
    model=peft_model, # LoRA가 적용된 모델
    args=training_args,
    train_dataset=tokenized_dataset, # 전처리된 데이터셋
    data_collator=None # 데이터 전처리 시 패딩을 직접 처리했다면 None으로 설정 가능
)

trainer.train()

# 학습된 LoRA 어댑터 저장
trainer.save_model("./final_peft_adapter")

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  return fn(*args, **kwargs)


Step,Training Loss
10,17.7344
20,4.9384
30,0.3199
40,0.3238
50,0.2687
60,0.2969
70,0.3095
80,0.2354
90,0.2416
100,0.2635


KeyboardInterrupt: 