<a href="https://colab.research.google.com/github/jeongminia/NLP_paper_study/blob/main/code/LoRA_0824.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PEFT를 활용한 PLM 파인튜닝 LoRA
- 출처 : https://kjwony.tistory.com/8

In [1]:
!pip install peft

Collecting peft
  Downloading peft-0.12.0-py3-none-any.whl.metadata (13 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.13.0->peft)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.13.0->peft)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.13.0->peft)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.13.0->peft)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch>=1.13.0->peft)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch>=1.13.0->peft)
  Using cached nvidia_cufft_cu12-11.

In [2]:
!pip install git+https://github.com/huggingface/peft

Collecting git+https://github.com/huggingface/peft
  Cloning https://github.com/huggingface/peft to /tmp/pip-req-build-w4md8am_
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft /tmp/pip-req-build-w4md8am_
  Resolved https://github.com/huggingface/peft to commit 900f96c40ddebae9d76bed374c8baed60e8b34e9
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: peft
  Building wheel for peft (pyproject.toml) ... [?25l[?25hdone
  Created wheel for peft: filename=peft-0.12.1.dev0-py3-none-any.whl size=301274 sha256=39c65d43f8cdb747bd6ef538e8af1d6168ac547bd0105c73796d9c56416464d3
  Stored in directory: /tmp/pip-ephem-wheel-cache-8d2e92sw/wheels/4c/16/67/1002a2d4daa822eff130e6d85b90051b75d2ce0d26b9448e4a
Successfully built peft
Installing collected packages: peft
  Attempting uninstall: peft
    Fou

In [3]:
from peft import LoraConfig, TaskType

peft_config = LoraConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)

In [11]:
from transformers import AutoModelForSeq2SeqLM

model_name_or_path = "bigscience/mt0-large"
tokenizer_name_or_path = "bigscience/mt0-large"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)

PEFT를 활용하면, 기존 방법보다 훨씬 적은 파라미터를 훈련함으로써 훨씬 효율적으로 대규모 모델을 Fine-tuning

In [5]:
from peft import get_peft_model

model = get_peft_model(model, peft_config)
model.print_trainable_parameters() # 모델의 훈련 가능한 파라미터 수를 확인

trainable params: 2,359,296 || all params: 1,231,940,608 || trainable%: 0.1915


- all_params는 모델의 전체 파라미터
- trainable_parameters는 훈련 가능한 파라미터
- trainable는 모델 파라미터의 몇 퍼센트를 훈련

# KoAlpaca - LoRA 적용
https://github.com/Beomi/KoAlpaca

In [9]:
!pip install peft transformers accelerate



In [12]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import get_peft_model, LoraConfig, TaskType

# 모델과 토크나이저 설정
MODEL = 'beomi/KoAlpaca-Polyglot-5.8B'
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForCausalLM.from_pretrained(
    MODEL,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
).to(device="cuda")

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/13 [00:00<?, ?it/s]

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

In [None]:
# LoRA 구성 설정
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=8,  # 저랭크 행렬의 랭크
    lora_alpha=16,  # 로라의 알파 값 (스케일링)
    lora_dropout=0.1,  # 드롭아웃 확률
)

**LoRA 구성**

- r: 저랭크 행렬의 랭크
  - 이 값은 모델의 파라미터 수를 줄이는 정도를 조절
  - 일반적으로 4 또는 8을 사용
- lora_alpha: LoRA의 스케일링 파라미터
  - 모델 학습 시 이 값은 학습률처럼 LoRA의 영향력을 조절
- lora_dropout: 드롭아웃 확률
  - 과적합을 방지하기 위해 일부 뉴런을 임의로 꺼서 학습

In [None]:
# 모델에 LoRA 적용
model = get_peft_model(model, lora_config)
model.eval()

In [None]:
# 텍스트 생성 파이프라인 설정
from transformers import pipeline

pipe = pipeline(
    'text-generation',
    model=model,
    tokenizer=tokenizer,
    device=0
)

In [None]:
# 질문 함수 정의
def ask(x, context='', is_input_full=False):
    ans = pipe(
        f"### 질문: {x}\n\n### 맥락: {context}\n\n### 답변:" if context else f"### 질문: {x}\n\n### 답변:",
        do_sample=True,
        max_new_tokens=512,
        temperature=0.7,
        top_p=0.9,
        return_full_text=False,
        eos_token_id=2,
    )
    print(ans[0]['generated_text'])

In [None]:
ask("딥러닝이 뭐야?")