학습 소요 시간 :

https://api.wandb.ai/links/w3yfrl-none/kcq8ar15

학습 메모리 소모 :

https://api.wandb.ai/links/w3yfrl-none/nmhh0v43

테스트 정확도 :

https://api.wandb.ai/links/w3yfrl-none/ijc4eq61

LoRA 장단점

장점

* 메모리 효율이 뛰어남: rank=8 설정에서는 GPU 사용량이 매우 낮았음
* 학습 속도가 빠름: 파라미터 수가 적기 때문에 동일한 에폭 기준 학습 시간이 짧음
* 파라미터 효율적 학습 가능: rank=128 수준만 되어도 상당한 성능 확보 가능, full model tuning 없이도 충분히 높은 정확도 달성 가능
* 모듈화: 기존 모델은 그대로 유지한 채, LoRA adapter만 별도로 저장 가능

단점

* rank가 너무 낮으면 성능 저하: rank=8 설정에서는 정확도와 손실 개선폭이 부족해 underfitting이 발생함
* rank가 높아질수록 메모리 사용량 증가: rank=256은 성능은 좋지만 메모리 소모가 커지고 학습 시간도 늘어남


In [1]:
!pip install peft transformers trl accelerate datasets wandb

Collecting trl
  Downloading trl-0.17.0-py3-none-any.whl.metadata (12 kB)
Collecting datasets
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.13.0->peft)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.13.0->peft)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.13.0->peft)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.13.0->peft)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-c

In [2]:
!wandb login

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit: 
[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mw3yfrl[0m ([33mw3yfrl-none[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


In [22]:
from transformers import AutoTokenizer, AutoModelForCausalLM
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer, SFTConfig
import torch, time, wandb

MODEL_NAME = "facebook/opt-350m"
DATASET_NAME = "sahil2801/CodeAlpaca-20k"
RANKS = [8, 128, 256]
TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "out_proj"]

# === 포맷 함수 ===
def formatting_prompts_func(example):
    return {
        "text": f"### Instruction:\n{example['instruction']}\n### Input:\n{example['input']}\n### Response:\n{example['output']}"
    }

# === 데이터셋 준비 ===
raw_dataset = load_dataset(DATASET_NAME)
train_dataset = raw_dataset["train"].select(range(1000))
formatted_dataset = train_dataset.map(formatting_prompts_func)

# === 실험 루프 ===
for rank in RANKS:
    print(f"\n[Rank {rank}] Initializing experiment...")
    wandb.init(project="lora-rank-comparison", name=f"rank-{rank}", reinit=True)

    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
    model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)

    # LoRA 설정
    peft_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        inference_mode=False,
        r=rank,
        lora_alpha=rank * 4,
        lora_dropout=0.1,
        target_modules=TARGET_MODULES,
    )
    model = get_peft_model(model, peft_config)

    trainer = SFTTrainer(
        model=model,
        train_dataset=formatted_dataset,
        args=SFTConfig(
            output_dir=f"/tmp/lora-rank-{rank}",
            max_seq_length=128,
            per_device_train_batch_size=32,
            num_train_epochs=1,
            logging_steps=10,
            report_to="wandb",
            dataset_text_field="text"
        ),
        data_collator=None,
        formatting_func=None
    )

    # 학습
    print("Training...")
    start_time = time.time()
    trainer.train()
    runtime = time.time() - start_time
    print(f"Runtime: {runtime:.2f} seconds")

    # 메모리 측정
    max_memory_gb = round(torch.cuda.max_memory_allocated(0) / 1024**3, 2)
    print(f"Max GPU Memory: {max_memory_gb} GB")

    # 결과 로깅
    wandb.log({
        "runtime_seconds": runtime,
        "max_memory_gb": max_memory_gb,
    })
    wandb.finish()

    del model
    torch.cuda.empty_cache()



[Rank 8] Initializing experiment...


No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Training...


Step,Training Loss
10,2.6929
20,2.5867
30,2.4986


Runtime: 14.54 seconds
Max GPU Memory: 10.63 GB


0,1
max_memory_gb,▁
runtime_seconds,▁
train/epoch,▁▄▇█
train/global_step,▁▄▇██
train/grad_norm,▁█▁
train/learning_rate,█▄▁
train/loss,█▄▁
train/mean_token_accuracy,▁▂▅█
train/num_tokens,▁▄██

0,1
max_memory_gb,10.63
runtime_seconds,14.54162
total_flos,234186866688000.0
train/epoch,1.0
train/global_step,32.0
train/grad_norm,2.47358
train/learning_rate,0.0
train/loss,2.4986
train/mean_token_accuracy,0.54194
train/num_tokens,92858.0



[Rank 128] Initializing experiment...


No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Training...


Step,Training Loss
10,2.3312
20,1.9921
30,1.9018


Runtime: 14.71 seconds
Max GPU Memory: 11.07 GB


0,1
max_memory_gb,▁
runtime_seconds,▁
train/epoch,▁▄▇█
train/global_step,▁▄▇██
train/grad_norm,█▆▁
train/learning_rate,█▄▁
train/loss,█▂▁
train/mean_token_accuracy,▁▅▇█
train/num_tokens,▁▄██

0,1
max_memory_gb,11.07
runtime_seconds,14.71281
total_flos,252306259968000.0
train/epoch,1.0
train/global_step,32.0
train/grad_norm,3.37472
train/learning_rate,0.0
train/loss,1.9018
train/mean_token_accuracy,0.62611
train/num_tokens,92858.0



[Rank 256] Initializing experiment...


No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Training...


Step,Training Loss
10,2.2164
20,1.8824
30,1.8099


Runtime: 16.13 seconds
Max GPU Memory: 11.54 GB


0,1
max_memory_gb,▁
runtime_seconds,▁
train/epoch,▁▄▇█
train/global_step,▁▄▇██
train/grad_norm,█▇▁
train/learning_rate,█▄▁
train/loss,█▂▁
train/mean_token_accuracy,▁▆▇█
train/num_tokens,▁▄██

0,1
max_memory_gb,11.54
runtime_seconds,16.13068
total_flos,271633612800000.0
train/epoch,1.0
train/global_step,32.0
train/grad_norm,4.24784
train/learning_rate,0.0
train/loss,1.8099
train/mean_token_accuracy,0.64193
train/num_tokens,92858.0
