<a href="https://colab.research.google.com/github/hawe66/sql-learning-log/blob/main/slm_lora_toy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# 최신 pip
!pip install --upgrade pip

# 핵심 라이브러리
!pip install torch transformers datasets accelerate evaluate matplotlib sagemaker boto3

# LoRA / QLoRA 관련
!pip install -q --no-deps xformers trl peft accelerate
!pip install git+https://github.com/TimDettmers/bitsandbytes.git

# 평가 지표용 (ROUGE 등)
!pip install rouge_score


Collecting git+https://github.com/TimDettmers/bitsandbytes.git
  Cloning https://github.com/TimDettmers/bitsandbytes.git to /tmp/pip-req-build-w1gt2rm4
  Running command git clone --filter=blob:none --quiet https://github.com/TimDettmers/bitsandbytes.git /tmp/pip-req-build-w1gt2rm4
  Resolved https://github.com/TimDettmers/bitsandbytes.git to commit 39dd8471c1c0677001d0d20ba2218b14bf18fd00
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: bitsandbytes
  Building wheel for bitsandbytes (pyproject.toml) ... [?25l[?25hdone
  Created wheel for bitsandbytes: filename=bitsandbytes-0.48.0.dev0-cp312-cp312-linux_x86_64.whl size=121772 sha256=3e2e2ae5f9c326a6670e514dec42e2485af308ad55269ead4ee3c92689d5b261
  Stored in directory: /tmp/pip-ephem-wheel-cache-0p8v00x3/wheels/c8/ab/0c/

# Load dataset and model

In [None]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM

# [🔁 모델 변경 가능] -- GPT 대신 T5, BART 등으로 교체 시 여기를 바꾸세요.
model_name = "distilgpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# GPT류에는 pad_token이 없으므로 eos를 재사용
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

cnn_dataset = load_dataset("cnn_dailymail", "3.0.0")


In [None]:
# Function to generate text
def distilgpt2_generate_text(text, model, max_new_tokens=128):
    input = tokenizer.encode(text, return_tensors="pt").to(model.device)
    output = model.generate(
        input,
        do_sample=True,
        top_k=50,
        top_p=0.95,
        temperature=0.7,
        max_new_tokens=max_new_tokens,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id
    )
    return tokenizer.decode(output[0], skip_special_tokens=True)


### Testing via example





In [None]:
highlight = cnn_dataset["train"][10]['highlights']
print("Reference Summary:")
print(highlight)
sentence = (
    "The city of San Francisco has introduced a new pilot program to reduce traffic congestion. "
    "Officials say the plan will involve rerouting certain streets and implementing new traffic signals."
)
tokens = tokenizer.tokenize(sentence)
token_ids = tokenizer.convert_tokens_to_ids(tokens)
print("Tokenized:")
print(tokens)
print("Token IDs:")
print(token_ids)

# 테스트 문장 생성
sample_sentence = (
    "The city of San Francisco has introduced a new pilot program to reduce traffic congestion. "
    "Officials say the plan will involve rerouting certain streets and implementing new traffic signals."
)

# 텍스트 생성 함수
def generate_text(text, model):
    input = tokenizer.encode(text, return_tensors="pt").to(model.device)
    output = model.generate(
        input,
        max_new_tokens=100,
        do_sample=True,
        top_p=0.95,
        temperature=0.8,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.pad_token_id,
        attention_mask=(input != tokenizer.pad_token_id).long()
    )
    return tokenizer.decode(output[0], skip_special_tokens=True)

print(generate_text(sample_sentence, model))

Reference Summary:
President Bush to address the Veterans of Foreign Wars on Wednesday .
Bush to say that withdrawing from Vietnam emboldened today's terrorists .
Speech will be latest White House attempt to try to reframe the debate over Iraq .
Tokenized:
['The', 'Ġcity', 'Ġof', 'ĠSan', 'ĠFrancisco', 'Ġhas', 'Ġintroduced', 'Ġa', 'Ġnew', 'Ġpilot', 'Ġprogram', 'Ġto', 'Ġreduce', 'Ġtraffic', 'Ġcongestion', '.', 'ĠOfficials', 'Ġsay', 'Ġthe', 'Ġplan', 'Ġwill', 'Ġinvolve', 'Ġre', 'r', 'outing', 'Ġcertain', 'Ġstreets', 'Ġand', 'Ġimplementing', 'Ġnew', 'Ġtraffic', 'Ġsignals', '.']
Token IDs:
[464, 1748, 286, 2986, 6033, 468, 5495, 257, 649, 8022, 1430, 284, 4646, 4979, 28014, 13, 28244, 910, 262, 1410, 481, 6211, 302, 81, 13660, 1728, 6483, 290, 15427, 649, 4979, 10425, 13]
The city of San Francisco has introduced a new pilot program to reduce traffic congestion.
This year, the City Council approved two measures that would have eliminated one major transit lane on West Side streets: 1) elimina

# Task
Generate Python code to perform full fine-tuning and LoRA on DistilGPT-2 using a sampled subset of the cnn_dailymail dataset for a summarization task, and include code to measure training time, VRAM usage, and the number of tuned parameters for comparison.

## 데이터 샘플링 및 전처리

### Subtask:
cnn_dailymail 데이터셋의 일부를 샘플링하고 모델 학습에 필요한 형식으로 전처리합니다.


**Reasoning**:
Sample the dataset, combine the article and highlights, tokenize the combined text, and format it for model training.



In [None]:
import torch

def preprocess_function(examples):
    inputs = [f"{a} [SEP] {h}" for a, h in zip(examples["article"], examples["highlights"])]
    model_inputs = tokenizer(inputs, max_length=512, truncation=True, padding="max_length")
    model_inputs["labels"] = model_inputs["input_ids"].copy()
    return model_inputs

# [🔁 샘플 수 변경 가능]
train_sample_size = 5000
eval_sample_size = 500
sampled_train_dataset = cnn_dataset["train"].select(range(train_sample_size))
sampled_test_dataset = cnn_dataset["test"].select(range(eval_sample_size))

# 토크나이즈
tokenized_train_dataset = sampled_train_dataset.map(preprocess_function, batched=True)
tokenized_test_dataset = sampled_test_dataset.map(preprocess_function, batched=True)
tokenized_train_dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])
tokenized_test_dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

## Full fine-tuning 모델 학습

### Subtask:
샘플링된 데이터셋을 사용하여 DistilGPT-2 모델을 Full Fine-tuning합니다. 학습 시간과 VRAM 사용량을 기록합니다.


**Reasoning**:
Import necessary classes, set the device, create TrainingArguments, initialize the Trainer, start training, and record training time and VRAM usage.



**Reasoning**:
The previous command failed because `evaluation_strategy` is not a valid argument for `TrainingArguments`. Based on the traceback, I will remove the `evaluation_strategy` argument and replace it with `eval_strategy` as per the transformers library documentation. I will also update `logging_steps` to `logging_steps=50` and `warmup_steps` to `warmup_ratio=0.06` as suggested by the documentation to improve training stability. I will also adjust `save_strategy` and `save_steps` to save the model at the end of each epoch and at intervals of 500 steps to ensure the model is saved during training. I will also set `push_to_hub=False` as we are not pushing the model to the Hugging Face Hub. Finally, I will set `load_best_model_at_end=True` and `metric_for_best_model="loss"` to load the best model at the end of training based on the loss metric.



In [None]:
from transformers import Trainer, TrainingArguments
import time

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# [🔁 학습 조건 변경 가능]
training_args = TrainingArguments(
    output_dir="./results_baseline",
    eval_strategy="epoch",
    learning_rate=5e-5,                      # 🔁 기존보다 큰 학습률
    num_train_epochs=5,                      # 🔁 에폭 수 증가
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    warmup_steps=500,                        # 🔁 warmup step 명시
    weight_decay=0.01,
    logging_dir="./logs_baseline",
    logging_steps=50,
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="loss",
    report_to="none"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_test_dataset,
    tokenizer=tokenizer
)

start_time = time.time()
trainer.train()
end_time = time.time()

print(f"\nTraining time: {end_time - start_time:.2f} seconds")
if device.type == 'cuda':
    print(f"Peak VRAM usage: {torch.cuda.max_memory_allocated(device)/1024**2:.2f} MB")
print(f"Number of parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad)}")


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.
`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.


Epoch,Training Loss,Validation Loss
1,3.112,2.961987
2,2.9219,2.973611
3,2.8927,2.982705
4,2.7824,2.988582


Epoch,Training Loss,Validation Loss
1,3.112,2.961987
2,2.9219,2.973611
3,2.8927,2.982705
4,2.7824,2.988582
5,2.7693,2.994571


There were missing keys in the checkpoint model loaded: ['lm_head.weight'].



Training time: 2654.71 seconds
Peak VRAM usage: 3284.71 MB
Number of parameters: 81912576


In [None]:
model

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-5): 6 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

# Task
Continue the experiment by implementing and training DistilGPT-2 with LoRA on the sampled cnn_dailymail dataset, then evaluate and compare it with the full fine-tuned DistilGPT-2 model based on the provided requirements. If time and resources allow, proceed with the optional GPT-2 Medium experiments with LoRA and QLoRA. Finally, summarize all findings.

## Implement lora on distilgpt-2

### Subtask:
Apply LoRA to the DistilGPT-2 model using the sampled cnn_dailymail dataset.


In [None]:
import time
import torch
from transformers import Trainer, TrainingArguments
from peft import LoraConfig, get_peft_model

# 디바이스 설정
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# LoRA 구성
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["c_attn", "c_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# 모델에 LoRA 적용
pretrained_model = AutoModelForCausalLM.from_pretrained(checkpoint_path, device_map="auto")
lora_model = get_peft_model(pretrained_model, lora_config)
lora_model.print_trainable_parameters()
lora_model.to(device)

# VRAM 초기 사용량 기록
if device.type == 'cuda':
    torch.cuda.reset_peak_memory_stats(device)
    initial_lora_vram = torch.cuda.memory_allocated(device) / 1024**2
    print(f"Initial VRAM usage (LoRA model): {initial_lora_vram:.2f} MB")

# TrainingArguments 설정
lora_training_args = TrainingArguments(
    output_dir="./results_lora",
    eval_strategy="epoch",
    learning_rate=5e-5,                      # 🔁 기존보다 큰 학습률
    num_train_epochs=5,                      # 🔁 에폭 수 증가
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    warmup_steps=500,                        # 🔁 warmup step 명시
    weight_decay=0.01,
    logging_dir="./logs_lora",
    logging_steps=50,
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="loss",
    report_to="none"
)

# Trainer 초기화
lora_trainer = Trainer(
    model=lora_model,
    args=lora_training_args,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_test_dataset,
    tokenizer=tokenizer
)

# 학습 시간 측정 시작
start_time_lora = time.time()

# ✅ resume 지원: 필요 시 체크포인트 지정 가능
# 예: resume_from_checkpoint="./results_lora/checkpoint-500"
# lora_trainer.train(resume_from_checkpoint=True)
lora_trainer.train()

# 학습 종료 후 시간 측정
end_time_lora = time.time()
training_time_lora = end_time_lora - start_time_lora
print(f"\nTraining time (LoRA): {training_time_lora:.2f} seconds")

# 최대 VRAM 사용량 기록
if device.type == 'cuda':
    peak_lora_vram = torch.cuda.max_memory_allocated(device) / 1024**2
    print(f"Peak VRAM usage during training (LoRA): {peak_lora_vram:.2f} MB")


Using device: cuda
trainable params: 405,504 || all params: 82,318,080 || trainable%: 0.4926
Initial VRAM usage (LoRA model): 1290.24 MB


  lora_trainer = Trainer(


Epoch,Training Loss,Validation Loss
1,3.3058,3.06206
2,3.1534,2.977643
3,3.1883,2.962579
4,3.1015,2.955986
5,3.1437,2.953976



Training time (LoRA): 1845.09 seconds
Peak VRAM usage during training (LoRA): 3659.93 MB


## Evaluate distilgpt-2 models

### Subtask:
Calculate evaluation metrics (Perplexity, ROUGE) for both the full fine-tuned and LoRA models on the test dataset.


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

# [NEW] Load checkpointed baseline model
checkpoint_path = "./results_baseline/checkpoint-6250"
baseline_model = AutoModelForCausalLM.from_pretrained(checkpoint_path, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(checkpoint_path)
generation_config = GenerationConfig.from_pretrained(checkpoint_path)
baseline_model.generation_config = generation_config


In [None]:
import time
import gc
import torch
import evaluate

# Load evaluation metrics
perplexity_metric = evaluate.load("perplexity")
rouge_metric = evaluate.load("rouge")

# Set tokenizer padding side for generation
tokenizer.padding_side = "left"

# Set the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Move models to device
baseline_model.to(device)
lora_model.to(device)

# Define evaluation function
def compute_metrics(model, dataset, tokenizer, original_dataset, use_base_model=False, max_eval=100):
    model.eval()
    predictions = []
    references = []

    for i in range(min(len(dataset), max_eval)):
        print(f"Processing example {i+1}/{min(len(dataset), max_eval)}")

        try:
            # Get example
            example = dataset[i]
            reference = original_dataset[i]["highlights"]

            input_ids = example["input_ids"].unsqueeze(0).to(model.device)
            attention_mask = example["attention_mask"].unsqueeze(0).to(model.device)

            # Clean up memory before generate
            gc.collect()
            torch.cuda.empty_cache()

            with torch.no_grad():
                gen_model = getattr(model, "base_model", model) if use_base_model else model

                generated = gen_model.generate(
                    input_ids,
                    attention_mask=attention_mask,
                    max_new_tokens=128,
                    do_sample=False,
                    pad_token_id=tokenizer.eos_token_id
                )

            # Decode prediction
            decoded = tokenizer.batch_decode(generated[:, input_ids.shape[1]:], skip_special_tokens=True)
            predictions.extend(decoded)
            references.append(reference)

        except Exception as e:
            print(f"[Error] Example {i+1}: {e}")
            continue

    # Compute ROUGE
    rouge_scores = rouge_metric.compute(predictions=predictions, references=references, use_stemmer=True)
    return rouge_scores

# Run evaluation
print("🧪 Evaluating Full Fine-tuned Model:")
baseline_rouge = compute_metrics(baseline_model, tokenized_test_dataset, tokenizer, sampled_test_dataset)
print(f"📊 ROUGE (Full): {baseline_rouge}")

print("\n🧪 Evaluating LoRA Fine-tuned Model:")
lora_rouge = compute_metrics(lora_model, tokenized_test_dataset, tokenizer, sampled_test_dataset, use_base_model=True)
print(f"📊 ROUGE (LoRA): {lora_rouge}")

🧪 Evaluating Full Fine-tuned Model:
Processing example 1/100
Processing example 2/100
Processing example 3/100
Processing example 4/100
Processing example 5/100
Processing example 6/100
Processing example 7/100
Processing example 8/100
Processing example 9/100
Processing example 10/100
Processing example 11/100
Processing example 12/100
Processing example 13/100
Processing example 14/100
Processing example 15/100
Processing example 16/100
Processing example 17/100
Processing example 18/100
Processing example 19/100
Processing example 20/100
Processing example 21/100
Processing example 22/100
Processing example 23/100
Processing example 24/100
Processing example 25/100
Processing example 26/100
Processing example 27/100
Processing example 28/100
Processing example 29/100
Processing example 30/100
Processing example 31/100
Processing example 32/100
Processing example 33/100
Processing example 34/100
Processing example 35/100
Processing example 36/100
Processing example 37/100
Processing 

## Generate text samples

### Subtask:
Generate text samples from both the full fine-tuned and LoRA models using the same prompt for qualitative comparison.

**Reasoning**:
Define a prompt and use the `distilgpt2_generate_text` function to generate text from both the baseline (full fine-tuned) and LoRA models.

In [None]:
# Define a prompt for text generation
prompt = "Once upon a time..."

# Generate text using the full fine-tuned model
print("Generating text from Full Fine-tuned Model:")
baseline_generated_text = distilgpt2_generate_text(prompt, baseline_model)
print("Full Fine-tuned Model Output:")
print(baseline_generated_text)

# Generate text using the LoRA model
print("\nGenerating text from LoRA Model:")
lora_generated_text = distilgpt2_generate_text(prompt, lora_model)
print("LoRA Model Output:")
print(lora_generated_text)

Generating text from Full Fine-tuned Model:
Full Fine-tuned Model Output:
Once upon a time...

I was in the basement, and had never seen some kind of room before. It looked

Generating text from LoRA Model:
LoRA Model Output:
Once upon a time...
To help you identify which type of item you are referring to, I present a list of 10


## Optional: GPT-2 Medium Experiments with LoRA and QLoRA

Now, we can optionally explore applying LoRA and QLoRA to a larger model, GPT-2 Medium, to observe the differences in efficiency and performance compared to full fine-tuning. Please note that training GPT-2 Medium, even with LoRA or QLoRA, will require significant computational resources and may take a considerable amount of time in a Colab environment.

## Implement LoRA on GPT-2 Medium (Optional)

### Subtask:
Load the GPT-2 Medium model and apply LoRA to it.

**Reasoning**:
Load the GPT-2 Medium model, apply the LoRA configuration, and print the number of trainable parameters.

In [None]:
from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model
import torch

# Set the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")


# Load the GPT-2 Medium model
print("Loading GPT-2 Medium model...")
gpt2_medium_model = AutoModelForCausalLM.from_pretrained("gpt2-medium")
gpt2_medium_model.to(device)
print("GPT-2 Medium model loaded.")

# Define LoRA configuration (using the same config as for DistilGPT-2 for consistency)
# We can adjust these parameters later if needed for performance on the larger model.
lora_config_medium = LoraConfig(
    r=8,  # LoRA attention dimension
    lora_alpha=16,  # Alpha parameter for LoRA scaling
    target_modules=["c_attn", "c_proj"],  # Modules to apply LoRA to
    lora_dropout=0.05,  # Dropout probability for LoRA layers
    bias="none",  # Bias type
    task_type="CAUSAL_LM",  # Task type for causal language modeling
)

# Apply LoRA to the GPT-2 Medium model
print("Applying LoRA to GPT-2 Medium model...")
lora_gpt2_medium_model = get_peft_model(gpt2_medium_model, lora_config_medium)
print("LoRA applied to GPT-2 Medium model.")

# Print the number of trainable parameters in the LoRA GPT-2 Medium model
print("\nLoRA GPT-2 Medium model trainable parameters:")
lora_gpt2_medium_tuned_params = lora_gpt2_medium_model.print_trainable_parameters()

Using device: cpu
Loading GPT-2 Medium model...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

GPT-2 Medium model loaded.
Applying LoRA to GPT-2 Medium model...
LoRA applied to GPT-2 Medium model.

LoRA GPT-2 Medium model trainable parameters:
trainable params: 2,162,688 || all params: 356,985,856 || trainable%: 0.6058




## Implement QLoRA on GPT-2 Medium (Optional)

### Subtask:
Apply QLoRA to the GPT-2 Medium model.

**Reasoning**:
Load the GPT-2 Medium model (if not already loaded) and apply the QLoRA configuration. Print the number of trainable parameters.

In [None]:
# === 런타임 재시작 후 QLoRA 코드 실행 ===
import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# 장치 설정
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# 4-bit 양자화 로딩
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=False
)

# 메모리 정리
if device.type == 'cuda':
    torch.cuda.empty_cache()
import gc; gc.collect()

print("Loading GPT-2 Medium with 4-bit quantization...")
gpt2_medium_model = AutoModelForCausalLM.from_pretrained(
    "gpt2-medium",
    quantization_config=bnb_config,
    device_map="auto"
)
print("Quantized model loaded.")

# QLoRA 준비
gpt2_medium_model = prepare_model_for_kbit_training(gpt2_medium_model)
print("Model prepared for k-bit training.")


Using device: cpu
Loading GPT-2 Medium with 4-bit quantization...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Quantized model loaded.
Model prepared for k-bit training.


# Task Conclusion

We have successfully completed the following parts of the task:

*   Loading the dataset and model (DistilGPT-2).
*   Sampling and preprocessing the dataset for training.
*   Performing full fine-tuning on DistilGPT-2 and measuring training time, VRAM usage, and tuned parameters.
*   Implementing and training DistilGPT-2 with LoRA and measuring training time, VRAM usage, and tuned parameters.
*   Generating text samples from both the full fine-tuned and LoRA DistilGPT-2 models for qualitative comparison.

We were unable to complete the evaluation metrics calculation due to a runtime error during the evaluation step.

The optional experiments with GPT-2 Medium, including implementing LoRA and QLoRA, could not be completed due to environmental limitations (lack of a CUDA-enabled GPU required for `bitsandbytes`).

Based on the completed DistilGPT-2 experiments, we observed that LoRA significantly reduced the number of trainable parameters and training time compared to full fine-tuning, with comparable peak VRAM usage in this run.

This concludes the task based on the feasible experiments in this environment.

## Summary of DistilGPT-2 Experiment Results

Here is a table summarizing the key metrics from the DistilGPT-2 experiments:

| Metric                        | Full Fine-tuning   | LoRA        |
| :---------------------------- | :----------------- | :---------- |
| Training Time (seconds)       | {{training_time:.2f}}      | {{training_time_lora:.2f}}   |
| Peak VRAM Usage (MB)          | {{peak_vram:.2f}}      | {{peak_lora_vram:.2f}}    |
| Number of Tuned Parameters    | {{num_tuned_params}}     | 405,504     |

*Note: Evaluation metrics (Perplexity and ROUGE) could not be computed due to a runtime error during the evaluation step.*