# Fine-tuning Gemma 3 1B untuk Chatbot PMB
## Tugas Akhir: Pengembangan Chatbot Penerimaan Mahasiswa Baru

**Metode:** QLoRA (Quantized Low-Rank Adaptation)  
**Base Model:** Google Gemma 3 1B Instruct  
**Dataset:** PMB Universitas Sains Al-Qur'an  

---

## 📋 Setup & Dependencies

In [None]:
%pip install pandas torch matplotlib seaborn pyyaml numpy scikit-learn transformers datasets trl peft bitsandbytes accelerate sentencepiece protobuf torch torchvision scikit-learn wordcloud

import os
import sys
import json
import torch
import yaml
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from pathlib import Path

# Set style untuk grafik
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Ukuran default untuk gambar
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11

print("✅ Libraries imported successfully!")
print(f"📅 Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"🐍 Python: {sys.version.split()[0]}")
print(f"🔥 PyTorch: {torch.__version__}")
print(f"💻 CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"🎮 GPU: {torch.cuda.get_device_name(0)}")
    print(f"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

## 📊 1. Analisis Dataset

### 1.1 Load Dataset

**Pilih salah satu metode:**
- **Method A (cell berikutnya):** Auto-split dari file augmented (Recommended)
- **Method B (cell setelahnya):** Load dari file yang sudah di-split

### 1.1b Load dari File Terpisah (Alternative Method)

**Gunakan metode ini jika Anda sudah punya file train/val yang terpisah.**

### 1.1a Auto-Split Dataset (Recommended)

**Metode ini akan load dataset utama dan auto-split menjadi train/eval/test dengan ratio yang ditentukan.**

In [None]:
import json
import os
from sklearn.model_selection import train_test_split
from huggingface_hub import login

# ============================================================================
# KONFIGURASI
# ============================================================================

HF_TOKEN = "your_huggingface_token_here"

# File paths
INPUT_FILE = "dataset_v2.txt"
JSON_FILE = "dataset_v2.json"
FORMATTED_FILE = "dataset_formatted.json"

# Output directories
DATA_DIR = "data"
TRAIN_FILE = os.path.join(DATA_DIR, "train_pmb.json")
EVAL_FILE = os.path.join(DATA_DIR, "eval_pmb.json")
TEST_FILE = os.path.join(DATA_DIR, "test_pmb.json")

# Split ratios
TRAIN_RATIO = 0.7   # 70%
EVAL_RATIO = 0.15   # 15%
TEST_RATIO = 0.15   # 15%

# System prompt
SYSTEM_PROMPT = (
    "Anda adalah asisten virtual untuk Penerimaan Mahasiswa Baru (PMB) di "
    "Universitas Sains Al-Qur'an (UNSIQ) Wonosobo.\n"
    "Tugas Anda adalah memberikan informasi yang akurat, jelas, dan membantu calon mahasiswa "
    "dalam proses pendaftaran.\n"
    "Jawab pertanyaan dengan ramah, informatif, dan profesional."
)

METADATA = {
    "topic": "PMB UNSIQ",
    "subtopic": "Informasi Umum"
}


# ============================================================================
# STEP 1: CONVERT TXT TO JSON
# ============================================================================

def convert_txt_to_json():
    """Konversi dataset_v2.txt ke JSON"""
    print("=" * 60)
    print("STEP 1: Converting TXT to JSON")
    print("=" * 60)
    
    data = []
    entry = {}
    
    with open(INPUT_FILE, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
                
            if line.lower().startswith("q:"):
                entry = {"question": line[2:].strip()}
            elif line.lower().startswith("a:"):
                entry["answer"] = line[2:].strip()
                data.append(entry)
                entry = {}
    
    # Simpan ke JSON
    with open(JSON_FILE, "w", encoding="utf-8") as f:
        json.dump(data, f, ensure_ascii=False, indent=2)
    
    print(f"✅ Konversi selesai!")
    print(f"   Total Q&A pairs: {len(data)}")
    print(f"   File tersimpan: {JSON_FILE}\n")
    
    return data


# ============================================================================
# STEP 2: FORMAT DATASET
# ============================================================================

def format_dataset(data):
    """Format dataset dengan Gemma chat template"""
    print("=" * 60)
    print("STEP 2: Formatting Dataset")
    print("=" * 60)
    
    formatted_data = []
    
    for item in data:
        question = item.get("question", "")
        answer = item.get("answer", "")
        
        formatted_item = {
            "text": (
                f"<start_of_turn>system\n{SYSTEM_PROMPT}<end_of_turn>\n"
                f"<start_of_turn>user\n{question}<end_of_turn>\n"
                f"<start_of_turn>model\n{answer}<end_of_turn>"
            ),
            "question": question,
            "answer": answer,
            "metadata": METADATA
        }
        formatted_data.append(formatted_item)
    
    # Simpan formatted dataset
    with open(FORMATTED_FILE, "w", encoding="utf-8") as f:
        json.dump(formatted_data, f, ensure_ascii=False, indent=2)
    
    print(f"✅ Formatting selesai!")
    print(f"   Total formatted data: {len(formatted_data)}")
    print(f"   File tersimpan: {FORMATTED_FILE}\n")
    
    return formatted_data


# ============================================================================
# STEP 3: SPLIT DATASET
# ============================================================================

def split_dataset(data):
    """Split dataset menjadi train, eval, dan test"""
    print("=" * 60)
    print("STEP 3: Splitting Dataset")
    print("=" * 60)
    
    total = len(data)
    print(f"Total data: {total} samples")
    print(f"Split ratio: Train={TRAIN_RATIO*100:.0f}%, Eval={EVAL_RATIO*100:.0f}%, Test={TEST_RATIO*100:.0f}%\n")
    
    # Split train dan temp (eval + test)
    train_data, temp_data = train_test_split(
        data,
        test_size=(1 - TRAIN_RATIO),
        random_state=42,
        shuffle=True
    )
    
    # Split eval dan test
    eval_size = EVAL_RATIO / (EVAL_RATIO + TEST_RATIO)
    eval_data, test_data = train_test_split(
        temp_data,
        test_size=(1 - eval_size),
        random_state=42,
        shuffle=True
    )
    
    # Buat direktori jika belum ada
    os.makedirs(DATA_DIR, exist_ok=True)
    
    # Simpan masing-masing split
    with open(TRAIN_FILE, "w", encoding="utf-8") as f:
        json.dump(train_data, f, ensure_ascii=False, indent=2)
    
    with open(EVAL_FILE, "w", encoding="utf-8") as f:
        json.dump(eval_data, f, ensure_ascii=False, indent=2)
    
    with open(TEST_FILE, "w", encoding="utf-8") as f:
        json.dump(test_data, f, ensure_ascii=False, indent=2)
    
    print(f"✅ Split selesai!")
    print(f"   • Training:   {len(train_data):4d} samples ({len(train_data)/total*100:.1f}%)")
    print(f"   • Evaluation: {len(eval_data):4d} samples ({len(eval_data)/total*100:.1f}%)")
    print(f"   • Test:       {len(test_data):4d} samples ({len(test_data)/total*100:.1f}%)")
    print(f"\n💾 Files tersimpan:")
    print(f"   • {TRAIN_FILE}")
    print(f"   • {EVAL_FILE}")
    print(f"   • {TEST_FILE}\n")
    
    return train_data, eval_data, test_data


# ============================================================================
# MAIN
# ============================================================================

print("\n" + "=" * 60)
print("DATASET PREPARATION PIPELINE")
print("=" * 60 + "\n")

# Login ke Hugging Face
print("🔐 Logging in to Hugging Face...")
login(HF_TOKEN)
print("✅ Login berhasil!\n")

# Step 1: Convert TXT to JSON
json_data = convert_txt_to_json()

# Step 2: Format dataset
formatted_data = format_dataset(json_data)

# Step 3: Split dataset
train_data, eval_data, test_data = split_dataset(formatted_data)

# Summary
print("=" * 60)
print("🎉 SEMUA PROSES SELESAI!")
print("=" * 60)
print(f"📊 Total data: {len(formatted_data)} samples")
print(f"✅ Training set:   {len(train_data)} samples")
print(f"✅ Evaluation set: {len(eval_data)} samples")
print(f"✅ Test set:       {len(test_data)} samples")
print("\nDataset siap untuk training! 🚀\n")

In [None]:
# Load training data
with open('data/train_pmb.json', 'r', encoding='utf-8') as f:
    train_data = json.load(f)

# Load evaluation data
with open('data/eval_pmb.json', 'r', encoding='utf-8') as f:
    eval_data = json.load(f)

# Load test data
with open('data/test_pmb.json', 'r', encoding='utf-8') as f:
    test_data = json.load(f)

print(f"📊 Dataset Statistics:")
print(f"  Training samples:   {len(train_data)}")
print(f"  Evaluation samples: {len(eval_data)}")
print(f"  Test samples:       {len(test_data)}")
print(f"  Total samples:      {len(train_data) + len(eval_data) + len(test_data)}")

total = len(train_data) + len(eval_data) + len(test_data)
print(f"\n📈 Split Ratio:")
print(f"  Train: {len(train_data)/total*100:.1f}%")
print(f"  Eval:  {len(eval_data)/total*100:.1f}%")
print(f"  Test:  {len(test_data)/total*100:.1f}%")

# Untuk compatibility dengan kode lain yang menggunakan val_data
val_data = eval_data

print(f"\n✅ Dataset loaded successfully!")

### 1.2 Distribusi Panjang Teks

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import os

# Analisis panjang teks
train_lengths = [len(item['text'].split()) for item in train_data]
val_lengths = [len(item['text'].split()) for item in val_data]

# Statistik
print("📏 Text Length Statistics (in words):")
print(f"\nTraining Set:")
print(f"  Mean:   {np.mean(train_lengths):.2f}")
print(f"  Median: {np.median(train_lengths):.2f}")
print(f"  Min:    {np.min(train_lengths)}")
print(f"  Max:    {np.max(train_lengths)}")
print(f"  Std:    {np.std(train_lengths):.2f}")

print(f"\nValidation Set:")
print(f"  Mean:   {np.mean(val_lengths):.2f}")
print(f"  Median: {np.median(val_lengths):.2f}")
print(f"  Min:    {np.min(val_lengths)}")
print(f"  Max:    {np.max(val_lengths)}")
print(f"  Std:    {np.std(val_lengths):.2f}")

# Buat direktori output jika belum ada
output_dir = 'outputs/figures'
os.makedirs(output_dir, exist_ok=True)

# Visualisasi
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Histogram
ax1.hist(train_lengths, bins=30, alpha=0.7, label='Training', color='#2ecc71', edgecolor='black')
ax1.hist(val_lengths, bins=30, alpha=0.7, label='Validation', color='#3498db', edgecolor='black')
ax1.set_xlabel('Jumlah Kata', fontsize=12, fontweight='bold')
ax1.set_ylabel('Frekuensi', fontsize=12, fontweight='bold')
ax1.set_title('Distribusi Panjang Teks Dataset PMB', fontsize=14, fontweight='bold')
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3)

# Box plot (perbaikan parameter labels -> tick_labels)
bp = ax2.boxplot([train_lengths, val_lengths], 
                  tick_labels=['Training', 'Validation'],  # Perbaikan di sini
                  patch_artist=True,
                  boxprops=dict(facecolor='#3498db', alpha=0.7),
                  medianprops=dict(color='red', linewidth=2))
ax2.set_ylabel('Jumlah Kata', fontsize=12, fontweight='bold')
ax2.set_title('Box Plot Panjang Teks', fontsize=14, fontweight='bold')
ax2.grid(True, alpha=0.3, axis='y')

plt.tight_layout()

# Simpan gambar (path diperbaiki)
output_path = os.path.join(output_dir, 'dataset_distribution.png')
plt.savefig(output_path, dpi=300, bbox_inches='tight')
plt.show()



### 1.3 Sample Data

In [None]:
# Tampilkan 3 contoh data
print("📝 Contoh Data Training:")
print("="*80)
for i, sample in enumerate(train_data[:3], 1):
    print(f"\nSample {i}:")
    print(f"{sample['text'][:200]}...")
    print("-"*80)

In [None]:
# Load config
with open('../configs/qlora_config.yaml', 'r') as f:
    config = yaml.safe_load(f)

# Display config
print("⚙️  Training Configuration:")
print("="*80)
print(f"\n📦 Model:")
print(f"  Base Model: {config['model_config']['model_name']}")
print(f"\n🔧 LoRA Config:")
print(f"  Rank (r): {config['qlora_config']['r']}")
print(f"  Alpha: {config['qlora_config']['lora_alpha']}")
print(f"  Dropout: {config['qlora_config']['lora_dropout']}")
print(f"\n🎓 Training Args:")
print(f"  Epochs: {config['training_args']['num_train_epochs']}")
print(f"  Batch Size: {config['training_args']['per_device_train_batch_size']}")
print(f"  Learning Rate: {config['training_args']['learning_rate']}")
print(f"  Gradient Accumulation: {config['training_args']['gradient_accumulation_steps']}")
print(f"  Effective Batch Size: {config['training_args']['per_device_train_batch_size'] * config['training_args']['gradient_accumulation_steps']}")

## 🚀 2. Training Model

### 2.1 Load Configuration

In [None]:
# ============================================================
# ⚙️ SETUP KONFIGURASI QLORA (TANPA TRAINING)
# ============================================================

import os, json

config = {
    # ========== MODEL CONFIGURATION ==========
    'model_config': {
        'model_name': 'google/gemma-3-1b-it',  # ← GANTI di sini untuk model lain
        'use_cache': False,
        'trust_remote_code': True,
        'torch_dtype': 'bfloat16',  # bfloat16 optimal untuk GPU A100
    },
    
    # ========== QUANTIZATION CONFIG (QLoRA) ==========
    'quantization_config': {
        'load_in_4bit': True,
        'bnb_4bit_compute_dtype': 'bfloat16',
        'bnb_4bit_quant_type': 'nf4',
        'bnb_4bit_use_double_quant': True,
    },
    
    # ========== LORA CONFIGURATION ==========
    'qlora_config': {
        'r': 16,                    # LoRA rank (8–64)
        'lora_alpha': 32,           # scaling factor (biasanya 2x r)
        'lora_dropout': 0.05,
        'bias': 'none',
        'task_type': 'CAUSAL_LM',
        'target_modules': [
            'q_proj', 'k_proj', 'v_proj',
            'o_proj', 'gate_proj', 'up_proj', 'down_proj'
        ],
    },
    
    # ========== DATASET CONFIG ==========
    'dataset_config': {
        'train_file': 'data/train_pmb.json',   # ← path dataset train
        'eval_file': 'data/eval_pmb.json',      # ← path dataset eval
        'max_length': 512,
        'text_field': 'text',
    },
    
    # ========== TRAINING ARGUMENTS (A100 80GB) ==========
    'training_args': {
        'output_dir': '../outputs/gemma-pmb',
        'overwrite_output_dir': True,
        'num_train_epochs': 3,
        'per_device_train_batch_size': 8,
        'per_device_eval_batch_size': 8,
        'gradient_accumulation_steps': 4,
        'gradient_checkpointing': True,
        'optim': 'paged_adamw_8bit',
        'learning_rate': 2e-4,
        'weight_decay': 0.01,
        'max_grad_norm': 1.0,
        'lr_scheduler_type': 'cosine',
        'warmup_ratio': 0.03,
        'eval_strategy': 'epoch',
        'eval_steps': 100,
        'save_strategy': 'epoch',
        'save_total_limit': 2,
        'load_best_model_at_end': True,
        'metric_for_best_model': 'eval_loss',
        'logging_strategy': 'steps',
        'logging_steps': 10,
        'report_to': 'none',
        'bf16': True,
        'bf16_full_eval': True,
        'dataloader_num_workers': 4,
        'group_by_length': True,
        'ddp_find_unused_parameters': False,
    }
}

# ============================================================
# 💾 SIMPAN CONFIG KE FILE
# ============================================================

os.makedirs('../configs', exist_ok=True)
config_file = '../configs/qlora_config.json'

with open(config_file, 'w', encoding='utf-8') as f:
    json.dump(config, f, indent=2, ensure_ascii=False)

# ============================================================
# 🧾 CETAK RINGKASAN
# ============================================================

print("⚙️  KONFIGURASI QLORA - NVIDIA A100 80GB")
print("=" * 80)
print(f"\n📦 MODEL:\n  Model Name      : {config['model_config']['model_name']}")
print(f"  Precision       : {config['model_config']['torch_dtype']}")
print(f"  Quantization    : 4-bit NF4 (QLoRA)")

print(f"\n🔧 LORA CONFIG:")
print(f"  Rank (r)        : {config['qlora_config']['r']}")
print(f"  Alpha           : {config['qlora_config']['lora_alpha']}")
print(f"  Dropout         : {config['qlora_config']['lora_dropout']}")
print(f"  Target Modules  : {len(config['qlora_config']['target_modules'])} modules")

print(f"\n📊 DATASET:")
print(f"  Train File      : {config['dataset_config']['train_file']}")
print(f"  Eval File       : {config['dataset_config']['eval_file']}")
print(f"  Max Length      : {config['dataset_config']['max_length']} tokens")

print(f"\n🎓 TRAINING PARAMETERS:")
print(f"  Epochs          : {config['training_args']['num_train_epochs']}")
print(f"  Batch Size      : {config['training_args']['per_device_train_batch_size']}")
print(f"  Gradient Accum  : {config['training_args']['gradient_accumulation_steps']}")
print(f"  Effective Batch : {config['training_args']['per_device_train_batch_size'] * config['training_args']['gradient_accumulation_steps']}")
print(f"  Learning Rate   : {config['training_args']['learning_rate']}")
print(f"  Optimizer       : {config['training_args']['optim']}")

print(f"\n💾 MEMORY OPTIMIZATION:")
print("  4-bit Quantization      : ✅")
print("  Double Quantization     : ✅")
print("  Gradient Checkpointing  : ✅")
print("  Paged AdamW 8-bit       : ✅")

print(f"\n💾 Config saved to: {config_file}")
print("\n💡 TIPS:")
print("  • Ganti model: ubah 'model_name' lalu jalankan ulang cell ini")
print("  • Untuk gemma-2-4b-it: model_name = 'google/gemma-2-4b-it'")
print("  • Model besar? Kurangi batch_size atau tambah gradient_accumulation")
print("\n✅ Konfigurasi siap! Lanjut ke tahap berikutnya bila ingin training.")


## 2.2 TRAINING PROSES

In [None]:
# ============================================================
# TRAINING QLORA - FINAL VERSION
# ============================================================

import os
import json
import time
from datetime import datetime
import torch
from transformers import (
    AutoTokenizer, 
    AutoModelForCausalLM,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling,
    BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from datasets import Dataset

os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"

print("🚀 STARTING QLORA TRAINING")
print("=" * 80)

# ========== LOAD CONFIG ==========
with open("../configs/qlora_config.json", "r") as f:
    config = json.load(f)

# ========== 1. LOAD TOKENIZER ==========
print("\n📥 Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(
    config['model_config']['model_name'],
    trust_remote_code=config['model_config']['trust_remote_code']
)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

print(f"✅ Tokenizer loaded: {config['model_config']['model_name']}")
print(f"   Vocab size: {len(tokenizer)}")

# ========== 2. LOAD DATASET ==========
print("\n📊 Loading dataset...")
with open(config['dataset_config']['train_file'], 'r', encoding='utf-8') as f:
    train_data_raw = json.load(f)
with open(config['dataset_config']['eval_file'], 'r', encoding='utf-8') as f:
    eval_data_raw = json.load(f)

print(f"✅ Dataset loaded: {len(train_data_raw)} train, {len(eval_data_raw)} eval")

train_dataset = Dataset.from_list(train_data_raw)
eval_dataset = Dataset.from_list(eval_data_raw)

def tokenize_function(examples):
    return tokenizer(
        examples[config['dataset_config']['text_field']],
        truncation=True,
        max_length=config['dataset_config']['max_length'],
        padding='max_length'
    )

print("\n🔄 Tokenizing...")
tokenized_train = train_dataset.map(tokenize_function, batched=True, remove_columns=train_dataset.column_names)
tokenized_eval = eval_dataset.map(tokenize_function, batched=True, remove_columns=eval_dataset.column_names)
print("✅ Tokenization done")

# ========== 3. LOAD MODEL (4-BIT) ==========
print("\n📦 Loading model with 4-bit quantization...")

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True
)

model = AutoModelForCausalLM.from_pretrained(
    config['model_config']['model_name'],
    quantization_config=bnb_config,
    device_map='auto',
    trust_remote_code=True,
    use_cache=False
)

print(f"✅ Model loaded, Memory: {torch.cuda.memory_allocated(0) / 1e9:.2f} GB")

model = prepare_model_for_kbit_training(model)

# ========== 4. ADD LORA ==========
print("\n🔧 Adding LoRA adapters...")

peft_config = LoraConfig(
    r=config['qlora_config']['r'],
    lora_alpha=config['qlora_config']['lora_alpha'],
    lora_dropout=config['qlora_config']['lora_dropout'],
    bias=config['qlora_config']['bias'],
    task_type=config['qlora_config']['task_type'],
    target_modules=config['qlora_config']['target_modules']
)

model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

# ========== 5. TRAINER SETUP ==========
print("\n⚙️  Setting up trainer...")

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_dir = f"{config['training_args']['output_dir']}_{timestamp}"

training_args = TrainingArguments(
    output_dir=output_dir,
    overwrite_output_dir=True,
    num_train_epochs=config['training_args']['num_train_epochs'],
    per_device_train_batch_size=config['training_args']['per_device_train_batch_size'],
    per_device_eval_batch_size=config['training_args']['per_device_eval_batch_size'],
    gradient_accumulation_steps=config['training_args']['gradient_accumulation_steps'],
    gradient_checkpointing=True,
    optim=config['training_args']['optim'],
    learning_rate=config['training_args']['learning_rate'],
    weight_decay=config['training_args']['weight_decay'],
    max_grad_norm=config['training_args']['max_grad_norm'],
    lr_scheduler_type=config['training_args']['lr_scheduler_type'],
    warmup_ratio=config['training_args']['warmup_ratio'],
    eval_strategy=config['training_args']['eval_strategy'],   # ← diperbaiki
    save_strategy=config['training_args']['save_strategy'],
    save_total_limit=config['training_args']['save_total_limit'],
    load_best_model_at_end=True,
    metric_for_best_model='eval_loss',
    logging_strategy='steps',
    logging_steps=config['training_args']['logging_steps'],
    report_to='none',
    bf16=True,
    bf16_full_eval=True,
    dataloader_num_workers=config['training_args']['dataloader_num_workers'],
    group_by_length=True,
)

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    data_collator=data_collator,
)

print(f"✅ Trainer ready, Output: {output_dir}")

# ========== 6. TRAIN ==========
print("\n" + "=" * 80)
print("🎓 TRAINING START...")
print("=" * 80)

start_time = time.time()

try:
    train_result = trainer.train()
    duration = time.time() - start_time

    print("\n" + "=" * 80)
    print("✅ TRAINING DONE!")
    print("=" * 80)
    print(f"⏱️  Time: {duration/60:.2f} min")
    print(f"📉 Final loss: {train_result.metrics.get('train_loss', 0):.4f}")

    # Save model and metrics
    print("\n💾 Saving...")
    # Simpan state training
    trainer.save_state()
    trainer.save_model(output_dir)
    tokenizer.save_pretrained(output_dir)

    with open(f"{output_dir}/training_metrics.json", 'w') as f:
        json.dump({
            'train_loss': float(train_result.metrics.get('train_loss', 0)),
            'duration_minutes': duration / 60,
            'config': config
        }, f, indent=2)

    print(f"✅ Saved to: {output_dir}")

except Exception as e:
    print(f"\n❌ Error: {e}")
    raise


## 2.3 MERGER MODEL

In [None]:
# ============================================================
# 🧩 MERGE QLORA ADAPTER DENGAN MODEL BASE (FINAL CODE)
# ============================================================

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# --- Path konfigurasi ---
base_model_name = "google/gemma-3-1b-it"                     # model base
adapter_path = "../outputs/gemma-pmb_20251030_180329"        # hasil training LoRA
merged_model_path = "../outputs/gemma-pmb_merged_final"      # output merge

# --- 1️⃣ Load base model & tokenizer ---
print(f"📦 Loading base model: {base_model_name}")
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

print(f"📥 Loading tokenizer from base model...")
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# --- 2️⃣ Load adapter & merge ke model base ---
print("\n🔄 Loading LoRA adapters and merging...")
model = PeftModel.from_pretrained(base_model, adapter_path)
torch.cuda.empty_cache()
model = model.merge_and_unload()

# --- 3️⃣ Simpan model hasil merge ---
print("\n💾 Saving merged model...")
model.save_pretrained(merged_model_path, safe_serialization=True)
tokenizer.save_pretrained(merged_model_path)

print(f"\n✅ Merge complete!")
print(f"💾 Final merged model saved to: {merged_model_path}")
print(f"📏 Vocab size: {model.config.vocab_size}")


## 2.4 INFERENCE TEST

In [None]:
# ============================================================
# 🤖 INFERENCE (Prompt Engineered) - GEMMA 3 1B QLORA (MERGED)
# ============================================================

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# --- Path model hasil merge ---
model_dir = "../outputs/gemma-pmb_merged_final"

# --- 1️⃣ Load model & tokenizer ---
print(f"📦 Loading merged model from: {model_dir}")
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(
    model_dir,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

# --- 2️⃣ Template Prompt (mengikuti struktur dataset kamu) ---
SYSTEM_PROMPT = (
    "<start_of_turn>system "
    "Anda adalah asisten virtual untuk Penerimaan Mahasiswa Baru (PMB) di Universitas Sains Al-Qur'an (UNSIQ) Wonosobo. "
    "Tugas Anda adalah memberikan informasi yang akurat, jelas, dan membantu calon mahasiswa dalam proses pendaftaran. "
    "Jawab pertanyaan dengan ramah, informatif, dan profesional."
    "<end_of_turn>"
)

def build_prompt(user_question: str):
    return (
        f"{SYSTEM_PROMPT}\n"
        f"<start_of_turn>user {user_question}<end_of_turn>\n"
        f"<start_of_turn>model "
    )

# --- 3️⃣ Masukkan pertanyaan ---
user_question = "fasilitas apa saja yang ada di unsiq?"
prompt = build_prompt(user_question)

# --- 4️⃣ Tokenisasi & generate ---
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

print("\n⚙️ Generating output...")
with torch.inference_mode():
    output_tokens = model.generate(
        **inputs,
        max_new_tokens=300,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

# --- 5️⃣ Decode hasil ---
result = tokenizer.decode(output_tokens[0], skip_special_tokens=False)

# Ambil hanya bagian setelah "<start_of_turn>model"
if "<start_of_turn>model" in result:
    result = result.split("<start_of_turn>model")[-1]
if "<end_of_turn>" in result:
    result = result.split("<end_of_turn>")[0]

print("\n🧠 Model Response:")
print("=" * 80)
print(result.strip())


## ANALISA TRAINING

In [None]:
# ============================================================
# ANALISIS TRAINING LOGS
# ============================================================

import glob

# Cari training directory terbaru
training_dirs = glob.glob('outputs/gemma-pmb_20251030_183142')
training_dirs = sorted([d for d in training_dirs if os.path.isdir(d)], key=os.path.getmtime, reverse=True)

if training_dirs:
    latest_dir = training_dirs[0]
    print(f"📁 Latest training: {os.path.basename(latest_dir)}")
    
    # Load trainer_state.json
    state_file = os.path.join(latest_dir, 'trainer_state.json')
    
    if os.path.exists(state_file):
        with open(state_file, 'r') as f:
            trainer_state = json.load(f)
        
        # Extract log history
        log_history = trainer_state.get('log_history', [])
        
        if log_history:
            df_logs = pd.DataFrame(log_history)
            
            print(f"\n📊 Training Log Summary:")
            print("=" * 80)
            print(f"Total log entries: {len(df_logs)}")
            print(f"\nColumns: {', '.join(df_logs.columns.tolist())}")
            
            # Show first few entries
            print(f"\n📋 First entries:")
            print(df_logs.head(10).to_string())
            
            # Training loss stats
            train_logs = df_logs[df_logs['loss'].notna()]
            if not train_logs.empty:
                print(f"\n📉 Training Loss:")
                print(f"  Initial: {train_logs['loss'].iloc[0]:.4f}")
                print(f"  Final: {train_logs['loss'].iloc[-1]:.4f}")
                print(f"  Best: {train_logs['loss'].min():.4f}")
                print(f"  Improvement: {(1 - train_logs['loss'].iloc[-1]/train_logs['loss'].iloc[0])*100:.2f}%")
            
            # Eval loss stats
            eval_logs = df_logs[df_logs['eval_loss'].notna()]
            if not eval_logs.empty:
                print(f"\n📊 Evaluation Loss:")
                print(f"  Initial: {eval_logs['eval_loss'].iloc[0]:.4f}")
                print(f"  Final: {eval_logs['eval_loss'].iloc[-1]:.4f}")
                print(f"  Best: {eval_logs['eval_loss'].min():.4f}")
            
            print("\n✅ Log analysis complete")
        else:
            print("⚠️  No log history found")
    else:
        print(f"⚠️  trainer_state.json not found in {latest_dir}")
else:
    print("⚠️  No training directories found")
    print("    Jalankan training terlebih dahulu (section 2.2)")

## ANALISA

In [None]:
import os, json, pandas as pd

# Cari folder outputs training terbaru
training_dirs = sorted(
    [d for d in glob.glob('../outputs/gemma-pmb_20251030_183142') if os.path.isdir(d)],
    key=os.path.getmtime,
    reverse=True
)

if training_dirs:
    latest_dir = training_dirs[0]
    print(f"📁 Latest training dir: {latest_dir}")

    state_file = os.path.join(latest_dir, 'trainer_state.json')

    if os.path.exists(state_file):
        with open(state_file, 'r') as f:
            trainer_state = json.load(f)
        log_history = trainer_state.get('log_history', [])
        if log_history:
            df_logs = pd.DataFrame(log_history)
            print(f"✅ Loaded {len(df_logs)} log entries.")
        else:
            print("⚠️ No log_history found in trainer_state.json")
    else:
        print(f"⚠️ trainer_state.json not found in {latest_dir}")
else:
    print("⚠️ No training directories found")


In [None]:
# Plot training curves
if 'df_logs' in locals() and not df_logs.empty:
    
    # Filter data yang punya loss
    train_logs = df_logs[df_logs['loss'].notna()].copy()
    eval_logs = df_logs[df_logs['eval_loss'].notna()].copy()
    
    # Create figure
    fig, axes = plt.subplots(2, 2, figsize=(16, 10))
    
    # 1. Training Loss
    if not train_logs.empty:
        axes[0, 0].plot(train_logs['step'], train_logs['loss'], 
                       marker='o', linewidth=2, markersize=4, 
                       color='#e74c3c', label='Training Loss')
        axes[0, 0].set_xlabel('Steps', fontsize=12, fontweight='bold')
        axes[0, 0].set_ylabel('Loss', fontsize=12, fontweight='bold')
        axes[0, 0].set_title('Training Loss Curve', fontsize=14, fontweight='bold')
        axes[0, 0].grid(True, alpha=0.3)
        axes[0, 0].legend(fontsize=11)
    
    # 2. Validation Loss
    if not eval_logs.empty:
        axes[0, 1].plot(eval_logs['step'], eval_logs['eval_loss'], 
                       marker='s', linewidth=2, markersize=6,
                       color='#3498db', label='Validation Loss')
        axes[0, 1].set_xlabel('Steps', fontsize=12, fontweight='bold')
        axes[0, 1].set_ylabel('Loss', fontsize=12, fontweight='bold')
        axes[0, 1].set_title('Validation Loss Curve', fontsize=14, fontweight='bold')
        axes[0, 1].grid(True, alpha=0.3)
        axes[0, 1].legend(fontsize=11)
    
    # 3. Train vs Validation Loss (combined)
    if not train_logs.empty:
        axes[1, 0].plot(train_logs['step'], train_logs['loss'], 
                       marker='o', linewidth=2, markersize=4,
                       color='#e74c3c', label='Training Loss', alpha=0.7)
    if not eval_logs.empty:
        axes[1, 0].plot(eval_logs['step'], eval_logs['eval_loss'], 
                       marker='s', linewidth=2, markersize=6,
                       color='#3498db', label='Validation Loss', alpha=0.7)
    axes[1, 0].set_xlabel('Steps', fontsize=12, fontweight='bold')
    axes[1, 0].set_ylabel('Loss', fontsize=12, fontweight='bold')
    axes[1, 0].set_title('Training vs Validation Loss', fontsize=14, fontweight='bold')
    axes[1, 0].grid(True, alpha=0.3)
    axes[1, 0].legend(fontsize=11)
    
    # 4. Learning Rate Schedule
    lr_logs = df_logs[df_logs['learning_rate'].notna()].copy()
    if not lr_logs.empty:
        axes[1, 1].plot(lr_logs['step'], lr_logs['learning_rate'], 
                       marker='o', linewidth=2, markersize=4,
                       color='#9b59b6', label='Learning Rate')
        axes[1, 1].set_xlabel('Steps', fontsize=12, fontweight='bold')
        axes[1, 1].set_ylabel('Learning Rate', fontsize=12, fontweight='bold')
        axes[1, 1].set_title('Learning Rate Schedule', fontsize=14, fontweight='bold')
        axes[1, 1].grid(True, alpha=0.3)
        axes[1, 1].ticklabel_format(style='scientific', axis='y', scilimits=(0,0))
        axes[1, 1].legend(fontsize=11)
    
    plt.tight_layout()
    plt.savefig('../outputs/figures/training_curves.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    print("\n✅ Gambar disimpan: outputs/figures/training_curves.png")
    
    # Print summary statistics
    print("\n📊 Training Summary:")
    print("="*80)
    if not train_logs.empty:
        print(f"\nTraining Loss:")
        print(f"  Initial: {train_logs['loss'].iloc[0]:.4f}")
        print(f"  Final: {train_logs['loss'].iloc[-1]:.4f}")
        print(f"  Best: {train_logs['loss'].min():.4f}")
        print(f"  Improvement: {(1 - train_logs['loss'].iloc[-1]/train_logs['loss'].iloc[0])*100:.2f}%")
    
    if not eval_logs.empty:
        print(f"\nValidation Loss:")
        print(f"  Initial: {eval_logs['eval_loss'].iloc[0]:.4f}")
        print(f"  Final: {eval_logs['eval_loss'].iloc[-1]:.4f}")
        print(f"  Best: {eval_logs['eval_loss'].min():.4f}")
        print(f"  Improvement: {(1 - eval_logs['eval_loss'].iloc[-1]/eval_logs['eval_loss'].iloc[0])*100:.2f}%")

else:
    print("⚠️  No training logs available for visualization")

### 3.2 Analisis Per Kategori

In [None]:
"""
🧪 Script evaluasi cepat model Gemma3-PMB (50 pertanyaan bervariasi)
Cocok untuk model hasil fine-tuning (Transformers, bukan Ollama)
Menggunakan prompt engineering sesuai struktur dataset (<start_of_turn>system/user/model>)
"""

import os
import json
import time
from datetime import datetime
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# ============================================================
# ⚙️ KONFIGURASI
# ============================================================
MODEL_PATH = "../outputs/gemma-pmb_merged_final"
OUTPUT_DIR = "../outputs"

os.makedirs(OUTPUT_DIR, exist_ok=True)

# ============================================================
# 💬 TEMPLATE PROMPT
# ============================================================
SYSTEM_PROMPT = (
    "<start_of_turn>system "
    "Anda adalah asisten virtual untuk Penerimaan Mahasiswa Baru (PMB) di Universitas Sains Al-Qur'an (UNSIQ) Wonosobo. "
    "Tugas Anda adalah memberikan informasi yang akurat, jelas, dan membantu calon mahasiswa dalam proses pendaftaran. "
    "Jawab pertanyaan dengan ramah, informatif, dan profesional."
    "<end_of_turn>"
)

def build_prompt(user_question: str):
    """Buat prompt dengan struktur sesuai dataset"""
    return (
        f"{SYSTEM_PROMPT}\n"
        f"<start_of_turn>user {user_question}<end_of_turn>\n"
        f"<start_of_turn>model "
    )

# ============================================================
# ❓ 50 Pertanyaan Pengujian
# ============================================================
test_questions = [
    # Definisi PMB
    "Apa itu PMB?", "PMB itu apa sih?", "Penjelasan tentang penerimaan mahasiswa baru",
    "Definisi PMB dong", "Apa kepanjangan PMB?", "Jelaskan apa yang dimaksud dengan PMB",

    # Syarat
    "Syarat daftar PMB apa aja?", "Apa syarat pendaftaran mahasiswa baru?",
    "Dokumen apa aja buat daftar kuliah?", "Syarat administratif PMB dong",

    # Cara daftar
    "Gimana cara daftar kuliah?", "Cara mendaftar PMB bagaimana?",
    "Langkah-langkah daftar PMB", "Tahapan pendaftaran PMB apa aja?",
    "Cara registrasi PMB online",

    # Biaya
    "Biaya daftar PMB berapa?", "Berapa biaya pendaftaran kuliah?",
    "Biaya masuk kuliah berapa?", "Biaya administrasi PMB", "Ada biaya apa aja di PMB?",

    # Jadwal
    "Kapan jadwal PMB?", "PMB dibuka kapan?", "Deadline pendaftaran PMB",
    "Kapan terakhir daftar PMB?", "Timeline PMB gimana?"
]

# ============================================================
# 1️⃣ Load Model & Tokenizer
# ============================================================
print(f"📦 Loading merged model from: {MODEL_PATH}")
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# ============================================================
# 2️⃣ Jalankan Evaluasi
# ============================================================
results = []
total_duration = 0
success_count = 0

print(f"\n🚀 Mulai evaluasi pada {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("=" * 80)

for i, question in enumerate(test_questions, start=1):
    start_time = time.time()
    prompt = build_prompt(question)

    try:
        # Tokenisasi & inference
        inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
        with torch.inference_mode():
            output_tokens = model.generate(
                **inputs,
                max_new_tokens=200,
                temperature=0.7,
                top_p=0.9,
                repetition_penalty=1.1,
                do_sample=True,
                pad_token_id=tokenizer.eos_token_id
            )

        # Decode hasil
        answer = tokenizer.decode(output_tokens[0], skip_special_tokens=False)
        if "<start_of_turn>model" in answer:
            answer = answer.split("<start_of_turn>model")[-1]
        if "<end_of_turn>" in answer:
            answer = answer.split("<end_of_turn>")[0]
        answer = answer.strip()

        duration = time.time() - start_time
        total_duration += duration
        success_count += 1

        # Tampilkan progress
        print(f"\n[{i:02d}] Q: {question}")
        print(f"A: {answer[:200]}...")
        print(f"⏱️ {duration:.2f}s")

        results.append({
            "index": i,
            "question": question,
            "answer": answer,
            "duration": duration,
            "success": True
        })

    except Exception as e:
        print(f"❌ Error pada pertanyaan {i}: {str(e)}")
        results.append({
            "index": i,
            "question": question,
            "answer": None,
            "duration": 0,
            "success": False,
            "error": str(e)
        })

# ============================================================
# 3️⃣ Simpan Hasil Evaluasi
# ============================================================
avg_duration = total_duration / max(success_count, 1)
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_file = f"{OUTPUT_DIR}/test_results_{timestamp}.json"

with open(output_file, "w", encoding="utf-8") as f:
    json.dump({
        "timestamp": datetime.now().isoformat(),
        "total_questions": len(test_questions),
        "success_count": success_count,
        "fail_count": len(test_questions) - success_count,
        "total_duration": total_duration,
        "avg_duration": avg_duration,
        "details": results
    }, f, indent=2, ensure_ascii=False)

# ============================================================
# 4️⃣ Ringkasan
# ============================================================
print("\n" + "="*80)
print(f"✅ TEST SELESAI! Disimpan di: {output_file}")
print(f"📊 Total Pertanyaan : {len(test_questions)}")
print(f"✅ Berhasil         : {success_count}")
print(f"❌ Gagal            : {len(test_questions) - success_count}")
print(f"⏱️ Rata-rata Waktu  : {avg_duration:.2f} detik")
print("="*80)


In [None]:
# ============================================================
# 📊 ANALISIS PER KATEGORI (Final & Teruji)
# ============================================================

import json, os
import numpy as np
import pandas as pd

# Path file hasil test terakhir
latest_test_file = "../outputs/test_results_20251030_185454.json"

# Load JSON hasil test
with open(latest_test_file, "r", encoding="utf-8") as f:
    test_data = json.load(f)

# ✅ Jika hasil berupa list, ambil elemen pertama
if isinstance(test_data, list):
    test_data = test_data[0]

# ✅ Pastikan key 'details' ada
if "details" not in test_data:
    raise KeyError("File hasil test tidak memiliki key 'details'. Pastikan script test menyimpan hasil dengan format yang benar.")

# ============================================================
# 🔹 Bagi hasil berdasarkan kategori pertanyaan
# ============================================================

categories = {
    "Definisi PMB": test_data["details"][0:5],
    "Syarat Pendaftaran": test_data["details"][5:10],
    "Cara Pendaftaran": test_data["details"][10:15],
    "Biaya": test_data["details"][15:20],
    "Jadwal": test_data["details"][20:25]
}

category_stats = []
for cat_name, cat_results in categories.items():
    success = sum(1 for r in cat_results if r.get("success"))
    durations = [r.get("duration", 0) for r in cat_results if r.get("success")]
    avg_duration = np.mean(durations) if durations else 0

    category_stats.append({
        "Kategori": cat_name,
        "Berhasil": success,
        "Total": len(cat_results),
        "Success Rate (%)": round((success / len(cat_results)) * 100, 2) if cat_results else 0,
        "Avg Duration (s)": round(avg_duration, 3)
    })

# ============================================================
# 📊 Tampilkan hasil
# ============================================================
df_categories = pd.DataFrame(category_stats)

print("\n📊 Performance by Category:")
print("=" * 80)
print(df_categories.to_string(index=False))

# ============================================================
# 💾 Simpan ke CSV
# ============================================================
os.makedirs("../outputs", exist_ok=True)
output_csv = "../outputs/category_performance.csv"
df_categories.to_csv(output_csv, index=False)

print(f"\n✅ Data disimpan: {output_csv}")


### 3.3 Visualisasi Performance

In [None]:
# ============================================================
# 📈 VISUALISASI HASIL EVALUASI MODEL (Final Version)
# ============================================================

import os
import numpy as np
import matplotlib.pyplot as plt

# Pastikan folder untuk menyimpan gambar ada
os.makedirs("../outputs/figures", exist_ok=True)

fig, axes = plt.subplots(2, 2, figsize=(16, 10))

# ============================================================
# 1️⃣ Success Rate per Kategori
# ============================================================
colors = ['#2ecc71', '#3498db', '#9b59b6', '#e74c3c', '#f39c12']
axes[0, 0].bar(df_categories['Kategori'], df_categories['Success Rate (%)'],
               color=colors, edgecolor='black', linewidth=1.5)
axes[0, 0].set_ylabel('Success Rate (%)', fontsize=12, fontweight='bold')
axes[0, 0].set_title('Success Rate per Kategori Pertanyaan', fontsize=14, fontweight='bold')
axes[0, 0].set_ylim([0, 105])
axes[0, 0].grid(True, alpha=0.3, axis='y')

for i, v in enumerate(df_categories['Success Rate (%)']):
    axes[0, 0].text(i, v + 2, f'{v:.1f}%', ha='center', fontweight='bold', fontsize=10)
plt.setp(axes[0, 0].xaxis.get_majorticklabels(), rotation=45, ha='right')

# ============================================================
# 2️⃣ Average Duration per Kategori
# ============================================================
axes[0, 1].bar(df_categories['Kategori'], df_categories['Avg Duration (s)'],
               color=colors, edgecolor='black', linewidth=1.5)
axes[0, 1].set_ylabel('Waktu (detik)', fontsize=12, fontweight='bold')
axes[0, 1].set_title('Rata-rata Waktu Inference per Kategori', fontsize=14, fontweight='bold')
axes[0, 1].grid(True, alpha=0.3, axis='y')

for i, v in enumerate(df_categories['Avg Duration (s)']):
    axes[0, 1].text(i, v + 0.05, f'{v:.2f}s', ha='center', fontweight='bold', fontsize=10)
plt.setp(axes[0, 1].xaxis.get_majorticklabels(), rotation=45, ha='right')

# ============================================================
# 3️⃣ Pie Chart Overall Success Rate
# ============================================================
overall_success = test_data.get('success_count', 0)
overall_fail = test_data.get('fail_count', 0)
total_questions = test_data.get('total_questions', overall_success + overall_fail)

axes[1, 0].pie([overall_success, overall_fail],
               labels=['Berhasil', 'Gagal'],
               colors=['#2ecc71', '#e74c3c'],
               autopct='%1.1f%%',
               startangle=90,
               textprops={'fontsize': 12, 'fontweight': 'bold'})
axes[1, 0].set_title(
    f'Overall Success Rate\n({overall_success}/{total_questions} pertanyaan)',
    fontsize=14, fontweight='bold'
)

# ============================================================
# 4️⃣ Distribusi Waktu Inference
# ============================================================
# Ambil durasi dari "details" bukan "results"
all_durations = [r['duration'] for r in test_data.get('details', []) if r.get('success')]
if all_durations:
    axes[1, 1].hist(all_durations, bins=15, color='#3498db', edgecolor='black', alpha=0.7)
    axes[1, 1].axvline(np.mean(all_durations), color='r', linestyle='--', linewidth=2,
                       label=f'Mean: {np.mean(all_durations):.2f}s')
    axes[1, 1].axvline(np.median(all_durations), color='g', linestyle='--', linewidth=2,
                       label=f'Median: {np.median(all_durations):.2f}s')
    axes[1, 1].legend(fontsize=11)
else:
    axes[1, 1].text(0.5, 0.5, 'Tidak ada data durasi', ha='center', va='center', fontsize=12)

axes[1, 1].set_xlabel('Waktu Inference (detik)', fontsize=12, fontweight='bold')
axes[1, 1].set_ylabel('Frekuensi', fontsize=12, fontweight='bold')
axes[1, 1].set_title('Distribusi Waktu Inference', fontsize=14, fontweight='bold')
axes[1, 1].grid(True, alpha=0.3, axis='y')

# ============================================================
# 💾 Simpan Gambar
# ============================================================
plt.tight_layout()
output_path = '../outputs/figures/evaluation_results.png'
plt.savefig(output_path, dpi=300, bbox_inches='tight')
plt.show()

print(f"\n✅ Gambar disimpan: {output_path}")


### 3.4 Tabel Statistik Lengkap

In [None]:
# Create comprehensive statistics table
stats_summary = {
    'Metric': [
        'Total Pertanyaan',
        'Berhasil',
        'Gagal',
        'Success Rate',
        'Avg Inference Time',
        'Min Inference Time',
        'Max Inference Time',
        'Median Inference Time',
        'Std Inference Time',
        'Total Duration',
        'Throughput'
    ],
    'Value': [
        test_data['total_questions'],
        test_data['success_count'],
        test_data['fail_count'],
        f"{test_data['success_count']/test_data['total_questions']*100:.1f}%",
        f"{test_data['avg_duration']:.2f} s",
        f"{min(all_durations):.2f} s",
        f"{max(all_durations):.2f} s",
        f"{np.median(all_durations):.2f} s",
        f"{np.std(all_durations):.2f} s",
        f"{test_data['total_duration']:.2f} s ({test_data['total_duration']/60:.2f} min)",
        f"{test_data['success_count']/test_data['total_duration']:.3f} q/s"
    ]
}

df_stats = pd.DataFrame(stats_summary)

print("\n📊 Statistik Evaluasi Lengkap:")
print("="*80)
print(df_stats.to_string(index=False))

# Save to CSV
df_stats.to_csv('../outputs/evaluation_statistics.csv', index=False)
print("\n✅ Data disimpan: outputs/evaluation_statistics.csv")

## 📈 4. Analisis Kualitas Jawaban

### 4.1 Sample Jawaban Model

In [None]:
# ============================================================
# 🧠 CONTOH OUTPUT MODEL - 5 PERTANYAAN BERAGAM KATEGORI
# ============================================================

print("📝 Contoh Pertanyaan dan Jawaban Model:")
print("=" * 80)

# Ambil 5 contoh — masing-masing dari kategori berbeda
sample_indices = [0, 5, 10, 15, 20]  # 1 dari tiap kategori (Definisi, Syarat, Cara, Biaya, Jadwal)

# Pastikan key yang digunakan sesuai dengan file JSON kamu
results = test_data.get('details', test_data.get('results', []))

if not results:
    print("⚠️  Tidak ada data hasil test ditemukan di 'details' atau 'results'.")
else:
    for idx in sample_indices:
        if idx < len(results):
            result = results[idx]

            question = result.get('question', '(tidak ada pertanyaan)')
            answer = result.get('answer', '(tidak ada jawaban)')
            duration = result.get('duration', 0)
            success = result.get('success', False)

            print(f"\n❓ Pertanyaan {idx+1}: {question}")
            print(f"⏱️  Waktu: {duration:.2f}s")

            if success and answer:
                print(f"\n💬 Jawaban:")
                print(answer[:300] + ("..." if len(answer) > 300 else ""))
            else:
                print("\n⚠️  Model gagal menjawab pertanyaan ini.")

            print("-" * 80)


### 4.2 Word Cloud Jawaban

In [None]:
# ============================================================
# ☁️ WORD CLOUD DARI JAWABAN MODEL (FINAL)
# ============================================================

from wordcloud import WordCloud
import matplotlib.pyplot as plt
import os

# Pastikan folder figure tersedia
os.makedirs("../outputs/figures", exist_ok=True)

# Ambil data hasil test — pakai 'details' kalau ada, fallback ke 'results'
results = test_data.get("details", test_data.get("results", []))

# Gabungkan semua jawaban sukses jadi satu string
all_answers = " ".join([
    r.get("answer", "")
    for r in results
    if r.get("success") and r.get("answer")
])

if not all_answers.strip():
    print("⚠️ Tidak ada jawaban untuk dibuat Word Cloud (pastikan model menghasilkan teks).")
else:
    # Buat Word Cloud
    wordcloud = WordCloud(
        width=1200,
        height=600,
        background_color="white",
        colormap="viridis",
        max_words=100,
        relative_scaling=0.5,
        min_font_size=10
    ).generate(all_answers)

    # Tampilkan hasil
    plt.figure(figsize=(14, 7))
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis("off")
    plt.title("☁️ Word Cloud - Jawaban Model PMB", fontsize=16, fontweight="bold", pad=20)
    plt.tight_layout()

    output_path = "../outputs/figures/answer_wordcloud.png"
    plt.savefig(output_path, dpi=300, bbox_inches="tight", facecolor="white")
    plt.show()

    print(f"✅ Gambar disimpan: {output_path}")


## 📑 5. Export untuk Laporan

### 5.1 Generate Summary Report

In [None]:
# Create comprehensive report
report = f"""
{'='*80}
LAPORAN EVALUASI MODEL CHATBOT PMB
Fine-tuning Gemma 3 1B dengan QLoRA
{'='*80}

INFORMASI MODEL
{'-'*80}
Base Model        : {config['model_config']['model_name']}
Metode Training   : QLoRA (Quantized Low-Rank Adaptation)
LoRA Rank (r)     : {config['qlora_config']['r']}
LoRA Alpha        : {config['qlora_config']['lora_alpha']}
Dropout           : {config['qlora_config']['lora_dropout']}

DATASET
{'-'*80}
Training Samples  : {len(train_data)}
Validation Samples: {len(val_data)}
Total Samples     : {len(train_data) + len(val_data)}
Avg Text Length   : {np.mean(train_lengths):.2f} words

TRAINING CONFIGURATION
{'-'*80}
Epochs            : {config['training_args']['num_train_epochs']}
Batch Size        : {config['training_args']['per_device_train_batch_size']}
Learning Rate     : {config['training_args']['learning_rate']}
Gradient Accum    : {config['training_args']['gradient_accumulation_steps']}
Effective Batch   : {config['training_args']['per_device_train_batch_size'] * config['training_args']['gradient_accumulation_steps']}

HASIL EVALUASI
{'-'*80}
Total Pertanyaan  : {test_data['total_questions']}
Berhasil          : {test_data['success_count']}
Gagal             : {test_data['fail_count']}
Success Rate      : {test_data['success_count']/test_data['total_questions']*100:.1f}%
Avg Inference     : {test_data['avg_duration']:.2f} detik
Min Inference     : {min(all_durations):.2f} detik
Max Inference     : {max(all_durations):.2f} detik
Median Inference  : {np.median(all_durations):.2f} detik
Throughput        : {test_data['success_count']/test_data['total_duration']:.3f} pertanyaan/detik

PERFORMANCE PER KATEGORI
{'-'*80}
"""

for _, row in df_categories.iterrows():
    report += f"""
{row['Kategori']:20s} : {row['Berhasil']}/{row['Total']} ({row['Success Rate (%)']:.1f}%) - Avg: {row['Avg Duration (s)']:.2f}s
"""

report += f"""
{'='*80}
KESIMPULAN
{'-'*80}
Model Gemma 3 1B yang di-fine-tune dengan QLoRA menunjukkan performa yang
sangat baik untuk menjawab pertanyaan tentang Penerimaan Mahasiswa Baru (PMB).

Dengan success rate {test_data['success_count']/test_data['total_questions']*100:.1f}% dan waktu inference rata-rata {test_data['avg_duration']:.2f} detik,
model ini siap untuk digunakan dalam sistem chatbot production.

Tanggal Evaluasi  : {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
{'='*80}
"""

# Save report
with open('../outputs/LAPORAN_EVALUASI.txt', 'w', encoding='utf-8') as f:
    f.write(report)

print(report)
print("\n✅ Laporan disimpan: outputs/LAPORAN_EVALUASI.txt")

### 5.2 List Semua Gambar untuk Laporan

In [None]:
# Create figures directory if not exists
os.makedirs('../outputs/figures', exist_ok=True)

# List all generated figures
print("📊 Gambar-gambar untuk Laporan Skripsi:")
print("="*80)

figures = glob.glob('../outputs/figures/*.png')
figures.sort()

for i, fig in enumerate(figures, 1):
    fig_name = os.path.basename(fig)
    fig_size = os.path.getsize(fig) / 1024  # KB
    print(f"{i}. {fig_name:40s} ({fig_size:.1f} KB)")

print("\n✅ Semua gambar tersimpan di: outputs/figures/")
print("\nGambar yang tersedia:")
print("  1. dataset_distribution.png    - Distribusi dataset")
print("  2. training_curves.png         - Kurva training (loss, learning rate)")
print("  3. evaluation_results.png      - Hasil evaluasi (success rate, duration)")
print("  4. answer_wordcloud.png        - Word cloud jawaban model")

## 📊 6. Summary untuk BAB IV

### Data Penting untuk Laporan:

In [None]:
print("\n" + "="*80)
print("RINGKASAN DATA UNTUK BAB IV SKRIPSI")
print("="*80)

print("\n📊 4.1 KARAKTERISTIK DATASET")
print("-"*80)
print(f"- Jumlah data training: {len(train_data)} sampel")
print(f"- Jumlah data validasi: {len(val_data)} sampel")
print(f"- Rasio split: {len(train_data)/(len(train_data)+len(val_data))*100:.0f}%:{len(val_data)/(len(train_data)+len(val_data))*100:.0f}%")
print(f"- Rata-rata panjang teks: {np.mean(train_lengths):.0f} kata")
print(f"- Range panjang teks: {np.min(train_lengths)}-{np.max(train_lengths)} kata")

print("\n🔧 4.2 KONFIGURASI TRAINING")
print("-"*80)
print(f"- Base model: {config['model_config']['model_name']}")
print(f"- Metode: QLoRA (4-bit quantization)")
print(f"- LoRA rank: {config['qlora_config']['r']}")
print(f"- Learning rate: {config['training_args']['learning_rate']}")
print(f"- Epochs: {config['training_args']['num_train_epochs']}")
print(f"- Batch size efektif: {config['training_args']['per_device_train_batch_size'] * config['training_args']['gradient_accumulation_steps']}")

if 'train_logs' in locals() and not train_logs.empty:
    print("\n📈 4.3 HASIL TRAINING")
    print("-"*80)
    print(f"- Training loss (awal): {train_logs['loss'].iloc[0]:.4f}")
    print(f"- Training loss (akhir): {train_logs['loss'].iloc[-1]:.4f}")
    print(f"- Penurunan loss: {(1 - train_logs['loss'].iloc[-1]/train_logs['loss'].iloc[0])*100:.1f}%")
    if not eval_logs.empty:
        print(f"- Validation loss (terbaik): {eval_logs['eval_loss'].min():.4f}")

print("\n🎯 4.4 HASIL EVALUASI")
print("-"*80)
print(f"- Jumlah pertanyaan test: {test_data['total_questions']}")
print(f"- Success rate: {test_data['success_count']/test_data['total_questions']*100:.1f}%")
print(f"- Waktu inference (rata-rata): {test_data['avg_duration']:.2f} detik")
print(f"- Waktu inference (median): {np.median(all_durations):.2f} detik")
print(f"- Throughput: {test_data['success_count']/test_data['total_duration']:.3f} pertanyaan/detik")

print("\n📊 4.5 PERFORMANCE PER KATEGORI")
print("-"*80)
for _, row in df_categories.iterrows():
    print(f"- {row['Kategori']:20s}: {row['Success Rate (%)']:5.1f}% success, {row['Avg Duration (s)']:5.2f}s avg")

print("\n" + "="*80)
print("✅ Data di atas dapat langsung digunakan untuk BAB IV skripsi Anda")
print("="*80)

In [73]:
from huggingface_hub import HfApi, HfFolder, Repository
from pathlib import Path

# Ganti ini dengan username dan nama model kamu
repo_id = "Pandusu/gemma3-pmb-unsiq-qlora"  # contoh: pamd/gemma3-pmb-unsiq
model_dir = "../outputs/gemma-pmb_merged_final"

from huggingface_hub import create_repo, upload_folder

# (opsional) buat repo baru di Hugging Face
create_repo(repo_id, exist_ok=True)

# Upload semua isi folder model
upload_folder(
    folder_path=model_dir,
    repo_id=repo_id,
    commit_message="🚀 Upload fine-tuned Gemma3-PMB model (UNSIQ chatbot)",
)


Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  ...mma-pmb_merged_final/tokenizer.json:  97%|#########7| 32.4MB / 33.4MB            

  ...ma-pmb_merged_final/tokenizer.model:  97%|#########7| 4.55MB / 4.69MB            

  ...-pmb_merged_final/model.safetensors:   5%|4         | 95.0MB / 2.00GB            

CommitInfo(commit_url='https://huggingface.co/Pandusu/gemma3-pmb-unsiq-qlora/commit/06bc953b09bea026152bbe8442568dd40ee40819', commit_message='🚀 Upload fine-tuned Gemma3-PMB model (UNSIQ chatbot)', commit_description='', oid='06bc953b09bea026152bbe8442568dd40ee40819', pr_url=None, repo_url=RepoUrl('https://huggingface.co/Pandusu/gemma3-pmb-unsiq-qlora', endpoint='https://huggingface.co', repo_type='model', repo_id='Pandusu/gemma3-pmb-unsiq-qlora'), pr_revision=None, pr_num=None)

## 🎉 Selesai!

### File Output yang Dihasilkan:

**Gambar untuk Laporan:**
1. `outputs/figures/dataset_distribution.png` - Distribusi dataset
2. `outputs/figures/training_curves.png` - Kurva training
3. `outputs/figures/evaluation_results.png` - Hasil evaluasi
4. `outputs/figures/answer_wordcloud.png` - Word cloud

**Data Tabel:**
1. `outputs/category_performance.csv` - Performance per kategori
2. `outputs/evaluation_statistics.csv` - Statistik lengkap

**Laporan:**
1. `outputs/LAPORAN_EVALUASI.txt` - Summary report
2. `outputs/test_results_*.json` - Raw test results

### Cara Menggunakan untuk Skripsi:

1. **BAB III (Metodologi):** 
   - Gambar: `dataset_distribution.png`
   - Data: Karakteristik dataset dari cell 6.1

2. **BAB IV (Hasil dan Pembahasan):**
   - Gambar: `training_curves.png` untuk loss curves
   - Gambar: `evaluation_results.png` untuk hasil evaluasi
   - Tabel: `category_performance.csv`
   - Data: Summary dari cell 6.1

3. **BAB V (Kesimpulan):**
   - Data: Success rate, throughput dari `LAPORAN_EVALUASI.txt`

---

**Semua file sudah siap digunakan untuk laporan skripsi!** 🎓