# Fine-tuning Gemma 3 1B untuk Chatbot PMB
## Tugas Akhir: Pengembangan Chatbot Penerimaan Mahasiswa Baru

**Metode:** QLoRA (Quantized Low-Rank Adaptation)  
**Base Model:** Google Gemma 3 1B Instruct  
**Dataset:** PMB Universitas Sains Al-Qur'an  

---

## üìã Setup & Dependencies

In [None]:
%pip install bert-score nltk rouge-score pandas torch matplotlib seaborn pyyaml numpy scikit-learn transformers datasets trl peft bitsandbytes accelerate sentencepiece protobuf torch torchvision scikit-learn wordcloud

import os
import sys
import json
import torch
import yaml
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from pathlib import Path

# Set style untuk grafik
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Ukuran default untuk gambar
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11

print("‚úÖ Libraries imported successfully!")
print(f"üìÖ Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"üêç Python: {sys.version.split()[0]}")
print(f"üî• PyTorch: {torch.__version__}")
print(f"üíª CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"üéÆ GPU: {torch.cuda.get_device_name(0)}")
    print(f"üíæ GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

## üìä 1. Analisis Dataset

### 1.1 Load Dataset

**Pilih salah satu metode:**
- **Method A (cell berikutnya):** Auto-split dari file augmented (Recommended)
- **Method B (cell setelahnya):** Load dari file yang sudah di-split

### 1.1b Load dari File Terpisah (Alternative Method)

**Gunakan metode ini jika Anda sudah punya file train/val yang terpisah.**

### 1.1a Auto-Split Dataset (Recommended)

**Metode ini akan load dataset utama dan auto-split menjadi train/eval/test dengan ratio yang ditentukan.**

In [None]:
import json
import os
from sklearn.model_selection import train_test_split
from huggingface_hub import login

# ============================================================================
# KONFIGURASI
# ============================================================================

HF_TOKEN = "your_code"

# File paths
INPUT_FILE = "dataset_v4.txt"
JSON_FILE = "dataset_v2.json"
FORMATTED_FILE = "dataset_formatted.json"

# Output directories
DATA_DIR = "data"
TRAIN_FILE = os.path.join(DATA_DIR, "train_pmb.json")
EVAL_FILE = os.path.join(DATA_DIR, "eval_pmb.json")
TEST_FILE = os.path.join(DATA_DIR, "test_pmb.json")

# Split ratios
TRAIN_RATIO = 0.8   # 70%
EVAL_RATIO = 0.1   # 15%
TEST_RATIO = 0.1   # 15%

# System prompt
SYSTEM_PROMPT = (
    "Anda adalah asisten virtual untuk Penerimaan Mahasiswa Baru (PMB) di "
    "Universitas Sains Al-Qur'an (UNSIQ) Wonosobo.\n"
   
)

METADATA = {
    "topic": "PMB UNSIQ",
    "subtopic": "Informasi Umum"
}


# ============================================================================
# STEP 1: CONVERT TXT TO JSON
# ============================================================================

def convert_txt_to_json():
    """Konversi dataset_v2.txt ke JSON"""
    print("=" * 60)
    print("STEP 1: Converting TXT to JSON")
    print("=" * 60)
    
    data = []
    entry = {}
    
    with open(INPUT_FILE, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
                
            if line.lower().startswith("q:"):
                entry = {"question": line[2:].strip()}
            elif line.lower().startswith("a:"):
                entry["answer"] = line[2:].strip()
                data.append(entry)
                entry = {}
    
    # Simpan ke JSON
    with open(JSON_FILE, "w", encoding="utf-8") as f:
        json.dump(data, f, ensure_ascii=False, indent=2)
    
    print(f"‚úÖ Konversi selesai!")
    print(f"   Total Q&A pairs: {len(data)}")
    print(f"   File tersimpan: {JSON_FILE}\n")
    
    return data


# ============================================================================
# STEP 2: FORMAT DATASET
# ============================================================================

def format_dataset(data):
    """Format dataset dengan Gemma chat template"""
    print("=" * 60)
    print("STEP 2: Formatting Dataset")
    print("=" * 60)
    
    formatted_data = []
    
    for item in data:
        question = item.get("question", "")
        answer = item.get("answer", "")
        
        formatted_item = {
            "text": (
                f"<start_of_turn>system\n{SYSTEM_PROMPT}<end_of_turn>\n"
                f"<start_of_turn>user\n{question}<end_of_turn>\n"
                f"<start_of_turn>model\n{answer}<end_of_turn>"
            ),
       
        }
        formatted_data.append(formatted_item)
    
    # Simpan formatted dataset
    with open(FORMATTED_FILE, "w", encoding="utf-8") as f:
        json.dump(formatted_data, f, ensure_ascii=False, indent=2)
    
    print(f"‚úÖ Formatting selesai!")
    print(f"   Total formatted data: {len(formatted_data)}")
    print(f"   File tersimpan: {FORMATTED_FILE}\n")
    
    return formatted_data


# ============================================================================
# STEP 3: SPLIT DATASET
# ============================================================================

def split_dataset(data):
    """Split dataset menjadi train, eval, dan test"""
    print("=" * 60)
    print("STEP 3: Splitting Dataset")
    print("=" * 60)
    
    total = len(data)
    print(f"Total data: {total} samples")
    print(f"Split ratio: Train={TRAIN_RATIO*100:.0f}%, Eval={EVAL_RATIO*100:.0f}%, Test={TEST_RATIO*100:.0f}%\n")
    
    # Split train dan temp (eval + test)
    train_data, temp_data = train_test_split(
        data,
        test_size=(1 - TRAIN_RATIO),
        random_state=42,
        shuffle=True
    )
    
    # Split eval dan test
    eval_size = EVAL_RATIO / (EVAL_RATIO + TEST_RATIO)
    eval_data, test_data = train_test_split(
        temp_data,
        test_size=(1 - eval_size),
        random_state=42,
        shuffle=True
    )
    
    # Buat direktori jika belum ada
    os.makedirs(DATA_DIR, exist_ok=True)
    
    # Simpan masing-masing split
    with open(TRAIN_FILE, "w", encoding="utf-8") as f:
        json.dump(train_data, f, ensure_ascii=False, indent=2)
    
    with open(EVAL_FILE, "w", encoding="utf-8") as f:
        json.dump(eval_data, f, ensure_ascii=False, indent=2)
    
    with open(TEST_FILE, "w", encoding="utf-8") as f:
        json.dump(test_data, f, ensure_ascii=False, indent=2)
    
    print(f"‚úÖ Split selesai!")
    print(f"   ‚Ä¢ Training:   {len(train_data):4d} samples ({len(train_data)/total*100:.1f}%)")
    print(f"   ‚Ä¢ Evaluation: {len(eval_data):4d} samples ({len(eval_data)/total*100:.1f}%)")
    print(f"   ‚Ä¢ Test:       {len(test_data):4d} samples ({len(test_data)/total*100:.1f}%)")
    print(f"\nüíæ Files tersimpan:")
    print(f"   ‚Ä¢ {TRAIN_FILE}")
    print(f"   ‚Ä¢ {EVAL_FILE}")
    print(f"   ‚Ä¢ {TEST_FILE}\n")
    
    return train_data, eval_data, test_data


# ============================================================================
# MAIN
# ============================================================================

print("\n" + "=" * 60)
print("DATASET PREPARATION PIPELINE")
print("=" * 60 + "\n")

# Login ke Hugging Face
print("üîê Logging in to Hugging Face...")
login(HF_TOKEN)
print("‚úÖ Login berhasil!\n")

# Step 1: Convert TXT to JSON
json_data = convert_txt_to_json()

# Step 2: Format dataset
formatted_data = format_dataset(json_data)

# Step 3: Split dataset
train_data, eval_data, test_data = split_dataset(formatted_data)

# Summary
print("=" * 60)
print("üéâ SEMUA PROSES SELESAI!")
print("=" * 60)
print(f"üìä Total data: {len(formatted_data)} samples")
print(f"‚úÖ Training set:   {len(train_data)} samples")
print(f"‚úÖ Evaluation set: {len(eval_data)} samples")
print(f"‚úÖ Test set:       {len(test_data)} samples")
print("\nDataset siap untuk training! üöÄ\n")

In [None]:
# Load training data
with open('data/train_pmb.json', 'r', encoding='utf-8') as f:
    train_data = json.load(f)

# Load evaluation data
with open('data/eval_pmb.json', 'r', encoding='utf-8') as f:
    eval_data = json.load(f)

# Load test data
with open('data/test_pmb.json', 'r', encoding='utf-8') as f:
    test_data = json.load(f)

print(f"üìä Dataset Statistics:")
print(f"  Training samples:   {len(train_data)}")
print(f"  Evaluation samples: {len(eval_data)}")
print(f"  Test samples:       {len(test_data)}")
print(f"  Total samples:      {len(train_data) + len(eval_data) + len(test_data)}")

total = len(train_data) + len(eval_data) + len(test_data)
print(f"\nüìà Split Ratio:")
print(f"  Train: {len(train_data)/total*100:.1f}%")
print(f"  Eval:  {len(eval_data)/total*100:.1f}%")
print(f"  Test:  {len(test_data)/total*100:.1f}%")

# Untuk compatibility dengan kode lain yang menggunakan val_data
val_data = eval_data

print(f"\n‚úÖ Dataset loaded successfully!")

### 1.2 Distribusi Panjang Teks

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import os

# Analisis panjang teks
train_lengths = [len(item['text'].split()) for item in train_data]
val_lengths = [len(item['text'].split()) for item in val_data]

# Statistik
print("üìè Text Length Statistics (in words):")
print(f"\nTraining Set:")
print(f"  Mean:   {np.mean(train_lengths):.2f}")
print(f"  Median: {np.median(train_lengths):.2f}")
print(f"  Min:    {np.min(train_lengths)}")
print(f"  Max:    {np.max(train_lengths)}")
print(f"  Std:    {np.std(train_lengths):.2f}")

print(f"\nValidation Set:")
print(f"  Mean:   {np.mean(val_lengths):.2f}")
print(f"  Median: {np.median(val_lengths):.2f}")
print(f"  Min:    {np.min(val_lengths)}")
print(f"  Max:    {np.max(val_lengths)}")
print(f"  Std:    {np.std(val_lengths):.2f}")

# Buat direktori output jika belum ada
output_dir = 'outputs/figures'
os.makedirs(output_dir, exist_ok=True)

# Visualisasi
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Histogram
ax1.hist(train_lengths, bins=30, alpha=0.7, label='Training', color='#2ecc71', edgecolor='black')
ax1.hist(val_lengths, bins=30, alpha=0.7, label='Validation', color='#3498db', edgecolor='black')
ax1.set_xlabel('Jumlah Kata', fontsize=12, fontweight='bold')
ax1.set_ylabel('Frekuensi', fontsize=12, fontweight='bold')
ax1.set_title('Distribusi Panjang Teks Dataset PMB', fontsize=14, fontweight='bold')
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3)

# Box plot (perbaikan parameter labels -> tick_labels)
bp = ax2.boxplot([train_lengths, val_lengths], 
                  tick_labels=['Training', 'Validation'],  # Perbaikan di sini
                  patch_artist=True,
                  boxprops=dict(facecolor='#3498db', alpha=0.7),
                  medianprops=dict(color='red', linewidth=2))
ax2.set_ylabel('Jumlah Kata', fontsize=12, fontweight='bold')
ax2.set_title('Box Plot Panjang Teks', fontsize=14, fontweight='bold')
ax2.grid(True, alpha=0.3, axis='y')

plt.tight_layout()

# Simpan gambar (path diperbaiki)
output_path = os.path.join(output_dir, 'dataset_distribution.png')
plt.savefig(output_path, dpi=300, bbox_inches='tight')
plt.show()



### 1.3 Sample Data

In [None]:
# Tampilkan 3 contoh data
print("üìù Contoh Data Training:")
print("="*80)
for i, sample in enumerate(train_data[:3], 1):
    print(f"\nSample {i}:")
    print(f"{sample['text'][:200]}...")
    print("-"*80)

In [None]:
# Load config
with open('../configs/qlora_config.yaml', 'r') as f:
    config = yaml.safe_load(f)

# Display config
print("‚öôÔ∏è  Training Configuration:")
print("="*80)
print(f"\nüì¶ Model:")
print(f"  Base Model: {config['model_config']['model_name']}")
print(f"\nüîß LoRA Config:")
print(f"  Rank (r): {config['qlora_config']['r']}")
print(f"  Alpha: {config['qlora_config']['lora_alpha']}")
print(f"  Dropout: {config['qlora_config']['lora_dropout']}")
print(f"\nüéì Training Args:")
print(f"  Epochs: {config['training_args']['num_train_epochs']}")
print(f"  Batch Size: {config['training_args']['per_device_train_batch_size']}")
print(f"  Learning Rate: {config['training_args']['learning_rate']}")
print(f"  Gradient Accumulation: {config['training_args']['gradient_accumulation_steps']}")
print(f"  Effective Batch Size: {config['training_args']['per_device_train_batch_size'] * config['training_args']['gradient_accumulation_steps']}")

## üöÄ 2. Training Model

### 2.1 Load Configuration

In [None]:
# ============================================================
# ‚öôÔ∏è SETUP KONFIGURASI QLORA (TANPA TRAINING)
# ============================================================

import os, json

config = {
    # ========== MODEL CONFIGURATION ==========
    'model_config': {
        'model_name': 'google/gemma-3-1b-it',  # ‚Üê GANTI di sini untuk model lain
        'use_cache': False,
        'trust_remote_code': True,
        'torch_dtype': 'bfloat16',  # bfloat16 optimal untuk GPU A100
    },
    
    # ========== QUANTIZATION CONFIG (QLoRA) ==========
    'quantization_config': {
        'load_in_4bit': True,
        'bnb_4bit_compute_dtype': 'bfloat16',
        'bnb_4bit_quant_type': 'nf4',
        'bnb_4bit_use_double_quant': True,
    },
    
    # ========== LORA CONFIGURATION ==========
    'qlora_config': {
        'r': 16,                    # LoRA rank (8‚Äì64)
        'lora_alpha': 32,           # scaling factor (biasanya 2x r)
        'lora_dropout': 0.05,
        'bias': 'none',
        'task_type': 'CAUSAL_LM',
        'target_modules': [
            'q_proj', 'k_proj', 'v_proj',
            'o_proj', 'gate_proj', 'up_proj', 'down_proj'
        ],
    },
    
    # ========== DATASET CONFIG ==========
    'dataset_config': {
        'train_file': 'data/train_pmb.json',   # ‚Üê path dataset train
        'eval_file': 'data/eval_pmb.json',      # ‚Üê path dataset eval
        'max_length': 512,
        'text_field': 'text',
    },
    
    # ========== TRAINING ARGUMENTS (A100 80GB) ==========
    'training_args': {
        'output_dir': '../outputs/gemma-pmb',
        'overwrite_output_dir': True,
        'num_train_epochs': 3,
        'per_device_train_batch_size': 8,
        'per_device_eval_batch_size': 8,
        'gradient_accumulation_steps': 4,
        'gradient_checkpointing': True,
        'optim': 'paged_adamw_8bit',
        'learning_rate': 2e-4,
        'weight_decay': 0.01,
        'max_grad_norm': 1.0,
        'lr_scheduler_type': 'cosine',
        'warmup_ratio': 0.03,
        'eval_strategy': 'epoch',
        'eval_steps': 100,
        'save_strategy': 'epoch',
        'save_total_limit': 2,
        'load_best_model_at_end': True,
        'metric_for_best_model': 'eval_loss',
        'logging_strategy': 'steps',
        'logging_steps': 10,
        'report_to': 'none',
        'bf16': True,
        'bf16_full_eval': True,
        'dataloader_num_workers': 4,
        'group_by_length': True,
        'ddp_find_unused_parameters': False,
    }
}

# ============================================================
# üíæ SIMPAN CONFIG KE FILE
# ============================================================

os.makedirs('../configs', exist_ok=True)
config_file = '../configs/qlora_config.json'

with open(config_file, 'w', encoding='utf-8') as f:
    json.dump(config, f, indent=2, ensure_ascii=False)

# ============================================================
# üßæ CETAK RINGKASAN
# ============================================================

print("‚öôÔ∏è  KONFIGURASI QLORA - NVIDIA A100 80GB")
print("=" * 80)
print(f"\nüì¶ MODEL:\n  Model Name      : {config['model_config']['model_name']}")
print(f"  Precision       : {config['model_config']['torch_dtype']}")
print(f"  Quantization    : 4-bit NF4 (QLoRA)")

print(f"\nüîß LORA CONFIG:")
print(f"  Rank (r)        : {config['qlora_config']['r']}")
print(f"  Alpha           : {config['qlora_config']['lora_alpha']}")
print(f"  Dropout         : {config['qlora_config']['lora_dropout']}")
print(f"  Target Modules  : {len(config['qlora_config']['target_modules'])} modules")

print(f"\nüìä DATASET:")
print(f"  Train File      : {config['dataset_config']['train_file']}")
print(f"  Eval File       : {config['dataset_config']['eval_file']}")
print(f"  Max Length      : {config['dataset_config']['max_length']} tokens")

print(f"\nüéì TRAINING PARAMETERS:")
print(f"  Epochs          : {config['training_args']['num_train_epochs']}")
print(f"  Batch Size      : {config['training_args']['per_device_train_batch_size']}")
print(f"  Gradient Accum  : {config['training_args']['gradient_accumulation_steps']}")
print(f"  Effective Batch : {config['training_args']['per_device_train_batch_size'] * config['training_args']['gradient_accumulation_steps']}")
print(f"  Learning Rate   : {config['training_args']['learning_rate']}")
print(f"  Optimizer       : {config['training_args']['optim']}")

print(f"\nüíæ MEMORY OPTIMIZATION:")
print("  4-bit Quantization      : ‚úÖ")
print("  Double Quantization     : ‚úÖ")
print("  Gradient Checkpointing  : ‚úÖ")
print("  Paged AdamW 8-bit       : ‚úÖ")

print(f"\nüíæ Config saved to: {config_file}")
print("\nüí° TIPS:")
print("  ‚Ä¢ Ganti model: ubah 'model_name' lalu jalankan ulang cell ini")
print("  ‚Ä¢ Untuk gemma-2-4b-it: model_name = 'google/gemma-2-4b-it'")
print("  ‚Ä¢ Model besar? Kurangi batch_size atau tambah gradient_accumulation")
print("\n‚úÖ Konfigurasi siap! Lanjut ke tahap berikutnya bila ingin training.")


## 2.2 TRAINING PROSES

In [None]:
# ============================================================
# TRAINING QLORA - FINAL VERSION
# ============================================================

import os
import json
import time
from datetime import datetime
import torch
from transformers import (
    AutoTokenizer, 
    AutoModelForCausalLM,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling,
    BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from datasets import Dataset

os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"

print("üöÄ STARTING QLORA TRAINING")
print("=" * 80)

# ========== LOAD CONFIG ==========
with open("../configs/qlora_config.json", "r") as f:
    config = json.load(f)

# ========== 1. LOAD TOKENIZER ==========
print("\nüì• Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(
    config['model_config']['model_name'],
    trust_remote_code=config['model_config']['trust_remote_code']
)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

print(f"‚úÖ Tokenizer loaded: {config['model_config']['model_name']}")
print(f"   Vocab size: {len(tokenizer)}")

# ========== 2. LOAD DATASET ==========
print("\nüìä Loading dataset...")
with open(config['dataset_config']['train_file'], 'r', encoding='utf-8') as f:
    train_data_raw = json.load(f)
with open(config['dataset_config']['eval_file'], 'r', encoding='utf-8') as f:
    eval_data_raw = json.load(f)

print(f"‚úÖ Dataset loaded: {len(train_data_raw)} train, {len(eval_data_raw)} eval")

train_dataset = Dataset.from_list(train_data_raw)
eval_dataset = Dataset.from_list(eval_data_raw)

def tokenize_function(examples):
    return tokenizer(
        examples[config['dataset_config']['text_field']],
        truncation=True,
        max_length=config['dataset_config']['max_length'],
        padding='max_length'
    )

print("\nüîÑ Tokenizing...")
tokenized_train = train_dataset.map(tokenize_function, batched=True, remove_columns=train_dataset.column_names)
tokenized_eval = eval_dataset.map(tokenize_function, batched=True, remove_columns=eval_dataset.column_names)
print("‚úÖ Tokenization done")

# ========== 3. LOAD MODEL (4-BIT) ==========
print("\nüì¶ Loading model with 4-bit quantization...")

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True
)

model = AutoModelForCausalLM.from_pretrained(
    config['model_config']['model_name'],
    quantization_config=bnb_config,
    device_map='auto',
    trust_remote_code=True,
    use_cache=False
)

print(f"‚úÖ Model loaded, Memory: {torch.cuda.memory_allocated(0) / 1e9:.2f} GB")

model = prepare_model_for_kbit_training(model)

# ========== 4. ADD LORA ==========
print("\nüîß Adding LoRA adapters...")

peft_config = LoraConfig(
    r=config['qlora_config']['r'],
    lora_alpha=config['qlora_config']['lora_alpha'],
    lora_dropout=config['qlora_config']['lora_dropout'],
    bias=config['qlora_config']['bias'],
    task_type=config['qlora_config']['task_type'],
    target_modules=config['qlora_config']['target_modules']
)

model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

# ========== 5. TRAINER SETUP ==========
print("\n‚öôÔ∏è  Setting up trainer...")

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_dir = f"{config['training_args']['output_dir']}_{timestamp}"

training_args = TrainingArguments(
    output_dir=output_dir,
    overwrite_output_dir=True,
    num_train_epochs=config['training_args']['num_train_epochs'],
    per_device_train_batch_size=config['training_args']['per_device_train_batch_size'],
    per_device_eval_batch_size=config['training_args']['per_device_eval_batch_size'],
    gradient_accumulation_steps=config['training_args']['gradient_accumulation_steps'],
    gradient_checkpointing=True,
    optim=config['training_args']['optim'],
    learning_rate=config['training_args']['learning_rate'],
    weight_decay=config['training_args']['weight_decay'],
    max_grad_norm=config['training_args']['max_grad_norm'],
    lr_scheduler_type=config['training_args']['lr_scheduler_type'],
    warmup_ratio=config['training_args']['warmup_ratio'],
    eval_strategy=config['training_args']['eval_strategy'],   # ‚Üê diperbaiki
    save_strategy=config['training_args']['save_strategy'],
    save_total_limit=config['training_args']['save_total_limit'],
    load_best_model_at_end=True,
    metric_for_best_model='eval_loss',
    logging_strategy='steps',
    logging_steps=config['training_args']['logging_steps'],
    report_to='none',
    bf16=True,
    bf16_full_eval=True,
    dataloader_num_workers=config['training_args']['dataloader_num_workers'],
    group_by_length=True,
)

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    data_collator=data_collator,
)

print(f"‚úÖ Trainer ready, Output: {output_dir}")

# ========== 6. TRAIN ==========
print("\n" + "=" * 80)
print("üéì TRAINING START...")
print("=" * 80)

start_time = time.time()

try:
    train_result = trainer.train()
    duration = time.time() - start_time

    print("\n" + "=" * 80)
    print("‚úÖ TRAINING DONE!")
    print("=" * 80)
    print(f"‚è±Ô∏è  Time: {duration/60:.2f} min")
    print(f"üìâ Final loss: {train_result.metrics.get('train_loss', 0):.4f}")

    # Save model and metrics
    print("\nüíæ Saving...")
    # Simpan state training
    trainer.save_state()
    trainer.save_model(output_dir)
    tokenizer.save_pretrained(output_dir)

    with open(f"{output_dir}/training_metrics.json", 'w') as f:
        json.dump({
            'train_loss': float(train_result.metrics.get('train_loss', 0)),
            'duration_minutes': duration / 60,
            'config': config
        }, f, indent=2)

    print(f"‚úÖ Saved to: {output_dir}")

except Exception as e:
    print(f"\n‚ùå Error: {e}")
    raise


## 2.3 MERGER MODEL

In [None]:
# ============================================================
# üß© MERGE QLORA ADAPTER DENGAN MODEL BASE (FINAL CODE)
# ============================================================

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# --- Path konfigurasi ---
base_model_name = "google/gemma-3-1b-it"                     # model base
adapter_path = "../outputs/gemma-pmb_20251108_112804"        # hasil training LoRA
merged_model_path = "../outputs/gemma-pmb_merged_final_v2"      # output merge

# --- 1Ô∏è‚É£ Load base model & tokenizer ---
print(f"üì¶ Loading base model: {base_model_name}")
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

print(f"üì• Loading tokenizer from base model...")
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# --- 2Ô∏è‚É£ Load adapter & merge ke model base ---
print("\nüîÑ Loading LoRA adapters and merging...")
model = PeftModel.from_pretrained(base_model, adapter_path)
torch.cuda.empty_cache()
model = model.merge_and_unload()

# --- 3Ô∏è‚É£ Simpan model hasil merge ---
print("\nüíæ Saving merged model...")
model.save_pretrained(merged_model_path, safe_serialization=True)
tokenizer.save_pretrained(merged_model_path)

print(f"\n‚úÖ Merge complete!")
print(f"üíæ Final merged model saved to: {merged_model_path}")
print(f"üìè Vocab size: {model.config.vocab_size}")


## 2.4 INFERENCE TEST

In [None]:
# ============================================================
# ü§ñ INFERENCE (Prompt Engineered) - GEMMA 3 1B QLORA (MERGED)
# ============================================================

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# --- Path model hasil merge ---
model_dir = "../outputs/gemma-pmb_merged_final_v2"

# --- 1Ô∏è‚É£ Load model & tokenizer ---
print(f"üì¶ Loading merged model from: {model_dir}")
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(
    model_dir,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

# --- 2Ô∏è‚É£ Template Prompt (mengikuti struktur dataset kamu) ---
SYSTEM_PROMPT = (
    "<start_of_turn>system "
    "Anda adalah asisten virtual untuk Penerimaan Mahasiswa Baru (PMB) di Universitas Sains Al-Qur'an (UNSIQ) Wonosobo. "
    "<end_of_turn>"
)

def build_prompt(user_question: str):
    return (
        f"{SYSTEM_PROMPT}\n"
        f"<start_of_turn>user {user_question}<end_of_turn>\n"
        f"<start_of_turn>model "
    )

# --- 3Ô∏è‚É£ Masukkan pertanyaan ---
user_question = "fasilitas apa saja yang ada di unsiq?"
prompt = build_prompt(user_question)

# --- 4Ô∏è‚É£ Tokenisasi & generate ---
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

print("\n‚öôÔ∏è Generating output...")
with torch.inference_mode():
    output_tokens = model.generate(
        **inputs,
        max_new_tokens=300,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

# --- 5Ô∏è‚É£ Decode hasil ---
result = tokenizer.decode(output_tokens[0], skip_special_tokens=False)

# Ambil hanya bagian setelah "<start_of_turn>model"
if "<start_of_turn>model" in result:
    result = result.split("<start_of_turn>model")[-1]
if "<end_of_turn>" in result:
    result = result.split("<end_of_turn>")[0]

print("\nüß† Model Response:")
print("=" * 80)
print(result.strip())


In [None]:
"""
Script untuk test model dengan pertanyaan dari test_pmb.json
Format data: Gemma chat template dengan user dan model turns
Evaluasi dengan BERT Score
"""

import time
import json
from datetime import datetime
from pathlib import Path
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from bert_score import score as bert_score

# ============================================================================
# CONFIGURATION
# ============================================================================
MODEL_PATH = "../outputs/gemma-pmb_merged_final"
TEST_DATA_PATH = "data/test_pmb.json"  # Path ke file test data
OUTPUT_DIR = "../outputs"

# System prompt untuk model
SYSTEM_PROMPT = """Anda adalah asisten virtual untuk Penerimaan Mahasiswa Baru (PMB) di Universitas Sains Al-Qur'an."""

# ============================================================================
# LOAD MODEL
# ============================================================================
print("="*80)
print("  LOADING MODEL AND TEST DATA")
print("="*80)

print(f"\nüìÇ Loading model dari: {MODEL_PATH}")
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.float16,
    device_map="auto"
)
print(f"‚úÖ Model berhasil di-load!")

# ============================================================================
# LOAD TEST DATA
# ============================================================================
print(f"\nüìÇ Loading test data dari: {TEST_DATA_PATH}")

def extract_qa_from_text(text):
    """Extract question dan answer dari format Gemma chat template"""
    try:
        # Extract user question
        if "<start_of_turn>user\n" in text and "<end_of_turn>" in text:
            user_start = text.find("<start_of_turn>user\n") + len("<start_of_turn>user\n")
            user_end = text.find("<end_of_turn>", user_start)
            question = text[user_start:user_end].strip()
        else:
            question = ""
        
        # Extract model answer (reference)
        if "<start_of_turn>model\n" in text:
            model_start = text.find("<start_of_turn>model\n") + len("<start_of_turn>model\n")
            model_end = text.find("<end_of_turn>", model_start)
            if model_end == -1:
                # Jika tidak ada closing tag, ambil sampai akhir
                reference = text[model_start:].strip()
            else:
                reference = text[model_start:model_end].strip()
        else:
            reference = ""
        
        return question, reference
    
    except Exception as e:
        print(f"‚ö†Ô∏è  Error extracting Q&A: {e}")
        return "", ""

try:
    with open(TEST_DATA_PATH, 'r', encoding='utf-8') as f:
        test_data_raw = json.load(f)
    
    # Convert format ke yang dibutuhkan
    test_data = []
    skipped = 0
    
    for idx, item in enumerate(test_data_raw, 1):
        text = item.get("text", "")
        
        if not text:
            skipped += 1
            continue
        
        # Extract question dan reference dari text
        question, reference = extract_qa_from_text(text)
        
        if not question or not reference:
            print(f"‚ö†Ô∏è  Sample #{idx} skipped - empty Q or A")
            skipped += 1
            continue
        
        test_data.append({
            "question": question,
            "reference": reference,
            "original_text": text
        })
    
    print(f"‚úÖ Test data berhasil di-load!")
    print(f"üìä Total pertanyaan: {len(test_data)}")
    print(f"‚ö†Ô∏è  Skipped: {skipped}")
    
    if len(test_data) == 0:
        print("‚ùå ERROR: Tidak ada data test yang valid!")
        exit(1)
    
    # Tampilkan sample pertama untuk verifikasi
    print(f"\nüìù Sample pertama:")
    print(f"   Q: {test_data[0]['question'][:80]}...")
    print(f"   R: {test_data[0]['reference'][:80]}...")
    
except FileNotFoundError:
    print(f"‚ùå ERROR: File {TEST_DATA_PATH} tidak ditemukan!")
    print(f"üí° Pastikan file test_pmb.json ada di folder data/")
    exit(1)
except json.JSONDecodeError as e:
    print(f"‚ùå ERROR: File JSON tidak valid - {e}")
    exit(1)
except Exception as e:
    print(f"‚ùå ERROR: {e}")
    exit(1)

print("="*80)

# ============================================================================
# EVALUATION FUNCTIONS
# ============================================================================

def calculate_bert_score(references, candidates):
    """Hitung BERT score untuk batch references dan candidates"""
    try:
        P, R, F1 = bert_score(candidates, references, lang='id', verbose=False)
        return P.tolist(), R.tolist(), F1.tolist()
    except Exception as e:
        print(f"Error calculating BERT score: {e}")
        return [0.0]*len(candidates), [0.0]*len(candidates), [0.0]*len(candidates)

# ============================================================================
# INFERENCE FUNCTION
# ============================================================================

def run_inference(question, reference, index, total):
    """Jalankan inference untuk satu pertanyaan"""
    print(f"\n{'='*80}")
    print(f"PERTANYAAN {index}/{total}")
    print(f"{'='*80}")
    print(f"Q: {question}")
    print(f"{'-'*80}")
    
    start_time = time.time()
    
    try:
        # Format prompt dengan Gemma chat template
        full_prompt = (
          
            f"<start_of_turn>user\n{question}<end_of_turn>\n"
            f"<start_of_turn>model\n"
        )
        
        # Tokenize input
        inputs = tokenizer(full_prompt, return_tensors="pt").to(model.device)
        
        # Generate response
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=256,
                temperature=0.7,
                top_p=0.9,
                do_sample=True,
                pad_token_id=tokenizer.eos_token_id
            )
        
        # Decode response
        full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        
        # Extract hanya bagian model response
        if "<start_of_turn>model\n" in full_response:
            response = full_response.split("<start_of_turn>model\n")[-1].strip()
        else:
            response = full_response.strip()
        
        # Remove end_of_turn jika ada
        response = response.replace("<end_of_turn>", "").strip()
        
        end_time = time.time()
        duration = end_time - start_time
        
        print(f"A: {response}")
        print(f"R: {reference}")
        print(f"{'-'*80}")
        print(f"‚è±Ô∏è  Waktu inference: {duration:.2f} detik")
        
        return {
            "index": index,
            "question": question,
            "reference": reference,
            "answer": response,
            "duration": duration,
            "bert_precision": 0.0,  # Akan diisi nanti
            "bert_recall": 0.0,
            "bert_f1": 0.0,
            "success": True,
            "error": None
        }
    
    except Exception as e:
        print(f"‚ùå ERROR: {str(e)}")
        return {
            "index": index,
            "question": question,
            "reference": reference,
            "answer": None,
            "duration": 0,
            "bert_precision": 0.0,
            "bert_recall": 0.0,
            "bert_f1": 0.0,
            "success": False,
            "error": str(e)
        }

# ============================================================================
# MAIN FUNCTION
# ============================================================================

def main():
    """Jalankan test untuk semua pertanyaan"""
    
    print("\n" + "="*80)
    print("  TEST PERFORMA MODEL LOCAL GEMMA3-PMB")
    print(f"  {len(test_data)} PERTANYAAN DARI test_pmb.json")
    print("  EVALUASI DENGAN BERT SCORE")
    print("="*80)
    print(f"üìÖ Waktu mulai: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"üìä Total pertanyaan: {len(test_data)}")
    print(f"ü§ñ Model: {MODEL_PATH}")
    print(f"üìÇ Test data: {TEST_DATA_PATH}")
    print("="*80)
    
    results = []
    total_duration = 0
    success_count = 0
    
    # Jalankan semua pertanyaan
    for i, item in enumerate(test_data, 1):
        result = run_inference(
            item["question"], 
            item["reference"], 
            i,
            len(test_data)
        )
        results.append(result)
        
        if result["success"]:
            success_count += 1
            total_duration += result["duration"]
        
        # Jeda kecil antar pertanyaan
        time.sleep(0.5)
    
    # ========================================================================
    # CALCULATE BERT SCORE
    # ========================================================================
    print(f"\n{'='*80}")
    print("  MENGHITUNG BERT SCORE...")
    print(f"{'='*80}")
    
    successful_results = [r for r in results if r["success"] and r["answer"]]
    if successful_results:
        references = [r["reference"] for r in successful_results]
        candidates = [r["answer"] for r in successful_results]
        
        bert_P, bert_R, bert_F1 = calculate_bert_score(references, candidates)
        
        # Tambahkan BERT scores ke results
        bert_idx = 0
        for result in results:
            if result["success"] and result["answer"]:
                result["bert_precision"] = bert_P[bert_idx]
                result["bert_recall"] = bert_R[bert_idx]
                result["bert_f1"] = bert_F1[bert_idx]
                bert_idx += 1
    
    # ========================================================================
    # SUMMARY STATISTICS
    # ========================================================================
    print(f"\n{'='*80}")
    print("  RINGKASAN HASIL TEST")
    print(f"{'='*80}")
    print(f"‚úÖ Berhasil: {success_count}/{len(test_data)} pertanyaan")
    print(f"‚ùå Gagal: {len(test_data) - success_count}/{len(test_data)} pertanyaan")
    
    if success_count > 0:
        avg_duration = total_duration / success_count
        
        # Hitung avg BERT scores
        successful_with_answer = [r for r in results if r["success"] and r["answer"]]
        if successful_with_answer:
            avg_bert_f1 = sum(r["bert_f1"] for r in successful_with_answer) / len(successful_with_answer)
            avg_bert_precision = sum(r["bert_precision"] for r in successful_with_answer) / len(successful_with_answer)
            avg_bert_recall = sum(r["bert_recall"] for r in successful_with_answer) / len(successful_with_answer)
        else:
            avg_bert_f1 = 0.0
            avg_bert_precision = 0.0
            avg_bert_recall = 0.0
        
        print(f"\n‚è±Ô∏è  PERFORMANCE METRICS")
        print(f"  Rata-rata waktu inference: {avg_duration:.2f} detik")
        print(f"  Total waktu: {total_duration:.2f} detik")
        print(f"  Throughput: {success_count/total_duration:.2f} pertanyaan/detik")
        
        print(f"\nüìä QUALITY METRICS")
        print(f"  Avg BERT F1 Score: {avg_bert_f1:.4f}")
        print(f"  Avg BERT Precision: {avg_bert_precision:.4f}")
        print(f"  Avg BERT Recall: {avg_bert_recall:.4f}")
    
    # ========================================================================
    # SAVE RESULTS
    # ========================================================================
    Path(OUTPUT_DIR).mkdir(parents=True, exist_ok=True)
    output_file = f"{OUTPUT_DIR}/test_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
    
    successful_with_answer = [r for r in results if r["success"] and r["answer"]]
    
    summary_data = {
        "timestamp": datetime.now().isoformat(),
        "model": MODEL_PATH,
        "test_data_source": TEST_DATA_PATH,
        "total_questions": len(test_data),
        "success_count": success_count,
        "fail_count": len(test_data) - success_count,
        "performance_metrics": {
            "total_duration": total_duration,
            "avg_duration": total_duration / success_count if success_count > 0 else 0,
            "throughput": success_count / total_duration if total_duration > 0 else 0
        },
        "quality_metrics": {
            "avg_bert_f1": sum(r["bert_f1"] for r in successful_with_answer) / len(successful_with_answer) if successful_with_answer else 0,
            "avg_bert_precision": sum(r["bert_precision"] for r in successful_with_answer) / len(successful_with_answer) if successful_with_answer else 0,
            "avg_bert_recall": sum(r["bert_recall"] for r in successful_with_answer) / len(successful_with_answer) if successful_with_answer else 0
        },
        "results": results
    }
    
    with open(output_file, 'w', encoding='utf-8') as f:
        json.dump(summary_data, f, ensure_ascii=False, indent=2)
    
    print(f"\nüìÅ Hasil disimpan di: {output_file}")
    
    # ========================================================================
    # QUALITY ANALYSIS
    # ========================================================================
    print(f"\n{'='*80}")
    print("  ANALISIS KUALITAS JAWABAN")
    print(f"{'='*80}")
    
    relevant_count = 0
    keywords = ["pmb", "mahasiswa", "pendaftaran", "kuliah", "universitas", "unsiq",
               "daftar", "syarat", "biaya", "jadwal", "seleksi", "fakultas", "prodi"]
    
    for result in results:
        if result["success"] and result["answer"]:
            answer_lower = result["answer"].lower()
            if any(keyword in answer_lower for keyword in keywords):
                relevant_count += 1
    
    relevance_rate = (relevant_count / success_count * 100) if success_count > 0 else 0
    print(f"üéØ Jawaban relevan: {relevant_count}/{success_count} ({relevance_rate:.1f}%)")
    
    print(f"\n{'='*80}")
    print("‚úÖ TEST SELESAI!")
    print(f"{'='*80}\n")

if __name__ == "__main__":
    main()

## ANALISA TRAINING

In [None]:
# ================================================================
# üß† DATASET AUTO-REPAIR PIPELINE (PMB UNSIQ)
# ================================================================
import json, re
from bert_score import score

# === 1Ô∏è‚É£ LOAD HASIL EVALUASI ===
data = json.load(open("../outputs/test_results_20251108_114654.json"))
low_samples = [r for r in data["results"] if r["bert_f1"] < 0.75]
print(f"üîç Ditemukan {len(low_samples)} entri dengan BERT F1 < 0.75")

# === 2Ô∏è‚É£ DEFINISI ATURAN PERBAIKAN ===
def fix_answer(q, a):
    ql, al = q.lower(), a.lower()

    # 1Ô∏è‚É£ Out-of-domain (OOD)
    if any(k in ql for k in ["blogger", "saham", "bca", "kucing", "pancasila", "garuda"]):
        return (
            "Pertanyaan di luar konteks PMB UNSIQ. "
            "Saya hanya dapat memberikan informasi seputar kampus dan pendaftaran mahasiswa baru."
        )

    # 2Ô∏è‚É£ Fakta / angka PMB
    if "biaya" in ql or "daftar ulang" in ql or "angsuran" in ql or "spp" in ql:
        return (
            "Mahasiswa dapat mencicil biaya kuliah maksimal tiga kali per semester. "
            "Pembayaran pertama minimal Rp 745.000 sesuai ketentuan UNSIQ."
        )

    # 3Ô∏è‚É£ Tes buta warna dan FIKES
    if "buta warna" in ql or "fikes" in ql or "keperawatan" in ql:
        return "Tes buta warna wajib bagi seluruh calon mahasiswa Fakultas Ilmu Kesehatan (FIKES) UNSIQ."

    # 4Ô∏è‚É£ Beasiswa dan KIP
    if "beasiswa" in ql or "kip" in ql:
        return (
            "Beasiswa UNSIQ diberikan bagi mahasiswa berprestasi atau berhak secara ekonomi. "
            "KIP-Kuliah menanggung biaya kuliah penuh selama delapan semester."
        )

    # 5Ô∏è‚É£ Prosedural PMB
    if "gelombang" in ql or "pendaftaran" in ql or "berkas" in ql:
        return (
            "Pendaftaran dilakukan melalui laman pmb.unsiq.ac.id sesuai jadwal gelombang. "
            "Berkas wajib dilengkapi sebelum batas waktu yang ditentukan."
        )

    # 6Ô∏è‚É£ Negasi terbalik umum
    if "tidak wajib" in al:
        return a.replace("tidak wajib", "wajib")

    # 7Ô∏è‚É£ Redaksi panjang
    if len(a.split()) > 35:
        return a.split(".")[0].strip() + "."

    return a.strip().capitalize()

# === 3Ô∏è‚É£ PERBAIKI SEMUA ENTRI BERT F1 RENDAH ===
for r in low_samples:
    r["fixed_answer"] = fix_answer(r["question"], r["answer"])

# === 4Ô∏è‚É£ CEK PENINGKATAN SKOR BERT F1 (OPSIONAL) ===
refs = [r["reference"] for r in low_samples]
preds = [r["fixed_answer"] for r in low_samples]
P, R, F1 = score(preds, refs, lang="id", verbose=False)
print(f"üìà Rata-rata BERT F1 setelah perbaikan: {F1.mean().item():.3f}")

# === 5Ô∏è‚É£ INTEGRASIKAN KE DATASET UTAMA ===
# Pastikan format dataset_v4.txt seperti:
# Q: ...
# A: ...
with open("dataset_v4.txt", "r", encoding="utf-8") as f:
    lines = f.readlines()

for r in low_samples:
    for i in range(len(lines)):
        if lines[i].startswith("Q:") and r["question"].strip() in lines[i]:
            lines[i+1] = "A: " + r["fixed_answer"].strip() + "\n"
            break

with open("dataset_v5_refined.txt", "w", encoding="utf-8") as f:
    f.writelines(lines)

print("‚úÖ Dataset baru berhasil disimpan sebagai dataset_v5_refined.txt")


In [None]:
# ============================================================
# ANALISIS TRAINING LOGS
# ============================================================

import glob

# Cari training directory terbaru
training_dirs = glob.glob('outputs/gemma-pmb_20251030_183142')
training_dirs = sorted([d for d in training_dirs if os.path.isdir(d)], key=os.path.getmtime, reverse=True)

if training_dirs:
    latest_dir = training_dirs[0]
    print(f"üìÅ Latest training: {os.path.basename(latest_dir)}")
    
    # Load trainer_state.json
    state_file = os.path.join(latest_dir, 'trainer_state.json')
    
    if os.path.exists(state_file):
        with open(state_file, 'r') as f:
            trainer_state = json.load(f)
        
        # Extract log history
        log_history = trainer_state.get('log_history', [])
        
        if log_history:
            df_logs = pd.DataFrame(log_history)
            
            print(f"\nüìä Training Log Summary:")
            print("=" * 80)
            print(f"Total log entries: {len(df_logs)}")
            print(f"\nColumns: {', '.join(df_logs.columns.tolist())}")
            
            # Show first few entries
            print(f"\nüìã First entries:")
            print(df_logs.head(10).to_string())
            
            # Training loss stats
            train_logs = df_logs[df_logs['loss'].notna()]
            if not train_logs.empty:
                print(f"\nüìâ Training Loss:")
                print(f"  Initial: {train_logs['loss'].iloc[0]:.4f}")
                print(f"  Final: {train_logs['loss'].iloc[-1]:.4f}")
                print(f"  Best: {train_logs['loss'].min():.4f}")
                print(f"  Improvement: {(1 - train_logs['loss'].iloc[-1]/train_logs['loss'].iloc[0])*100:.2f}%")
            
            # Eval loss stats
            eval_logs = df_logs[df_logs['eval_loss'].notna()]
            if not eval_logs.empty:
                print(f"\nüìä Evaluation Loss:")
                print(f"  Initial: {eval_logs['eval_loss'].iloc[0]:.4f}")
                print(f"  Final: {eval_logs['eval_loss'].iloc[-1]:.4f}")
                print(f"  Best: {eval_logs['eval_loss'].min():.4f}")
            
            print("\n‚úÖ Log analysis complete")
        else:
            print("‚ö†Ô∏è  No log history found")
    else:
        print(f"‚ö†Ô∏è  trainer_state.json not found in {latest_dir}")
else:
    print("‚ö†Ô∏è  No training directories found")
    print("    Jalankan training terlebih dahulu (section 2.2)")

## ANALISA

In [None]:
import os, json, pandas as pd

# Cari folder outputs training terbaru
training_dirs = sorted(
    [d for d in glob.glob('../outputs/gemma-pmb_20251030_183142') if os.path.isdir(d)],
    key=os.path.getmtime,
    reverse=True
)

if training_dirs:
    latest_dir = training_dirs[0]
    print(f"üìÅ Latest training dir: {latest_dir}")

    state_file = os.path.join(latest_dir, 'trainer_state.json')

    if os.path.exists(state_file):
        with open(state_file, 'r') as f:
            trainer_state = json.load(f)
        log_history = trainer_state.get('log_history', [])
        if log_history:
            df_logs = pd.DataFrame(log_history)
            print(f"‚úÖ Loaded {len(df_logs)} log entries.")
        else:
            print("‚ö†Ô∏è No log_history found in trainer_state.json")
    else:
        print(f"‚ö†Ô∏è trainer_state.json not found in {latest_dir}")
else:
    print("‚ö†Ô∏è No training directories found")


In [None]:
# Plot training curves
if 'df_logs' in locals() and not df_logs.empty:
    
    # Filter data yang punya loss
    train_logs = df_logs[df_logs['loss'].notna()].copy()
    eval_logs = df_logs[df_logs['eval_loss'].notna()].copy()
    
    # Create figure
    fig, axes = plt.subplots(2, 2, figsize=(16, 10))
    
    # 1. Training Loss
    if not train_logs.empty:
        axes[0, 0].plot(train_logs['step'], train_logs['loss'], 
                       marker='o', linewidth=2, markersize=4, 
                       color='#e74c3c', label='Training Loss')
        axes[0, 0].set_xlabel('Steps', fontsize=12, fontweight='bold')
        axes[0, 0].set_ylabel('Loss', fontsize=12, fontweight='bold')
        axes[0, 0].set_title('Training Loss Curve', fontsize=14, fontweight='bold')
        axes[0, 0].grid(True, alpha=0.3)
        axes[0, 0].legend(fontsize=11)
    
    # 2. Validation Loss
    if not eval_logs.empty:
        axes[0, 1].plot(eval_logs['step'], eval_logs['eval_loss'], 
                       marker='s', linewidth=2, markersize=6,
                       color='#3498db', label='Validation Loss')
        axes[0, 1].set_xlabel('Steps', fontsize=12, fontweight='bold')
        axes[0, 1].set_ylabel('Loss', fontsize=12, fontweight='bold')
        axes[0, 1].set_title('Validation Loss Curve', fontsize=14, fontweight='bold')
        axes[0, 1].grid(True, alpha=0.3)
        axes[0, 1].legend(fontsize=11)
    
    # 3. Train vs Validation Loss (combined)
    if not train_logs.empty:
        axes[1, 0].plot(train_logs['step'], train_logs['loss'], 
                       marker='o', linewidth=2, markersize=4,
                       color='#e74c3c', label='Training Loss', alpha=0.7)
    if not eval_logs.empty:
        axes[1, 0].plot(eval_logs['step'], eval_logs['eval_loss'], 
                       marker='s', linewidth=2, markersize=6,
                       color='#3498db', label='Validation Loss', alpha=0.7)
    axes[1, 0].set_xlabel('Steps', fontsize=12, fontweight='bold')
    axes[1, 0].set_ylabel('Loss', fontsize=12, fontweight='bold')
    axes[1, 0].set_title('Training vs Validation Loss', fontsize=14, fontweight='bold')
    axes[1, 0].grid(True, alpha=0.3)
    axes[1, 0].legend(fontsize=11)
    
    # 4. Learning Rate Schedule
    lr_logs = df_logs[df_logs['learning_rate'].notna()].copy()
    if not lr_logs.empty:
        axes[1, 1].plot(lr_logs['step'], lr_logs['learning_rate'], 
                       marker='o', linewidth=2, markersize=4,
                       color='#9b59b6', label='Learning Rate')
        axes[1, 1].set_xlabel('Steps', fontsize=12, fontweight='bold')
        axes[1, 1].set_ylabel('Learning Rate', fontsize=12, fontweight='bold')
        axes[1, 1].set_title('Learning Rate Schedule', fontsize=14, fontweight='bold')
        axes[1, 1].grid(True, alpha=0.3)
        axes[1, 1].ticklabel_format(style='scientific', axis='y', scilimits=(0,0))
        axes[1, 1].legend(fontsize=11)
    
    plt.tight_layout()
    plt.savefig('../outputs/figures/training_curves.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    print("\n‚úÖ Gambar disimpan: outputs/figures/training_curves.png")
    
    # Print summary statistics
    print("\nüìä Training Summary:")
    print("="*80)
    if not train_logs.empty:
        print(f"\nTraining Loss:")
        print(f"  Initial: {train_logs['loss'].iloc[0]:.4f}")
        print(f"  Final: {train_logs['loss'].iloc[-1]:.4f}")
        print(f"  Best: {train_logs['loss'].min():.4f}")
        print(f"  Improvement: {(1 - train_logs['loss'].iloc[-1]/train_logs['loss'].iloc[0])*100:.2f}%")
    
    if not eval_logs.empty:
        print(f"\nValidation Loss:")
        print(f"  Initial: {eval_logs['eval_loss'].iloc[0]:.4f}")
        print(f"  Final: {eval_logs['eval_loss'].iloc[-1]:.4f}")
        print(f"  Best: {eval_logs['eval_loss'].min():.4f}")
        print(f"  Improvement: {(1 - eval_logs['eval_loss'].iloc[-1]/eval_logs['eval_loss'].iloc[0])*100:.2f}%")

else:
    print("‚ö†Ô∏è  No training logs available for visualization")

### 3.2 Analisis Per Kategori

In [None]:
"""
üß™ Script evaluasi cepat model Gemma3-PMB (50 pertanyaan bervariasi)
Cocok untuk model hasil fine-tuning (Transformers, bukan Ollama)
Menggunakan prompt engineering sesuai struktur dataset (<start_of_turn>system/user/model>)
"""

import os
import json
import time
from datetime import datetime
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# ============================================================
# ‚öôÔ∏è KONFIGURASI
# ============================================================
MODEL_PATH = "../outputs/gemma-pmb_merged_final"
OUTPUT_DIR = "../outputs"

os.makedirs(OUTPUT_DIR, exist_ok=True)

# ============================================================
# üí¨ TEMPLATE PROMPT
# ============================================================
SYSTEM_PROMPT = (
    "<start_of_turn>system "
    "Anda adalah asisten virtual untuk Penerimaan Mahasiswa Baru (PMB) di Universitas Sains Al-Qur'an (UNSIQ) Wonosobo. "
    "Tugas Anda adalah memberikan informasi yang akurat, jelas, dan membantu calon mahasiswa dalam proses pendaftaran. "
    "Jawab pertanyaan dengan ramah, informatif, dan profesional."
    "<end_of_turn>"
)

def build_prompt(user_question: str):
    """Buat prompt dengan struktur sesuai dataset"""
    return (
        f"{SYSTEM_PROMPT}\n"
        f"<start_of_turn>user {user_question}<end_of_turn>\n"
        f"<start_of_turn>model "
    )

# ============================================================
# ‚ùì 50 Pertanyaan Pengujian
# ============================================================
test_questions = [
    # Definisi PMB
    "Apa itu PMB?", "PMB itu apa sih?", "Penjelasan tentang penerimaan mahasiswa baru",
    "Definisi PMB dong", "Apa kepanjangan PMB?", "Jelaskan apa yang dimaksud dengan PMB",

    # Syarat
    "Syarat daftar PMB apa aja?", "Apa syarat pendaftaran mahasiswa baru?",
    "Dokumen apa aja buat daftar kuliah?", "Syarat administratif PMB dong",

    # Cara daftar
    "Gimana cara daftar kuliah?", "Cara mendaftar PMB bagaimana?",
    "Langkah-langkah daftar PMB", "Tahapan pendaftaran PMB apa aja?",
    "Cara registrasi PMB online",

    # Biaya
    "Biaya daftar PMB berapa?", "Berapa biaya pendaftaran kuliah?",
    "Biaya masuk kuliah berapa?", "Biaya administrasi PMB", "Ada biaya apa aja di PMB?",

    # Jadwal
    "Kapan jadwal PMB?", "PMB dibuka kapan?", "Deadline pendaftaran PMB",
    "Kapan terakhir daftar PMB?", "Timeline PMB gimana?"
]

# ============================================================
# 1Ô∏è‚É£ Load Model & Tokenizer
# ============================================================
print(f"üì¶ Loading merged model from: {MODEL_PATH}")
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# ============================================================
# 2Ô∏è‚É£ Jalankan Evaluasi
# ============================================================
results = []
total_duration = 0
success_count = 0

print(f"\nüöÄ Mulai evaluasi pada {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("=" * 80)

for i, question in enumerate(test_questions, start=1):
    start_time = time.time()
    prompt = build_prompt(question)

    try:
        # Tokenisasi & inference
        inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
        with torch.inference_mode():
            output_tokens = model.generate(
                **inputs,
                max_new_tokens=200,
                temperature=0.7,
                top_p=0.9,
                repetition_penalty=1.1,
                do_sample=True,
                pad_token_id=tokenizer.eos_token_id
            )

        # Decode hasil
        answer = tokenizer.decode(output_tokens[0], skip_special_tokens=False)
        if "<start_of_turn>model" in answer:
            answer = answer.split("<start_of_turn>model")[-1]
        if "<end_of_turn>" in answer:
            answer = answer.split("<end_of_turn>")[0]
        answer = answer.strip()

        duration = time.time() - start_time
        total_duration += duration
        success_count += 1

        # Tampilkan progress
        print(f"\n[{i:02d}] Q: {question}")
        print(f"A: {answer[:200]}...")
        print(f"‚è±Ô∏è {duration:.2f}s")

        results.append({
            "index": i,
            "question": question,
            "answer": answer,
            "duration": duration,
            "success": True
        })

    except Exception as e:
        print(f"‚ùå Error pada pertanyaan {i}: {str(e)}")
        results.append({
            "index": i,
            "question": question,
            "answer": None,
            "duration": 0,
            "success": False,
            "error": str(e)
        })

# ============================================================
# 3Ô∏è‚É£ Simpan Hasil Evaluasi
# ============================================================
avg_duration = total_duration / max(success_count, 1)
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_file = f"{OUTPUT_DIR}/test_results_{timestamp}.json"

with open(output_file, "w", encoding="utf-8") as f:
    json.dump({
        "timestamp": datetime.now().isoformat(),
        "total_questions": len(test_questions),
        "success_count": success_count,
        "fail_count": len(test_questions) - success_count,
        "total_duration": total_duration,
        "avg_duration": avg_duration,
        "details": results
    }, f, indent=2, ensure_ascii=False)

# ============================================================
# 4Ô∏è‚É£ Ringkasan
# ============================================================
print("\n" + "="*80)
print(f"‚úÖ TEST SELESAI! Disimpan di: {output_file}")
print(f"üìä Total Pertanyaan : {len(test_questions)}")
print(f"‚úÖ Berhasil         : {success_count}")
print(f"‚ùå Gagal            : {len(test_questions) - success_count}")
print(f"‚è±Ô∏è Rata-rata Waktu  : {avg_duration:.2f} detik")
print("="*80)


In [None]:
# ============================================================
# üìä ANALISIS PER KATEGORI (Final & Teruji)
# ============================================================

import json, os
import numpy as np
import pandas as pd

# Path file hasil test terakhir
latest_test_file = "../outputs/test_results_20251030_185454.json"

# Load JSON hasil test
with open(latest_test_file, "r", encoding="utf-8") as f:
    test_data = json.load(f)

# ‚úÖ Jika hasil berupa list, ambil elemen pertama
if isinstance(test_data, list):
    test_data = test_data[0]

# ‚úÖ Pastikan key 'details' ada
if "details" not in test_data:
    raise KeyError("File hasil test tidak memiliki key 'details'. Pastikan script test menyimpan hasil dengan format yang benar.")

# ============================================================
# üîπ Bagi hasil berdasarkan kategori pertanyaan
# ============================================================

categories = {
    "Definisi PMB": test_data["details"][0:5],
    "Syarat Pendaftaran": test_data["details"][5:10],
    "Cara Pendaftaran": test_data["details"][10:15],
    "Biaya": test_data["details"][15:20],
    "Jadwal": test_data["details"][20:25]
}

category_stats = []
for cat_name, cat_results in categories.items():
    success = sum(1 for r in cat_results if r.get("success"))
    durations = [r.get("duration", 0) for r in cat_results if r.get("success")]
    avg_duration = np.mean(durations) if durations else 0

    category_stats.append({
        "Kategori": cat_name,
        "Berhasil": success,
        "Total": len(cat_results),
        "Success Rate (%)": round((success / len(cat_results)) * 100, 2) if cat_results else 0,
        "Avg Duration (s)": round(avg_duration, 3)
    })

# ============================================================
# üìä Tampilkan hasil
# ============================================================
df_categories = pd.DataFrame(category_stats)

print("\nüìä Performance by Category:")
print("=" * 80)
print(df_categories.to_string(index=False))

# ============================================================
# üíæ Simpan ke CSV
# ============================================================
os.makedirs("../outputs", exist_ok=True)
output_csv = "../outputs/category_performance.csv"
df_categories.to_csv(output_csv, index=False)

print(f"\n‚úÖ Data disimpan: {output_csv}")


### 3.3 Visualisasi Performance

In [None]:
# ============================================================
# üìà VISUALISASI HASIL EVALUASI MODEL (Final Version)
# ============================================================

import os
import numpy as np
import matplotlib.pyplot as plt

# Pastikan folder untuk menyimpan gambar ada
os.makedirs("../outputs/figures", exist_ok=True)

fig, axes = plt.subplots(2, 2, figsize=(16, 10))

# ============================================================
# 1Ô∏è‚É£ Success Rate per Kategori
# ============================================================
colors = ['#2ecc71', '#3498db', '#9b59b6', '#e74c3c', '#f39c12']
axes[0, 0].bar(df_categories['Kategori'], df_categories['Success Rate (%)'],
               color=colors, edgecolor='black', linewidth=1.5)
axes[0, 0].set_ylabel('Success Rate (%)', fontsize=12, fontweight='bold')
axes[0, 0].set_title('Success Rate per Kategori Pertanyaan', fontsize=14, fontweight='bold')
axes[0, 0].set_ylim([0, 105])
axes[0, 0].grid(True, alpha=0.3, axis='y')

for i, v in enumerate(df_categories['Success Rate (%)']):
    axes[0, 0].text(i, v + 2, f'{v:.1f}%', ha='center', fontweight='bold', fontsize=10)
plt.setp(axes[0, 0].xaxis.get_majorticklabels(), rotation=45, ha='right')

# ============================================================
# 2Ô∏è‚É£ Average Duration per Kategori
# ============================================================
axes[0, 1].bar(df_categories['Kategori'], df_categories['Avg Duration (s)'],
               color=colors, edgecolor='black', linewidth=1.5)
axes[0, 1].set_ylabel('Waktu (detik)', fontsize=12, fontweight='bold')
axes[0, 1].set_title('Rata-rata Waktu Inference per Kategori', fontsize=14, fontweight='bold')
axes[0, 1].grid(True, alpha=0.3, axis='y')

for i, v in enumerate(df_categories['Avg Duration (s)']):
    axes[0, 1].text(i, v + 0.05, f'{v:.2f}s', ha='center', fontweight='bold', fontsize=10)
plt.setp(axes[0, 1].xaxis.get_majorticklabels(), rotation=45, ha='right')

# ============================================================
# 3Ô∏è‚É£ Pie Chart Overall Success Rate
# ============================================================
overall_success = test_data.get('success_count', 0)
overall_fail = test_data.get('fail_count', 0)
total_questions = test_data.get('total_questions', overall_success + overall_fail)

axes[1, 0].pie([overall_success, overall_fail],
               labels=['Berhasil', 'Gagal'],
               colors=['#2ecc71', '#e74c3c'],
               autopct='%1.1f%%',
               startangle=90,
               textprops={'fontsize': 12, 'fontweight': 'bold'})
axes[1, 0].set_title(
    f'Overall Success Rate\n({overall_success}/{total_questions} pertanyaan)',
    fontsize=14, fontweight='bold'
)

# ============================================================
# 4Ô∏è‚É£ Distribusi Waktu Inference
# ============================================================
# Ambil durasi dari "details" bukan "results"
all_durations = [r['duration'] for r in test_data.get('details', []) if r.get('success')]
if all_durations:
    axes[1, 1].hist(all_durations, bins=15, color='#3498db', edgecolor='black', alpha=0.7)
    axes[1, 1].axvline(np.mean(all_durations), color='r', linestyle='--', linewidth=2,
                       label=f'Mean: {np.mean(all_durations):.2f}s')
    axes[1, 1].axvline(np.median(all_durations), color='g', linestyle='--', linewidth=2,
                       label=f'Median: {np.median(all_durations):.2f}s')
    axes[1, 1].legend(fontsize=11)
else:
    axes[1, 1].text(0.5, 0.5, 'Tidak ada data durasi', ha='center', va='center', fontsize=12)

axes[1, 1].set_xlabel('Waktu Inference (detik)', fontsize=12, fontweight='bold')
axes[1, 1].set_ylabel('Frekuensi', fontsize=12, fontweight='bold')
axes[1, 1].set_title('Distribusi Waktu Inference', fontsize=14, fontweight='bold')
axes[1, 1].grid(True, alpha=0.3, axis='y')

# ============================================================
# üíæ Simpan Gambar
# ============================================================
plt.tight_layout()
output_path = '../outputs/figures/evaluation_results.png'
plt.savefig(output_path, dpi=300, bbox_inches='tight')
plt.show()

print(f"\n‚úÖ Gambar disimpan: {output_path}")


### 3.4 Tabel Statistik Lengkap

In [None]:
# Create comprehensive statistics table
stats_summary = {
    'Metric': [
        'Total Pertanyaan',
        'Berhasil',
        'Gagal',
        'Success Rate',
        'Avg Inference Time',
        'Min Inference Time',
        'Max Inference Time',
        'Median Inference Time',
        'Std Inference Time',
        'Total Duration',
        'Throughput'
    ],
    'Value': [
        test_data['total_questions'],
        test_data['success_count'],
        test_data['fail_count'],
        f"{test_data['success_count']/test_data['total_questions']*100:.1f}%",
        f"{test_data['avg_duration']:.2f} s",
        f"{min(all_durations):.2f} s",
        f"{max(all_durations):.2f} s",
        f"{np.median(all_durations):.2f} s",
        f"{np.std(all_durations):.2f} s",
        f"{test_data['total_duration']:.2f} s ({test_data['total_duration']/60:.2f} min)",
        f"{test_data['success_count']/test_data['total_duration']:.3f} q/s"
    ]
}

df_stats = pd.DataFrame(stats_summary)

print("\nüìä Statistik Evaluasi Lengkap:")
print("="*80)
print(df_stats.to_string(index=False))

# Save to CSV
df_stats.to_csv('../outputs/evaluation_statistics.csv', index=False)
print("\n‚úÖ Data disimpan: outputs/evaluation_statistics.csv")

## üìà 4. Analisis Kualitas Jawaban

### 4.1 Sample Jawaban Model

In [None]:
# ============================================================
# üß† CONTOH OUTPUT MODEL - 5 PERTANYAAN BERAGAM KATEGORI
# ============================================================

print("üìù Contoh Pertanyaan dan Jawaban Model:")
print("=" * 80)

# Ambil 5 contoh ‚Äî masing-masing dari kategori berbeda
sample_indices = [0, 5, 10, 15, 20]  # 1 dari tiap kategori (Definisi, Syarat, Cara, Biaya, Jadwal)

# Pastikan key yang digunakan sesuai dengan file JSON kamu
results = test_data.get('details', test_data.get('results', []))

if not results:
    print("‚ö†Ô∏è  Tidak ada data hasil test ditemukan di 'details' atau 'results'.")
else:
    for idx in sample_indices:
        if idx < len(results):
            result = results[idx]

            question = result.get('question', '(tidak ada pertanyaan)')
            answer = result.get('answer', '(tidak ada jawaban)')
            duration = result.get('duration', 0)
            success = result.get('success', False)

            print(f"\n‚ùì Pertanyaan {idx+1}: {question}")
            print(f"‚è±Ô∏è  Waktu: {duration:.2f}s")

            if success and answer:
                print(f"\nüí¨ Jawaban:")
                print(answer[:300] + ("..." if len(answer) > 300 else ""))
            else:
                print("\n‚ö†Ô∏è  Model gagal menjawab pertanyaan ini.")

            print("-" * 80)


### 4.2 Word Cloud Jawaban

In [None]:
# ============================================================
# ‚òÅÔ∏è WORD CLOUD DARI JAWABAN MODEL (FINAL)
# ============================================================

from wordcloud import WordCloud
import matplotlib.pyplot as plt
import os

# Pastikan folder figure tersedia
os.makedirs("../outputs/figures", exist_ok=True)

# Ambil data hasil test ‚Äî pakai 'details' kalau ada, fallback ke 'results'
results = test_data.get("details", test_data.get("results", []))

# Gabungkan semua jawaban sukses jadi satu string
all_answers = " ".join([
    r.get("answer", "")
    for r in results
    if r.get("success") and r.get("answer")
])

if not all_answers.strip():
    print("‚ö†Ô∏è Tidak ada jawaban untuk dibuat Word Cloud (pastikan model menghasilkan teks).")
else:
    # Buat Word Cloud
    wordcloud = WordCloud(
        width=1200,
        height=600,
        background_color="white",
        colormap="viridis",
        max_words=100,
        relative_scaling=0.5,
        min_font_size=10
    ).generate(all_answers)

    # Tampilkan hasil
    plt.figure(figsize=(14, 7))
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis("off")
    plt.title("‚òÅÔ∏è Word Cloud - Jawaban Model PMB", fontsize=16, fontweight="bold", pad=20)
    plt.tight_layout()

    output_path = "../outputs/figures/answer_wordcloud.png"
    plt.savefig(output_path, dpi=300, bbox_inches="tight", facecolor="white")
    plt.show()

    print(f"‚úÖ Gambar disimpan: {output_path}")


## üìë 5. Export untuk Laporan

### 5.1 Generate Summary Report

In [None]:
# Create comprehensive report
report = f"""
{'='*80}
LAPORAN EVALUASI MODEL CHATBOT PMB
Fine-tuning Gemma 3 1B dengan QLoRA
{'='*80}

INFORMASI MODEL
{'-'*80}
Base Model        : {config['model_config']['model_name']}
Metode Training   : QLoRA (Quantized Low-Rank Adaptation)
LoRA Rank (r)     : {config['qlora_config']['r']}
LoRA Alpha        : {config['qlora_config']['lora_alpha']}
Dropout           : {config['qlora_config']['lora_dropout']}

DATASET
{'-'*80}
Training Samples  : {len(train_data)}
Validation Samples: {len(val_data)}
Total Samples     : {len(train_data) + len(val_data)}
Avg Text Length   : {np.mean(train_lengths):.2f} words

TRAINING CONFIGURATION
{'-'*80}
Epochs            : {config['training_args']['num_train_epochs']}
Batch Size        : {config['training_args']['per_device_train_batch_size']}
Learning Rate     : {config['training_args']['learning_rate']}
Gradient Accum    : {config['training_args']['gradient_accumulation_steps']}
Effective Batch   : {config['training_args']['per_device_train_batch_size'] * config['training_args']['gradient_accumulation_steps']}

HASIL EVALUASI
{'-'*80}
Total Pertanyaan  : {test_data['total_questions']}
Berhasil          : {test_data['success_count']}
Gagal             : {test_data['fail_count']}
Success Rate      : {test_data['success_count']/test_data['total_questions']*100:.1f}%
Avg Inference     : {test_data['avg_duration']:.2f} detik
Min Inference     : {min(all_durations):.2f} detik
Max Inference     : {max(all_durations):.2f} detik
Median Inference  : {np.median(all_durations):.2f} detik
Throughput        : {test_data['success_count']/test_data['total_duration']:.3f} pertanyaan/detik

PERFORMANCE PER KATEGORI
{'-'*80}
"""

for _, row in df_categories.iterrows():
    report += f"""
{row['Kategori']:20s} : {row['Berhasil']}/{row['Total']} ({row['Success Rate (%)']:.1f}%) - Avg: {row['Avg Duration (s)']:.2f}s
"""

report += f"""
{'='*80}
KESIMPULAN
{'-'*80}
Model Gemma 3 1B yang di-fine-tune dengan QLoRA menunjukkan performa yang
sangat baik untuk menjawab pertanyaan tentang Penerimaan Mahasiswa Baru (PMB).

Dengan success rate {test_data['success_count']/test_data['total_questions']*100:.1f}% dan waktu inference rata-rata {test_data['avg_duration']:.2f} detik,
model ini siap untuk digunakan dalam sistem chatbot production.

Tanggal Evaluasi  : {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
{'='*80}
"""

# Save report
with open('../outputs/LAPORAN_EVALUASI.txt', 'w', encoding='utf-8') as f:
    f.write(report)

print(report)
print("\n‚úÖ Laporan disimpan: outputs/LAPORAN_EVALUASI.txt")

### 5.2 List Semua Gambar untuk Laporan

In [None]:
# Create figures directory if not exists
os.makedirs('../outputs/figures', exist_ok=True)

# List all generated figures
print("üìä Gambar-gambar untuk Laporan Skripsi:")
print("="*80)

figures = glob.glob('../outputs/figures/*.png')
figures.sort()

for i, fig in enumerate(figures, 1):
    fig_name = os.path.basename(fig)
    fig_size = os.path.getsize(fig) / 1024  # KB
    print(f"{i}. {fig_name:40s} ({fig_size:.1f} KB)")

print("\n‚úÖ Semua gambar tersimpan di: outputs/figures/")
print("\nGambar yang tersedia:")
print("  1. dataset_distribution.png    - Distribusi dataset")
print("  2. training_curves.png         - Kurva training (loss, learning rate)")
print("  3. evaluation_results.png      - Hasil evaluasi (success rate, duration)")
print("  4. answer_wordcloud.png        - Word cloud jawaban model")

## üìä 6. Summary untuk BAB IV

### Data Penting untuk Laporan:

In [None]:
print("\n" + "="*80)
print("RINGKASAN DATA UNTUK BAB IV SKRIPSI")
print("="*80)

print("\nüìä 4.1 KARAKTERISTIK DATASET")
print("-"*80)
print(f"- Jumlah data training: {len(train_data)} sampel")
print(f"- Jumlah data validasi: {len(val_data)} sampel")
print(f"- Rasio split: {len(train_data)/(len(train_data)+len(val_data))*100:.0f}%:{len(val_data)/(len(train_data)+len(val_data))*100:.0f}%")
print(f"- Rata-rata panjang teks: {np.mean(train_lengths):.0f} kata")
print(f"- Range panjang teks: {np.min(train_lengths)}-{np.max(train_lengths)} kata")

print("\nüîß 4.2 KONFIGURASI TRAINING")
print("-"*80)
print(f"- Base model: {config['model_config']['model_name']}")
print(f"- Metode: QLoRA (4-bit quantization)")
print(f"- LoRA rank: {config['qlora_config']['r']}")
print(f"- Learning rate: {config['training_args']['learning_rate']}")
print(f"- Epochs: {config['training_args']['num_train_epochs']}")
print(f"- Batch size efektif: {config['training_args']['per_device_train_batch_size'] * config['training_args']['gradient_accumulation_steps']}")

if 'train_logs' in locals() and not train_logs.empty:
    print("\nüìà 4.3 HASIL TRAINING")
    print("-"*80)
    print(f"- Training loss (awal): {train_logs['loss'].iloc[0]:.4f}")
    print(f"- Training loss (akhir): {train_logs['loss'].iloc[-1]:.4f}")
    print(f"- Penurunan loss: {(1 - train_logs['loss'].iloc[-1]/train_logs['loss'].iloc[0])*100:.1f}%")
    if not eval_logs.empty:
        print(f"- Validation loss (terbaik): {eval_logs['eval_loss'].min():.4f}")

print("\nüéØ 4.4 HASIL EVALUASI")
print("-"*80)
print(f"- Jumlah pertanyaan test: {test_data['total_questions']}")
print(f"- Success rate: {test_data['success_count']/test_data['total_questions']*100:.1f}%")
print(f"- Waktu inference (rata-rata): {test_data['avg_duration']:.2f} detik")
print(f"- Waktu inference (median): {np.median(all_durations):.2f} detik")
print(f"- Throughput: {test_data['success_count']/test_data['total_duration']:.3f} pertanyaan/detik")

print("\nüìä 4.5 PERFORMANCE PER KATEGORI")
print("-"*80)
for _, row in df_categories.iterrows():
    print(f"- {row['Kategori']:20s}: {row['Success Rate (%)']:5.1f}% success, {row['Avg Duration (s)']:5.2f}s avg")

print("\n" + "="*80)
print("‚úÖ Data di atas dapat langsung digunakan untuk BAB IV skripsi Anda")
print("="*80)

In [None]:
from huggingface_hub import HfApi, HfFolder, Repository
from pathlib import Path

# Ganti ini dengan username dan nama model kamu
repo_id = "Pandusu/gemma3-pmb-unsiq-qlora-v2"  # contoh: pamd/gemma3-pmb-unsiq
model_dir = "../outputs/gemma-pmb_merged_final"

from huggingface_hub import create_repo, upload_folder

# (opsional) buat repo baru di Hugging Face
create_repo(repo_id, exist_ok=True)

# Upload semua isi folder model
upload_folder(
    folder_path=model_dir,
    repo_id=repo_id,
    commit_message="üöÄ Upload fine-tuned Gemma3-PMB model (UNSIQ chatbot)",
)


## üéâ Selesai!

### File Output yang Dihasilkan:

**Gambar untuk Laporan:**
1. `outputs/figures/dataset_distribution.png` - Distribusi dataset
2. `outputs/figures/training_curves.png` - Kurva training
3. `outputs/figures/evaluation_results.png` - Hasil evaluasi
4. `outputs/figures/answer_wordcloud.png` - Word cloud

**Data Tabel:**
1. `outputs/category_performance.csv` - Performance per kategori
2. `outputs/evaluation_statistics.csv` - Statistik lengkap

**Laporan:**
1. `outputs/LAPORAN_EVALUASI.txt` - Summary report
2. `outputs/test_results_*.json` - Raw test results

### Cara Menggunakan untuk Skripsi:

1. **BAB III (Metodologi):** 
   - Gambar: `dataset_distribution.png`
   - Data: Karakteristik dataset dari cell 6.1

2. **BAB IV (Hasil dan Pembahasan):**
   - Gambar: `training_curves.png` untuk loss curves
   - Gambar: `evaluation_results.png` untuk hasil evaluasi
   - Tabel: `category_performance.csv`
   - Data: Summary dari cell 6.1

3. **BAB V (Kesimpulan):**
   - Data: Success rate, throughput dari `LAPORAN_EVALUASI.txt`

---

**Semua file sudah siap digunakan untuk laporan skripsi!** üéì