# Fine-tuning Gemma 3 1B dengan QLoRA - Konfigurasi Optimal untuk Kualitas Terbaik

**Model**: Google Gemma 3 1B-IT (Instruction Tuned)

**Optimized for**: Kualitas output terbaik dengan A100 GPU

## Spesifikasi Gemma 3 1B:
- 1 Billion parameters
- Context: 32K tokens input, 8K output
- Trained on: 2 trillion tokens, 140+ languages
- Native BF16 support (trained on TPU)
- Multimodal: Text + Images

## Konfigurasi Optimal:
- **LoRA Rank**: 64 (high capacity)
- **Learning Rate**: 2e-4 (recommended by Google)
- **Max Length**: 1024 tokens
- **Epochs**: 3 (balanced quality)
- **Effective Batch Size**: 16
- **All Checkpoints Saved**: Unlimited

## 1. Install Dependencies

In [None]:
!pip install -q -U torch>=2.4.0 transformers>=4.50.0 accelerate bitsandbytes peft datasets trl tensorboard sentencepiece

## 2. Import Libraries

In [None]:
import torch
import json
import os
from pathlib import Path
from datetime import datetime

from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
    TrainingArguments,
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from datasets import load_dataset
from trl import SFTTrainer, SFTConfig

print(f"PyTorch version: {torch.__version__}")
print(f"Transformers version: {__import__('transformers').__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    gpu_props = torch.cuda.get_device_properties(0)
    print(f"GPU Memory: {gpu_props.total_memory / 1024**3:.2f} GB")
    print(f"GPU Compute Capability: {gpu_props.major}.{gpu_props.minor}")

## 3. Load Optimal Configuration for Gemma 3 1B

In [None]:
# Load optimal config
config_path = "qlora_config_gemma3_1b_OPTIMAL.json"

with open(config_path, 'r', encoding='utf-8') as f:
    config = json.load(f)

print("\n" + "="*80)
print("OPTIMAL CONFIGURATION FOR GEMMA 3 1B - BEST QUALITY")
print("="*80)
print(f"\nModel: {config['model_config']['model_name']}")
print(f"\nLoRA Configuration:")
print(f"  - Rank (r): {config['qlora_config']['r']}")
print(f"  - Alpha: {config['qlora_config']['lora_alpha']}")
print(f"  - Dropout: {config['qlora_config']['lora_dropout']}")
print(f"  - Target modules: {', '.join(config['qlora_config']['target_modules'])}")
print(f"\nTraining Configuration:")
print(f"  - Learning Rate: {config['training_args']['learning_rate']}")
print(f"  - Epochs: {config['training_args']['num_train_epochs']}")
print(f"  - Batch Size per Device: {config['training_args']['per_device_train_batch_size']}")
print(f"  - Gradient Accumulation: {config['training_args']['gradient_accumulation_steps']}")
print(f"  - Effective Batch Size: {config['training_args']['per_device_train_batch_size'] * config['training_args']['gradient_accumulation_steps']}")
print(f"  - Max Sequence Length: {config['dataset_config']['max_length']} tokens")
print(f"  - Save Steps: {config['training_args']['save_steps']}")
print(f"  - Save Total Limit: Unlimited (all checkpoints saved)")
print("\nRecommendations:")
for note in config['notes']['recommendations']:
    print(f"  ‚úì {note}")
print("="*80)

## 4. Setup Model & Tokenizer dengan QLoRA 4-bit

In [None]:
model_name = config['model_config']['model_name']

# Quantization config untuk 4-bit NF4 dengan double quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=config['quantization_config']['load_in_4bit'],
    bnb_4bit_compute_dtype=getattr(torch, config['quantization_config']['bnb_4bit_compute_dtype']),
    bnb_4bit_quant_type=config['quantization_config']['bnb_4bit_quant_type'],
    bnb_4bit_use_double_quant=config['quantization_config']['bnb_4bit_use_double_quant'],
)

print(f"\nLoading Gemma 3 1B model: {model_name}")
print("This may take a few minutes...\n")

# Load model dengan 4-bit quantization
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=config['model_config']['trust_remote_code'],
    torch_dtype=getattr(torch, config['model_config']['torch_dtype']),
    use_cache=config['model_config']['use_cache'],
    attn_implementation=config['model_config'].get('attn_implementation', 'eager'),
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
)

# Set padding
tokenizer.padding_side = 'right'
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

print(f"\n‚úì Model loaded successfully!")
print(f"  Total parameters: {model.num_parameters():,}")
print(f"  Tokenizer vocab size: {len(tokenizer):,}")
print(f"  Model dtype: {model.dtype}")
print(f"  Device map: {model.hf_device_map}")

## 5. Setup LoRA - High Rank untuk Kualitas Maksimal

In [None]:
# Prepare model for k-bit training
print("Preparing model for QLoRA training...")
model = prepare_model_for_kbit_training(
    model,
    use_gradient_checkpointing=config['training_args']['gradient_checkpointing']
)

# LoRA config dengan rank tinggi untuk kualitas terbaik
lora_config = LoraConfig(
    r=config['qlora_config']['r'],
    lora_alpha=config['qlora_config']['lora_alpha'],
    lora_dropout=config['qlora_config']['lora_dropout'],
    bias=config['qlora_config']['bias'],
    task_type=config['qlora_config']['task_type'],
    target_modules=config['qlora_config']['target_modules'],
    modules_to_save=config['qlora_config'].get('modules_to_save', None),
)

# Apply LoRA adapters
model = get_peft_model(model, lora_config)

# Calculate trainable parameters
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
all_params = sum(p.numel() for p in model.parameters())
trainable_percent = 100 * trainable_params / all_params

print(f"\n‚úì LoRA adapters applied!")
print(f"\nTrainable parameters breakdown:")
print(f"  Trainable params: {trainable_params:,}")
print(f"  All params: {all_params:,}")
print(f"  Trainable%: {trainable_percent:.4f}%")
print(f"\nWith LoRA rank {config['qlora_config']['r']}, you have HIGH capacity for learning!")

## 6. Load Dataset UNSIQ

In [None]:
# Load datasets
print("Loading UNSIQ datasets...\n")

train_dataset = load_dataset(
    'json',
    data_files=config['dataset_config']['train_file'],
    split='train'
)

eval_dataset = load_dataset(
    'json',
    data_files=config['dataset_config']['eval_file'],
    split='train'
)

print(f"‚úì Datasets loaded:")
print(f"  Train: {len(train_dataset):,} samples")
print(f"  Eval: {len(eval_dataset):,} samples")
print(f"  Total: {len(train_dataset) + len(eval_dataset):,} samples")

# Show sample
print("\nSample data structure:")
print(json.dumps(train_dataset[0], indent=2, ensure_ascii=False))

## 7. Format Dataset dengan Chat Template

In [None]:
def format_chat_template(example):
    """
    Format messages menggunakan Gemma 3 chat template
    """
    messages = example['messages']
    
    # Apply tokenizer's chat template
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=False
    )
    
    return {'text': text}

print("Formatting datasets with Gemma 3 chat template...")

# Apply formatting
train_dataset = train_dataset.map(
    format_chat_template,
    desc="Formatting train dataset"
)

eval_dataset = eval_dataset.map(
    format_chat_template,
    desc="Formatting eval dataset"
)

print("\n‚úì Datasets formatted!")
print("\nFormatted example (first 500 chars):")
print("-" * 80)
print(train_dataset[0]['text'][:500])
print("...")
print("-" * 80)

## 8. Training Configuration - Optimized for Quality

In [None]:
training_args_config = config['training_args']

# Create output directory
os.makedirs(training_args_config['output_dir'], exist_ok=True)

# Training arguments
training_args = TrainingArguments(
    # Output
    output_dir=training_args_config['output_dir'],
    overwrite_output_dir=training_args_config['overwrite_output_dir'],
    
    # Training
    num_train_epochs=training_args_config['num_train_epochs'],
    per_device_train_batch_size=training_args_config['per_device_train_batch_size'],
    per_device_eval_batch_size=training_args_config['per_device_eval_batch_size'],
    gradient_accumulation_steps=training_args_config['gradient_accumulation_steps'],
    gradient_checkpointing=training_args_config['gradient_checkpointing'],
    gradient_checkpointing_kwargs=training_args_config.get('gradient_checkpointing_kwargs', {}),
    
    # Optimization
    optim=training_args_config['optim'],
    learning_rate=training_args_config['learning_rate'],
    weight_decay=training_args_config['weight_decay'],
    max_grad_norm=training_args_config['max_grad_norm'],
    
    # Scheduler
    lr_scheduler_type=training_args_config['lr_scheduler_type'],
    warmup_ratio=training_args_config['warmup_ratio'],
    warmup_steps=training_args_config.get('warmup_steps', 0),
    
    # Evaluation
    eval_strategy=training_args_config['eval_strategy'],
    eval_steps=training_args_config['eval_steps'],
    
    # Checkpointing - SAVE ALL
    save_strategy=training_args_config['save_strategy'],
    save_steps=training_args_config['save_steps'],
    save_total_limit=training_args_config['save_total_limit'],  # None = unlimited
    load_best_model_at_end=training_args_config['load_best_model_at_end'],
    metric_for_best_model=training_args_config['metric_for_best_model'],
    greater_is_better=training_args_config.get('greater_is_better', False),
    
    # Logging
    logging_strategy=training_args_config['logging_strategy'],
    logging_steps=training_args_config['logging_steps'],
    logging_first_step=training_args_config.get('logging_first_step', True),
    report_to=training_args_config['report_to'],
    
    # Mixed Precision
    bf16=training_args_config['bf16'],
    bf16_full_eval=training_args_config['bf16_full_eval'],
    
    # Data loading
    dataloader_num_workers=training_args_config['dataloader_num_workers'],
    dataloader_pin_memory=training_args_config.get('dataloader_pin_memory', True),
    group_by_length=training_args_config['group_by_length'],
    
    # DDP
    ddp_find_unused_parameters=training_args_config['ddp_find_unused_parameters'],
    
    # Reproducibility
    seed=training_args_config.get('seed', 42),
    data_seed=training_args_config.get('data_seed', 42),
)

# Calculate training stats
steps_per_epoch = len(train_dataset) // (
    training_args.per_device_train_batch_size * 
    training_args.gradient_accumulation_steps
)
total_steps = steps_per_epoch * training_args.num_train_epochs
num_checkpoints = total_steps // training_args.save_steps

print("\n" + "="*80)
print("TRAINING PLAN")
print("="*80)
print(f"Steps per epoch: {steps_per_epoch}")
print(f"Total training steps: {total_steps}")
print(f"Warmup steps: {int(total_steps * training_args.warmup_ratio)}")
print(f"Eval every: {training_args.eval_steps} steps")
print(f"Save checkpoint every: {training_args.save_steps} steps")
print(f"Expected checkpoints: ~{num_checkpoints}")
print(f"Save limit: {'ALL CHECKPOINTS' if training_args.save_total_limit is None else training_args.save_total_limit}")
print("="*80)

## 9. Initialize SFTTrainer

In [None]:
print("Initializing SFTTrainer...\n")

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
    dataset_text_field='text',
    max_seq_length=config['dataset_config']['max_length'],
    packing=config['dataset_config'].get('packing', False),
)

print("‚úì Trainer initialized successfully!")
print("\nReady to start training...")

## 10. Start Training - Monitoring Real-time

In [None]:
print("\n" + "="*80)
print(f"STARTING TRAINING - {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("="*80)
print("\nTraining Gemma 3 1B on UNSIQ dataset...")
print("Monitor progress in TensorBoard: tensorboard --logdir " + training_args.output_dir)
print("\n" + "="*80 + "\n")

# Start training
train_result = trainer.train()

print("\n" + "="*80)
print(f"TRAINING COMPLETED - {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("="*80)
print(f"\nFinal Training Loss: {train_result.training_loss:.4f}")
print(f"Training Runtime: {train_result.metrics['train_runtime']:.2f} seconds")
print(f"Samples per second: {train_result.metrics['train_samples_per_second']:.2f}")
print(f"Steps per second: {train_result.metrics['train_steps_per_second']:.2f}")
print("="*80)

## 11. Save Final Model & Metrics

In [None]:
# Save final adapter
final_output_dir = f"{training_args.output_dir}/final_adapter"
print(f"Saving final adapter to: {final_output_dir}")

trainer.model.save_pretrained(final_output_dir)
tokenizer.save_pretrained(final_output_dir)

# Save training config
config_save_path = f"{training_args.output_dir}/training_config.json"
with open(config_save_path, 'w', encoding='utf-8') as f:
    json.dump(config, f, indent=2, ensure_ascii=False)

# Save training metrics
metrics_file = f"{training_args.output_dir}/training_metrics.json"
with open(metrics_file, 'w', encoding='utf-8') as f:
    json.dump(train_result.metrics, f, indent=2)

print(f"\n‚úì Final adapter saved: {final_output_dir}")
print(f"‚úì Training config saved: {config_save_path}")
print(f"‚úì Training metrics saved: {metrics_file}")

## 12. Final Evaluation

In [None]:
print("\nRunning final evaluation...\n")

eval_results = trainer.evaluate()

print("="*80)
print("FINAL EVALUATION RESULTS")
print("="*80)
for key, value in eval_results.items():
    print(f"{key}: {value}")
print("="*80)

# Save evaluation results
eval_file = f"{training_args.output_dir}/final_eval_results.json"
with open(eval_file, 'w', encoding='utf-8') as f:
    json.dump(eval_results, f, indent=2)

print(f"\n‚úì Evaluation results saved: {eval_file}")

## 13. Test Inference - Quality Check

In [None]:
def generate_response(question, max_new_tokens=512):
    """
    Generate response using fine-tuned model
    """
    messages = [
        {
            "role": "system",
            "content": "Anda adalah asisten informasi UNSIQ (Universitas Sains Al-Qur'an) yang membantu menjawab pertanyaan tentang biaya kuliah, program studi, dan informasi akademik."
        },
        {
            "role": "user",
            "content": question
        }
    ]
    
    # Format dengan chat template
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    
    # Tokenize
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    
    # Generate
    gen_config = config.get('generation_config', {})
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=gen_config.get('max_new_tokens', max_new_tokens),
            temperature=gen_config.get('temperature', 0.7),
            top_p=gen_config.get('top_p', 0.9),
            top_k=gen_config.get('top_k', 50),
            repetition_penalty=gen_config.get('repetition_penalty', 1.1),
            do_sample=gen_config.get('do_sample', True),
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )
    
    # Decode hanya generated tokens
    generated_ids = outputs[0][inputs['input_ids'].shape[1]:]
    response = tokenizer.decode(generated_ids, skip_special_tokens=True)
    
    return response.strip()

# Test dengan berbagai pertanyaan
test_questions = [
    "Berapa total biaya kuliah S1 Akuntansi di UNSIQ untuk 8 semester?",
    "Apa itu KIP Kuliah dan bagaimana cara mendaftarnya?",
    "Mengapa biaya semester 1 lebih mahal dari semester lainnya?",
    "Program studi apa saja yang tersedia di UNSIQ?",
    "Bagaimana sistem pembayaran kuliah di UNSIQ?"
]

print("\n" + "="*80)
print("TESTING FINE-TUNED MODEL - QUALITY CHECK")
print("="*80)

for i, question in enumerate(test_questions, 1):
    print(f"\n{'='*80}")
    print(f"TEST {i}/{len(test_questions)}")
    print(f"{'='*80}")
    print(f"\n‚ùì QUESTION: {question}")
    print(f"\nü§ñ ANSWER:")
    
    response = generate_response(question)
    print(response)
    print(f"\n{'-'*80}")

## 14. List All Saved Checkpoints

In [None]:
import os

output_dir = training_args.output_dir
checkpoints = sorted(
    [d for d in os.listdir(output_dir) if d.startswith('checkpoint-')],
    key=lambda x: int(x.split('-')[-1])
)

print("\n" + "="*80)
print(f"ALL SAVED CHECKPOINTS ({len(checkpoints)} total)")
print("="*80)

total_size = 0
for i, cp in enumerate(checkpoints, 1):
    cp_path = os.path.join(output_dir, cp)
    
    # Calculate size
    size = 0
    for root, dirs, files in os.walk(cp_path):
        size += sum(os.path.getsize(os.path.join(root, f)) for f in files)
    
    total_size += size
    step_num = int(cp.split('-')[-1])
    
    print(f"{i:3d}. {cp:20s} | Step {step_num:5d} | Size: {size / 1024**2:7.2f} MB")

print("="*80)
print(f"Total checkpoint storage: {total_size / 1024**3:.2f} GB")
print("="*80)

print("\nüí° TIP: Anda dapat load checkpoint tertentu dengan:")
print(f"   from peft import AutoPeftModelForCausalLM")
print(f"   model = AutoPeftModelForCausalLM.from_pretrained('{output_dir}/checkpoint-XXX')")

## 15. TensorBoard Visualization

In [None]:
# Load tensorboard extension
%load_ext tensorboard

print("Opening TensorBoard...\n")
print("You can view:")
print("  ‚Ä¢ Training loss curve")
print("  ‚Ä¢ Evaluation loss curve")
print("  ‚Ä¢ Learning rate schedule")
print("  ‚Ä¢ Gradient norms")
print("\nMonitor all metrics to ensure quality training!\n")

%tensorboard --logdir {training_args.output_dir}

## 16. Compare Checkpoint Performance (Optional)

In [None]:
# Fungsi untuk evaluate checkpoint tertentu
def evaluate_checkpoint(checkpoint_path, test_questions):
    """
    Load dan test checkpoint tertentu
    """
    from peft import AutoPeftModelForCausalLM
    
    print(f"\nLoading checkpoint: {checkpoint_path}")
    
    # Load model dari checkpoint
    temp_model = AutoPeftModelForCausalLM.from_pretrained(
        checkpoint_path,
        device_map="auto",
        torch_dtype=torch.bfloat16,
    )
    
    results = []
    for q in test_questions:
        # Generate response (gunakan fungsi yang sama)
        response = generate_response(q)
        results.append({
            'question': q,
            'answer': response
        })
    
    return results

# Example: Compare first, middle, and last checkpoint
print("\nTo compare checkpoints, uncomment and run:")
print("""# checkpoints_to_compare = [
#     f"{output_dir}/checkpoint-50",  # Early
#     f"{output_dir}/checkpoint-{len(checkpoints)//2 * 50}",  # Middle
#     f"{output_dir}/final_adapter",  # Final
# ]
# 
# for cp in checkpoints_to_compare:
#     if os.path.exists(cp):
#         results = evaluate_checkpoint(cp, test_questions[:2])
#         print(f"\\n{cp}:")
#         for r in results:
#             print(f"Q: {r['question']}")
#             print(f"A: {r['answer'][:200]}...\\n")
""")

## 17. Merge LoRA Adapters untuk Deployment (Optional)

In [None]:
# WARNING: Merge membutuhkan RAM/VRAM lebih besar!
# Hasilnya adalah model lengkap tanpa perlu load adapter terpisah

print("\nMerging LoRA adapters dengan base model...")
print("‚ö†Ô∏è  WARNING: This requires significant memory!\n")

from peft import AutoPeftModelForCausalLM

# Load model dengan adapter
print("Loading model with adapters...")
merged_model = AutoPeftModelForCausalLM.from_pretrained(
    final_output_dir,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

print("Merging adapters into base model...")
# Merge adapters
merged_model = merged_model.merge_and_unload()

# Save merged model
merged_output_dir = f"{training_args.output_dir}/merged_model_full"
print(f"Saving merged model to: {merged_output_dir}")

merged_model.save_pretrained(
    merged_output_dir,
    safe_serialization=True,
    max_shard_size="2GB"
)
tokenizer.save_pretrained(merged_output_dir)

print(f"\n‚úì Merged model saved to: {merged_output_dir}")
print("\nYou can now use this model directly without loading adapters!")
print(f"Model size: ~{sum(os.path.getsize(os.path.join(merged_output_dir, f)) for f in os.listdir(merged_output_dir)) / 1024**3:.2f} GB")

## 18. Training Summary & Best Practices

In [None]:
print("\n" + "="*80)
print("TRAINING SUMMARY - GEMMA 3 1B OPTIMAL")
print("="*80)

summary = f"""
Model: {config['model_config']['model_name']}
Dataset: UNSIQ ({len(train_dataset)} train, {len(eval_dataset)} eval)

LoRA Configuration:
  ‚Ä¢ Rank: {config['qlora_config']['r']} (HIGH capacity)
  ‚Ä¢ Alpha: {config['qlora_config']['lora_alpha']}
  ‚Ä¢ Target modules: {len(config['qlora_config']['target_modules'])} layers
  ‚Ä¢ Trainable params: {trainable_params:,} ({trainable_percent:.4f}%)

Training Configuration:
  ‚Ä¢ Learning rate: {config['training_args']['learning_rate']}
  ‚Ä¢ Effective batch size: {config['training_args']['per_device_train_batch_size'] * config['training_args']['gradient_accumulation_steps']}
  ‚Ä¢ Epochs: {config['training_args']['num_train_epochs']}
  ‚Ä¢ Max sequence length: {config['dataset_config']['max_length']} tokens
  ‚Ä¢ Total steps: {total_steps}

Results:
  ‚Ä¢ Final training loss: {train_result.training_loss:.4f}
  ‚Ä¢ Final eval loss: {eval_results.get('eval_loss', 'N/A')}
  ‚Ä¢ Training time: {train_result.metrics['train_runtime']/60:.2f} minutes
  ‚Ä¢ Checkpoints saved: {len(checkpoints)}

Output Files:
  üìÅ {training_args.output_dir}/
     ‚îú‚îÄ‚îÄ final_adapter/ (LoRA weights)
     ‚îú‚îÄ‚îÄ checkpoint-*/ (all training checkpoints)
     ‚îú‚îÄ‚îÄ merged_model_full/ (merged model - if created)
     ‚îú‚îÄ‚îÄ training_config.json
     ‚îú‚îÄ‚îÄ training_metrics.json
     ‚îî‚îÄ‚îÄ final_eval_results.json

Next Steps:
  1. Review TensorBoard metrics untuk analisis training
  2. Test model dengan lebih banyak pertanyaan
  3. Jika perlu, load checkpoint tertentu untuk perbandingan
  4. Deploy model (gunakan final_adapter atau merged_model)

üéâ TRAINING COMPLETE - MODEL READY FOR DEPLOYMENT!
"""

print(summary)
print("="*80)

# Save summary
summary_file = f"{training_args.output_dir}/TRAINING_SUMMARY.txt"
with open(summary_file, 'w', encoding='utf-8') as f:
    f.write(summary)

print(f"\n‚úì Summary saved to: {summary_file}")