# üöÄ IndoBERT Sentiment Analysis 3 Kelas - Google Colab Version

**Dataset**: gojek_reviews_3class_clean.csv  
**Model**: IndoBERT (indobenchmark/indobert-base-p1)  
**Target**: Akurasi tinggi dengan generalisasi yang baik (tidak overfitting)

## üìã Persiapan Sebelum Running:

1. **Upload file data ke folder `skripsi` di Google Drive:**
```
MyDrive/
‚îî‚îÄ‚îÄ skripsi/
    ‚îú‚îÄ‚îÄ gojek_reviews_3class_clean.csv   ‚Üê Upload file ini
    ‚îú‚îÄ‚îÄ models/                           ‚Üê Akan dibuat otomatis
    ‚îî‚îÄ‚îÄ (notebook ini jika mau)
```

2. **Pastikan Runtime GPU aktif**: Runtime ‚Üí Change runtime type ‚Üí GPU

---

### Teknik Anti-Overfitting yang Digunakan:
1. **Data Balancing** - Undersampling ke kelas minoritas
2. **Dropout** - 0.3 untuk regularisasi
3. **Label Smoothing** - 0.1 untuk soft labels
4. **Early Stopping** - Stop jika val_loss tidak membaik
5. **Weight Decay** - L2 regularization (0.01)
6. **Learning Rate Warmup** - Gradual increase
7. **Gradient Clipping** - Mencegah exploding gradients
8. **Data Augmentation** - Random word dropout

### Kelas Sentiment:
- **0 = Negative** (Score 1-2)
- **1 = Neutral** (Score 3)
- **2 = Positive** (Score 4-5)

In [None]:
# ============================================
# SETUP GOOGLE COLAB
# ============================================

# Install dependencies
!pip install transformers torch pandas numpy scikit-learn matplotlib seaborn tqdm -q

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Path ke folder skripsi di Google Drive
DRIVE_PATH = '/content/drive/MyDrive/skripsi'

import os

# Check apakah folder exists
if os.path.exists(DRIVE_PATH):
    os.chdir(DRIVE_PATH)
    print(f'‚úì Working directory: {os.getcwd()}')
    print(f'‚úì Files in folder skripsi:')
    for f in os.listdir('.'):
        print(f'   - {f}')
else:
    print(f'‚ùå Folder tidak ditemukan: {DRIVE_PATH}')
    print('Pastikan folder "skripsi" ada di Google Drive kamu')

# Check GPU
import torch
if torch.cuda.is_available():
    print(f'\n‚úì GPU Available: {torch.cuda.get_device_name(0)}')
    print(f'‚úì GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB')
else:
    print('\n‚ö†Ô∏è GPU not available, using CPU (akan lebih lambat)')

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from tqdm.auto import tqdm
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from torch.optim import AdamW
from transformers import BertTokenizer, BertModel, get_linear_schedule_with_warmup
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    accuracy_score, precision_recall_fscore_support, 
    classification_report, confusion_matrix, f1_score
)
from sklearn.utils import resample
import random
import os
import copy
import json
from datetime import datetime

warnings.filterwarnings('ignore')

# Reproducibility
def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

set_seed(42)

# Device setup (sudah di-check di cell sebelumnya)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'üñ•Ô∏è  Device: {device}')
if torch.cuda.is_available():
    print(f'üéÆ GPU: {torch.cuda.get_device_name(0)}')

## üìä 1. Load & Explore Data

In [None]:
# Load cleaned data dari Google Drive folder skripsi
# Prioritaskan file augmented (15,000 samples) jika ada
DATA_FILES = [
    'gojek_reviews_final_augmented.csv',  # 15,000 samples - RECOMMENDED
    'gojek_reviews_3class_balanced.csv',
    'gojek_reviews_3class_clean.csv',
]

DATA_PATH = None
for f in DATA_FILES:
    if os.path.exists(f):
        DATA_PATH = f
        break
    # Check in data folder too
    if os.path.exists(f'data/{f}'):
        DATA_PATH = f'data/{f}'
        break

if DATA_PATH is None:
    print('‚ùå Data file tidak ditemukan!')
    print(f'\nüìÅ Files yang ada di folder skripsi:')
    for f in os.listdir('.'):
        print(f'   - {f}')
    if os.path.exists('data'):
        print(f'\nüìÅ Files di folder data:')
        for f in os.listdir('data'):
            print(f'   - data/{f}')
    print(f'\nüí° Upload file "gojek_reviews_final_augmented.csv" ke folder skripsi')
else:
    print(f'‚úì Using data file: {DATA_PATH}')
    df = pd.read_csv(DATA_PATH)
    
    print('=' * 60)
    print('üìä DATA OVERVIEW')
    print('=' * 60)
    print(f'Total samples: {len(df):,}')
    print(f'\nColumns: {df.columns.tolist()}')
    print(f'\nüìà Sentiment Distribution:')
    print(df['sentiment'].value_counts())
    
    # Visualize
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))
    
    # Bar plot
    colors = {'negative': '#e74c3c', 'neutral': '#95a5a6', 'positive': '#2ecc71'}
    sentiment_counts = df['sentiment'].value_counts()
    axes[0].bar(sentiment_counts.index, sentiment_counts.values, 
                color=[colors[s] for s in sentiment_counts.index])
    axes[0].set_title('Sentiment Distribution')
    axes[0].set_ylabel('Count')
    
    # Pie chart
    axes[1].pie(sentiment_counts.values, labels=sentiment_counts.index, 
                autopct='%1.1f%%', colors=[colors[s] for s in sentiment_counts.index])
    axes[1].set_title('Sentiment Percentage')
    
    plt.tight_layout()
    plt.show()

## ‚öñÔ∏è 2. Balance Data (Undersampling)

In [None]:
# Check if data is already balanced
counts = df['sentiment'].value_counts()
min_count = counts.min()
max_count = counts.max()

# If already balanced (difference < 10%), skip undersampling
if (max_count - min_count) / max_count < 0.1:
    print('‚úì Data sudah balanced! Skip undersampling.')
    df_balanced = df.copy()
else:
    # Balance data menggunakan undersampling
    print(f'‚ö†Ô∏è Data tidak balanced. Melakukan undersampling...')
    print(f'Kelas minoritas: {min_count} samples')
    
    df_balanced = pd.DataFrame()
    for sentiment in ['negative', 'neutral', 'positive']:
        df_class = df[df['sentiment'] == sentiment]
        df_sampled = resample(df_class, replace=False, n_samples=min_count, random_state=42)
        df_balanced = pd.concat([df_balanced, df_sampled])
    
    # Shuffle
    df_balanced = df_balanced.sample(frac=1, random_state=42).reset_index(drop=True)

print('\n' + '=' * 60)
print('‚öñÔ∏è  DATA UNTUK TRAINING')
print('=' * 60)
print(f'Total: {len(df_balanced):,}')
print(df_balanced['sentiment'].value_counts())

# Visualize balanced
plt.figure(figsize=(8, 4))
balanced_counts = df_balanced['sentiment'].value_counts()
colors = {'negative': '#e74c3c', 'neutral': '#95a5a6', 'positive': '#2ecc71'}
plt.bar(balanced_counts.index, balanced_counts.values, 
        color=[colors[s] for s in balanced_counts.index])
plt.title('Sentiment Distribution for Training')
plt.ylabel('Count')
for i, (label, count) in enumerate(balanced_counts.items()):
    plt.text(i, count + 50, str(count), ha='center', fontweight='bold')
plt.show()

## üè∑Ô∏è 3. Prepare Labels & Split Data

In [None]:
# Label mapping
LABEL_MAP = {'negative': 0, 'neutral': 1, 'positive': 2}
LABEL_NAMES = ['negative', 'neutral', 'positive']
NUM_CLASSES = 3

df_balanced['label'] = df_balanced['sentiment'].map(LABEL_MAP)

# Split: 70% train, 15% validation, 15% test (stratified)
# Stratified split memastikan distribusi kelas sama di setiap split
train_df, temp_df = train_test_split(
    df_balanced, test_size=0.3, random_state=42, 
    stratify=df_balanced['label']
)
val_df, test_df = train_test_split(
    temp_df, test_size=0.5, random_state=42, 
    stratify=temp_df['label']
)

print('=' * 60)
print('üìÇ DATA SPLITS')
print('=' * 60)
print(f'Train: {len(train_df):,} samples ({len(train_df)/len(df_balanced)*100:.1f}%)')
print(f'Val:   {len(val_df):,} samples ({len(val_df)/len(df_balanced)*100:.1f}%)')
print(f'Test:  {len(test_df):,} samples ({len(test_df)/len(df_balanced)*100:.1f}%)')

print(f'\nüìä Train label distribution:')
print(train_df['sentiment'].value_counts())

## üîß 4. Hyperparameters & Configuration

In [None]:
# === HYPERPARAMETERS ===
# OPTIMIZED untuk menghindari overfitting dan meningkatkan akurasi

CONFIG = {
    # Model
    'model_name': 'indobenchmark/indobert-base-p1',
    'max_length': 128,
    'num_classes': NUM_CLASSES,
    
    # Training - ADJUSTED
    'batch_size': 32,  # Larger batch untuk stabilitas
    'epochs': 20,  # Lebih banyak epoch, early stopping akan handle
    'learning_rate': 1e-5,  # Lebih kecil untuk fine-tuning BERT
    
    # Anti-Overfitting - MORE AGGRESSIVE
    'dropout_rate': 0.5,  # Tingkatkan dropout
    'weight_decay': 0.02,  # Tingkatkan L2 regularization
    'label_smoothing': 0.15,  # Tingkatkan label smoothing
    'warmup_ratio': 0.1,
    'max_grad_norm': 1.0,
    'early_stopping_patience': 5,  # Lebih sabar menunggu improvement
    
    # Data Augmentation
    'word_dropout_prob': 0.15,  # Tingkatkan word dropout
    
    # Layer Freezing - BERT layers to freeze (0-11)
    'freeze_layers': 6,  # Freeze 6 layer pertama dari 12 layer BERT
}

print('=' * 60)
print('‚öôÔ∏è  OPTIMIZED CONFIGURATION')
print('=' * 60)
for key, value in CONFIG.items():
    print(f'{key}: {value}')

## üì¶ 5. Dataset Class with Augmentation

In [None]:
# Load tokenizer
tokenizer = BertTokenizer.from_pretrained(CONFIG['model_name'])
print(f'‚úì Tokenizer loaded: {CONFIG["model_name"]}')

class SentimentDataset(Dataset):
    """Dataset dengan ENHANCED augmentation"""
    
    def __init__(self, texts, labels, tokenizer, max_length=128, 
                 augment=False, word_dropout_prob=0.15):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length
        self.augment = augment
        self.word_dropout_prob = word_dropout_prob
    
    def __len__(self):
        return len(self.texts)
    
    def _augment_text(self, text):
        """Enhanced augmentation dengan multiple techniques"""
        if not self.augment:
            return text
            
        text = str(text)
        words = text.split()
        
        if len(words) <= 3:
            return text
        
        # Technique 1: Word dropout
        if random.random() < 0.5:
            words = [w for w in words if random.random() > self.word_dropout_prob]
        
        # Technique 2: Word swap (swap adjacent words)
        if random.random() < 0.3 and len(words) > 2:
            idx = random.randint(0, len(words) - 2)
            words[idx], words[idx + 1] = words[idx + 1], words[idx]
        
        # Technique 3: Random word duplication
        if random.random() < 0.2 and len(words) > 1:
            idx = random.randint(0, len(words) - 1)
            words.insert(idx, words[idx])
        
        return ' '.join(words) if words else text
    
    def __getitem__(self, idx):
        text = self._augment_text(self.texts[idx])
        
        encoding = self.tokenizer.encode_plus(
            str(text),
            add_special_tokens=True,
            max_length=self.max_length,
            padding='max_length',
            truncation=True,
            return_attention_mask=True,
            return_tensors='pt'
        )
        
        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'label': torch.tensor(self.labels[idx], dtype=torch.long)
        }

# Create datasets
train_dataset = SentimentDataset(
    train_df['content_clean'].values,
    train_df['label'].values,
    tokenizer,
    max_length=CONFIG['max_length'],
    augment=True,  # Enable augmentation for training
    word_dropout_prob=CONFIG['word_dropout_prob']
)

val_dataset = SentimentDataset(
    val_df['content_clean'].values,
    val_df['label'].values,
    tokenizer,
    max_length=CONFIG['max_length'],
    augment=False
)

test_dataset = SentimentDataset(
    test_df['content_clean'].values,
    test_df['label'].values,
    tokenizer,
    max_length=CONFIG['max_length'],
    augment=False
)

# Create dataloaders
train_loader = DataLoader(train_dataset, batch_size=CONFIG['batch_size'], shuffle=True, drop_last=True)
val_loader = DataLoader(val_dataset, batch_size=CONFIG['batch_size'], shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=CONFIG['batch_size'], shuffle=False)

print(f'\n‚úì Datasets created:')
print(f'  Train: {len(train_dataset)} samples, {len(train_loader)} batches')
print(f'  Val:   {len(val_dataset)} samples, {len(val_loader)} batches')
print(f'  Test:  {len(test_dataset)} samples, {len(test_loader)} batches')

## üß† 6. Model Architecture

In [None]:
class IndoBERTSentimentClassifier(nn.Module):
    """
    IndoBERT dengan regularisasi AGRESIF untuk mencegah overfitting:
    - Freeze beberapa layer BERT awal
    - Dropout tinggi (0.5)
    - Multi-layer dropout
    - Layer normalization
    """
    
    def __init__(self, model_name, num_classes, dropout_rate=0.5, freeze_layers=6):
        super(IndoBERTSentimentClassifier, self).__init__()
        
        # Load pretrained BERT
        self.bert = BertModel.from_pretrained(model_name)
        self.hidden_size = self.bert.config.hidden_size
        
        # === FREEZE BERT LAYERS ===
        # Freeze embeddings
        for param in self.bert.embeddings.parameters():
            param.requires_grad = False
        
        # Freeze first N encoder layers
        for i in range(freeze_layers):
            for param in self.bert.encoder.layer[i].parameters():
                param.requires_grad = False
        
        print(f'‚úì Froze embeddings and first {freeze_layers} encoder layers')
        
        # Regularization layers - MORE AGGRESSIVE
        self.dropout1 = nn.Dropout(dropout_rate)
        self.dropout2 = nn.Dropout(dropout_rate * 0.6)  # Second dropout
        self.layer_norm = nn.LayerNorm(self.hidden_size)
        
        # Classifier dengan hidden layer untuk lebih banyak kapasitas
        self.fc1 = nn.Linear(self.hidden_size, self.hidden_size // 2)
        self.fc2 = nn.Linear(self.hidden_size // 2, num_classes)
        
        # Activation
        self.relu = nn.ReLU()
        
        # Initialize weights
        nn.init.xavier_uniform_(self.fc1.weight)
        nn.init.zeros_(self.fc1.bias)
        nn.init.xavier_uniform_(self.fc2.weight)
        nn.init.zeros_(self.fc2.bias)
    
    def forward(self, input_ids, attention_mask):
        # Get BERT output
        outputs = self.bert(
            input_ids=input_ids,
            attention_mask=attention_mask
        )
        
        # Use [CLS] token representation
        pooled_output = outputs.pooler_output
        
        # Apply regularization pipeline
        x = self.layer_norm(pooled_output)
        x = self.dropout1(x)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout2(x)
        logits = self.fc2(x)
        
        return logits

# Initialize model
model = IndoBERTSentimentClassifier(
    model_name=CONFIG['model_name'],
    num_classes=CONFIG['num_classes'],
    dropout_rate=CONFIG['dropout_rate'],
    freeze_layers=CONFIG['freeze_layers']
).to(device)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
frozen_params = total_params - trainable_params

print(f'\n‚úì Model initialized')
print(f'  Total parameters: {total_params:,}')
print(f'  Trainable parameters: {trainable_params:,} ({trainable_params/total_params*100:.1f}%)')
print(f'  Frozen parameters: {frozen_params:,} ({frozen_params/total_params*100:.1f}%)')

## üìâ 7. Loss Function, Optimizer & Scheduler

In [None]:
# Loss function dengan label smoothing
criterion = nn.CrossEntropyLoss(label_smoothing=CONFIG['label_smoothing'])

# Optimizer - ONLY for trainable parameters
# Pisahkan parameter yang perlu weight decay dan yang tidak
no_decay = ['bias', 'LayerNorm.weight', 'layer_norm.weight']

# Filter hanya parameter yang requires_grad=True
trainable_params_list = [(n, p) for n, p in model.named_parameters() if p.requires_grad]

optimizer_grouped_parameters = [
    {
        'params': [p for n, p in trainable_params_list if not any(nd in n for nd in no_decay)],
        'weight_decay': CONFIG['weight_decay']
    },
    {
        'params': [p for n, p in trainable_params_list if any(nd in n for nd in no_decay)],
        'weight_decay': 0.0
    }
]

optimizer = AdamW(optimizer_grouped_parameters, lr=CONFIG['learning_rate'])

# Learning rate scheduler dengan warmup
total_steps = len(train_loader) * CONFIG['epochs']
warmup_steps = int(total_steps * CONFIG['warmup_ratio'])

scheduler = get_linear_schedule_with_warmup(
    optimizer,
    num_warmup_steps=warmup_steps,
    num_training_steps=total_steps
)

print(f'‚úì Optimizer: AdamW (lr={CONFIG["learning_rate"]}, weight_decay={CONFIG["weight_decay"]})')
print(f'‚úì Scheduler: Linear warmup ({warmup_steps} warmup steps, {total_steps} total steps)')
print(f'‚úì Loss: CrossEntropy with label_smoothing={CONFIG["label_smoothing"]}')

## üèãÔ∏è 8. Training Functions

In [None]:
def train_epoch(model, dataloader, criterion, optimizer, scheduler, device, max_grad_norm):
    """Train untuk satu epoch"""
    model.train()
    total_loss = 0
    all_preds = []
    all_labels = []
    
    progress_bar = tqdm(dataloader, desc='Training', leave=False)
    
    for batch in progress_bar:
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['label'].to(device)
        
        optimizer.zero_grad()
        
        logits = model(input_ids, attention_mask)
        loss = criterion(logits, labels)
        
        loss.backward()
        
        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)
        
        optimizer.step()
        scheduler.step()
        
        total_loss += loss.item()
        preds = torch.argmax(logits, dim=1).cpu().numpy()
        all_preds.extend(preds)
        all_labels.extend(labels.cpu().numpy())
        
        progress_bar.set_postfix({'loss': f'{loss.item():.4f}'})
    
    avg_loss = total_loss / len(dataloader)
    accuracy = accuracy_score(all_labels, all_preds)
    f1 = f1_score(all_labels, all_preds, average='weighted')
    
    return avg_loss, accuracy, f1


def evaluate(model, dataloader, criterion, device):
    """Evaluasi model"""
    model.eval()
    total_loss = 0
    all_preds = []
    all_labels = []
    
    with torch.no_grad():
        for batch in dataloader:
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['label'].to(device)
            
            logits = model(input_ids, attention_mask)
            loss = criterion(logits, labels)
            
            total_loss += loss.item()
            preds = torch.argmax(logits, dim=1).cpu().numpy()
            all_preds.extend(preds)
            all_labels.extend(labels.cpu().numpy())
    
    avg_loss = total_loss / len(dataloader)
    accuracy = accuracy_score(all_labels, all_preds)
    f1 = f1_score(all_labels, all_preds, average='weighted')
    
    return avg_loss, accuracy, f1, all_preds, all_labels


class EarlyStopping:
    """Early stopping dengan metric monitoring yang lebih baik"""
    
    def __init__(self, patience=5, min_delta=0.001, mode='min'):
        self.patience = patience
        self.min_delta = min_delta
        self.mode = mode  # 'min' for loss, 'max' for accuracy
        self.counter = 0
        self.best_score = None
        self.early_stop = False
        self.best_model = None
    
    def __call__(self, score, model):
        if self.mode == 'min':
            is_improvement = self.best_score is None or score < self.best_score - self.min_delta
        else:
            is_improvement = self.best_score is None or score > self.best_score + self.min_delta
        
        if is_improvement:
            self.best_score = score
            self.best_model = copy.deepcopy(model.state_dict())
            self.counter = 0
        else:
            self.counter += 1
            if self.counter >= self.patience:
                self.early_stop = True
        
        return self.early_stop

print('‚úì Training functions defined')

## üöÄ 9. Training Loop

In [None]:
# Training history
history = {
    'train_loss': [], 'train_acc': [], 'train_f1': [],
    'val_loss': [], 'val_acc': [], 'val_f1': []
}

# Early stopping - monitor validation F1 (mode='max')
early_stopping = EarlyStopping(patience=CONFIG['early_stopping_patience'], mode='max')

print('=' * 60)
print('üöÄ TRAINING STARTED')
print('=' * 60)
print(f'Epochs: {CONFIG["epochs"]} | Early Stopping Patience: {CONFIG["early_stopping_patience"]}')
print(f'Learning Rate: {CONFIG["learning_rate"]} | Batch Size: {CONFIG["batch_size"]}')
print(f'Frozen Layers: {CONFIG["freeze_layers"]} | Dropout: {CONFIG["dropout_rate"]}')
print('-' * 60)

best_val_f1 = 0
best_epoch = 0

for epoch in range(CONFIG['epochs']):
    print(f'\nüìç Epoch {epoch + 1}/{CONFIG["epochs"]}')
    
    # Train
    train_loss, train_acc, train_f1 = train_epoch(
        model, train_loader, criterion, optimizer, scheduler, 
        device, CONFIG['max_grad_norm']
    )
    
    # Validate
    val_loss, val_acc, val_f1, _, _ = evaluate(
        model, val_loader, criterion, device
    )
    
    # Save history
    history['train_loss'].append(train_loss)
    history['train_acc'].append(train_acc)
    history['train_f1'].append(train_f1)
    history['val_loss'].append(val_loss)
    history['val_acc'].append(val_acc)
    history['val_f1'].append(val_f1)
    
    # Print metrics
    print(f'  Train - Loss: {train_loss:.4f} | Acc: {train_acc:.4f} | F1: {train_f1:.4f}')
    print(f'  Val   - Loss: {val_loss:.4f} | Acc: {val_acc:.4f} | F1: {val_f1:.4f}')
    
    # Track best model based on F1
    if val_f1 > best_val_f1:
        best_val_f1 = val_f1
        best_epoch = epoch + 1
        print(f'  ‚≠ê New best validation F1!')
    
    # Check overfitting
    gap = train_acc - val_acc
    print(f'  üìä Train-Val Gap: {gap:.4f}', end='')
    if gap > 0.10:
        print(' ‚ö†Ô∏è Overfitting!')
    elif gap > 0.05:
        print(' ‚ö° Slight gap')
    else:
        print(' ‚úÖ Good')
    
    # Early stopping check - based on val_f1
    if early_stopping(val_f1, model):
        print(f'\nüõë Early stopping triggered at epoch {epoch + 1}')
        print(f'   Best F1 was at epoch {best_epoch}')
        break

# Load best model
model.load_state_dict(early_stopping.best_model)
print(f'\n‚úì Loaded best model from epoch {best_epoch}')
print(f'  Best Val F1: {best_val_f1:.4f}')

## üìà 10. Training Visualization

In [None]:
# Plot training history
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

epochs_range = range(1, len(history['train_loss']) + 1)

# Loss
axes[0].plot(epochs_range, history['train_loss'], 'b-', label='Train Loss', marker='o')
axes[0].plot(epochs_range, history['val_loss'], 'r-', label='Val Loss', marker='s')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')
axes[0].set_title('Training & Validation Loss')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Accuracy
axes[1].plot(epochs_range, history['train_acc'], 'b-', label='Train Acc', marker='o')
axes[1].plot(epochs_range, history['val_acc'], 'r-', label='Val Acc', marker='s')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy')
axes[1].set_title('Training & Validation Accuracy')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# F1 Score
axes[2].plot(epochs_range, history['train_f1'], 'b-', label='Train F1', marker='o')
axes[2].plot(epochs_range, history['val_f1'], 'r-', label='Val F1', marker='s')
axes[2].set_xlabel('Epoch')
axes[2].set_ylabel('F1 Score')
axes[2].set_title('Training & Validation F1 Score')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('training_history.png', dpi=150, bbox_inches='tight')
plt.show()

# Check for overfitting
final_gap = history['train_acc'][-1] - history['val_acc'][-1]
print(f'\nüìä Overfitting Analysis:')
print(f'  Final Train Accuracy: {history["train_acc"][-1]:.4f}')
print(f'  Final Val Accuracy:   {history["val_acc"][-1]:.4f}')
print(f'  Gap (Train - Val):    {final_gap:.4f}')

if final_gap < 0.03:
    print('  ‚úÖ Model is NOT overfitting (gap < 3%)')
elif final_gap < 0.05:
    print('  ‚ö†Ô∏è  Slight overfitting (gap 3-5%)')
else:
    print('  ‚ùå Model is overfitting (gap > 5%)')

## üß™ 11. Test Set Evaluation

In [None]:
# Evaluate on test set
print('=' * 60)
print('üß™ TEST SET EVALUATION')
print('=' * 60)

test_loss, test_acc, test_f1, test_preds, test_labels = evaluate(
    model, test_loader, criterion, device
)

print(f'\nüìä Test Results:')
print(f'  Loss:     {test_loss:.4f}')
print(f'  Accuracy: {test_acc:.4f} ({test_acc*100:.2f}%)')
print(f'  F1 Score: {test_f1:.4f}')

# Classification report
print('\n' + '=' * 60)
print('üìã CLASSIFICATION REPORT')
print('=' * 60)
print(classification_report(test_labels, test_preds, target_names=LABEL_NAMES, digits=4))

# Per-class metrics
precision, recall, f1, support = precision_recall_fscore_support(
    test_labels, test_preds, average=None, labels=[0, 1, 2]
)

print('\nüìä Per-Class Metrics:')
for i, label in enumerate(LABEL_NAMES):
    print(f'  {label.upper():10} - P: {precision[i]:.4f} | R: {recall[i]:.4f} | F1: {f1[i]:.4f} | N: {support[i]}')

## üî• 12. Confusion Matrix

In [None]:
# Confusion Matrix
cm = confusion_matrix(test_labels, test_preds)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Absolute numbers
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=LABEL_NAMES, yticklabels=LABEL_NAMES, ax=axes[0])
axes[0].set_xlabel('Predicted')
axes[0].set_ylabel('Actual')
axes[0].set_title('Confusion Matrix (Counts)')

# Normalized (percentages)
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] * 100
sns.heatmap(cm_normalized, annot=True, fmt='.1f', cmap='Blues',
            xticklabels=LABEL_NAMES, yticklabels=LABEL_NAMES, ax=axes[1])
axes[1].set_xlabel('Predicted')
axes[1].set_ylabel('Actual')
axes[1].set_title('Confusion Matrix (Percentages %)')

plt.tight_layout()
plt.savefig('confusion_matrix.png', dpi=150, bbox_inches='tight')
plt.show()

# Analysis
print('\nüìä Confusion Matrix Analysis:')
for i, label in enumerate(LABEL_NAMES):
    correct = cm[i, i]
    total = cm[i].sum()
    print(f'  {label.upper():10} - Correct: {correct}/{total} ({correct/total*100:.1f}%)')

## üíæ 13. Save Model

In [None]:
# Create models directory di folder skripsi (Google Drive)
os.makedirs('models', exist_ok=True)

# Save model ke Google Drive
model_path = 'models/indobert_sentiment_3class.pt'
torch.save({
    'model_state_dict': model.state_dict(),
    'config': CONFIG,
    'label_map': LABEL_MAP,
    'label_names': LABEL_NAMES,
    'test_accuracy': test_acc,
    'test_f1': test_f1,
    'history': history,
}, model_path)
print(f'‚úì Model saved to: {DRIVE_PATH}/{model_path}')

# Save training history
history_path = 'models/training_history.json'
with open(history_path, 'w') as f:
    json.dump(history, f, indent=2)
print(f'‚úì History saved to: {DRIVE_PATH}/{history_path}')

# Save tokenizer
tokenizer.save_pretrained('models/tokenizer')
print(f'‚úì Tokenizer saved to: {DRIVE_PATH}/models/tokenizer/')

print(f'\n‚úÖ Semua file tersimpan di Google Drive folder: {DRIVE_PATH}/models/')

## üîÆ 14. Inference Demo

In [None]:
def predict_sentiment(text, model, tokenizer, device, label_names):
    """Prediksi sentiment untuk satu teks"""
    model.eval()
    
    encoding = tokenizer.encode_plus(
        text,
        add_special_tokens=True,
        max_length=128,
        padding='max_length',
        truncation=True,
        return_attention_mask=True,
        return_tensors='pt'
    )
    
    input_ids = encoding['input_ids'].to(device)
    attention_mask = encoding['attention_mask'].to(device)
    
    with torch.no_grad():
        logits = model(input_ids, attention_mask)
        probs = torch.softmax(logits, dim=1)
        pred = torch.argmax(probs, dim=1).item()
    
    return {
        'sentiment': label_names[pred],
        'confidence': probs[0][pred].item(),
        'probabilities': {
            label_names[i]: probs[0][i].item() 
            for i in range(len(label_names))
        }
    }

# Test dengan contoh
test_reviews = [
    "Aplikasi gojek sangat membantu, driver ramah dan cepat",
    "Driver nya lama banget, udah nunggu 1 jam gak datang datang",
    "Biasa aja sih aplikasinya",
    "Pelayanan buruk, driver tidak sopan, tidak akan pakai lagi",
    "Mantap, makanan sampai dengan selamat dan masih hangat",
    "Ongkirnya agak mahal tapi ya lumayan lah"
]

print('=' * 60)
print('üîÆ INFERENCE DEMO')
print('=' * 60)

for review in test_reviews:
    result = predict_sentiment(review, model, tokenizer, device, LABEL_NAMES)
    emoji = {'negative': 'üò†', 'neutral': 'üòê', 'positive': 'üòä'}[result['sentiment']]
    print(f'\nüìù "{review[:50]}..."' if len(review) > 50 else f'\nüìù "{review}"')
    print(f'   {emoji} Sentiment: {result["sentiment"].upper()} (Confidence: {result["confidence"]*100:.1f}%)')
    print(f'   Probabilities: Neg={result["probabilities"]["negative"]*100:.1f}% | '
          f'Neu={result["probabilities"]["neutral"]*100:.1f}% | '
          f'Pos={result["probabilities"]["positive"]*100:.1f}%')

## üìä 15. Final Summary

In [None]:
# Final summary
print('=' * 60)
print('üìä FINAL SUMMARY')
print('=' * 60)

# Calculate final gap
final_gap = history['train_acc'][-1] - history['val_acc'][-1]
best_val_acc = max(history['val_acc'])

print(f'''
üéØ MODEL PERFORMANCE:
   ‚Ä¢ Test Accuracy: {test_acc*100:.2f}%
   ‚Ä¢ Test F1 Score: {test_f1*100:.2f}%
   ‚Ä¢ Best Validation Accuracy: {best_val_acc*100:.2f}%

üìà OVERFITTING CHECK:
   ‚Ä¢ Train-Val Gap: {final_gap*100:.2f}%
   ‚Ä¢ Status: {"‚úÖ Not Overfitting" if final_gap < 0.05 else "‚ö†Ô∏è Potential Overfitting"}

‚öôÔ∏è ANTI-OVERFITTING TECHNIQUES USED:
   ‚Ä¢ Data Balancing (Undersampling)
   ‚Ä¢ Layer Freezing: {CONFIG['freeze_layers']} layers
   ‚Ä¢ Dropout Rate: {CONFIG['dropout_rate']}
   ‚Ä¢ Label Smoothing: {CONFIG['label_smoothing']}
   ‚Ä¢ Weight Decay: {CONFIG['weight_decay']}
   ‚Ä¢ Early Stopping (Patience: {CONFIG['early_stopping_patience']})
   ‚Ä¢ Learning Rate Warmup
   ‚Ä¢ Gradient Clipping
   ‚Ä¢ Word Dropout Augmentation
   ‚Ä¢ Word Swap Augmentation
   ‚Ä¢ Batch Size: {CONFIG['batch_size']}

üíæ SAVED FILES:
   ‚Ä¢ Model: {DRIVE_PATH}/models/indobert_sentiment_3class.pt
   ‚Ä¢ Tokenizer: {DRIVE_PATH}/models/tokenizer/
   ‚Ä¢ History: {DRIVE_PATH}/models/training_history.json
''')

print('=' * 60)
print('‚úÖ Training completed successfully!')
print('=' * 60)

## üì• 16. Download Model (Optional)

In [None]:
# Download model dari Colab ke komputer lokal
from google.colab import files

# Zip model folder untuk download
!cd /content && zip -r model_sentiment_3class.zip models/

# Download
files.download('/content/model_sentiment_3class.zip')
print('‚úì Model siap di-download!')