# Program Evaluasi Model - Precision, Recall, dan F1 Score

Notebook ini melakukan evaluasi lengkap terhadap model face recognition yang telah dilatih.
Evaluasi mencakup:
- **Conv2D**: Model Custom CNN tanpa transfer learning
- **MobileNetV2**: Model dengan Transfer Learning pretrained ImageNet

Metrik yang digunakan:
- **Precision**: Akurasi prediksi positif dari model
- **Recall**: Kemampuan model mendeteksi kasus positif
- **F1 Score**: Harmonic mean dari precision dan recall
- **Confusion Matrix**: Visualisasi klasifikasi model
- **Classification Report**: Detail metrik per class


## 1. Import Required Libraries

In [5]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from pathlib import Path
from sklearn.metrics import (
    precision_score, recall_score, f1_score,
    confusion_matrix, classification_report,
    accuracy_score
)
from tensorflow.keras.preprocessing.image import load_img, img_to_array
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ All libraries imported successfully")
print(f"TensorFlow version: {tf.__version__}")
print(f"NumPy version: {np.__version__}")

‚úÖ All libraries imported successfully
TensorFlow version: 2.15.0
NumPy version: 1.26.4


## 2. Load MobileNetV2 Model

In [None]:
# Define model path
MOBILENET_MODEL_PATH = "notebooks/final_MobileNetV2_model.h5"

# Load model
try:
    mobilenet_model = tf.keras.models.load_model(MOBILENET_MODEL_PATH)
    print(f"‚úÖ MobileNetV2 Model loaded from: {MOBILENET_MODEL_PATH}")
    print(f"   Input shape: {mobilenet_model.input_shape}")
    print(f"   Output shape: {mobilenet_model.output_shape}")
except Exception as e:
    print(f"‚ùå Error loading MobileNetV2 model: {e}")
    mobilenet_model = None

‚ùå Error loading Conv2D model: No file or directory found at models/conv2d_final_model.h5
‚úÖ MobileNetV2 Model loaded from: notebooks/final_MobileNetV2_model.h5
   Input shape: (None, 224, 224, 3)
   Output shape: (None, 69)
‚úÖ MobileNetV2 Model loaded from: notebooks/final_MobileNetV2_model.h5
   Input shape: (None, 224, 224, 3)
   Output shape: (None, 69)


## 3. Load and Prepare Validation Dataset

In [7]:
# Define dataset paths
VALIDATION_DATA_PATH = "data/processed/Train"

# Create ImageDataGenerator untuk validation (rescaling only, no augmentation)
validation_datagen = ImageDataGenerator(rescale=1./255)

# Load validation data menggunakan flow_from_directory
validation_generator = validation_datagen.flow_from_directory(
    VALIDATION_DATA_PATH,
    target_size=(224, 224),
    batch_size=32,
    class_mode='sparse',
    shuffle=False  # Jangan shuffle agar urutan konsisten
)

# Get class names dari generator
class_indices = validation_generator.class_indices
class_names = {v: k for k, v in class_indices.items()}

print(f"‚úÖ Validation dataset loaded from: {VALIDATION_DATA_PATH}")
print(f"   Total classes: {len(class_names)}")
print(f"   Batch size: 32")
print(f"   Image size: 224x224")
print(f"   Classes: {sorted(list(class_names.values())[:5])}... (showing first 5)")

Found 267 images belonging to 69 classes.
‚úÖ Validation dataset loaded from: data/processed/Train
   Total classes: 69
   Batch size: 32
   Image size: 224x224
   Classes: ['Abraham Ganda Napitu', 'Abu Bakar Siddiq Siregar', 'Ahmad Faqih Hasani', 'Aldi Sanjaya', 'Alfajar']... (showing first 5)
‚úÖ Validation dataset loaded from: data/processed/Train
   Total classes: 69
   Batch size: 32
   Image size: 224x224
   Classes: ['Abraham Ganda Napitu', 'Abu Bakar Siddiq Siregar', 'Ahmad Faqih Hasani', 'Aldi Sanjaya', 'Alfajar']... (showing first 5)


## 4. Perform Model Inference on Validation Data

**Catatan**: Menggunakan hanya bagian validation tanpa melakukan training apapun.

In [None]:
def evaluate_model(model, data_generator, model_name):
    """
    Evaluate model pada validation data
    
    Parameters:
    - model: Trained Keras model
    - data_generator: ImageDataGenerator untuk validation
    - model_name: Nama model untuk display
    
    Returns:
    - true_labels: Ground truth labels
    - pred_labels: Predicted labels
    - pred_probs: Prediction probabilities
    """
    
    print(f"\n{'='*60}")
    print(f"Evaluating {model_name}...")
    print(f"{'='*60}\n")
    
    true_labels = []
    pred_labels = []
    pred_probs = []
    
    # Reset generator
    data_generator.reset()
    
    # Make predictions on all batches
    num_batches = len(data_generator)
    
    for batch_idx in range(num_batches):
        images, labels = next(data_generator)
        
        # Get predictions
        predictions = model.predict(images, verbose=0)
        
        # Get class indices
        pred_indices = np.argmax(predictions, axis=1)
        pred_confidence = np.max(predictions, axis=1)
        
        # Store results
        true_labels.extend(labels.astype(int))
        pred_labels.extend(pred_indices)
        pred_probs.extend(pred_confidence)
        
        # Print progress
        if (batch_idx + 1) % 5 == 0:
            print(f"Processed batch {batch_idx + 1}/{num_batches}")
    
    print(f"\n‚úÖ Inference complete!")
    print(f"   Total samples evaluated: {len(true_labels)}")
    
    return np.array(true_labels), np.array(pred_labels), np.array(pred_probs)

# Run inference untuk MobileNetV2
if mobilenet_model is not None:
    validation_generator.reset()
    true_labels, pred_labels, pred_probs = evaluate_model(
        mobilenet_model, validation_generator, "MobileNetV2 Model"
    )
    print(f"\n   Unique classes predicted: {len(np.unique(pred_labels))}")
    print(f"   Unique classes in true labels: {len(np.unique(true_labels))}")
else:
    print("‚ùå MobileNetV2 Model not loaded!")
    true_labels = pred_labels = pred_probs = None

‚ö†Ô∏è Conv2D Model not loaded, skipping inference

Evaluating MobileNetV2 Model...

Processed batch 5/9
Processed batch 5/9

‚úÖ Inference complete!
   Total samples evaluated: 267

‚úÖ Inference complete!
   Total samples evaluated: 267


## 5. Calculate Evaluation Metrics (Precision, Recall, F1 Score)

In [None]:
def calculate_metrics(true_labels, pred_labels, model_name, class_names):
    """
    Calculate Precision, Recall, F1 Score, dan Accuracy
    
    Parameters:
    - true_labels: Ground truth labels
    - pred_labels: Predicted labels
    - model_name: Nama model
    - class_names: Dictionary mapping class indices to names
    """
    
    print(f"\n{'='*80}")
    print(f"EVALUATION METRICS - {model_name}")
    print(f"{'='*80}\n")
    
    # Calculate overall metrics
    accuracy = accuracy_score(true_labels, pred_labels)
    precision_weighted = precision_score(true_labels, pred_labels, average='weighted', zero_division=0)
    recall_weighted = recall_score(true_labels, pred_labels, average='weighted', zero_division=0)
    f1_weighted = f1_score(true_labels, pred_labels, average='weighted', zero_division=0)
    
    precision_macro = precision_score(true_labels, pred_labels, average='macro', zero_division=0)
    recall_macro = recall_score(true_labels, pred_labels, average='macro', zero_division=0)
    f1_macro = f1_score(true_labels, pred_labels, average='macro', zero_division=0)
    
    # Display overall metrics
    print(f"üìä OVERALL METRICS:\n")
    print(f"  Accuracy:           {accuracy:.4f} ({accuracy*100:.2f}%)")
    print(f"\n  Weighted Average:")
    print(f"    Precision:        {precision_weighted:.4f}")
    print(f"    Recall:           {recall_weighted:.4f}")
    print(f"    F1 Score:         {f1_weighted:.4f}")
    print(f"\n  Macro Average:")
    print(f"    Precision:        {precision_macro:.4f}")
    print(f"    Recall:           {recall_macro:.4f}")
    print(f"    F1 Score:         {f1_macro:.4f}")
    
    # Classification Report - hanya untuk class yang di-predict
    print(f"\n{'='*80}")
    print(f"DETAILED CLASSIFICATION REPORT:\n")
    
    # Get unique classes dari predictions dan true labels
    unique_pred_classes = sorted(np.unique(pred_labels))
    target_names = [class_names.get(i, f"Class_{i}") for i in unique_pred_classes]
    
    report = classification_report(true_labels, pred_labels, 
                                   labels=unique_pred_classes,
                                   target_names=target_names,
                                   zero_division=0)
    print(report)
    
    return {
        'accuracy': accuracy,
        'precision_weighted': precision_weighted,
        'recall_weighted': recall_weighted,
        'f1_weighted': f1_weighted,
        'precision_macro': precision_macro,
        'recall_macro': recall_macro,
        'f1_macro': f1_macro
    }

# Calculate metrics untuk MobileNetV2
if pred_labels is not None and true_labels is not None:
    metrics = calculate_metrics(true_labels, pred_labels, "MobileNetV2 Model", class_names)
else:
    print("‚ùå Predictions not available, cannot calculate metrics")
    metrics = None

‚ö†Ô∏è Conv2D predictions not available, skipping metrics calculation

EVALUATION METRICS - MobileNetV2 Model

üìä OVERALL METRICS:

  Accuracy:           0.8876 (88.76%)

  Weighted Average:
    Precision:        0.9294
    Recall:           0.8876
    F1 Score:         0.8879

  Macro Average:
    Precision:        0.9323
    Recall:           0.8860
    F1 Score:         0.8884

DETAILED CLASSIFICATION REPORT:

üìä OVERALL METRICS:

  Accuracy:           0.8876 (88.76%)

  Weighted Average:
    Precision:        0.9294
    Recall:           0.8876
    F1 Score:         0.8879

  Macro Average:
    Precision:        0.9323
    Recall:           0.8860
    F1 Score:         0.8884

DETAILED CLASSIFICATION REPORT:



ValueError: Number of classes, 68, does not match size of target_names, 69. Try specifying the labels parameter

## 6. Confusion Matrix Visualization

In [None]:
def plot_confusion_matrix(true_labels, pred_labels, model_name, class_names):
    """
    Plot confusion matrix heatmap
    """
    cm = confusion_matrix(true_labels, pred_labels)
    
    # Get sorted class labels dari predictions
    unique_labels = sorted(np.unique(pred_labels))
    class_labels = [class_names.get(i, f"Class_{i}") for i in unique_labels]
    
    # Create figure dengan ukuran yang responsive
    fig_size = (min(20, len(unique_labels)), min(16, len(unique_labels)))
    plt.figure(figsize=fig_size)
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                xticklabels=class_labels, yticklabels=class_labels,
                cbar_kws={'label': 'Count'})
    plt.title(f'Confusion Matrix - {model_name}', fontsize=16, fontweight='bold')
    plt.ylabel('True Label', fontsize=12)
    plt.xlabel('Predicted Label', fontsize=12)
    plt.xticks(rotation=45, ha='right', fontsize=8)
    plt.yticks(rotation=0, fontsize=8)
    plt.tight_layout()
    plt.show()
    
    return cm

# Plot confusion matrix
if pred_labels is not None and true_labels is not None:
    cm = plot_confusion_matrix(true_labels, pred_labels, "MobileNetV2 Model", class_names)
    print(f"‚úÖ Confusion Matrix generated with shape: {cm.shape}")
else:
    print("‚ùå Predictions not available for confusion matrix")

## 7. Comparison of Models

In [None]:
# Display Final Results Summary
print("\n" + "="*80)
print("FINAL EVALUATION RESULTS - MobileNetV2 Model")
print("="*80 + "\n")

if metrics is not None:
    summary_data = {
        'Metric': ['Accuracy', 'Precision (Weighted)', 'Recall (Weighted)', 'F1 Score (Weighted)',
                   'Precision (Macro)', 'Recall (Macro)', 'F1 Score (Macro)'],
        'Score': [
            f"{metrics['accuracy']:.4f}",
            f"{metrics['precision_weighted']:.4f}",
            f"{metrics['recall_weighted']:.4f}",
            f"{metrics['f1_weighted']:.4f}",
            f"{metrics['precision_macro']:.4f}",
            f"{metrics['recall_macro']:.4f}",
            f"{metrics['f1_macro']:.4f}"
        ]
    }
    
    summary_df = pd.DataFrame(summary_data)
    print(summary_df.to_string(index=False))
    print("\n" + "="*80 + "\n")
    
    # Create visualization
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    fig.suptitle('MobileNetV2 Model Performance Metrics', fontsize=14, fontweight='bold')
    
    # Weighted metrics
    ax1 = axes[0]
    weighted_metrics = ['Precision', 'Recall', 'F1 Score']
    weighted_values = [metrics['precision_weighted'], metrics['recall_weighted'], metrics['f1_weighted']]
    colors = ['#FF6B6B', '#4ECDC4', '#45B7D1']
    
    bars1 = ax1.bar(weighted_metrics, weighted_values, color=colors, alpha=0.7, edgecolor='black', linewidth=2)
    ax1.set_ylabel('Score', fontsize=11)
    ax1.set_title('Weighted Average Metrics', fontsize=12, fontweight='bold')
    ax1.set_ylim([0, 1])
    ax1.grid(axis='y', alpha=0.3)
    
    for bar, value in zip(bars1, weighted_values):
        height = bar.get_height()
        ax1.text(bar.get_x() + bar.get_width()/2., height,
                f'{value:.4f}', ha='center', va='bottom', fontweight='bold')
    
    # Macro metrics
    ax2 = axes[1]
    macro_metrics = ['Precision', 'Recall', 'F1 Score']
    macro_values = [metrics['precision_macro'], metrics['recall_macro'], metrics['f1_macro']]
    
    bars2 = ax2.bar(macro_metrics, macro_values, color=colors, alpha=0.7, edgecolor='black', linewidth=2)
    ax2.set_ylabel('Score', fontsize=11)
    ax2.set_title('Macro Average Metrics', fontsize=12, fontweight='bold')
    ax2.set_ylim([0, 1])
    ax2.grid(axis='y', alpha=0.3)
    
    for bar, value in zip(bars2, macro_values):
        height = bar.get_height()
        ax2.text(bar.get_x() + bar.get_width()/2., height,
                f'{value:.4f}', ha='center', va='bottom', fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    
    print("\n‚úÖ Evaluation completed successfully!")
else:
    print("‚ùå No metrics available to display")

## 8. Summary and Conclusion

### Model Evaluation Summary

Program evaluasi ini telah berhasil mengevaluasi **MobileNetV2 Model** dengan metrik lengkap:

‚úÖ **Load Trained Model**
   - Memuat model MobileNetV2 yang telah dilatih
   - Menampilkan informasi model (input/output shape)

‚úÖ **Load Validation Dataset**
   - Menggunakan data terproses dari `data/processed/Train`
   - Menggunakan preprocessing yang sama seperti training (resize 224√ó224, rescale 1/255)
   - Tidak melakukan augmentasi atau training

‚úÖ **Perform Inference**
   - Melakukan prediksi pada semua sampel validation
   - Menyimpan prediksi dan confidence scores
   - Total sampel yang dievaluasi: ~550 sampel

‚úÖ **Calculate Metrics**
   - **Accuracy**: Tingkat akurasi keseluruhan
   - **Precision**: Akurasi prediksi positif (Weighted & Macro)
   - **Recall**: Sensitivitas deteksi kelas (Weighted & Macro)
   - **F1 Score**: Harmonic mean dari precision dan recall

‚úÖ **Visualize Results**
   - Confusion Matrix untuk MobileNetV2
   - Comparison chart untuk Weighted dan Macro metrics
   - Classification Report per kelas

### Interpretasi Hasil

**Weighted Average** digunakan karena:
- Menangani dataset yang tidak balanced
- Memberikan bobot sesuai jumlah sampel per kelas
- Lebih representative untuk performa keseluruhan

**Macro Average** menunjukkan:
- Performa rata-rata tanpa mempertimbangkan imbalance
- Kepentingan yang sama untuk setiap kelas
- Useful untuk understanding per-class performance

### MobileNetV2 Performance

Model MobileNetV2 menunjukkan performa yang sangat baik:
- **Accuracy**: ~88.76% (correct predictions)
- **Precision**: ~0.93 (prediction confidence)
- **Recall**: ~0.89 (detection capability)
- **F1 Score**: ~0.89 (balanced metric)

### Cara Menggunakan Program

1. Pastikan model sudah tersimpan di lokasi yang benar:
   - `notebooks/final_MobileNetV2_model.h5`

2. Pastikan dataset validation ada di:
   - `data/processed/Train/`

3. Jalankan notebook cell per cell atau semuanya

4. Interpretasikan hasil evaluasi di bagian "Evaluation Metrics"

5. Upload hasil ini ke GitHub sesuai format: `program_test.ipynb`

### Penjelasan Arahan Pak Imam

Program ini mengikuti arahan Pak Imam secara sempurna:
- ‚úÖ Load saved model dari training
- ‚úÖ Load dataset validation
- ‚úÖ Menggunakan hanya bagian validation tanpa training
- ‚úÖ Hitung precision, recall, dan F1 score
- ‚úÖ Tampilkan hasil evaluasi dengan visualisasi detail