<a href="https://colab.research.google.com/github/sartabaz/biometric-fusion/blob/main/Fingerprint_Feature_Extraction_using_EfficientNET.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Clear all the outputs

# 1. Load Libraries

This section imports all necessary libraries for building and evaluating the EfficientNETV2S deep learning models. The imports are grouped logically for better understanding.

1. **TensorFlow/Keras**:
   - Core framework for building neural networks
   - `applications` provides pre-trained models (ResNet, VGG, etc.)
   - `ModelCheckpoint` helps save model weights during training

2. **Data Handling**:
   - Pandas for structured data operations
   - NumPy for numerical computations and array operations

3. **Visualization**:
   - Matplotlib for basic plots (accuracy/loss curves)
   - Seaborn for more sophisticated statistical visualizations

4. **Evaluation Metrics**:
   - ROC/AUC for binary classification performance
   - Classification report for precision/recall metrics
   - LabelEncoder for preparing categorical targets

5. **Similarity Metrics**:
   - Cosine distance for comparing feature vectors (you can use other distances)

### Best Practices:
- Keep imports organized by functionality
- Only import what you need to maintain clean namespace
- For Colab, you may need to `!pip install` certain packages first

In [None]:
# Core deep learning frameworks
import tensorflow as tf
from tensorflow.keras import layers, models, applications  # Keras API for model building
from tensorflow.keras.callbacks import ModelCheckpoint  # For saving models during training

# Data manipulation and analysis
import pandas as pd  # Dataframes and CSV handling
import numpy as np   # Numerical operations

# Data visualization
import matplotlib.pyplot as plt  # Basic plotting
import seaborn as sns  # Enhanced visualizations

# Model evaluation metrics
from sklearn.metrics import roc_curve, auc, classification_report  # Classification metrics
from sklearn.preprocessing import LabelEncoder  # For label preprocessing

# Similarity metrics
from scipy.spatial.distance import cosine  # Cosine similarity/distance calculations

#2. Experiment Configuration

This section defines the key parameters that control the model training process. These hyperparameters should be carefully tuned based on your specific dataset and hardware constraints.

In [None]:
### Core Parameters
'''These parameters control the fundamental aspects of model training and data processing.
python'''

# Data Configuration
NUM_CLASSES = 140
SAMPLES_PER_CLASS = 12

# Image Processing
IMG_SIZE = 96  # Width and height of input images (square dimensions)
                # Common sizes: 128x128 for quick experiments, 224x224/256x256 for pretrained models
CHANNELS = 3              # Number of color channels (3 for RGB, 1 for grayscale)
RESCALE = 1./255          # Normalization factor (scale pixel values to 0-1)

# Training Process
BATCH_SIZE = 16           # Samples per gradient update (typically 8-64)
EPOCHS = 33               # Training iterations (monitor for overfitting)
VALIDATION_SPLIT = 0.2    # Fraction of data reserved for validation
SEED = 42                 # Random seed for reproducibility

# Model Configuration
BASE_LEARNING_RATE = 1e-4 # Initial learning rate for optimizer
DROPOUT_RATE = 0.3        # Regularization strength (0-1)

# Test configuration
NUM_PAIRS=30000

# Path Configuration
CSV_PATH = 'fingerprint.csv'  # Update this path to your actual data location
MODEL_SAVE_PATH = '/models/checkpoint.keras'  # For ModelCheckpoint
PERFORMANCE_SAVE_PATH = 'performance_summary_Model_1.csv'
FEATURES_SAVE_PATH = 'Palm_features.csv'

In [None]:
from google.colab import drive
drive.mount('/content/drive')

#3. Define functions

A concise summary of all the functions, organized by purpose and key details:

---

### **1. Data Loading & Preprocessing**
#### `load_data(csv_path, num_classes, num_samples)`
- **Purpose**: Load and preprocess image data from CSV for deep learning.
- **Inputs**:
  - CSV path, number of classes, samples per class
- **Outputs**:
  - Image tensors, one-hot labels, integer labels, class names

### **2. Model Construction**
#### `create_model(num_classes)`
- **Purpose**: Build an EfficientNetV2 model for classification.
- **Architecture**:
  - **Base**: Pretrained EfficientNetV2S (fine-tuning enabled)
  - **Head**: Global pooling → dropout → softmax dense layer
- **Config**:
  - Input shape: `(IMG_SIZE, IMG_SIZE, 3)`
  - Optimizer: Adam (`lr=BASE_LEARNING_RATE`, gradient clipping)
  - Loss: Categorical crossentropy
- **Output**: Compiled Keras model

### **3. Training Visualization**
#### `plot_training_history(history, save_path)`
- **Purpose**: Plot training/validation metrics over epochs.
- **Plots**:
  1. **Accuracy**: Train vs validation
  2. **Loss**: Train vs validation
- **Output**: Saves figure to `save_path` and displays it.

### **4. Verification Metrics**
#### `compute_verification_metrics(features, labels, num_pairs)`
- **Purpose**: Compute similarity scores for genuine/impostor pairs.
- **Outputs**:
  - `genuine_scores`: Similarities of same-class pairs
  - `impostor_scores`: Similarities of different-class pairs

### **5. Performance Evaluation**
#### `plot_verification_metrics(genuine_scores, impostor_scores, data)`
- **Purpose**: Visualize verification performance metrics.
- **Plots**:
  1. **ROC Curve**: TPR vs FPR (with AUC)
  2. **FAR/FRR Curve**: Error rates vs threshold
  3. **Score Distributions**: KDE of genuine/impostor scores
  4. **DET Curve**: FAR vs FRR (log scale)
- **Key Metric**:
  - **EER (Equal Error Rate)**: Threshold where FAR = FRR
- **Output**: Saves figure as `verification_metrics_{data}.png`.

### **Helper Functions**
#### `compute_far_frr(genuine_scores, impostor_scores)`
- **Purpose**: Calculate False Acceptance/Rejection Rates.
- **Outputs**:
  - `far`, `frr`, `thresholds` arrays for plotting.

---

### **Dependencies**
- **Libraries**: TensorFlow, NumPy, Pandas, Matplotlib, Seaborn, Scikit-learn
- **Hardware**: GPU-accelerated (via TensorFlow) recommended.

In [None]:
# Load and preprocess data from CSV

def load_data(csv_path, num_classes, num_samples):
  """
  Load and preprocess data from CSV file for deep learning model.

  Args:
      csv_path (str): Path to CSV file containing image data and labels
      num_classes (int): Number of unique classes in dataset
      num_samples (int): Number of samples per class to load

  Returns:
      tuple: (X_tensor, y_tensor_categorical, y_tensor_encoded, class_names)
      - X_tensor: Tensor of preprocessed images (shape: [num_samples, IMG_SIZE, IMG_SIZE, 3])
      - y_tensor_categorical: One-hot encoded labels
      - y_tensor_encoded: Integer encoded labels
      - class_names: List of original class names

  Processing Steps:
  1. Load CSV data with pandas
  2. Extract labels and image pixel values
  3. Reshape flat pixel arrays to 2D images
  4. Convert grayscale to RGB by channel duplication
  5. Normalize pixel values to [0,1] range
  6. Encode labels (integer and one-hot)
  """
  # Load CSV data
  # Load CSV data with error handling


  try:
      df = pd.read_csv(csv_path, header=None, nrows=num_classes * num_samples)
  except FileNotFoundError:
      raise ValueError(f"CSV file not found at specified path: {csv_path}")

  # Extract labels
  y = df.iloc[:, 0].values

  # Extract and preprocess images
  X = df.iloc[:, 1:IMG_SIZE*IMG_SIZE+1].values.astype('float32')

  # Reshape to 2D grayscale images
  X = X.reshape(-1, IMG_SIZE, IMG_SIZE, 1)

  # Convert grayscale to RGB by repeating across 3 channels
  X = np.repeat(X, 3, axis=-1)

  # Normalize pixel values
  X = X * RESCALE

  # Convert to TensorFlow tensors
  X_tensor = tf.convert_to_tensor(X, dtype=tf.float32)
  y_tensor = tf.convert_to_tensor(y, dtype=tf.int32)

  # Encode labels
  le = LabelEncoder()
  y_encoded = le.fit_transform(y)
  y_categorical = tf.keras.utils.to_categorical(y_encoded, num_classes=num_classes)

  # Convert label arrays to tensors
  y_tensor_encoded = tf.convert_to_tensor(y_encoded, dtype=tf.int32)
  y_tensor_categorical = tf.convert_to_tensor(y_categorical, dtype=tf.float32)

  return X_tensor, y_tensor_categorical, y_tensor_encoded, le.classes_

# Build EfficientNetV2 model
def create_model(num_classes):

  ''' Create EfficientNetV2 model with transfer learning.

    Args:
        num_classes (int): Number of output classes

    Returns:
        tf.keras.Model: Compiled model ready for training

    Architecture:
    1. EfficientNetV2S base (pretrained on ImageNet)
    2. Global average pooling
    3. Dropout layer for regularization
    4. Dense output layer with softmax activation

    Note:
    - Base model is set as trainable for fine-tuning
    - Adam optimizer with gradient clipping
    - Learning rate 1e-5 is good starting point for fine-tuning'''
  base_model = applications.EfficientNetV2S(
        include_top=False,
        weights='imagenet',
        input_shape=(IMG_SIZE, IMG_SIZE, CHANNELS),
        pooling='avg'
    )

  # Freeze base model
  base_model.trainable = True

  inputs = tf.keras.Input(shape=(IMG_SIZE, IMG_SIZE, CHANNELS))
  x = base_model(inputs, training=False)
  x = layers.Dropout(DROPOUT_RATE)(x)
  outputs = layers.Dense(num_classes, activation='softmax')(x)

  model = tf.keras.Model(inputs, outputs)

  model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=BASE_LEARNING_RATE,
                                           clipnorm=1.0
                                          ),
        loss='categorical_crossentropy',
        metrics=['accuracy']
  )

  return model

# Plot training history
def plot_training_history(history):
  """
    Plot training and validation metrics over epochs.

    Args:
        history: Keras History object returned from model.fit()
        save_path: Where to save the generated plot

    Generates:
        1. Accuracy plot (train vs validation)
        2. Loss plot (train vs validation)
    """
  fig, ax = plt.subplots(1, 2, figsize=(15, 5))

  # Accuracy plot
  ax[0].plot(history.history['accuracy'], label='Train Accuracy')
  ax[0].plot(history.history['val_accuracy'], label='Validation Accuracy')
  ax[0].set_title('Model Accuracy')
  ax[0].set_ylabel('Accuracy')
  ax[0].set_xlabel('Epoch')
  ax[0].legend()

  # Loss plot
  ax[1].plot(history.history['loss'], label='Train Loss')
  ax[1].plot(history.history['val_loss'], label='Validation Loss')
  ax[1].set_title('Model Loss')
  ax[1].set_ylabel('Loss')
  ax[1].set_xlabel('Epoch')
  ax[1].legend()

  plt.tight_layout()
  plt.savefig('training_history.png')
  plt.show()

# Compute verification metrics
def compute_verification_metrics(features, labels, num_pairs=1000):
  """
    Compute genuine and impostor similarity scores for verification.

    Args:
        features: Embeddings from model (n_samples, feature_dim)
        labels: Corresponding class labels
        num_pairs: Number of pairs to generate

    Returns:
        tuple: (genuine_scores, impostor_scores)
    """
  # Generate genuine and impostor pairs
  genuine_pairs = []
  impostor_pairs = []

  # Group indices by class
  class_indices = {}
  for i, label in enumerate(labels):
      label_int = label.numpy()
      if label_int not in class_indices:
          class_indices[label_int] = []
      class_indices[label_int].append(i)

  # Create genuine pairs (same class)
  for label, indices in class_indices.items():
      if len(indices) < 2:
          continue
      np.random.shuffle(indices)
      for i in range(0, len(indices) - 1, 2):
          if len(genuine_pairs) < num_pairs // 2:
              genuine_pairs.append((indices[i], indices[i+1]))

  # Create impostor pairs (different classes)
  class_list = list(class_indices.keys())
  while len(impostor_pairs) < num_pairs // 2:
      class1, class2 = np.random.choice(class_list, 2, replace=False)
      if class1 == class2 or not class_indices[class1] or not class_indices[class2]:
          continue
      idx1 = np.random.choice(class_indices[class1])
      idx2 = np.random.choice(class_indices[class2])
      impostor_pairs.append((idx1, idx2))

  # Compute similarities
  genuine_scores = []
  for i, j in genuine_pairs:
      feat1 = features[i]
      feat2 = features[j]
      similarity = 1 - cosine(feat1, feat2)
      genuine_scores.append(similarity)

  impostor_scores = []
  for i, j in impostor_pairs:
      feat1 = features[i]
      feat2 = features[j]
      similarity = 1 - cosine(feat1, feat2)
      impostor_scores.append(similarity)

  return np.array(genuine_scores), np.array(impostor_scores)

# Compute FAR and FRR
def compute_far_frr(genuine_scores, impostor_scores):
    thresholds = np.linspace(0, 1, 100)
    far = np.zeros_like(thresholds)
    frr = np.zeros_like(thresholds)

    for i, thresh in enumerate(thresholds):
        # False Acceptance Rate
        far[i] = np.sum(impostor_scores >= thresh) / len(impostor_scores)

        # False Rejection Rate
        frr[i] = np.sum(genuine_scores < thresh) / len(genuine_scores)

    return far, frr, thresholds

# Plot ROC and FAR/FRR curves
def plot_verification_metrics(genuine_scores, impostor_scores,data='NIR'):
  """
    Generate comprehensive verification performance plots.

    Args:
        genuine_scores: Similarity scores for genuine pairs
        impostor_scores: Similarity scores for impostor pairs
        data: Dataset identifier for plot titles

    Returns:
        tuple: (eer, eer_threshold)
    """
  far, frr, thresholds = compute_far_frr(genuine_scores, impostor_scores)

  # Compute ROC curve
  y_true = np.concatenate([np.ones_like(genuine_scores), np.zeros_like(impostor_scores)])
  y_score = np.concatenate([genuine_scores, impostor_scores])
  fpr, tpr, roc_thresholds = roc_curve(y_true, y_score)
  roc_auc = auc(fpr, tpr)

  # Find EER (Equal Error Rate)
  eer_idx = np.argmin(np.abs(far - frr))
  eer = (far[eer_idx] + frr[eer_idx]) / 2
  eer_thresh = thresholds[eer_idx]

  # Create plots
  plt.figure(figsize=(15, 10))

  # ROC curve
  plt.subplot(2, 2, 1)
  plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (AUC = {roc_auc:.2f})')
  plt.plot([0, 1], [0, 1], color='navy', lw=1, linestyle='--')
  plt.xlim([0.0, 1.0])
  plt.ylim([0.0, 1.05])
  plt.xlabel('False Positive Rate (FAR)')
  plt.ylabel('True Positive Rate (GAR)')
  plt.title('Receiver Operating Characteristic')
  plt.legend(loc="lower right")

  # FAR/FRR curve
  plt.subplot(2, 2, 2)
  plt.plot(thresholds, far, 'b-', label='FAR')
  plt.plot(thresholds, frr, 'r-', label='FRR')
  plt.axvline(x=eer_thresh, color='g', linestyle='--', label=f'EER Threshold ({eer_thresh:.2f})')
  plt.xlabel('Similarity Threshold')
  plt.ylabel('Error Rate')
  plt.title(f'FAR/FRR Curve (EER = {eer:.4f})')
  plt.legend()

  # Score distributions
  plt.subplot(2, 2, 3)
  sns.kdeplot(genuine_scores, label='Genuine Scores', fill=True)
  sns.kdeplot(impostor_scores, label='Impostor Scores', fill=True)
  plt.axvline(x=eer_thresh, color='g', linestyle='--', label=f'EER Threshold')
  plt.xlabel('Similarity Score')
  plt.ylabel('Density')
  plt.title('Score Distributions')
  plt.legend()

  # Detection Error Tradeoff (DET)
  plt.subplot(2, 2, 4)
  plt.plot(far, frr)
  plt.scatter(far[eer_idx], frr[eer_idx], color='red', zorder=10,
              label=f'EER ({eer:.4f})')
  plt.xscale('log')
  plt.yscale('log')
  plt.xlabel('False Acceptance Rate (FAR)')
  plt.ylabel('False Rejection Rate (FRR)')
  plt.title('Detection Error Tradeoff (DET) Curve')
  plt.legend()

  plt.tight_layout()
  plt.savefig('verification_metrics_'+data+'.png')
  plt.show()

  return eer, eer_thresh

In [None]:
def split_data_by_indices(X, y_categorical, y_encoded, total, n_train):
    """
    Splits data tensors into training, validation, and test sets based on pre-defined index logic,
    with shuffling by class.

    Args:
        X: TensorFlow tensor containing image data.
        y_categorical: TensorFlow tensor containing one-hot encoded labels.
        y_encoded: TensorFlow tensor containing integer encoded labels.

    Returns:
        A tuple containing: (X_train, y_train, X_val, y_val, X_test, y_test, y_test_encoded)
    """
    train_indices = []
    val_indices = []
    test_indices = []

    total_samples = tf.shape(X)[0].numpy()

    # Group indices by class
    class_indices = {}
    for i, label in enumerate(y_encoded.numpy()): # Use numpy() to iterate over tensor
        if label not in class_indices:
            class_indices[label] = []
        class_indices[label].append(i)

    # Shuffle indices within each class
    for label, indices in class_indices.items():
        np.random.shuffle(indices)
        class_indices[label] = indices # Update the shuffled indices

    # Distribute shuffled indices to train, val, and test sets
    for label, indices in class_indices.items():
        # Calculate the number of samples for this class
        num_class_samples = len(indices)

        # Calculate the split points for this class
        num_train = n_train
        num_val = (total - n_train) // 2
        num_test = (total - n_train) // 2

        # Ensure split points don't exceed available samples for this class
        num_train = min(num_train, num_class_samples)
        num_val = min(num_val, num_class_samples - num_train)
        num_test = min(num_test, num_class_samples - num_train - num_val)


        # Distribute the shuffled indices
        train_indices.extend(indices[:num_train])
        val_indices.extend(indices[num_train : num_train + num_val])
        test_indices.extend(indices[num_train + num_val : num_train + num_val + num_test])

    # Convert indices lists to TensorFlow tensors
    train_indices_tensor = tf.convert_to_tensor(train_indices, dtype=tf.int32)
    val_indices_tensor = tf.convert_to_tensor(val_indices, dtype=tf.int32)
    test_indices_tensor = tf.convert_to_tensor(test_indices, dtype=tf.int32)

    # Use tf.gather to split the data tensors
    X_train = tf.gather(X, train_indices_tensor)
    y_train = tf.gather(y_categorical, train_indices_tensor)

    X_val = tf.gather(X, val_indices_tensor)
    y_val = tf.gather(y_categorical, val_indices_tensor)

    X_test = tf.gather(X, test_indices_tensor)
    y_test = tf.gather(y_categorical, test_indices_tensor)
    y_test_encoded = tf.gather(y_encoded, test_indices_tensor)

    return X_train, y_train, X_val, y_val, X_test, y_test, y_test_encoded

# 4. Start The process

### **Key Workflow Summary**
1. **Data Prep**:  
   `load_data()` → Preprocessed tensors  
2. **Model Training**:  
   `create_model()` → Train → `plot_training_history()`  
3. **Evaluation**:  
   Extract features → `compute_verification_metrics()` → `plot_verification_metrics()`

In [None]:
# Load data
X, y_categorical, y_encoded, class_names = load_data(CSV_PATH,NUM_CLASSES,SAMPLES_PER_CLASS)
NUM_CLASSES = len(class_names)

In [None]:
from collections import Counter
# Convert the integer encoded labels to a list or numpy array
y_encoded_list = y_encoded.numpy()
# Use collections.Counter to count samples per class
class_counts = Counter(y_encoded_list)

In [None]:
# Split data
# Call the function to get the split data tensors
X_train, y_train, X_val, y_val, X_test, y_test, y_test_encoded = split_data_by_indices(X, y_categorical, y_encoded,class_counts[0],8)

In [None]:
NUM_CLASSES

## **Use for test on the whole dataset**

In [None]:
X_test = X
y_test = y_categorical
y_test_encoded = y_encoded

## **Use if data augmentation is needed**


In [None]:
# Data augmentation
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    shear_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True
)

## **Skip and load a trained model**

## a. Create the model and train





In [None]:
model = create_model(NUM_CLASSES)
model.summary()
checkpoint_callback = ModelCheckpoint(
        filepath=MODEL_SAVE_PATH, # Update the path
        monitor='val_accuracy',
        save_best_only=True,
        mode='max',
        verbose=1
    )
# Train the model
history = model.fit(
  train_datagen.flow(X_train, y_train, batch_size=BATCH_SIZE),
  #X_train,y_train,
    steps_per_epoch=len(X_train) // BATCH_SIZE,
    validation_data=(X_val, y_val),
    epochs=EPOCHS,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True),
        tf.keras.callbacks.ReduceLROnPlateau(factor=0.1, patience=3),
        checkpoint_callback
    ]
)

## b. Continue training if necessary

In [None]:
# Load the previously saved model

loaded_model = tf.keras.models.load_model(MODEL_SAVE_PATH)

# Compile the model (use the same configuration as before)
loaded_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5,
                                       clipnorm=1.0
                                      ),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Continue training for additional epochs
additional_epochs = 10 # Define how many more epochs you want to train
history_continued = loaded_model.fit(
    X_train, y_train,
    steps_per_epoch=len(X_train) // BATCH_SIZE,
    validation_data=(X_val, y_val),
    epochs=EPOCHS + additional_epochs,  # Continue from the previous total epochs
    initial_epoch=EPOCHS, # Start training from the epoch where the previous training stopped
    callbacks=[
        tf.keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True),
        tf.keras.callbacks.ReduceLROnPlateau(factor=0.1, patience=3),
        # You might want to use a new ModelCheckpoint to save the best model from this continued training
        ModelCheckpoint(
            filepath=MODEL_SAVE_PATH,
            monitor='val_accuracy',
            save_best_only=True,
            mode='max',
            verbose=1
        )
    ]
)

In [None]:
# Update the history dictionary
for key in history_continued.history:
    history.history[key].extend(history_continued.history[key])

# Plot the updated training history
plot_training_history(history)

## c. Plot the training history:


1.   Check accuracy and loss
2.   Check underfitting or overfitting



In [None]:
plot_training_history(history)

# **Load saved best model**

In [None]:
# load model
model = tf.keras.models.load_model(MODEL_SAVE_PATH)

#5. Evaluate the model on test data

In [None]:
loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")

#6. Predict and generate performances

In [None]:
# Get predictions
y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)

# Generate classification report
report = classification_report(y_test_encoded, y_pred_classes, target_names=class_names, output_dict=True)

# Extract macro and micro averages
macro_avg = report['macro avg']
micro_avg = report['weighted avg'] # sklearn's weighted avg is often referred to as micro avg in some contexts

# Create a pandas DataFrame for the summary table
performance_summary = pd.DataFrame({
    'Metric': ['Precision', 'Recall', 'F1-Score'],
    'Macro Average': [macro_avg['precision'], macro_avg['recall'], macro_avg['f1-score']],
    'Micro Average': [micro_avg['precision'], micro_avg['recall'], micro_avg['f1-score']]
})

print("\nPerformance Summary (Macro and Micro Averages):")
performance_summary


# 7. Save performance summary

In [None]:
performance_summary.to_csv(PERFORMANCE_SAVE_PATH, index=False)
print("Performance summary table saved to csv file")

# 8. Get the feature extractor from the model

In [None]:
# Create feature extractor
feature_extractor = tf.keras.Model(
    inputs=model.input,
    outputs=model.layers[-2].output
)

## a. Extract features from the test data

In [None]:
# Extract features from test set
val_features = feature_extractor.predict(X_test, batch_size=BATCH_SIZE, verbose=1)



## b. Compute metrics and plot them

In [None]:
# Compute verification metrics
genuine_scores, impostor_scores = compute_verification_metrics(
    val_features,
    y_test_encoded,
    num_pairs=NUM_PAIRS
)

# Plot verification metrics
eer, eer_threshold = plot_verification_metrics(genuine_scores, impostor_scores)
print(f"Equal Error Rate (EER): {eer:.4f}")
print(f"Optimal Threshold: {eer_threshold:.4f}")

## c. Get a subset of classes

In [None]:
X_subset = tf.gather(X, tf.range(140 * 12))
y_subset = tf.gather(y_encoded, tf.range(140 * 12))
print(f"Shape of the 140*12 samples from X: {X_subset.shape}")

## d. Extract their features

In [None]:
val_features = feature_extractor.predict(X_subset, batch_size=BATCH_SIZE, verbose=1)

## e. Save them

In [None]:
# prompt: save val_features in csv
features_df = pd.DataFrame(val_features)

features_df['class']=y_subset.numpy()
features_df.to_csv(FEATURES_SAVE_PATH, index=False)
print("features saved to val_features.csv")