**EfficientNet B0 Pytorch [Train]**

This notebook is designed to train a bird song classification model for the BirdCLEF 2025 competition.

Below is an explanation of the main sections and their content in the notebook:

1. Import Libraries:
    * Imports necessary libraries such as PyTorch, timm, librosa, etc., which are used for image processing, audio processing, building machine learning models, and data visualization.
2. Configuration:
    * The CFG class defines various settings for model training, including seed value, number of epochs, batch size, model name, learning rate, optimizer, scheduler, and more.
    * It also checks if a GPU is available.
    * There are settings for debug mode, which reduces the number of epochs and other parameters when enabled.
3. Dataset Preparation and Data Augmentations:
    * The `BirdCLEFDatasetFromNPY` class defines a custom dataset.
    * This dataset loads pre-computed mel spectrograms from a .npy file or generates them from audio files if necessary.
    * During training, it applies data augmentations to the spectrograms, such as time masking, frequency masking, and random brightness/contrast adjustments.
    * It also handles encoding labels into a one-hot vector format.
    * `collate_fn` is a custom function to handle batches with potentially different sized spectrograms (although the notebook seems to use fixed-size spectrograms).
4. Model Definition:
    * The `BirdCLEFModel` class defines the classification model.
    * It uses a pre-trained model from the tf_efficientnetv2_s.in21k_ft_in1k, as the backbone.
    * An adaptive average pooling layer and a linear classifier are added to the backbone's output.
    * Mixup, a data augmentation technique, is implemented as part of the model's forward pass during training.
5. Training Utilities:
    * The get_optimizer, get_scheduler, and get_criterion functions initialize the optimizer, learning rate scheduler, and loss function based on the defined configuration.
6. Training Functions:
    * train_one_epoch executes a single training epoch loop. It calculates the loss for each batch, performs backpropagation, and updates the optimizer.
    * Mixup is also applied within this function.
    * `validate` evaluates the model's performance. It calculates the loss and AUC on the validation set.
    * `calculate_auc` calculates the ROC AUC score.
7. Load Data:
    * Loads train.csv and taxonomy.csv as Pandas DataFrames.
    * Retrieves the list of bird species from taxonomy.csv and sets the number of classes (cfg.num_classes).
8. Run Training:
    * Starts the training process according to the configuration (cfg).
    * Loads the pre-computed mel spectrograms.
    * Sets up Stratified K-Fold cross-validation. This is a technique to split the data into multiple folds and train and evaluate the model using each fold as the validation set.
    * For each selected fold, it performs the following:
    * Splits the data into training and validation sets.
    * Creates BirdCLEFDatasetFromNPY for each set and sets up DataLoaders.
    * Initializes the model, optimizer, criterion (loss function), and scheduler.
    * Iterates through the defined number of epochs for training and validation.
    * Saves the model weights if the validation AUC improves.
    * Records the best AUC for each fold.
9. Cross Validation:
    * Prints the best AUC for each fold and the average AUC across all folds.
    * Overall, this notebook constructs a pipeline to efficiently train and evaluate a deep learning model for bird song classification using cross-validation.
    * It leverages pre-computed spectrograms and incorporates techniques like data augmentation and Mixup.

# Import Libraries

In [1]:
import time
import os
import random
import gc
import time
import cv2
import math
from pathlib import Path

import numpy as np
import pandas as pd
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score
import librosa

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.utils.data import Dataset, DataLoader

import torchvision

import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.auto import tqdm

import timm

# Configuration

In [2]:
# Check if gpu is available
torch.cuda.is_available()

True

In [3]:
class CFG:
    seed: int = 42
    debug: bool = False
    apex: bool = False
    print_freq: int = 100
    num_workers: int = 2

    OUTPUT_DIR: str = "kaggle/working/"

    train_data_dir: str = "/kaggle/input/birdclef-2025/train_audio"
    train_csv: str = "/kaggle/input/birdclef-2025/train.csv"
    test_soundscapes: str = "/kaggle/input/birdclef-2025/test_soundscapes"
    submission_csv: str = "/kaggle/input/birdclef-2025/sample_submission.csv"
    taxonomy_csv: str = "/kaggle/input/birdclef-2025/taxonomy.csv"
    spectrogram_npy: str = "/kaggle/input/3-transforming-audio-to-mel-spec/birdclef2025_melspec_5sec_256_256.npy"

    model_name: str = "tf_efficientnetv2_s.in21k_ft_in1k"
    pretrained: bool = True
    in_channels: int = 1

    TARGET_SHAPE: tuple[int, int] = (256, 256)

    device: str = "cuda" if torch.cuda.is_available() else "cpu"
    epochs: int = 10
    batch_size: int = 16
    criterion: str = "FocalLossBCE"

    n_fold: int = 5
    selected_folds: list[int] = [0, 1, 2, 3, 4]

    optimizer: str = "AdamW"
    lr: float = 5e-4
    weight_decay: float = 1e-5

    scheduler: str = "CosineAnnealingLR"
    # scheduler: str = "ReduceLROnPlateau"
    # scheduler: str = "StepLR"
    # scheduler: str = "OneCycleLR"
    min_lr: float = 1e-6
    T_max: int = epochs

    aug_prob: float = 0.5
    mixup_alpha: float = 0.5

    def update_debug_settings(self) -> None:
        if self.debug:
            self.epochs = 2
            self.selected_folds = [0]

cfg = CFG()

In [4]:
# Set seed for reproducibility:

random.seed(cfg.seed)
os.environ["PYTHONHASHSEED"] = str(cfg.seed)
np.random.seed(cfg.seed)
torch.manual_seed(cfg.seed)
torch.cuda.manual_seed(cfg.seed)
torch.cuda.manual_seed_all(cfg.seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

# Dataset Preparation and Data Augmentations

In [5]:
class BirdCLEFDatasetFromNPY(Dataset):
    """
    Custom PyTorch Dataset class for the BirdCLEF 2025 dataset.
    It loads pre-computed mel spectrograms for training or validation
    and applies data augmentations if necessary.

    Args:
        df (pd.DataFrame): DataFrame containing the data (e.g., train.csv or a subset).
        cfg (CFG): Configuration object containing hyperparameters and settings.
        spectrograms (dict, optional): Dictionary of pre-computed mel spectrograms.
                                       Keys are sample `samplename`, values are the spectrogram numpy arrays.
                                       Defaults to None.
        mode (str, optional): Mode of the dataset, either 'train' or 'valid'.
                              Defaults to 'train'.
    """

    def __init__(self, df: pd.DataFrame, cfg: CFG, spectrograms: [dict[str, np.ndarray] | None]=None, mode: str="train") -> None:
        """
        Initializes the BirdCLEFDatasetFromNPY.
        Sets up the DataFrame, configuration, spectrogram data, and label encoding.
        """

        self.df: pd.DataFrame = df
        self.cfg: CFG = cfg
        self.mode: str = mode

        self.spectrograms: [dict[str, np.ndarray] | None] = spectrograms
        
        taxonomy_df = pd.read_csv(self.cfg.taxonomy_csv)
        self.species_ids: list[str] = taxonomy_df["primary_label"].tolist()
        self.num_classes: int = len(self.species_ids)
        self.label_to_idx: dict[str, int] = {label: idx for idx, label in enumerate(self.species_ids)}

        if "filepath" not in self.df.columns:
            self.df["filepath"] = self.cfg.train_data_dir + "/" + self.df.filename
        
        if "samplename" not in self.df.columns:
            self.df["samplename"] = self.df.filename.map(lambda x: x.split("/")[0] + "-" + x.split("/")[-1].split(".")[0])

        sample_names = set(self.df["samplename"])
        if self.spectrograms:
            found_samples = sum(1 for name in sample_names if name in self.spectrograms)
            print(f"Found {found_samples} matching spectrograms for {mode} dataset out of {len(self.df)} samples")
        
        if cfg.debug:
            self.df = self.df.sample(min(1000, len(self.df)), random_state=cfg.seed).reset_index(drop=True)
    
    def __len__(self) -> int:
        """
        Returns the number of samples in the dataset.
        """

        return len(self.df)
    
    def __getitem__(self, idx) -> dict[str, [torch.Tensor | str]]:
        """
        Loads an item at the given index, applies preprocessing and augmentations.

        Args:
            idx (int): Index of the sample.

        Returns:
            dict: A dictionary containing the following keys:
                  - 'melspec' (torch.Tensor): The mel spectrogram tensor.
                  - 'target' (torch.Tensor): The one-hot encoded target label tensor.
                  - 'filename' (str): The original filename.
        """

        row = self.df.iloc[idx]
        samplename = row["samplename"]
        spec = None

        if self.spectrograms and samplename in self.spectrograms:
            spec = self.spectrograms[samplename]
        elif not self.cfg.LOAD_DATA:
            spec = process_audio_file(row["filepath"], self.cfg)

        if spec is None:
            spec = np.zeros(self.cfg.TARGET_SHAPE, dtype=np.float32)
            if self.mode == "train":  # Only print warning during training
                print(f"Warning: Spectrogram for {samplename} not found and could not be generated")

        spec = torch.tensor(spec, dtype=torch.float32).unsqueeze(0)  # Add channel dimension

        if self.mode == "train" and random.random() < self.cfg.aug_prob:
            spec = self.apply_spec_augmentations(spec)
        
        target = self.encode_label(row["primary_label"])
        
        if "secondary_labels" in row and row["secondary_labels"] not in [[""], None, np.nan]:
            if isinstance(row["secondary_labels"], str):
                secondary_labels = eval(row["secondary_labels"])
            else:
                secondary_labels = row["secondary_labels"]
            
            for label in secondary_labels:
                if label in self.label_to_idx:
                    target[self.label_to_idx[label]] = 1.0
        
        return {
            "melspec": spec, 
            "target": torch.tensor(target, dtype=torch.float32),
            "filename": row["filename"]
        }
    
    def apply_spec_augmentations(self, spec: torch.Tensor) -> torch.Tensor:
        """
        Applies data augmentations to the mel spectrogram.
        Includes Time Masking, Frequency Masking, and random brightness/contrast adjustment.

        Args:
            spec (torch.Tensor): The mel spectrogram tensor to apply augmentations to.

        Returns:
            torch.Tensor: The mel spectrogram tensor with augmentations applied.
        """
    
        # Time masking (horizontal stripes)
        if random.random() < 0.5:
            num_masks = random.randint(1, 3)
            for _ in range(num_masks):
                width = random.randint(5, 20)
                start = random.randint(0, spec.shape[2] - width)
                spec[0, :, start:start+width] = 0
        
        # Frequency masking (vertical stripes)
        if random.random() < 0.5:
            num_masks = random.randint(1, 3)
            for _ in range(num_masks):
                height = random.randint(5, 20)
                start = random.randint(0, spec.shape[1] - height)
                spec[0, start:start+height, :] = 0
        
        # Random brightness/contrast
        if random.random() < 0.5:
            gain = random.uniform(0.8, 1.2)
            bias = random.uniform(-0.1, 0.1)
            spec = spec * gain + bias
            spec = torch.clamp(spec, 0, 1) 
            
        return spec
    
    def encode_label(self, label: str) -> np.ndarray:
        """
        Encodes a single label into a one-hot vector.

        Args:
            label (str): The label to encode.

        Returns:
            np.ndarray: The one-hot encoded label as a numpy array.
        """

        target = np.zeros(self.num_classes)
        if label in self.label_to_idx:
            target[self.label_to_idx[label]] = 1.0
        return target

In [6]:
def collate_fn(batch: list[dict[str, [torch.Tensor | str]]]) -> dict[str, [torch.Tensor | list[str]]]:
    """
    Custom collate function to handle batches from BirdCLEFDatasetFromNPY.
    This function is particularly useful for handling potential None items
    returned by __getitem__ and for correctly stacking tensors in the batch.

    Args:
        batch (list[Optional[dict[str, Union[torch.Tensor, str]]]]): A list of samples from the dataset.
                                                                      Each sample is a dictionary or None.

    Returns:
        dict[str, Union[torch.Tensor, list[str]]]: A dictionary where keys are the item names
                                                  (e.g., 'melspec', 'target', 'filename')
                                                  and values are either stacked tensors
                                                  (for 'melspec' and 'target' if shapes are uniform)
                                                  or a list (for 'filename').
                                                  Returns an empty dictionary if the input batch is empty
                                                  or contains only None values.
    """

    batch = [item for item in batch if item is not None]
    if len(batch) == 0:
        return {}
        
    result = {key: [] for key in batch[0].keys()}
    
    for item in batch:
        for key, value in item.items():
            result[key].append(value)
    
    for key in result:
        if key == "target" and isinstance(result[key][0], torch.Tensor):
            result[key] = torch.stack(result[key])
        elif key == "melspec" and isinstance(result[key][0], torch.Tensor):
            shapes = [t.shape for t in result[key]]
            if len(set(str(s) for s in shapes)) == 1:
                result[key] = torch.stack(result[key])
    
    return result

# Model Definition

In [7]:
class BirdCLEFModel(nn.Module):
    """Deep learning model for bird song classification using a pre-trained backbone."""
    
    def __init__(self, cfg) -> None:
        """
        Initializes the BirdCLEFModel.
        Sets up the backbone model and the classifier head.
        
        Args:
            cfg (CFG): Configuration object containing model settings and hyperparameters.
        """
        
        super().__init__()
        self.cfg = cfg
        
        taxonomy_df = pd.read_csv(cfg.taxonomy_csv)
        cfg.num_classes = len(taxonomy_df) # Update num_classes in cfg
        
        self.backbone = timm.create_model(
            cfg.model_name,
            pretrained=cfg.pretrained,
            in_chans=cfg.in_channels,
            drop_rate=0.2,
            drop_path_rate=0.2
        )

        # Determine the number of features from the backbone's output
        if "efficientnet" in cfg.model_name:
            backbone_out = self.backbone.classifier.in_features
            self.backbone.classifier = nn.Identity()
        elif "resnet" in cfg.model_name:
            backbone_out = self.backbone.fc.in_features
            self.backbone.fc = nn.Identity()
        else:
            # Generic approach for models with a get_classifier method
            backbone_out = self.backbone.get_classifier().in_features
            self.backbone.reset_classifier(0, "")
        
        self.pooling = nn.AdaptiveAvgPool2d(1)
            
        self.feat_dim = backbone_out
        
        self.classifier = nn.Linear(backbone_out, cfg.num_classes)
        
        self.mixup_enabled = hasattr(cfg, 'mixup_alpha') and cfg.mixup_alpha > 0
        if self.mixup_enabled:
            self.mixup_alpha = cfg.mixup_alpha
            
    def forward(self, x: torch.Tensor, targets: [torch.Tensor | None]=None) -> [torch.Tensor | tuple[torch.Tensor, torch.Tensor]]:
        """
        Forward pass of the model. Optionally applies Mixup during training.

        Args:
            x (torch.Tensor): Input tensor (spectrogram batch). Shape (batch_size, channels, height, width).
            targets (Optional[torch.Tensor], optional): Target labels for Mixup.
                                                      Shape (batch_size, num_classes). Defaults to None.

        Returns:
            Union[torch.Tensor, tuple[torch.Tensor, torch.Tensor]]: If training with Mixup, returns a tuple
                                                                  of (logits, loss). Otherwise, returns
                                                                  the logits tensor.
        """

        if self.training and self.mixup_enabled and targets is not None:
            mixed_x, targets_a, targets_b, lam = self.mixup_data(x, targets)
            x = mixed_x
        else:
            targets_a, targets_b, lam = None, None, None
        
        features = self.backbone(x)

        # Handle potential dictionary output from some backbones
        if isinstance(features, dict):
            features = features['features']

        # Apply pooling if the output is 4D (convolutional features)
        if len(features.shape) == 4:
            features = self.pooling(features)
            features = features.view(features.size(0), -1)
        
        logits = self.classifier(features)
        
        if self.training and self.mixup_enabled and targets is not None:
            loss = self.mixup_criterion(F.binary_cross_entropy_with_logits, 
                                       logits, targets_a, targets_b, lam)
            return logits, loss
            
        return logits
    
    def mixup_data(self, x: torch.Tensor, targets: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, float]:
        """
        Applies mixup to the data batch and targets.

        Args:
            x (torch.Tensor): Input tensor batch. Shape (batch_size, channels, height, width).
            targets (torch.Tensor): Target labels batch. Shape (batch_size, num_classes).

        Returns:
            tuple[torch.Tensor, torch.Tensor, torch.Tensor, float]: A tuple containing:
                                                                  - mixed_x (torch.Tensor): The mixed input tensor.
                                                                  - targets_a (torch.Tensor): First set of targets.
                                                                  - targets_b (torch.Tensor): Second set of targets.
                                                                  - lam (float): The lambda value used for mixing.
        """

        batch_size = x.size(0)

        lam = np.random.beta(self.mixup_alpha, self.mixup_alpha)

        indices = torch.randperm(batch_size).to(x.device)

        mixed_x = lam * x + (1 - lam) * x[indices]
        
        return mixed_x, targets, targets[indices], lam
    
    def mixup_criterion(self, criterion: nn.Module, pred: torch.Tensor, y_a: torch.Tensor, y_b: torch.Tensor, lam: float) -> torch.Tensor:
        """
        Applies mixup to the loss function.

        Args:
            criterion (nn.Module): The loss function (e.g., nn.BCEWithLogitsLoss).
            pred (torch.Tensor): The model predictions (logits).
            y_a (torch.Tensor): The first set of targets.
            y_b (torch.Tensor): The second set of targets.
            lam (float): The lambda value used for mixing.

        Returns:
            torch.Tensor: The mixed loss.
        """

        return lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)

# Training Utilities

In [8]:
def get_optimizer(model: nn.Module, cfg: CFG) -> optim.Optimizer:
    """
    Initializes and returns a PyTorch optimizer based on the configuration.

    Args:
        model (nn.Module): The PyTorch model for which the optimizer is created.
        cfg (CFG): Configuration object containing optimizer type, learning rate, and weight decay.

    Returns:
        torch.optim.Optimizer: An instance of a PyTorch optimizer.

    Raises:
        NotImplementedError: If the optimizer specified in the configuration is not implemented.
    """
    
    if cfg.optimizer == "Adam":
        optimizer = optim.Adam(
            model.parameters(),
            lr=cfg.lr,
            weight_decay=cfg.weight_decay
        )
    elif cfg.optimizer == "AdamW":
        optimizer = optim.AdamW(
            model.parameters(),
            lr=cfg.lr,
            weight_decay=cfg.weight_decay
        )
    elif cfg.optimizer == "SGD":
        optimizer = optim.SGD(
            model.parameters(),
            lr=cfg.lr,
            momentum=0.9,
            weight_decay=cfg.weight_decay
        )
    else:
        raise NotImplementedError(f"Optimizer {cfg.optimizer} not implemented")
        
    return optimizer

In [9]:
def get_scheduler(optimizer: optim.Optimizer, cfg: CFG) -> lr_scheduler._LRScheduler | lr_scheduler.ReduceLROnPlateau | None:
    """
    Initializes and returns a PyTorch learning rate scheduler based on the configuration.

    Args:
        optimizer (torch.optim.Optimizer): The optimizer for which the scheduler is created.
        cfg (CFG): Configuration object containing scheduler type and its specific parameters.

    Returns:
        torch.optim.lr_scheduler._LRScheduler | torch.optim.lr_scheduler.ReduceLROnPlateau | None:
            An instance of a PyTorch learning rate scheduler, or None if no scheduler is specified
            or the specified scheduler is 'OneCycleLR' (which might be handled differently).
    """
    
    if cfg.scheduler == "CosineAnnealingLR":
        scheduler = lr_scheduler.CosineAnnealingLR(
            optimizer,
            T_max=cfg.T_max,
            eta_min=cfg.min_lr
        )
    elif cfg.scheduler == "ReduceLROnPlateau":
        scheduler = lr_scheduler.ReduceLROnPlateau(
            optimizer,
            mode="min",
            factor=0.5,
            patience=2,
            min_lr=cfg.min_lr,
            verbose=True
        )
    elif cfg.scheduler == "StepLR":
        scheduler = lr_scheduler.StepLR(
            optimizer,
            step_size=cfg.epochs // 3,
            gamma=0.5
        )
    elif cfg.scheduler == "OneCycleLR":
        scheduler = None  
    else:
        scheduler = None

    return scheduler

In [10]:
""" 
    FocalLossBCE Use Example
"""
class FocalLossBCE(torch.nn.Module):
    def __init__(
            self,
            alpha: float = 0.25,
            gamma: float = 2,
            reduction: str = "mean",
            bce_weight: float = 0.6,
            focal_weight: float = 1.4,
    ):
        super().__init__()
        self.alpha = alpha
        self.gamma = gamma
        self.reduction = reduction
        self.bce = torch.nn.BCEWithLogitsLoss(reduction=reduction)
        self.bce_weight = bce_weight
        self.focal_weight = focal_weight

    def forward(self, logits, targets):
        focall_loss = torchvision.ops.focal_loss.sigmoid_focal_loss(
            inputs=logits,
            targets=targets,
            alpha=self.alpha,
            gamma=self.gamma,
            reduction=self.reduction,
        )
        bce_loss = self.bce(logits, targets)
        return self.bce_weight * bce_loss + self.focal_weight * focall_loss


In [11]:
def get_criterion(cfg: CFG) -> nn.Module:
    """
    Initializes and returns a PyTorch loss function based on the configuration.

    Args:
        cfg (CFG): Configuration object containing the criterion (loss function) type.

    Returns:
        torch.nn.Module: An instance of a PyTorch loss function.

    Raises:
        NotImplementedError: If the criterion specified in the configuration is not implemented.
    """
    
    if cfg.criterion == "BCEWithLogitsLoss":
        criterion = nn.BCEWithLogitsLoss()
    elif cfg.criterion == "FocalLossBCE":
        criterion = FocalLossBCE()
    else:
        raise NotImplementedError(f"Criterion {cfg.criterion} not implemented")

    return criterion

# Training Functions

In [12]:
def train_one_epoch(
    model: nn.Module,
    loader: DataLoader,
    optimizer: optim.Optimizer,
    criterion: nn.Module,
    device: str,
    scheduler: lr_scheduler._LRScheduler | lr_scheduler.ReduceLROnPlateau | None = None
) -> tuple[float, float]:
    """
    Performs one training epoch for the model.

    Iterates through the DataLoader, calculates the loss for each batch,
    performs backpropagation, and updates the model's weights using the optimizer.
    Optionally steps the learning rate scheduler if provided.

    Args:
        model (nn.Module): The PyTorch model to train.
        loader (DataLoader): DataLoader providing the training data batches.
        optimizer (torch.optim.Optimizer): The optimizer used for updating model weights.
        criterion (nn.Module): The loss function.
        device (str): The device to perform training on ('cuda' or 'cpu').
        scheduler (torch.optim.lr_scheduler._LRScheduler | torch.optim.lr_scheduler.ReduceLROnPlateau | None, optional):
            The learning rate scheduler. Expected to be stepped after each batch if it's a OneCycleLR,
            otherwise stepped outside this function after the epoch. Defaults to None.

    Returns:
        tuple[float, float]: A tuple containing the average training loss and the
                             average ROC AUC score for the epoch.
    """
    
    model.train()
    losses = []
    all_targets = []
    all_outputs = []
    
    pbar = tqdm(enumerate(loader), total=len(loader), desc="Training")
    
    for step, batch in pbar:
        # Handle the case where collate_fn might return lists of tensors
        # (although the current collate_fn primarily stacks fixed-size tensors)
    
        if isinstance(batch["melspec"], list):
            batch_outputs = []
            batch_losses = []
            
            for i in range(len(batch["melspec"])):
                # Ensure inputs and targets are tensors and on the correct device
                inputs = batch["melspec"][i].unsqueeze(0).to(device)
                target = batch["target"][i].unsqueeze(0).to(device)
                
                optimizer.zero_grad()
                output = model(inputs)
                # Assuming output is logits if mixup is not used in forward for single samples
                loss = criterion(output, target)
                loss.backward()
                
                batch_outputs.append(output.detach().cpu())
                batch_losses.append(loss.item())
            
            optimizer.step()
            outputs = torch.cat(batch_outputs, dim=0).numpy()
            loss = np.mean(batch_losses)
            targets = batch["target"].numpy()

        else:
            # Standard batch processing
            inputs = batch["melspec"].to(device)
            targets = batch["target"].to(device)
            
            optimizer.zero_grad()
            outputs = model(inputs) # Pass targets for potential mixup
            
            if isinstance(outputs, tuple):
                # Model returned (logits, loss) due to mixup
                outputs, loss = outputs  
            else:
                # Model returned logits
                loss = criterion(outputs, targets)
                
            loss.backward()
            optimizer.step()
            
            outputs = outputs.detach().cpu().numpy()
            targets = targets.detach().cpu().numpy()

        # Step scheduler if it's a OneCycleLR (stepped after each batch)
        if scheduler is not None and isinstance(scheduler, lr_scheduler.OneCycleLR):
            scheduler.step()
            
        all_outputs.append(outputs)
        all_targets.append(targets)
        losses.append(loss if isinstance(loss, float) else loss.item())
        
        pbar.set_postfix({
            'train_loss': np.mean(losses[-10:]) if losses else 0,
            'lr': optimizer.param_groups[0]['lr']
        })

    # Concatenate results from all batches
    all_outputs = np.concatenate(all_outputs)
    all_targets = np.concatenate(all_targets)

    # Calculate AUC for the epoch
    auc = calculate_auc(all_targets, all_outputs)
    avg_loss = np.mean(losses)
    
    return avg_loss, auc

In [13]:
def validate(
    model: nn.Module,
    loader: DataLoader,
    criterion: nn.Module,
    device: str
) -> tuple[float, float]:
    """
    Evaluates the model on the validation set.

    Iterates through the DataLoader in evaluation mode, calculates the loss
    and performance metric (AUC) without backpropagation.

    Args:
        model (nn.Module): The PyTorch model to evaluate.
        loader (DataLoader): DataLoader providing the validation data batches.
        criterion (nn.Module): The loss function.
        device (str): The device to perform evaluation on ('cuda' or 'cpu').

    Returns:
        tuple[float, float]: A tuple containing the average validation loss and the
                             average ROC AUC score for the validation set.
    """

    model.eval() # Set the model to evaluation mode
    losses = []
    all_targets = []
    all_outputs = []

    with torch.no_grad(): # Disable gradient calculation for evaluation
        for batch in tqdm(loader, desc="Validation"):
            # Handle the case where collate_fn might return lists of tensors
            if isinstance(batch["melspec"], list):
                batch_outputs = []
                batch_losses = []
                
                for i in range(len(batch["melspec"])):
                    # Ensure inputs and targets are tensors and on the correct device
                    inputs = batch["melspec"][i].unsqueeze(0).to(device)
                    target = batch["target"][i].unsqueeze(0).to(device)
                    
                    output = model(inputs)
                    loss = criterion(output, target)
                    
                    batch_outputs.append(output.detach().cpu())
                    batch_losses.append(loss.item())
                
                outputs = torch.cat(batch_outputs, dim=0).numpy()
                loss = np.mean(batch_losses)
                targets = batch["target"].numpy()
                
            else:
                inputs = batch["melspec"].to(device)
                targets = batch["target"].to(device)
                
                outputs = model(inputs) # No targets needed in forward for validation
                loss = criterion(outputs, targets)
                
                outputs = outputs.detach().cpu().numpy()
                targets = targets.detach().cpu().numpy()

            all_outputs.append(outputs)
            all_targets.append(targets)
            losses.append(loss if isinstance(loss, float) else loss.item())

    # Concatenate results from all batches
    all_outputs = np.concatenate(all_outputs)
    all_targets = np.concatenate(all_targets)

    # Calculate AUC for the validation set
    auc = calculate_auc(all_targets, all_outputs)
    avg_loss = np.mean(losses)
    
    return avg_loss, auc

In [14]:
def calculate_auc(targets: np.ndarray, outputs: np.ndarray) -> float:
    """
    Calculates the mean ROC AUC score across all classes.

    Computes the ROC AUC score for each class individually where there are
    positive samples in the target, and then returns the average of these scores.

    Args:
        targets (np.ndarray): Ground truth labels (one-hot encoded). Shape (num_samples, num_classes).
        outputs (np.ndarray): Model predictions (logits or probabilities). Shape (num_samples, num_classes).

    Returns:
        float: The mean ROC AUC score across all classes with at least one positive sample.
               Returns 0.0 if there are no classes with positive samples.
    """

    num_classes = targets.shape[1]
    aucs = []
    
    probs = 1 / (1 + np.exp(-outputs))
    
    for i in range(num_classes):
        
        if np.sum(targets[:, i]) > 0:
            class_auc = roc_auc_score(targets[:, i], probs[:, i])
            aucs.append(class_auc)
    
    return np.mean(aucs) if aucs else 0.0

# Load Data

In [15]:
train_df = pd.read_csv(cfg.train_csv)
taxonomy_df = pd.read_csv(cfg.taxonomy_csv)

In [16]:
train_df.head()

Unnamed: 0,primary_label,secondary_labels,type,filename,collection,rating,url,latitude,longitude,scientific_name,common_name,author,license
0,1139490,[''],[''],1139490/CSA36385.ogg,CSA,0.0,http://colecciones.humboldt.org.co/rec/sonidos...,7.3206,-73.7128,Ragoniella pulchella,Ragoniella pulchella,Fabio A. Sarria-S,cc-by-nc-sa 4.0
1,1139490,[''],[''],1139490/CSA36389.ogg,CSA,0.0,http://colecciones.humboldt.org.co/rec/sonidos...,7.3206,-73.7128,Ragoniella pulchella,Ragoniella pulchella,Fabio A. Sarria-S,cc-by-nc-sa 4.0
2,1192948,[''],[''],1192948/CSA36358.ogg,CSA,0.0,http://colecciones.humboldt.org.co/rec/sonidos...,7.3791,-73.7313,Oxyprora surinamensis,Oxyprora surinamensis,Fabio A. Sarria-S,cc-by-nc-sa 4.0
3,1192948,[''],[''],1192948/CSA36366.ogg,CSA,0.0,http://colecciones.humboldt.org.co/rec/sonidos...,7.28,-73.8582,Oxyprora surinamensis,Oxyprora surinamensis,Fabio A. Sarria-S,cc-by-nc-sa 4.0
4,1192948,[''],[''],1192948/CSA36373.ogg,CSA,0.0,http://colecciones.humboldt.org.co/rec/sonidos...,7.3791,-73.7313,Oxyprora surinamensis,Oxyprora surinamensis,Fabio A. Sarria-S,cc-by-nc-sa 4.0


In [17]:
taxonomy_df.head()

Unnamed: 0,primary_label,inat_taxon_id,scientific_name,common_name,class_name
0,1139490,1139490,Ragoniella pulchella,Ragoniella pulchella,Insecta
1,1192948,1192948,Oxyprora surinamensis,Oxyprora surinamensis,Insecta
2,1194042,1194042,Copiphora colombiae,Copiphora colombiae,Insecta
3,126247,126247,Leptodactylus insularum,Spotted Foam-nest Frog,Amphibia
4,1346504,1346504,Neoconocephalus brachypterus,Neoconocephalus brachypterus,Insecta


In [18]:
taxonomy_df = pd.read_csv(cfg.taxonomy_csv)
species_ids = taxonomy_df['primary_label'].tolist()
cfg.num_classes = len(species_ids)

In [19]:
taxonomy_df.head()

Unnamed: 0,primary_label,inat_taxon_id,scientific_name,common_name,class_name
0,1139490,1139490,Ragoniella pulchella,Ragoniella pulchella,Insecta
1,1192948,1192948,Oxyprora surinamensis,Oxyprora surinamensis,Insecta
2,1194042,1194042,Copiphora colombiae,Copiphora colombiae,Insecta
3,126247,126247,Leptodactylus insularum,Spotted Foam-nest Frog,Amphibia
4,1346504,1346504,Neoconocephalus brachypterus,Neoconocephalus brachypterus,Insecta


# Run Training

In [20]:
print(cfg.debug)
if cfg.debug:
    cfg.update_debug_settings()

False


In [21]:
spectrograms = None

In [22]:
print("Loading pre-computed mel spectrograms from NPY file...")
spectrograms = np.load(cfg.spectrogram_npy, allow_pickle=True).item()
print(f"Loaded {len(spectrograms)} pre-computed mel spectrograms")

Loading pre-computed mel spectrograms from NPY file...
Loaded 28564 pre-computed mel spectrograms


In [23]:
skf = StratifiedKFold(n_splits=cfg.n_fold, shuffle=True, random_state=cfg.seed)

In [24]:
best_scores = []

In [25]:
df = train_df.copy()

In [26]:
for fold, (train_idx, val_idx) in enumerate(skf.split(df, df['primary_label'])):
    if fold not in cfg.selected_folds:
        continue
        
    print(f'\n{"="*30} Fold {fold} {"="*30}')
    
    train_df = df.iloc[train_idx].reset_index(drop=True)
    val_df = df.iloc[val_idx].reset_index(drop=True)
    
    print(f'Training set: {len(train_df)} samples')
    print(f'Validation set: {len(val_df)} samples')
    
    train_dataset = BirdCLEFDatasetFromNPY(train_df, cfg, spectrograms=spectrograms, mode='train')
    val_dataset = BirdCLEFDatasetFromNPY(val_df, cfg, spectrograms=spectrograms, mode='valid')
    
    train_loader = DataLoader(
        train_dataset, 
        batch_size=cfg.batch_size, 
        shuffle=True, 
        num_workers=cfg.num_workers,
        pin_memory=True,
        collate_fn=collate_fn,
        drop_last=True
    )
    
    val_loader = DataLoader(
        val_dataset, 
        batch_size=cfg.batch_size, 
        shuffle=False, 
        num_workers=cfg.num_workers,
        pin_memory=True,
        collate_fn=collate_fn
    )
    
    model = BirdCLEFModel(cfg).to(cfg.device)
    optimizer = get_optimizer(model, cfg)
    criterion = get_criterion(cfg)
    
    if cfg.scheduler == 'OneCycleLR':
        scheduler = lr_scheduler.OneCycleLR(
            optimizer,
            max_lr=cfg.lr,
            steps_per_epoch=len(train_loader),
            epochs=cfg.epochs,
            pct_start=0.1
        )
    else:
        scheduler = get_scheduler(optimizer, cfg)
    
    best_auc = 0
    best_epoch = 0
    
    for epoch in range(cfg.epochs):
        print(f"\nEpoch {epoch+1}/{cfg.epochs}")
        
        train_loss, train_auc = train_one_epoch(
            model, 
            train_loader, 
            optimizer, 
            criterion, 
            cfg.device,
            scheduler if isinstance(scheduler, lr_scheduler.OneCycleLR) else None
        )
        
        val_loss, val_auc = validate(model, val_loader, criterion, cfg.device)

        if scheduler is not None and not isinstance(scheduler, lr_scheduler.OneCycleLR):
            if isinstance(scheduler, lr_scheduler.ReduceLROnPlateau):
                scheduler.step(val_loss)
            else:
                scheduler.step()

        print(f"Train Loss: {train_loss:.4f}, Train AUC: {train_auc:.4f}")
        print(f"Val Loss: {val_loss:.4f}, Val AUC: {val_auc:.4f}")
        
        if val_auc > best_auc:
            best_auc = val_auc
            best_epoch = epoch + 1
            print(f"New best AUC: {best_auc:.4f} at epoch {best_epoch}")

            torch.save({
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'scheduler_state_dict': scheduler.state_dict() if scheduler else None,
                'epoch': epoch,
                'val_auc': val_auc,
                'train_auc': train_auc,
                'cfg': cfg
            }, f"model_fold{fold}.pth")
    
    best_scores.append(best_auc)
    print(f"\nBest AUC for fold {fold}: {best_auc:.4f} at epoch {best_epoch}")
    
    # Clear memory
    del model, optimizer, scheduler, train_loader, val_loader
    torch.cuda.empty_cache()
    gc.collect()


Training set: 22851 samples
Validation set: 5713 samples
Found 22851 matching spectrograms for train dataset out of 22851 samples
Found 5713 matching spectrograms for valid dataset out of 5713 samples




model.safetensors:   0%|          | 0.00/86.5M [00:00<?, ?B/s]


Epoch 1/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0227, Train AUC: 0.7578
Val Loss: 0.0153, Val AUC: 0.9157
New best AUC: 0.9157 at epoch 1

Epoch 2/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0143, Train AUC: 0.9088
Val Loss: 0.0132, Val AUC: 0.9492
New best AUC: 0.9492 at epoch 2

Epoch 3/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0118, Train AUC: 0.9510
Val Loss: 0.0120, Val AUC: 0.9545
New best AUC: 0.9545 at epoch 3

Epoch 4/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0099, Train AUC: 0.9733
Val Loss: 0.0113, Val AUC: 0.9639
New best AUC: 0.9639 at epoch 4

Epoch 5/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0081, Train AUC: 0.9856
Val Loss: 0.0115, Val AUC: 0.9650
New best AUC: 0.9650 at epoch 5

Epoch 6/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0062, Train AUC: 0.9931
Val Loss: 0.0112, Val AUC: 0.9646

Epoch 7/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0046, Train AUC: 0.9969
Val Loss: 0.0115, Val AUC: 0.9673
New best AUC: 0.9673 at epoch 7

Epoch 8/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0031, Train AUC: 0.9988
Val Loss: 0.0119, Val AUC: 0.9660

Epoch 9/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0022, Train AUC: 0.9995
Val Loss: 0.0123, Val AUC: 0.9657

Epoch 10/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0018, Train AUC: 0.9997
Val Loss: 0.0124, Val AUC: 0.9654

Best AUC for fold 0: 0.9673 at epoch 7

Training set: 22851 samples
Validation set: 5713 samples
Found 22851 matching spectrograms for train dataset out of 22851 samples
Found 5713 matching spectrograms for valid dataset out of 5713 samples

Epoch 1/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0228, Train AUC: 0.7541
Val Loss: 0.0153, Val AUC: 0.9115
New best AUC: 0.9115 at epoch 1

Epoch 2/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0141, Train AUC: 0.9142
Val Loss: 0.0126, Val AUC: 0.9488
New best AUC: 0.9488 at epoch 2

Epoch 3/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0117, Train AUC: 0.9509
Val Loss: 0.0120, Val AUC: 0.9563
New best AUC: 0.9563 at epoch 3

Epoch 4/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0098, Train AUC: 0.9733
Val Loss: 0.0114, Val AUC: 0.9637
New best AUC: 0.9637 at epoch 4

Epoch 5/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0079, Train AUC: 0.9853
Val Loss: 0.0109, Val AUC: 0.9677
New best AUC: 0.9677 at epoch 5

Epoch 6/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0061, Train AUC: 0.9929
Val Loss: 0.0112, Val AUC: 0.9670

Epoch 7/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0045, Train AUC: 0.9971
Val Loss: 0.0117, Val AUC: 0.9669

Epoch 8/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0031, Train AUC: 0.9988
Val Loss: 0.0119, Val AUC: 0.9683
New best AUC: 0.9683 at epoch 8

Epoch 9/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0022, Train AUC: 0.9995
Val Loss: 0.0123, Val AUC: 0.9695
New best AUC: 0.9695 at epoch 9

Epoch 10/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0018, Train AUC: 0.9996
Val Loss: 0.0125, Val AUC: 0.9691

Best AUC for fold 1: 0.9695 at epoch 9

Training set: 22851 samples
Validation set: 5713 samples
Found 22851 matching spectrograms for train dataset out of 22851 samples
Found 5713 matching spectrograms for valid dataset out of 5713 samples

Epoch 1/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0223, Train AUC: 0.7720
Val Loss: 0.0146, Val AUC: 0.9296
New best AUC: 0.9296 at epoch 1

Epoch 2/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0140, Train AUC: 0.9179
Val Loss: 0.0124, Val AUC: 0.9519
New best AUC: 0.9519 at epoch 2

Epoch 3/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0116, Train AUC: 0.9559
Val Loss: 0.0118, Val AUC: 0.9565
New best AUC: 0.9565 at epoch 3

Epoch 4/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0098, Train AUC: 0.9731
Val Loss: 0.0114, Val AUC: 0.9617
New best AUC: 0.9617 at epoch 4

Epoch 5/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0079, Train AUC: 0.9861
Val Loss: 0.0110, Val AUC: 0.9658
New best AUC: 0.9658 at epoch 5

Epoch 6/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0060, Train AUC: 0.9928
Val Loss: 0.0113, Val AUC: 0.9648

Epoch 7/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0044, Train AUC: 0.9970
Val Loss: 0.0113, Val AUC: 0.9684
New best AUC: 0.9684 at epoch 7

Epoch 8/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0030, Train AUC: 0.9989
Val Loss: 0.0120, Val AUC: 0.9648

Epoch 9/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0022, Train AUC: 0.9995
Val Loss: 0.0122, Val AUC: 0.9660

Epoch 10/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0017, Train AUC: 0.9997
Val Loss: 0.0122, Val AUC: 0.9674

Best AUC for fold 2: 0.9684 at epoch 7

Training set: 22851 samples
Validation set: 5713 samples
Found 22851 matching spectrograms for train dataset out of 22851 samples
Found 5713 matching spectrograms for valid dataset out of 5713 samples

Epoch 1/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0227, Train AUC: 0.7451
Val Loss: 0.0151, Val AUC: 0.9315
New best AUC: 0.9315 at epoch 1

Epoch 2/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0141, Train AUC: 0.9166
Val Loss: 0.0128, Val AUC: 0.9470
New best AUC: 0.9470 at epoch 2

Epoch 3/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0116, Train AUC: 0.9546
Val Loss: 0.0118, Val AUC: 0.9592
New best AUC: 0.9592 at epoch 3

Epoch 4/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0097, Train AUC: 0.9733
Val Loss: 0.0113, Val AUC: 0.9636
New best AUC: 0.9636 at epoch 4

Epoch 5/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0079, Train AUC: 0.9866
Val Loss: 0.0114, Val AUC: 0.9642
New best AUC: 0.9642 at epoch 5

Epoch 6/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0061, Train AUC: 0.9932
Val Loss: 0.0112, Val AUC: 0.9672
New best AUC: 0.9672 at epoch 6

Epoch 7/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0044, Train AUC: 0.9972
Val Loss: 0.0118, Val AUC: 0.9631

Epoch 8/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0031, Train AUC: 0.9989
Val Loss: 0.0121, Val AUC: 0.9646

Epoch 9/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0022, Train AUC: 0.9995
Val Loss: 0.0124, Val AUC: 0.9633

Epoch 10/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/358 [00:00<?, ?it/s]

Train Loss: 0.0018, Train AUC: 0.9997
Val Loss: 0.0126, Val AUC: 0.9638

Best AUC for fold 3: 0.9672 at epoch 6

Training set: 22852 samples
Validation set: 5712 samples
Found 22852 matching spectrograms for train dataset out of 22852 samples
Found 5712 matching spectrograms for valid dataset out of 5712 samples

Epoch 1/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/357 [00:00<?, ?it/s]

Train Loss: 0.0214, Train AUC: 0.7860
Val Loss: 0.0148, Val AUC: 0.9252
New best AUC: 0.9252 at epoch 1

Epoch 2/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/357 [00:00<?, ?it/s]

Train Loss: 0.0137, Train AUC: 0.9237
Val Loss: 0.0135, Val AUC: 0.9499
New best AUC: 0.9499 at epoch 2

Epoch 3/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/357 [00:00<?, ?it/s]

Train Loss: 0.0114, Train AUC: 0.9562
Val Loss: 0.0120, Val AUC: 0.9555
New best AUC: 0.9555 at epoch 3

Epoch 4/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/357 [00:00<?, ?it/s]

Train Loss: 0.0095, Train AUC: 0.9737
Val Loss: 0.0119, Val AUC: 0.9574
New best AUC: 0.9574 at epoch 4

Epoch 5/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/357 [00:00<?, ?it/s]

Train Loss: 0.0077, Train AUC: 0.9870
Val Loss: 0.0110, Val AUC: 0.9651
New best AUC: 0.9651 at epoch 5

Epoch 6/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/357 [00:00<?, ?it/s]

Train Loss: 0.0059, Train AUC: 0.9931
Val Loss: 0.0119, Val AUC: 0.9649

Epoch 7/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/357 [00:00<?, ?it/s]

Train Loss: 0.0042, Train AUC: 0.9975
Val Loss: 0.0120, Val AUC: 0.9670
New best AUC: 0.9670 at epoch 7

Epoch 8/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/357 [00:00<?, ?it/s]

Train Loss: 0.0029, Train AUC: 0.9989
Val Loss: 0.0123, Val AUC: 0.9684
New best AUC: 0.9684 at epoch 8

Epoch 9/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/357 [00:00<?, ?it/s]

Train Loss: 0.0020, Train AUC: 0.9996
Val Loss: 0.0128, Val AUC: 0.9671

Epoch 10/10


Training:   0%|          | 0/1428 [00:00<?, ?it/s]

Validation:   0%|          | 0/357 [00:00<?, ?it/s]

Train Loss: 0.0017, Train AUC: 0.9998
Val Loss: 0.0128, Val AUC: 0.9669

Best AUC for fold 4: 0.9684 at epoch 8


# Cross Validation

In [27]:
print("Cross-Validation Results:")
for fold, score in enumerate(best_scores):
    print(f"Fold {cfg.selected_folds[fold]}: {score:.4f}")
print(f"Mean AUC: {np.mean(best_scores):.4f}")

Cross-Validation Results:
Fold 0: 0.9673
Fold 1: 0.9695
Fold 2: 0.9684
Fold 3: 0.9672
Fold 4: 0.9684
Mean AUC: 0.9682
