# **BirdCLEF 2025 Training Notebook**

This is a baseline training pipeline for BirdCLEF 2025 using EfficientNetB0 with PyTorch and Timm(for pretrained EffNet). You can check inference and preprocessing notebooks in the following links: 

- [EfficientNet B0 Pytorch [Inference] | BirdCLEF'25](https://www.kaggle.com/code/kadircandrisolu/efficientnet-b0-pytorch-inference-birdclef-25)

  
- [Transforming Audio-to-Mel Spec. | BirdCLEF'25](https://www.kaggle.com/code/kadircandrisolu/transforming-audio-to-mel-spec-birdclef-25)  

Note that by default this notebook is in Debug Mode, so it will only train the model with 2 epochs, but the [weight](https://www.kaggle.com/datasets/kadircandrisolu/birdclef25-effnetb0-starter-weight) I used in the inference notebook was obtained after 10 epochs of training.

**Features**
* Implement with Pytorch and Timm
* Flexible audio processing with both pre-computed and on-the-fly mel spectrograms
* Stratified 5-fold cross-validation with ensemble capability
* Mixup training for improved generalization
* Spectrogram augmentations (time/frequency masking, brightness adjustment)
* AdamW optimizer with Cosine Annealing LR scheduling
* Debug mode for quick experimentation with smaller datasets

**Pre-computed Spectrograms**
For faster training, you can use pre-computed mel spectrograms from [this dataset](https://www.kaggle.com/datasets/kadircandrisolu/birdclef25-mel-spectrograms) by setting `LOAD_DATA = True`

## Libraries

In [1]:
import os
import logging
import random
import gc
import time
import cv2
import math
import warnings
from pathlib import Path

import numpy as np
import pandas as pd 
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score
import librosa

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.utils.data import Dataset, DataLoader

import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.auto import tqdm

import timm

warnings.filterwarnings("ignore")
logging.basicConfig(level=logging.ERROR)

## Configuration

In [None]:
class CFG:
    
    seed = 42
    debug = False  
    apex = False
    print_freq = 100
    num_workers = 2
    
    OUTPUT_DIR = '/kaggle/working/'

    train_datadir = '/kaggle/input/birdclef-2025/train_audio'
    train_csv = '/kaggle/input/birdclef-2025/train.csv'
    test_soundscapes = '/kaggle/input/birdclef-2025/test_soundscapes'
    submission_csv = '/kaggle/input/birdclef-2025/sample_submission.csv'
    taxonomy_csv = '/kaggle/input/birdclef-2025/taxonomy.csv'
    pretrain_data = "/kaggle/input/pretraining-dataset-ss"
    spectrogram_npy = '/kaggle/input/spectogram-with-start-crop/spectograms.npy'
 
    # model_name = 'efficientnet_b0'  
    model_name = "custom"
    pretrained = True
    in_channels = 1

    LOAD_DATA = True  
    FS = 32000
    TARGET_DURATION = 5.0
    TARGET_SHAPE = (256, 256)
    
    N_FFT = 1024
    HOP_LENGTH = 512
    N_MELS = 128
    FMIN = 50
    FMAX = 14000
    
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    epochs = 10  
    batch_size = 32  
    criterion = 'BCEWithLogitsLoss'

    n_fold = 5
    selected_folds = [0, 1, 2, 3, 4]   

    optimizer = 'AdamW'
    lr = 5e-4 
    weight_decay = 1e-5
  
    scheduler = 'CosineAnnealingLR'
    min_lr = 1e-6
    T_max = epochs

    aug_prob = 0.5  
    mixup_alpha = 0.5  
    
    def update_debug_settings(self):
        if self.debug:
            self.epochs = 2
            self.selected_folds = [0]

cfg = CFG()

## Utilities

In [3]:
def set_seed(seed=42):
    """
    Set seed for reproducibility
    """
    random.seed(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

set_seed(cfg.seed)

## Pre-processing
These functions handle the transformation of audio files to mel spectrograms for model input, with flexibility controlled by the `LOAD_DATA` parameter. The process involves either loading pre-computed spectrograms from this [dataset](https://www.kaggle.com/datasets/kadircandrisolu/birdclef25-mel-spectrograms) (when `LOAD_DATA=True`) or dynamically generating them (when `LOAD_DATA=False`), transforming audio data into spectrogram representations, and preparing it for the neural network.

In [4]:
def audio2melspec(audio_data, cfg):
    """Convert audio data to mel spectrogram"""
    if np.isnan(audio_data).any():
        mean_signal = np.nanmean(audio_data)
        audio_data = np.nan_to_num(audio_data, nan=mean_signal)

    mel_spec = librosa.feature.melspectrogram(
        y=audio_data,
        sr=cfg.FS,
        n_fft=cfg.N_FFT,
        hop_length=cfg.HOP_LENGTH,
        n_mels=cfg.N_MELS,
        fmin=cfg.FMIN,
        fmax=cfg.FMAX,
        power=2.0
    )

    mel_spec_db = librosa.power_to_db(mel_spec, ref=np.max)
    mel_spec_norm = (mel_spec_db - mel_spec_db.min()) / (mel_spec_db.max() - mel_spec_db.min() + 1e-8)
    
    return mel_spec_norm

def process_audio_file(audio_path, cfg):
    """Process a single audio file to get the mel spectrogram"""
    try:
        audio_data, _ = librosa.load(audio_path, sr=cfg.FS)

        target_samples = int(cfg.TARGET_DURATION * cfg.FS)

        if len(audio_data) < target_samples:
            n_copy = math.ceil(target_samples / len(audio_data))
            if n_copy > 1:
                audio_data = np.concatenate([audio_data] * n_copy)

        # Extract center 5 seconds
        start_idx = max(0, int(len(audio_data) / 2 - target_samples / 2))
        end_idx = min(len(audio_data), start_idx + target_samples)
        # center_audio = audio_data[start_idx:end_idx]
        center_audio = audio_data[0:160000]
        

        if len(center_audio) < target_samples:
            center_audio = np.pad(center_audio, 
                                 (0, target_samples - len(center_audio)), 
                                 mode='constant')

        mel_spec = audio2melspec(center_audio, cfg)
        
        if mel_spec.shape != cfg.TARGET_SHAPE:
            mel_spec = cv2.resize(mel_spec, cfg.TARGET_SHAPE, interpolation=cv2.INTER_LINEAR)

        return mel_spec.astype(np.float32)
        
    except Exception as e:
        print(f"Error processing {audio_path}: {e}")
        return None

def generate_spectrograms(df, cfg):
    """Generate spectrograms from audio files"""
    print("Generating mel spectrograms from audio files...")
    start_time = time.time()

    all_bird_data = {}
    errors = []

    for i, row in tqdm(df.iterrows(), total=len(df)):
        if cfg.debug and i >= 1000:
            break
        
        try:
            samplename = row['samplename']
            filepath = row['filepath']
            
            mel_spec = process_audio_file(filepath, cfg)
            
            if mel_spec is not None:
                all_bird_data[samplename] = mel_spec
            
        except Exception as e:
            print(f"Error processing {row.filepath}: {e}")
            errors.append((row.filepath, str(e)))

    end_time = time.time()
    print(f"Processing completed in {end_time - start_time:.2f} seconds")
    print(f"Successfully processed {len(all_bird_data)} files out of {len(df)}")
    print(f"Failed to process {len(errors)} files")
    
    return all_bird_data

## Dataset Preparation and Data Augmentations
We'll convert audio to mel spectrograms and apply random augmentations with 50% probability each - including time stretching, pitch shifting, and volume adjustments. This randomized approach creates diverse training samples from the same audio files

In [5]:
class BirdCLEFDatasetFromNPY(Dataset):
    def __init__(self, df, cfg, spectrograms=None, mode="train"):
        self.df = df
        self.cfg = cfg
        self.mode = mode

        self.spectrograms = spectrograms
        
        taxonomy_df = pd.read_csv(self.cfg.taxonomy_csv)
        self.species_ids = taxonomy_df['primary_label'].tolist()
        self.num_classes = len(self.species_ids)
        self.label_to_idx = {label: idx for idx, label in enumerate(self.species_ids)}

        if 'filepath' not in self.df.columns:
            self.df['filepath'] = self.cfg.train_datadir + '/' + self.df.filename
        
        if 'samplename' not in self.df.columns:
            self.df['samplename'] = self.df.filename.map(lambda x: x.split('/')[0] + '-' + x.split('/')[-1].split('.')[0])

        sample_names = set(self.df['samplename'])
        if self.spectrograms:
            found_samples = sum(1 for name in sample_names if name in self.spectrograms)
            print(f"Found {found_samples} matching spectrograms for {mode} dataset out of {len(self.df)} samples")
        
        if cfg.debug:
            self.df = self.df.sample(min(1000, len(self.df)), random_state=cfg.seed).reset_index(drop=True)
    
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        samplename = row['samplename']
        spec = None

        if self.spectrograms and samplename in self.spectrograms:
            spec = self.spectrograms[samplename]
        elif not self.cfg.LOAD_DATA:
            spec = process_audio_file(row['filepath'], self.cfg)

        if spec is None:
            spec = np.zeros(self.cfg.TARGET_SHAPE, dtype=np.float32)
            if self.mode == "train":  # Only print warning during training
                print(f"Warning: Spectrogram for {samplename} not found and could not be generated")

        spec = torch.tensor(spec, dtype=torch.float32).unsqueeze(0)  # Add channel dimension

        if self.mode == "train" and random.random() < self.cfg.aug_prob:
            spec = self.apply_spec_augmentations(spec)
        
        target = self.encode_label(row['primary_label'])
        
        if 'secondary_labels' in row and row['secondary_labels'] not in [[''], None, np.nan]:
            if isinstance(row['secondary_labels'], str):
                secondary_labels = eval(row['secondary_labels'])
            else:
                secondary_labels = row['secondary_labels']
            
            for label in secondary_labels:
                if label in self.label_to_idx:
                    target[self.label_to_idx[label]] = 1.0
        
        return {
            'melspec': spec, 
            'target': torch.tensor(target, dtype=torch.float32),
            'filename': row['filename']
        }
    
    def apply_spec_augmentations(self, spec):
        """Apply augmentations to spectrogram"""
    
        # Time masking (horizontal stripes)
        if random.random() < 0.5:
            num_masks = random.randint(1, 3)
            for _ in range(num_masks):
                width = random.randint(5, 20)
                start = random.randint(0, spec.shape[2] - width)
                spec[0, :, start:start+width] = 0
        
        # Frequency masking (vertical stripes)
        if random.random() < 0.5:
            num_masks = random.randint(1, 3)
            for _ in range(num_masks):
                height = random.randint(5, 20)
                start = random.randint(0, spec.shape[1] - height)
                spec[0, start:start+height, :] = 0
        
        # Random brightness/contrast
        if random.random() < 0.5:
            gain = random.uniform(0.8, 1.2)
            bias = random.uniform(-0.1, 0.1)
            spec = spec * gain + bias
            spec = torch.clamp(spec, 0, 1) 
            
        return spec
    
    def encode_label(self, label):
        """Encode label to one-hot vector"""
        target = np.zeros(self.num_classes)
        if label in self.label_to_idx:
            target[self.label_to_idx[label]] = 1.0
        return target

In [6]:
def collate_fn(batch):
    """Custom collate function to handle different sized spectrograms"""
    batch = [item for item in batch if item is not None]
    if len(batch) == 0:
        return {}
        
    result = {key: [] for key in batch[0].keys()}
    
    for item in batch:
        for key, value in item.items():
            result[key].append(value)
    
    for key in result:
        if key == 'target' and isinstance(result[key][0], torch.Tensor):
            result[key] = torch.stack(result[key])
        elif key == 'melspec' and isinstance(result[key][0], torch.Tensor):
            shapes = [t.shape for t in result[key]]
            if len(set(str(s) for s in shapes)) == 1:
                result[key] = torch.stack(result[key])
    
    return result

## Model Definition

In [7]:
def create_custom_encoder():
    model = EfficientNetMaskedModel()
    checkpoint_path = "/kaggle/input/preatraining_epcoh20/pytorch/default/1/best_model_9.pth"  # Change this to your checkpoint path
    checkpoint = torch.load(checkpoint_path, map_location="cpu") 
    model.load_state_dict(checkpoint, strict=False)
    return model.encoder

In [8]:
class BirdCLEFModel(nn.Module):
    def __init__(self, cfg):
        super().__init__()
        self.cfg = cfg
        
        taxonomy_df = pd.read_csv(cfg.taxonomy_csv)
        cfg.num_classes = len(taxonomy_df)

        if cfg.model_name!= 'custom':
            self.backbone = timm.create_model(
                cfg.model_name,
                pretrained=cfg.pretrained,
                in_chans=cfg.in_channels,
                drop_rate=0.2,
                drop_path_rate=0.2
            )
        else:
            self.backbone = create_custom_encoder()
        
        if 'efficientnet' in cfg.model_name:
            backbone_out = self.backbone.classifier.in_features
            self.backbone.classifier = nn.Identity()
        elif 'resnet' in cfg.model_name:
            backbone_out = self.backbone.fc.in_features
            self.backbone.fc = nn.Identity()
        elif 'custom' in cfg.model_name:
            print("Using custom pretrained model")
            backbone_out = 1280
        else:
            backbone_out = self.backbone.get_classifier().in_features
            self.backbone.reset_classifier(0, '')
        
        self.pooling = nn.AdaptiveAvgPool2d(1)
            
        self.feat_dim = backbone_out
        
        self.classifier = nn.Linear(backbone_out, cfg.num_classes)
        
        self.mixup_enabled = hasattr(cfg, 'mixup_alpha') and cfg.mixup_alpha > 0
        if self.mixup_enabled:
            self.mixup_alpha = cfg.mixup_alpha
            
    def forward(self, x, targets=None):
    
        if self.training and self.mixup_enabled and targets is not None:
            mixed_x, targets_a, targets_b, lam = self.mixup_data(x, targets)
            x = mixed_x
        else:
            targets_a, targets_b, lam = None, None, None
        
        features = self.backbone(x)
        
        if isinstance(features, dict):
            features = features['features']
            
        if len(features.shape) == 4:
            features = self.pooling(features)
            features = features.view(features.size(0), -1)
        
        logits = self.classifier(features)
        
        if self.training and self.mixup_enabled and targets is not None:
            loss = self.mixup_criterion(F.binary_cross_entropy_with_logits, 
                                       logits, targets_a, targets_b, lam)
            return logits, loss
            
        return logits
    
    def mixup_data(self, x, targets):
        """Applies mixup to the data batch"""
        batch_size = x.size(0)

        lam = np.random.beta(self.mixup_alpha, self.mixup_alpha)

        indices = torch.randperm(batch_size).to(x.device)

        mixed_x = lam * x + (1 - lam) * x[indices]
        
        return mixed_x, targets, targets[indices], lam
    
    def mixup_criterion(self, criterion, pred, y_a, y_b, lam):
        """Applies mixup to the loss function"""
        return lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)

## Training Utilities
We are configuring our optimization strategy with the AdamW optimizer, cosine scheduling, and the BCEWithLogitsLoss criterion.

In [9]:
def get_optimizer(model, cfg):
  
    if cfg.optimizer == 'Adam':
        optimizer = optim.Adam(
            model.parameters(),
            lr=cfg.lr,
            weight_decay=cfg.weight_decay
        )
    elif cfg.optimizer == 'AdamW':
        optimizer = optim.AdamW(
            model.parameters(),
            lr=cfg.lr,
            weight_decay=cfg.weight_decay
        )
    elif cfg.optimizer == 'SGD':
        optimizer = optim.SGD(
            model.parameters(),
            lr=cfg.lr,
            momentum=0.9,
            weight_decay=cfg.weight_decay
        )
    else:
        raise NotImplementedError(f"Optimizer {cfg.optimizer} not implemented")
        
    return optimizer

def get_scheduler(optimizer, cfg):
   
    if cfg.scheduler == 'CosineAnnealingLR':
        scheduler = lr_scheduler.CosineAnnealingLR(
            optimizer,
            T_max=cfg.T_max,
            eta_min=cfg.min_lr
        )
    elif cfg.scheduler == 'ReduceLROnPlateau':
        scheduler = lr_scheduler.ReduceLROnPlateau(
            optimizer,
            mode='min',
            factor=0.5,
            patience=2,
            min_lr=cfg.min_lr,
            verbose=True
        )
    elif cfg.scheduler == 'StepLR':
        scheduler = lr_scheduler.StepLR(
            optimizer,
            step_size=cfg.epochs // 3,
            gamma=0.5
        )
    elif cfg.scheduler == 'OneCycleLR':
        scheduler = None  
    else:
        scheduler = None
        
    return scheduler

def get_criterion(cfg):
 
    if cfg.criterion == 'BCEWithLogitsLoss':
        criterion = nn.BCEWithLogitsLoss()
    else:
        raise NotImplementedError(f"Criterion {cfg.criterion} not implemented")
        
    return criterion

## Training Loop

In [10]:
def train_one_epoch(model, loader, optimizer, criterion, device, scheduler=None):
    
    model.train()
    losses = []
    all_targets = []
    all_outputs = []
    
    pbar = tqdm(enumerate(loader), total=len(loader), desc="Training")
    
    for step, batch in pbar:
    
        if isinstance(batch['melspec'], list):
            batch_outputs = []
            batch_losses = []
            
            for i in range(len(batch['melspec'])):
                inputs = batch['melspec'][i].unsqueeze(0).to(device)
                target = batch['target'][i].unsqueeze(0).to(device)
                
                optimizer.zero_grad()
                output = model(inputs)
                loss = criterion(output, target)
                loss.backward()
                
                batch_outputs.append(output.detach().cpu())
                batch_losses.append(loss.item())
            
            optimizer.step()
            outputs = torch.cat(batch_outputs, dim=0).numpy()
            loss = np.mean(batch_losses)
            targets = batch['target'].numpy()
            
        else:
            inputs = batch['melspec'].to(device)
            targets = batch['target'].to(device)
            
            optimizer.zero_grad()
            outputs = model(inputs)
            
            if isinstance(outputs, tuple):
                outputs, loss = outputs  
            else:
                loss = criterion(outputs, targets)
                
            loss.backward()
            optimizer.step()
            
            outputs = outputs.detach().cpu().numpy()
            targets = targets.detach().cpu().numpy()
        
        if scheduler is not None and isinstance(scheduler, lr_scheduler.OneCycleLR):
            scheduler.step()
            
        all_outputs.append(outputs)
        all_targets.append(targets)
        losses.append(loss if isinstance(loss, float) else loss.item())
        
        pbar.set_postfix({
            'train_loss': np.mean(losses[-10:]) if losses else 0,
            'lr': optimizer.param_groups[0]['lr']
        })
    
    all_outputs = np.concatenate(all_outputs)
    all_targets = np.concatenate(all_targets)
    auc = calculate_auc(all_targets, all_outputs)
    avg_loss = np.mean(losses)
    
    return avg_loss, auc

def validate(model, loader, criterion, device):
   
    model.eval()
    losses = []
    all_targets = []
    all_outputs = []
    
    with torch.no_grad():
        for batch in tqdm(loader, desc="Validation"):
            if isinstance(batch['melspec'], list):
                batch_outputs = []
                batch_losses = []
                
                for i in range(len(batch['melspec'])):
                    inputs = batch['melspec'][i].unsqueeze(0).to(device)
                    target = batch['target'][i].unsqueeze(0).to(device)
                    
                    output = model(inputs)
                    loss = criterion(output, target)
                    
                    batch_outputs.append(output.detach().cpu())
                    batch_losses.append(loss.item())
                
                outputs = torch.cat(batch_outputs, dim=0).numpy()
                loss = np.mean(batch_losses)
                targets = batch['target'].numpy()
                
            else:
                inputs = batch['melspec'].to(device)
                targets = batch['target'].to(device)
                
                outputs = model(inputs)
                loss = criterion(outputs, targets)
                
                outputs = outputs.detach().cpu().numpy()
                targets = targets.detach().cpu().numpy()
            
            all_outputs.append(outputs)
            all_targets.append(targets)
            losses.append(loss if isinstance(loss, float) else loss.item())
    
    all_outputs = np.concatenate(all_outputs)
    all_targets = np.concatenate(all_targets)
    
    auc = calculate_auc(all_targets, all_outputs)
    avg_loss = np.mean(losses)
    
    return avg_loss, auc

def calculate_auc(targets, outputs):
  
    num_classes = targets.shape[1]
    aucs = []
    
    probs = 1 / (1 + np.exp(-outputs))
    
    for i in range(num_classes):
        
        if np.sum(targets[:, i]) > 0:
            class_auc = roc_auc_score(targets[:, i], probs[:, i])
            aucs.append(class_auc)
    
    return np.mean(aucs) if aucs else 0.0

In [11]:
pre_train = False

### Pretraining

In [12]:
import torch
import torch.nn as nn
from tqdm import tqdm
from IPython.display import clear_output

import torch.optim as optim
from torchvision import transforms, models
from torch.utils.data import Dataset, DataLoader, random_split
import os
import random
import numpy as np
from PIL import Image
import torch.nn.functional as F

# Define transformation for spectrogram images
transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5], std=[0.5])
])

# Dataset to load spectrograms and apply masking
class SpectrogramDataset(Dataset):
    def __init__(self, folder_path, transform=None, mask_ratio=0.5, patch_size=16):
        self.folder_path = folder_path
        self.transform = transform
        self.mask_ratio = mask_ratio
        self.patch_size = patch_size
        self.image_paths = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if f.endswith('.png')]


    def mask_spectrogram(self, img):
        """Randomly mask patches of the spectrogram"""
        c, h, w = img.shape
        num_patches_h = h // self.patch_size
        num_patches_w = w // self.patch_size
    
        # Create binary mask with patches
        mask = torch.ones((num_patches_h, num_patches_w))
        num_masked = int(self.mask_ratio * num_patches_h * num_patches_w)
        masked_indices = random.sample(range(num_patches_h * num_patches_w), num_masked)
    
        for idx in masked_indices:
            i, j = divmod(idx, num_patches_w)
            mask[i, j] = 0  # Set masked patches to 0
    
        # Resize mask to match image size
        mask = mask.unsqueeze(0).unsqueeze(0)  # Shape (1,1,H,W)
        mask = F.interpolate(mask, size=(h, w), mode="bilinear", align_corners=False)  # Resize smoothly
        mask = mask.squeeze(0).squeeze(0)  # Remove extra dimensions
    
        return img * mask, mask



    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        img = Image.open(img_path).convert("L")  # Convert to grayscale
        if self.transform:
            img = self.transform(img)
        masked_img, mask = self.mask_spectrogram(img)
        return masked_img, img, mask  # Masked spectrogram, original spectrogram, mask

# Define EfficientNet-B0 based encoder
class EfficientNetMaskedModel(nn.Module):
    def __init__(self, pretrained=True):
        super(EfficientNetMaskedModel, self).__init__()
        self.encoder = models.efficientnet_b0(pretrained=pretrained)
        
        # Modify first convolution layer to accept 1-channel input
        self.encoder.features[0][0] = nn.Conv2d(1, 32, kernel_size=3, stride=2, padding=1, bias=False)
        
        self.encoder.classifier = nn.Identity()  # Remove classification head
        
        self.decoder = nn.Sequential(
            nn.Conv2d(1280, 512, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(512, 256, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(256, 128, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(128, 64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(64, 32, kernel_size=4, stride=2, padding=1),  # 7x7 → 14x14
            nn.ReLU(),
            nn.ConvTranspose2d(32, 16, kernel_size=4, stride=2, padding=1),  # 14x14 → 28x28
            nn.ReLU(),
            nn.ConvTranspose2d(16, 8, kernel_size=4, stride=2, padding=1),   # 28x28 → 56x56
            nn.ReLU(),
            nn.ConvTranspose2d(8, 4, kernel_size=4, stride=2, padding=1),    # 56x56 → 112x112
            nn.ReLU(),
            nn.ConvTranspose2d(4, 1, kernel_size=4, stride=2, padding=1)     # 112x112 → 224x224
        )


    def forward(self, x):
        encoded = self.encoder.features(x)  # Extract features
        reconstructed = self.decoder(encoded)  # Reconstruct masked spectrogram
        return reconstructed



# Training setup with validation loop
def pre_train_model(data_folder, epochs=10, batch_size=16, lr=1e-3, val_split=0.2,model=None):
    dataset = SpectrogramDataset(data_folder, transform)
    
    # Split dataset into train and validation sets
    train_size = int((1 - val_split) * len(dataset))
    val_size = len(dataset) - train_size
    train_dataset, val_dataset = random_split(dataset, [train_size, val_size])

    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

    if model==None:
        model = EfficientNetMaskedModel(pretrained=True).cuda()
    criterion = nn.MSELoss()
    optimizer = optim.AdamW(model.parameters(), lr=lr)

    best_val_loss = float('inf')

    for epoch in range(epochs):
        # Training phase
        model.train()
        train_loss = 0
        progress_bar = tqdm(train_loader, desc=f"Epoch {epoch+1}/{epochs}", leave=True)
        for masked_spectrograms, original_spectrograms, _ in progress_bar:
            masked_spectrograms, original_spectrograms = masked_spectrograms.cuda(), original_spectrograms.cuda()
            optimizer.zero_grad()
            reconstructed = model(masked_spectrograms)
            # print(reconstructed.shape,original_spectrograms.shape)
            loss = criterion(reconstructed, original_spectrograms)
            loss.backward()
            optimizer.step()
            train_loss += loss.item()
            progress_bar.set_postfix({"Batch Loss": loss.item()})

        train_loss /= len(train_loader)

        # Validation phase
        model.eval()
        val_loss = 0
        with torch.no_grad():
            for masked_spectrograms, original_spectrograms, _ in val_loader:
                masked_spectrograms, original_spectrograms = masked_spectrograms.cuda(), original_spectrograms.cuda()
                reconstructed = model(masked_spectrograms)
                loss = criterion(reconstructed, original_spectrograms)
                val_loss += loss.item()

        val_loss /= len(val_loader)
        clear_output()
        # Save best model
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            torch.save(model.state_dict(), "best_model.pth")
            print(f"Best model saved at epoch {epoch + 1}")
        
        print(f"Epoch {epoch + 1}/{epochs}, Train Loss: {train_loss:.6f}, Val Loss: {val_loss:.6f}")

    return model

# Run training


## Training!

In [13]:


def run_training(df, cfg):
    """Training function that can either use pre-computed spectrograms or generate them on-the-fly"""

    taxonomy_df = pd.read_csv(cfg.taxonomy_csv)
    species_ids = taxonomy_df['primary_label'].tolist()
    cfg.num_classes = len(species_ids)
    
    if cfg.debug:
        cfg.update_debug_settings()

    spectrograms = None
    if cfg.LOAD_DATA:
        print("Loading pre-computed mel spectrograms from NPY file...")
        try:
            spectrograms = np.load(cfg.spectrogram_npy, allow_pickle=True).item()
            print(f"Loaded {len(spectrograms)} pre-computed mel spectrograms")
        except Exception as e:
            print(f"Error loading pre-computed spectrograms: {e}")
            print("Will generate spectrograms on-the-fly instead.")
            cfg.LOAD_DATA = False
    
    if not cfg.LOAD_DATA:
        print("Will generate spectrograms on-the-fly during training.")
        if 'filepath' not in df.columns:
            df['filepath'] = cfg.train_datadir + '/' + df.filename
        if 'samplename' not in df.columns:
            df['samplename'] = df.filename.map(lambda x: x.split('/')[0] + '-' + x.split('/')[-1].split('.')[0])
        
    skf = StratifiedKFold(n_splits=cfg.n_fold, shuffle=True, random_state=cfg.seed)
    
    best_scores = []
    
    for fold, (train_idx, val_idx) in enumerate(skf.split(df, df['primary_label'])):
        if fold not in cfg.selected_folds:
            continue
            
        print(f'\n{"="*30} Fold {fold} {"="*30}')
        
        train_df = df.iloc[train_idx].reset_index(drop=True)
        val_df = df.iloc[val_idx].reset_index(drop=True)
        
        print(f'Training set: {len(train_df)} samples')
        print(f'Validation set: {len(val_df)} samples')
        
        train_dataset = BirdCLEFDatasetFromNPY(train_df, cfg, spectrograms=spectrograms, mode='train')
        val_dataset = BirdCLEFDatasetFromNPY(val_df, cfg, spectrograms=spectrograms, mode='valid')
        
        train_loader = DataLoader(
            train_dataset, 
            batch_size=cfg.batch_size, 
            shuffle=True, 
            num_workers=cfg.num_workers,
            pin_memory=True,
            collate_fn=collate_fn,
            drop_last=True
        )
        
        val_loader = DataLoader(
            val_dataset, 
            batch_size=cfg.batch_size, 
            shuffle=False, 
            num_workers=cfg.num_workers,
            pin_memory=True,
            collate_fn=collate_fn
        )
        
        
        # model = EfficientNetMaskedModel().to(cfg.device)
        # checkpoint_path = "/kaggle/input/pretraining_module/pytorch/default/1/best_model.pth"  # Change this to your checkpoint path
        # checkpoint = torch.load(checkpoint_path) 
        # model.load_state_dict(checkpoint, strict=False)
        # model = model.encoder
        # model.classifier = nn.Linear(256,num_classes)

        # model  = create_custom_encoder().to(cfg.device)
        model = BirdCLEFModel(cfg).to(cfg.device)

        # if pre_train:
        #     # model = pre_train_model(model)
        #     data_folder = "/kaggle/input/pretraining-dataset-ss"
        #     model = pre_train_model(data_folder,batch_size = 128,model=model)
        optimizer = get_optimizer(model, cfg)
        criterion = get_criterion(cfg)
        
        if cfg.scheduler == 'OneCycleLR':
            scheduler = lr_scheduler.OneCycleLR(
                optimizer,
                max_lr=cfg.lr,
                steps_per_epoch=len(train_loader),
                epochs=cfg.epochs,
                pct_start=0.1
            )
        else:
            scheduler = get_scheduler(optimizer, cfg)
        
        best_auc = 0
        best_epoch = 0
        
        for epoch in range(cfg.epochs):
            print(f"\nEpoch {epoch+1}/{cfg.epochs}")
            
            train_loss, train_auc = train_one_epoch(
                model, 
                train_loader, 
                optimizer, 
                criterion, 
                cfg.device,
                scheduler if isinstance(scheduler, lr_scheduler.OneCycleLR) else None
            )
            
            val_loss, val_auc = validate(model, val_loader, criterion, cfg.device)

            if scheduler is not None and not isinstance(scheduler, lr_scheduler.OneCycleLR):
                if isinstance(scheduler, lr_scheduler.ReduceLROnPlateau):
                    scheduler.step(val_loss)
                else:
                    scheduler.step()

            print(f"Train Loss: {train_loss:.4f}, Train AUC: {train_auc:.4f}")
            print(f"Val Loss: {val_loss:.4f}, Val AUC: {val_auc:.4f}")
            
            if val_auc > best_auc:
                best_auc = val_auc
                best_epoch = epoch + 1
                print(f"New best AUC: {best_auc:.4f} at epoch {best_epoch}")

                torch.save({
                    'model_state_dict': model.state_dict(),
                    'optimizer_state_dict': optimizer.state_dict(),
                    'scheduler_state_dict': scheduler.state_dict() if scheduler else None,
                    'epoch': epoch,
                    'val_auc': val_auc,
                    'train_auc': train_auc,
                    'cfg': cfg
                }, f"model_fold{fold}.pth")
        
        best_scores.append(best_auc)
        print(f"\nBest AUC for fold {fold}: {best_auc:.4f} at epoch {best_epoch}")
        
        # Clear memory
        del model, optimizer, scheduler, train_loader, val_loader
        torch.cuda.empty_cache()
        gc.collect()
    
    print("\n" + "="*60)
    print("Cross-Validation Results:")
    for fold, score in enumerate(best_scores):
        print(f"Fold {cfg.selected_folds[fold]}: {score:.4f}")
    print(f"Mean AUC: {np.mean(best_scores):.4f}")
    print("="*60)

In [14]:
# model = EfficientNetMaskedModel()
# checkpoint_path = "/kaggle/input/pretraining_module/pytorch/default/1/best_model.pth"  # Change this to your checkpoint path
# checkpoint = torch.load(checkpoint_path, map_location="cpu") 
# model.load_state_dict(checkpoint, strict=False)

In [15]:
# taxonomy_df = pd.read_csv(cfg.taxonomy_csv)
# species_ids = taxonomy_df['primary_label'].tolist()
# num_classes = len(species_ids)
# model.encoder.classifier = nn.Linear(256,num_classes)

In [16]:
if __name__ == "__main__":
    import time
    
    print("\nLoading training data...")
    train_df = pd.read_csv(cfg.train_csv)
    taxonomy_df = pd.read_csv(cfg.taxonomy_csv)

    print("\nStarting training...")
    print(f"LOAD_DATA is set to {cfg.LOAD_DATA}")
    if cfg.LOAD_DATA:
        print("Using pre-computed mel spectrograms from NPY file")
    else:
        print("Will generate spectrograms on-the-fly during training")
    
    run_training(train_df, cfg)
    
    print("\nTraining complete!")


Loading training data...

Starting training...
LOAD_DATA is set to True
Using pre-computed mel spectrograms from NPY file
Loading pre-computed mel spectrograms from NPY file...
Loaded 28564 pre-computed mel spectrograms

Training set: 22851 samples
Validation set: 5713 samples
Found 22851 matching spectrograms for train dataset out of 22851 samples
Found 5713 matching spectrograms for valid dataset out of 5713 samples


Downloading: "https://download.pytorch.org/models/efficientnet_b0_rwightman-7f5810bc.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_b0_rwightman-7f5810bc.pth
100%|██████████| 20.5M/20.5M [00:00<00:00, 170MB/s]


Using custom pretrained model

Epoch 1/10


Training: 100%|██████████| 714/714 [01:39<00:00,  7.18it/s, train_loss=0.0322, lr=0.0005]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.05it/s]


Train Loss: 0.0418, Train AUC: 0.5103
Val Loss: 0.0306, Val AUC: 0.5761
New best AUC: 0.5761 at epoch 1

Epoch 2/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0257, lr=0.000488]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.19it/s]


Train Loss: 0.0282, Train AUC: 0.6864
Val Loss: 0.0251, Val AUC: 0.8266
New best AUC: 0.8266 at epoch 2

Epoch 3/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0233, lr=0.000452]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.17it/s]


Train Loss: 0.0238, Train AUC: 0.8330
Val Loss: 0.0218, Val AUC: 0.8895
New best AUC: 0.8895 at epoch 3

Epoch 4/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0225, lr=0.000397]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.19it/s]


Train Loss: 0.0214, Train AUC: 0.8870
Val Loss: 0.0205, Val AUC: 0.9077
New best AUC: 0.9077 at epoch 4

Epoch 5/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0184, lr=0.000328]
Validation: 100%|██████████| 179/179 [00:05<00:00, 30.93it/s]


Train Loss: 0.0196, Train AUC: 0.9156
Val Loss: 0.0189, Val AUC: 0.9195
New best AUC: 0.9195 at epoch 5

Epoch 6/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.019, lr=0.000251] 
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.26it/s]


Train Loss: 0.0181, Train AUC: 0.9337
Val Loss: 0.0179, Val AUC: 0.9261
New best AUC: 0.9261 at epoch 6

Epoch 7/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0164, lr=0.000173]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.15it/s]


Train Loss: 0.0169, Train AUC: 0.9470
Val Loss: 0.0174, Val AUC: 0.9290
New best AUC: 0.9290 at epoch 7

Epoch 8/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.016, lr=0.000104] 
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.16it/s]


Train Loss: 0.0159, Train AUC: 0.9548
Val Loss: 0.0170, Val AUC: 0.9327
New best AUC: 0.9327 at epoch 8

Epoch 9/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0136, lr=4.87e-5]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.20it/s]


Train Loss: 0.0153, Train AUC: 0.9596
Val Loss: 0.0168, Val AUC: 0.9330
New best AUC: 0.9330 at epoch 9

Epoch 10/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0154, lr=1.32e-5]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.20it/s]


Train Loss: 0.0150, Train AUC: 0.9633
Val Loss: 0.0167, Val AUC: 0.9334
New best AUC: 0.9334 at epoch 10

Best AUC for fold 0: 0.9334 at epoch 10

Training set: 22851 samples
Validation set: 5713 samples
Found 22851 matching spectrograms for train dataset out of 22851 samples
Found 5713 matching spectrograms for valid dataset out of 5713 samples
Using custom pretrained model

Epoch 1/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0301, lr=0.0005]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.12it/s]


Train Loss: 0.0419, Train AUC: 0.5092
Val Loss: 0.0305, Val AUC: 0.6112
New best AUC: 0.6112 at epoch 1

Epoch 2/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0252, lr=0.000488]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.00it/s]


Train Loss: 0.0285, Train AUC: 0.6739
Val Loss: 0.0251, Val AUC: 0.8375
New best AUC: 0.8375 at epoch 2

Epoch 3/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.022, lr=0.000452] 
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.13it/s]


Train Loss: 0.0242, Train AUC: 0.8299
Val Loss: 0.0219, Val AUC: 0.8940
New best AUC: 0.8940 at epoch 3

Epoch 4/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0212, lr=0.000397]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.16it/s]


Train Loss: 0.0217, Train AUC: 0.8880
Val Loss: 0.0200, Val AUC: 0.9136
New best AUC: 0.9136 at epoch 4

Epoch 5/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0205, lr=0.000328]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.13it/s]


Train Loss: 0.0198, Train AUC: 0.9127
Val Loss: 0.0188, Val AUC: 0.9235
New best AUC: 0.9235 at epoch 5

Epoch 6/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0175, lr=0.000251]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.08it/s]


Train Loss: 0.0183, Train AUC: 0.9300
Val Loss: 0.0177, Val AUC: 0.9336
New best AUC: 0.9336 at epoch 6

Epoch 7/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0165, lr=0.000173]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.10it/s]


Train Loss: 0.0170, Train AUC: 0.9465
Val Loss: 0.0172, Val AUC: 0.9368
New best AUC: 0.9368 at epoch 7

Epoch 8/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0158, lr=0.000104]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.22it/s]


Train Loss: 0.0160, Train AUC: 0.9560
Val Loss: 0.0166, Val AUC: 0.9395
New best AUC: 0.9395 at epoch 8

Epoch 9/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0156, lr=4.87e-5]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.11it/s]


Train Loss: 0.0154, Train AUC: 0.9611
Val Loss: 0.0164, Val AUC: 0.9398
New best AUC: 0.9398 at epoch 9

Epoch 10/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0156, lr=1.32e-5]
Validation: 100%|██████████| 179/179 [00:05<00:00, 30.71it/s]


Train Loss: 0.0150, Train AUC: 0.9636
Val Loss: 0.0164, Val AUC: 0.9402
New best AUC: 0.9402 at epoch 10

Best AUC for fold 1: 0.9402 at epoch 10

Training set: 22851 samples
Validation set: 5713 samples
Found 22851 matching spectrograms for train dataset out of 22851 samples
Found 5713 matching spectrograms for valid dataset out of 5713 samples
Using custom pretrained model

Epoch 1/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0299, lr=0.0005]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.13it/s]


Train Loss: 0.0416, Train AUC: 0.4976
Val Loss: 0.0303, Val AUC: 0.6095
New best AUC: 0.6095 at epoch 1

Epoch 2/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0263, lr=0.000488]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.12it/s]


Train Loss: 0.0278, Train AUC: 0.7018
Val Loss: 0.0247, Val AUC: 0.8276
New best AUC: 0.8276 at epoch 2

Epoch 3/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0226, lr=0.000452]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.16it/s]


Train Loss: 0.0236, Train AUC: 0.8307
Val Loss: 0.0216, Val AUC: 0.8953
New best AUC: 0.8953 at epoch 3

Epoch 4/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0212, lr=0.000397]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.02it/s]


Train Loss: 0.0211, Train AUC: 0.8893
Val Loss: 0.0199, Val AUC: 0.9152
New best AUC: 0.9152 at epoch 4

Epoch 5/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0188, lr=0.000328]
Validation: 100%|██████████| 179/179 [00:05<00:00, 30.82it/s]


Train Loss: 0.0193, Train AUC: 0.9203
Val Loss: 0.0184, Val AUC: 0.9294
New best AUC: 0.9294 at epoch 5

Epoch 6/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0166, lr=0.000251]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.09it/s]


Train Loss: 0.0177, Train AUC: 0.9379
Val Loss: 0.0174, Val AUC: 0.9367
New best AUC: 0.9367 at epoch 6

Epoch 7/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.015, lr=0.000173] 
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.06it/s]


Train Loss: 0.0164, Train AUC: 0.9512
Val Loss: 0.0169, Val AUC: 0.9399
New best AUC: 0.9399 at epoch 7

Epoch 8/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.25it/s, train_loss=0.0154, lr=0.000104]
Validation: 100%|██████████| 179/179 [00:05<00:00, 30.84it/s]


Train Loss: 0.0154, Train AUC: 0.9590
Val Loss: 0.0165, Val AUC: 0.9431
New best AUC: 0.9431 at epoch 8

Epoch 9/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.014, lr=4.87e-5] 
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.09it/s]


Train Loss: 0.0148, Train AUC: 0.9635
Val Loss: 0.0163, Val AUC: 0.9434
New best AUC: 0.9434 at epoch 9

Epoch 10/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0136, lr=1.32e-5]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.18it/s]


Train Loss: 0.0144, Train AUC: 0.9668
Val Loss: 0.0162, Val AUC: 0.9441
New best AUC: 0.9441 at epoch 10

Best AUC for fold 2: 0.9441 at epoch 10

Training set: 22851 samples
Validation set: 5713 samples
Found 22851 matching spectrograms for train dataset out of 22851 samples
Found 5713 matching spectrograms for valid dataset out of 5713 samples
Using custom pretrained model

Epoch 1/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.25it/s, train_loss=0.0302, lr=0.0005]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.09it/s]


Train Loss: 0.0419, Train AUC: 0.5084
Val Loss: 0.0303, Val AUC: 0.6384
New best AUC: 0.6384 at epoch 1

Epoch 2/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0256, lr=0.000488]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.11it/s]


Train Loss: 0.0274, Train AUC: 0.7145
Val Loss: 0.0246, Val AUC: 0.8518
New best AUC: 0.8518 at epoch 2

Epoch 3/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0223, lr=0.000452]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.03it/s]


Train Loss: 0.0234, Train AUC: 0.8476
Val Loss: 0.0221, Val AUC: 0.8915
New best AUC: 0.8915 at epoch 3

Epoch 4/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0188, lr=0.000397]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.15it/s]


Train Loss: 0.0210, Train AUC: 0.8942
Val Loss: 0.0198, Val AUC: 0.9153
New best AUC: 0.9153 at epoch 4

Epoch 5/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.019, lr=0.000328] 
Validation: 100%|██████████| 179/179 [00:05<00:00, 30.41it/s]


Train Loss: 0.0191, Train AUC: 0.9221
Val Loss: 0.0184, Val AUC: 0.9300
New best AUC: 0.9300 at epoch 5

Epoch 6/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0178, lr=0.000251]
Validation: 100%|██████████| 179/179 [00:05<00:00, 30.93it/s]


Train Loss: 0.0175, Train AUC: 0.9388
Val Loss: 0.0174, Val AUC: 0.9352
New best AUC: 0.9352 at epoch 6

Epoch 7/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0169, lr=0.000173]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.14it/s]


Train Loss: 0.0163, Train AUC: 0.9514
Val Loss: 0.0168, Val AUC: 0.9413
New best AUC: 0.9413 at epoch 7

Epoch 8/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0149, lr=0.000104]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.07it/s]


Train Loss: 0.0153, Train AUC: 0.9594
Val Loss: 0.0164, Val AUC: 0.9434
New best AUC: 0.9434 at epoch 8

Epoch 9/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0143, lr=4.87e-5]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.11it/s]


Train Loss: 0.0146, Train AUC: 0.9644
Val Loss: 0.0162, Val AUC: 0.9443
New best AUC: 0.9443 at epoch 9

Epoch 10/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.014, lr=1.32e-5] 
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.03it/s]


Train Loss: 0.0143, Train AUC: 0.9667
Val Loss: 0.0161, Val AUC: 0.9444
New best AUC: 0.9444 at epoch 10

Best AUC for fold 3: 0.9444 at epoch 10

Training set: 22852 samples
Validation set: 5712 samples
Found 22852 matching spectrograms for train dataset out of 22852 samples
Found 5712 matching spectrograms for valid dataset out of 5712 samples
Using custom pretrained model

Epoch 1/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0302, lr=0.0005]
Validation: 100%|██████████| 179/179 [00:05<00:00, 30.92it/s]


Train Loss: 0.0417, Train AUC: 0.5113
Val Loss: 0.0306, Val AUC: 0.5865
New best AUC: 0.5865 at epoch 1

Epoch 2/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.026, lr=0.000488] 
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.17it/s]


Train Loss: 0.0283, Train AUC: 0.6697
Val Loss: 0.0253, Val AUC: 0.8188
New best AUC: 0.8188 at epoch 2

Epoch 3/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.024, lr=0.000452] 
Validation: 100%|██████████| 179/179 [00:05<00:00, 30.92it/s]


Train Loss: 0.0240, Train AUC: 0.8213
Val Loss: 0.0220, Val AUC: 0.8965
New best AUC: 0.8965 at epoch 3

Epoch 4/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0207, lr=0.000397]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.16it/s]


Train Loss: 0.0214, Train AUC: 0.8849
Val Loss: 0.0203, Val AUC: 0.9153
New best AUC: 0.9153 at epoch 4

Epoch 5/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.019, lr=0.000328] 
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.14it/s]


Train Loss: 0.0194, Train AUC: 0.9181
Val Loss: 0.0188, Val AUC: 0.9284
New best AUC: 0.9284 at epoch 5

Epoch 6/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.016, lr=0.000251] 
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.13it/s]


Train Loss: 0.0180, Train AUC: 0.9348
Val Loss: 0.0178, Val AUC: 0.9349
New best AUC: 0.9349 at epoch 6

Epoch 7/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0146, lr=0.000173]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.11it/s]


Train Loss: 0.0167, Train AUC: 0.9494
Val Loss: 0.0171, Val AUC: 0.9385
New best AUC: 0.9385 at epoch 7

Epoch 8/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0166, lr=0.000104]
Validation: 100%|██████████| 179/179 [00:05<00:00, 30.90it/s]


Train Loss: 0.0158, Train AUC: 0.9569
Val Loss: 0.0168, Val AUC: 0.9389
New best AUC: 0.9389 at epoch 8

Epoch 9/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.26it/s, train_loss=0.0157, lr=4.87e-5]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.17it/s]


Train Loss: 0.0151, Train AUC: 0.9634
Val Loss: 0.0166, Val AUC: 0.9397
New best AUC: 0.9397 at epoch 9

Epoch 10/10


Training: 100%|██████████| 714/714 [01:38<00:00,  7.27it/s, train_loss=0.0144, lr=1.32e-5]
Validation: 100%|██████████| 179/179 [00:05<00:00, 31.15it/s]


Train Loss: 0.0147, Train AUC: 0.9654
Val Loss: 0.0165, Val AUC: 0.9390

Best AUC for fold 4: 0.9397 at epoch 9

Cross-Validation Results:
Fold 0: 0.9334
Fold 1: 0.9402
Fold 2: 0.9441
Fold 3: 0.9444
Fold 4: 0.9397
Mean AUC: 0.9404

Training complete!


In [23]:
# model = BirdCLEFModel(cfg).to(cfg.device)
import os
os.chdir(r'/kaggle/working')
from IPython.display import FileLink
FileLink(r'model_fold5.pth')

In [None]:
checkpoint_path = "/kaggle/input/pretraining_module/pytorch/default/1/best_model.pth"  # Change this to your checkpoint path
checkpoint = torch.load(checkpoint_path, map_location="cpu")  # Use "cuda" if using GPU
encoder_state_dict = {k: v for k, v in checkpoint.items() if "encoder" in k}

# Load the encoder weights
# model.load_state_dict(encoder_state_dict, strict=False)
# Load the state_dict into the model
# model.load_state_dict(checkpoint, strict=False)  # Set strict=True if keys must match exactly

In [None]:
from torchvision import transforms, models
encoder = models.efficientnet_b0(pretrained=False)


In [None]:
encoder.state_dict

In [None]:
model.state_dict

In [None]:
model.state_dict