# **Summary**

The goal of this project is to create a **reliable softmax-based bird classification model** that can generate high-quality pseudo labels for rare bird classes, particularly those **not covered by main analyzers** such as GBVC and BirdNET.

---

### Methodology

- The model predicts the **primary bird species present in 5-second audio clips** using a softmax output layer.
- These pseudo labels are intended to improve a subsequent **main multi-label classifier** that handles overlapping and multiple bird species.
- We use a MobileNetV2 backbone with pre-trained ImageNet weights.
- Training data consists primarily of clips known to contain **only one bird species** to simplify learning and improve label reliability.
- Although some training samples had multiple labels (reflecting real-world ambiguity), we applied **label normalization (label smoothing)** so that the summed label vector equals 1. This allows the model to learn soft, split confidences while still framing the task as a multi-class problem.
- BirdNET pseudo labels were used as part of the training data to provide soft labels and expand coverage.

---

### Results

- The final model achieved an **overall accuracy of ~75%**.
- For the **non-BirdNET classes**, which are rarer and outside the main label sets, the model also maintained roughly **75.9% accuracy**.
- These metrics indicate strong reliability of the model for generating pseudo labels across many bird species, especially those underrepresented in existing large-scale analyzers.

---



# **Imports**

In [2]:
import os
import random
import warnings

import numpy as np
import pandas as pd
import librosa
import librosa.display
import soundfile as sf
import joblib
import ast

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.utils import Sequence, to_categorical
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import CategoricalAccuracy
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input as mobilenet_preprocess_input
from tensorflow.keras.models import load_model
from tensorflow.keras import mixed_precision
import tensorflow.keras.backend as K

from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import precision_score, recall_score, f1_score

from collections import Counter
from google.colab import drive
import zipfile
from tqdm import tqdm  # Import tqdm for progress bars and output writing

drive.mount('/content/drive')
import sys
sys.path.append('/content/drive/MyDrive/Main_Birdclef/scripts')
import birdclef_utils

# Filter out UserWarnings from librosa related to PySoundFile/audioread
warnings.filterwarnings("ignore", message="PySoundFile failed. Trying audioread instead.", category=UserWarning)


IMPLEMENTATION = 'softmax'



Mounted at /content/drive


# **Retrieve Data From Drive**

This step involves unzipping and retrieving the necessary data files for the BirdCLEF project.

- The main folder is extracted from the primary Kaggle download ZIP file named `birdclef-2025.zip`. This contains the raw and structured BirdCLEF dataset.

- Additionally, supplementary and processed data generated from other notebooks, such as `ColabUploads.zip`, are also retrieved and processed. These contain derived files, intermediate results, or additional resources required for the workflow.

The commands below initiate these retrieval and processing steps:



In [5]:
birdclef_utils.retrieve_and_process_birdclef_data()
birdclef_utils.retrieve_and_process_birdclef_data(zip_filename='ColabUploads.zip')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Successfully extracted all files from birdclef-2025.zip to /content/data
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Successfully extracted all files from ColabUploads.zip to /content/data


# **Directories**

In [6]:
main_dir='/content/data/'
main_processed_dir=os.path.join(main_dir,'ColabUploads')
processed_dir=os.path.join(main_processed_dir,'KaggleUploads')

drive_dir='/content/drive/MyDrive'
main_birdclef_dir=os.path.join(drive_dir,'Main_Birdclef')
csv_dir=os.path.join(main_birdclef_dir,'CSVs')
supplemental_files_dir=os.path.join(main_birdclef_dir,'supplemental_files')
models_dir=os.path.join(main_birdclef_dir,'models')


# **Training/Validation/Testing Filenames**

In [7]:
original_train_filenames=np.load(os.path.join(processed_dir,'train_filenames.npy'))
original_test_filenames=np.load(os.path.join(processed_dir,'test_filenames.npy'))
original_val_filenames=np.load(os.path.join(processed_dir,'val_filenames.npy'))

# **Start Times with Confidence From Birdnet (Train Audio)**

In [8]:
df_start_times=pd.read_csv(os.path.join(csv_dir, 'birdnet_train_labels_final.csv'))
df_start_times

Unnamed: 0,filename,start_time,end_time,confidence,scientific_name,primary_label
0,grepot1/iNat1214104.ogg,0.0,3.0,0.662340,Nyctibius grandis,grepot1
1,grepot1/XC135771.ogg,3.0,6.0,0.981190,Nyctibius grandis,grepot1
2,grepot1/XC135771.ogg,6.0,9.0,0.372203,Nyctibius grandis,grepot1
3,grepot1/XC135771.ogg,12.0,15.0,0.979664,Nyctibius grandis,grepot1
4,grepot1/XC135771.ogg,21.0,24.0,0.966239,Nyctibius grandis,grepot1
...,...,...,...,...,...,...
174797,plukit1/XC421627.ogg,3.0,6.0,0.471399,Ictinia plumbea,plukit1
174798,plukit1/XC421627.ogg,6.0,9.0,0.987835,Ictinia plumbea,plukit1
174799,plukit1/XC421627.ogg,9.0,12.0,0.983746,Ictinia plumbea,plukit1
174800,plukit1/XC421627.ogg,12.0,15.0,0.698430,Ictinia plumbea,plukit1


In [9]:
start_time_info_secondary_labels=pd.read_csv(os.path.join(csv_dir, 'birdnet_secondary_label_detections.csv'))

# **Start Times and Confidences From Birdnet (Train Soundscapes)**

In [10]:
labeled_soundscapes=pd.read_csv(os.path.join(csv_dir, 'birdnet_soft_labels.csv'))
labeled_soundscapes


Unnamed: 0,file,start_time,end_time,confidence,scientific_name,primary_label
0,H98_20230430_114000.ogg,21.0,24.0,0.476130,Pheugopedius fasciatoventris,blbwre1
1,H98_20230430_114000.ogg,27.0,30.0,0.171864,Pheugopedius fasciatoventris,blbwre1
2,H98_20230430_114000.ogg,48.0,51.0,0.330452,Pheugopedius fasciatoventris,blbwre1
3,H16_20230503_085500.ogg,6.0,9.0,0.986548,Tapera naevia,strcuc1
4,H16_20230503_085500.ogg,9.0,12.0,0.996310,Tapera naevia,strcuc1
...,...,...,...,...,...,...
27382,H84_20230422_185500.ogg,33.0,36.0,0.114390,Pulsatrix perspicillata,speowl1
27383,H84_20230422_185500.ogg,36.0,39.0,0.136469,Pulsatrix perspicillata,speowl1
27384,H84_20230422_185500.ogg,42.0,45.0,0.169751,Pulsatrix perspicillata,speowl1
27385,H84_20230422_185500.ogg,45.0,48.0,0.107960,Pulsatrix perspicillata,speowl1


# **Filter Soundscapes**
- Include only samples where birnet predicts one species

In [11]:
soundscape_label_counts=labeled_soundscapes.groupby(['file','start_time'])['primary_label'].size().reset_index(name='count')
# Merge the count column back to the original dataframe on 'file' and 'start_time'
labeled_soundscapes_with_counts = labeled_soundscapes.merge(
    soundscape_label_counts,
    left_on=['file', 'start_time'],
    right_on=['file', 'start_time']
)

# Filter rows where count is exactly 1
filtered_soundscapes = labeled_soundscapes_with_counts[labeled_soundscapes_with_counts['count'] == 1]
filtered_soundscapes=filtered_soundscapes[filtered_soundscapes['confidence']>0.30]

train_soundscape_fnames=filtered_soundscapes['file'].unique()

# **Load train.csv (speech_cleaned_audio_with_duration.csv) and Saved Label Encoder**

In [12]:
warnings.filterwarnings('ignore')
processed_train_csv_path=os.path.join(processed_dir,'speech_cleaned_audio_with_duration.csv')

df = pd.read_csv(processed_train_csv_path, dtype={'primary_label': 'object'})
print(f'Sucessfully loaded processed train.csv from: {processed_train_csv_path}')
df['secondary_labels'] = df['secondary_labels'].apply(ast.literal_eval)

# Path to the label encoder file
label_encoder_path = os.path.join(supplemental_files_dir,'bird_label_encoder.joblib')

try:
    # Load the label encoder
    label_encoder = joblib.load(label_encoder_path)
    print(f"Successfully loaded label_encoder from: {label_encoder_path}")
except FileNotFoundError:
    print(f"Error: File not found at: {label_encoder_path}. "
          f"Please ensure the dataset 'processing-models' is attached and the path is correct.")
taxonomy_path=os.path.join(main_dir,'taxonomy.csv')
taxonomy=pd.read_csv(taxonomy_path)
print(f"Successfully loaded taxonomy data from: {taxonomy_path}")

Sucessfully loaded processed train.csv from: /content/data/ColabUploads/KaggleUploads/speech_cleaned_audio_with_duration.csv
Successfully loaded label_encoder from: /content/drive/MyDrive/Main_Birdclef/supplemental_files/bird_label_encoder.joblib
Successfully loaded taxonomy data from: /content/data/taxonomy.csv


# **Oversampling**
- Oversample underrepresented labels and flag for augmentation

In [14]:
# Ensure train filenames come from original train filenames based on original split
train=df[df['cleaned_filename'].isin(original_train_filenames)]

# Get the most amount of filenames associated with a single bird
highest_bird_count=train['primary_label'].value_counts().sort_values(ascending=False).values[0]

birds=train['primary_label'].unique()
train_filenames=[]

# To track original balance, which may be used for weighting during training
train_bird_file_counts={}


# For each bird species, build a list of associated filenames with augmentation flags
for bird in birds:
    data=train[train['primary_label']==bird]
    all_files=list(data['cleaned_filename'].values)
    file_count=0
    original_file_count=0
    for file in all_files:
       # Add all original files (with augment_flag=False)
        train_filenames.append((file,False))
        original_file_count+=1
        file_count+=1
    if file_count>125:
      # If more than 125 files, add 20 augmented copies randomly
      for i in range(20):
        train_filenames.append((random.choice(all_files),True))
        file_count+=1
    while file_count<highest_bird_count:
        # Oversample with augmented files until count matches highest bird count
        train_filenames.append((random.choice(all_files),True))
        file_count+=1
    train_bird_file_counts[bird]=original_file_count

# Shuffle the train filenames list and limit its size to 25600 randomly sampled items
random.shuffle(train_filenames)
train_filenames = random.sample(train_filenames, 25600)
val_filenames=[]

# Prepare validation filenames with augmentation flags set to False
val_filenames=[(f,False) for f in original_val_filenames]



warnings.filterwarnings("ignore", category=FutureWarning)

# --- Map cleaned filenames to original filenames ---
filename_mapping = df.set_index('cleaned_filename')['filename'].to_dict()
train_filenames=[(filename_mapping[f[0]],f[1]) for f in train_filenames]
val_filenames=[(filename_mapping[f[0]],f[1]) for f in val_filenames]

# --- Identify one-bird files and filter train and validation filenames ---

# Create a boolean column indicating if the record is associated with exactly one bird
df['isOneBird']=df['secondary_labels'].apply(lambda x: True if len(x)==0 or x[0]=='' else False)
one_bird_files=df[df['isOneBird']]['filename'].values

# Load labeled soundscapes soft labels data to filter filenames
labeled_soundscapes=pd.read_csv(os.path.join(csv_dir, 'birdnet_soft_labels.csv'))
labeled_soundscape_fnames=labeled_soundscapes['file'].unique()

# Filter train filenames to only include those with one bird
train_filenames=[f for f in train_filenames if f[0] in one_bird_files]+[(f,False) for f in original_test_filenames[:4000] if f in one_bird_files]
train_filenames=train_filenames+[(f,random.choice([True,False,False])) for f in labeled_soundscape_fnames[:3000] if f in train_soundscape_fnames]

# Filter validation filenames to only include those with one bird
val_filenames=[f for f in val_filenames if f[0] in one_bird_files]
print('Length of Train Filenames', len(train_filenames))
print('Length of Validation Filenames', len(val_filenames))

Length of Train Filenames 28680
Length of Validation Filenames 2716


# **Pseudo Labeling and the `RetreiveData` Class**

This section describes the purpose and functionality of the `RetreiveData` class, which prepares data for training a bird classification model with a **softmax output** layer.

The goal is to focus on samples with **only one bird species present**, which simplifies the classification task by excluding multi-bird or ambiguous samples. The class integrates multiple data sources and applies filtering to achieve this.

---

## Overview of Data Sources and Their Roles

- **`birdnet_train_labels_final.csv` (`self.start_time_info`)**  
  Contains BirdNET model predictions specifically for the **train_audio** dataset. These predictions correspond to BirdCLEF-assigned bird labels and include confidence scores and start times.

- **`birdnet_soft_labels.csv` (`self.labeled_soundscapes`)**  
  Holds BirdCLEF model predictions for **soundscape audio** recordings. This dataset provides soft labels (probabilistic confidence values) for bird calls detected in longer and more complex audio segments.

- **`birdnet_secondary_label_detections.csv` (`self.start_time_info_secondary_labels`)**  
  Contains BirdCLEF model confidence scores specifically for files that have secondary bird labels. These confidences correspond *only* to the secondary labels (Filtered Out).


- **`non_birdnet_start_time_and_soft_label_info.csv` (`self.start_time_info_two`)**  
  This file is for labels that are **not part of the BirdCLEF species classes**. These predictions were generated by a completely separate ResNet-based model trained from scratch, which used MFCC and mel spectrogram features. This dataset provides confidence scores (soft labels) and start time information for non-BirdCLEF species or background sounds.

- **Non-BirdCLEF species labels (`non_birdnet_labels.npy`)**  
  A numpy array listing species labels outside of the BirdCLEF class set. These are integrated in the data retrieval step to handle non-standard labels.

---

## Key Methods and Processing Logic

### Filtering for single-bird samples

To focus on samples with only one bird species, the method `get_one_label_soundscapes` filters the soundscape predictions, returning only those entries where exactly one bird call was detected for a given file and start time, and where the confidence is above a threshold of 0.30. This helps create cleaner, less ambiguous training targets for the softmax model.

### Data Aggregation and Thresholding

The class concatenates all label sources into a unified dataframe (`self.data`), including train audio labels, soundscape labels, secondary detections, and non-BirdCLEF labels.

When retrieving data for a given file, the method `get_file_and_start_time`:

1. Determines if the file corresponds to a BirdCLEF training bird or a soundscape.
2. Samples one relevant entry from the aggregated data for that filename.
3. Identifies the start time and retrieves all labels and confidence scores for the matching filename-start time window.
4. Applies species-specific confidence thresholds using the `apply_threshold` method to convert raw confidences into scaled or hard binary labels:
   - For **train_audio** files (i.e., those with assigned BirdCLEF labels), *all labels* present for that file are converted to **hard labels** (confidence set to 1.0) since we are confident those birds are present.
   - For **soundscape** files, confidence values are scaled or thresholded according to species-specific confidence thresholds. These thresholds are computed as the **lower quartile of BirdNET confidence scores** for each bird species in the train_audio dataset—representing a conservative estimate of the minimum confidence level observed for samples where the bird is known (or highly likely) to be present.
     - Specifically, the thresholds are calculated by grouping BirdNET confidences from the train_audio files by `primary_label` and taking the 25th percentile (lower quartile) of the confidence values for each species.
     - This approach helps to set adaptive, per-species thresholds that reflect real confidence distributions for known positive samples, thereby improving the reliability of pseudo-labeling on soundscape data.

5. Returns the filename, primary labels, processed soft labels, start time, and file kind.

---

This carefully curated and filtered label data serves as reliable pseudo-labels for training a **softmax output** bird classification model, emphasizing single-bird audio segments from both training and soundscape datasets while also handling species outside the core BirdCLEF classes.



In [16]:
class RetreiveData:
  def __init__(self,labels_df,implementation='softmax'):
    self.labels_df=labels_df
    self.label_encoder = joblib.load(os.path.join(supplemental_files_dir,'bird_label_encoder.joblib'))
    self.birdnet_thresholds=np.load(os.path.join(supplemental_files_dir,'birdnet_confidence_thresholds.npy'),allow_pickle=True).item()
    self.start_time_info=pd.read_csv(os.path.join(csv_dir, 'birdnet_train_labels_final.csv'))
    self.start_time_info_secondary_labels=pd.read_csv(os.path.join(csv_dir, 'birdnet_secondary_label_detections.csv'))
    self.labeled_soundscapes=pd.read_csv(os.path.join(csv_dir, 'birdnet_soft_labels.csv'))
    if implementation=='softmax':
      self.labeled_soundscapes=self.get_one_label_soundscapes(self.labeled_soundscapes)

    self.labeled_soundscapes=self.labeled_soundscapes.rename(columns={'file':'filename'})
    self.list_of_labeled_soundscapes=list(self.labeled_soundscapes['filename'].values)
    self.list_of_train_filenames=list(self.start_time_info['filename'].values)
    self.start_time_info_two=pd.read_csv(os.path.join(csv_dir, 'non_birdnet_start_time_and_soft_label_info.csv'))
    self.data=pd.concat([self.start_time_info,self.labeled_soundscapes,self.start_time_info_two,self.start_time_info_secondary_labels])
    self.non_birdnet_labels=np.load(os.path.join(supplemental_files_dir,'non_birdnet_labels.npy'),allow_pickle=True)
  def get_one_label_soundscapes(self,labeled_soundscapes):
    soundscape_label_counts=labeled_soundscapes.groupby(['file','start_time'])['primary_label'].size().reset_index(name='count')
    # Merge the count column back to the original dataframe on 'file' and 'start_time'
    labeled_soundscapes_with_counts = labeled_soundscapes.merge(
        soundscape_label_counts,
        left_on=['file', 'start_time'],
        right_on=['file', 'start_time']
      )
    # Filter rows where count is exactly 1
    filtered_soundscapes = labeled_soundscapes_with_counts[labeled_soundscapes_with_counts['count'] == 1]
    filtered_soundscapes=filtered_soundscapes[filtered_soundscapes['confidence']>0.30]
    return filtered_soundscapes
  def apply_threshold(self, row, thresholds):
      """
      Scale or hard-label BirdNET confidence values
      based on per-species thresholds.

      Args:
          row (pd.Series): Must have 'primary_label' and 'confidence' columns.
          thresholds (dict): Mapping {label: threshold_value}.

      Returns:
          float: Scaled confidence or hard label (1.0).
      """
      label = row['primary_label']
      conf = row['confidence']


      # Get species-specific threshold, fallback None if missing
      threshold = thresholds.get(label, None)

      if threshold is None or threshold <= 0:
          # No valid threshold — return original conf
          return conf

      # Above threshold → hard-label
      if conf > threshold:
          return 1.0

      # Otherwise → scale relative to threshold
      return conf / threshold
  def get_file_and_start_time(self,filename):
    file_info=filename.split('/')[0]

    if file_info in self.label_encoder.classes_:
      file_kind='train_birds'
    else:
      file_kind='soundscapes'
    primary_labels=[]
    soft_labels=[]
    data=self.data[self.data['filename']==filename]
    try:
      data=data.sample(1)
    except:
      try:
        if file_info in self.non_birdnet_labels:
          return filename,[file_info],[1],None,file_kind
        else:
          return filename,[file_info],[1],"skip",file_kind
      except:
        return None
    start_time=data['start_time'].values[0]
    try:
      filename=data['filename'].values[0]
    except:
      filename=data['file'].values[0]

    # Make sure both filename and time window are matched
    window_mask = (
        (self.data['filename'] == filename) &
        (self.data['start_time'] == start_time)
    )
    data = self.data[window_mask]

    # Group and average
    agg = data.groupby('primary_label')['confidence'].sum().reset_index()
    agg['confidence'] = agg.apply(lambda row: self.apply_threshold(row, self.birdnet_thresholds), axis=1)

    primary_labels = agg['primary_label'].tolist()
    soft_labels = agg['confidence'].tolist()
    if file_kind=='train_birds':
      soft_labels=[1 for _ in soft_labels]

    return filename,primary_labels,soft_labels,start_time,file_kind
random.shuffle(train_filenames)
data=RetreiveData(df.set_index('filename'))
for i in range(16):
  fname,labels,soft_labels,start_time,file_kind=data.get_file_and_start_time(train_filenames[i][0])
  print(f'Filename: {fname}, Start Time: {start_time}, Label Names: {labels}, Label Values: {soft_labels}, File Kind: {file_kind}')
  print(type(labels))


Filename: H30_20230517_112500.ogg, Start Time: 33.0, Label Names: ['yehcar1'], Label Values: [1.0], File Kind: soundscapes
<class 'list'>
Filename: H24_20230517_162500.ogg, Start Time: 42.0, Label Names: ['grekis'], Label Values: [1.0], File Kind: soundscapes
<class 'list'>
Filename: 566513/iNat885527.ogg, Start Time: 1.0, Label Names: ['566513'], Label Values: [1], File Kind: train_birds
<class 'list'>
Filename: wbwwre1/XC532519.ogg, Start Time: skip, Label Names: ['wbwwre1'], Label Values: [1], File Kind: train_birds
<class 'list'>
Filename: yelori1/XC57482.ogg, Start Time: skip, Label Names: ['yelori1'], Label Values: [1], File Kind: train_birds
<class 'list'>
Filename: amekes/iNat318311.ogg, Start Time: skip, Label Names: ['amekes'], Label Values: [1], File Kind: train_birds
<class 'list'>
Filename: tropar/XC84606.ogg, Start Time: 12.0, Label Names: ['tropar'], Label Values: [1], File Kind: train_birds
<class 'list'>
Filename: gycwor1/iNat842159.ogg, Start Time: 27.0, Label Names: 

# **Generator: Was used to prepare data for both Categorical and Binary Cross Entropy**
- **Data Retrieval**: Gets start_times and labels for filenames
- **Audio Augmentation**: Performs audio agumentation including pitch/time shifts, volume shifts, compression, reverb, and time-stretching.
- **Audio Normalization**: Uses peak normalization on the audio.
- **Mixing**: For BCE implementation files from train_audio are mixed with files from train_soundscapes to create multi-labeled data.
- **Feature Generation**: Extracts Mel Spectrograms with 320 time bins and 128 frequency bins.
- **Spectrogram Augmentation**: Performs SpecAug on the mel spectrograms, randomly zeroing out randomly selected time and frequency bins.


In [17]:
class AudioFeatureGeneratorMixedPadded(tf.keras.utils.Sequence):
    def __init__(self, filenames, labels_df, audio_dir='train_audio', soundscapes_dir='train_soundscapes',
                 sr=32000, chunk_duration=5, batch_size=32, shuffle=True,
                 target_time_length_spectrogram=250, # Removed target_time_length_mfcc
                  num_classes=None, kind='train',return_info=False,imp_type='softmax'):
        self.return_info=return_info
        self.filenames = filenames
        self.labels_df = labels_df.set_index('filename')
        self.audio_dir = audio_dir
        self.soundscapes_dir = soundscapes_dir
        self.sr = sr
        self.chunk_duration = chunk_duration
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.label_encoder = joblib.load(os.path.join(supplemental_files_dir,'bird_label_encoder.joblib'))
        self.target_time_length_spectrogram = target_time_length_spectrogram
        self.start_time_label_data=RetreiveData(self.labels_df)
        self.start_time = start_time
        self.kind = kind
        self.implementation_type=imp_type
        self.num_classes = num_classes if num_classes is not None else len(self.label_encoder.classes_)

        self.all_classes = self.label_encoder.classes_
        self.on_epoch_end()

    def __len__(self):
        return int(np.ceil(len(self.filenames) / self.batch_size))


    def _normalize(self, audio):
        """Normalizes the audio waveform to the range [-1, 1] based on its peak."""
        peak = np.abs(audio).max()
        if peak > 0:
            return audio / peak
        return audio


    def spec_augment(self, mel_spectrogram, time_mask_param=8, num_time_masks=3, freq_mask_param=4, num_freq_masks=3):
        """
        Applies time and frequency masking to the Mel spectrogram.
        It's designed to work with 3-channel Mel spectrograms and maintain that shape.

        Args:
            mel_spectrogram: np.ndarray, shape (time, freq, 3) (or whatever shape your _extract_features produces)
            time_mask_param: int, maximum width of the time mask.
            num_time_masks: int, number of time masks to apply.
            freq_mask_param: int, maximum width of the frequency mask.
            num_freq_masks: int, number of frequency masks to apply.

        Returns:
            np.ndarray: augmented mel_spectrogram (same shape as input: time, freq, 3).
        """
        # Create a copy to avoid modifying the original array in place
        augmented_mel = mel_spectrogram.copy()

        # Get the dimensions of the mel spectrogram
        num_time_steps_mel = augmented_mel.shape[0]  # Time dimension
        num_freq_bins_mel = augmented_mel.shape[1]  # Frequency dimension
        # The channel dimension (augmented_mel.shape[2]) will be implicitly handled

        # --- Time Masking ---
        # Applied across all frequency bins AND all 3 channels
        for _ in range(num_time_masks):
            t = np.random.randint(0, time_mask_param + 1)
            if t > 0 and num_time_steps_mel > t:
                t0 = np.random.randint(0, num_time_steps_mel - t + 1)
                # Set a rectangular region to zero across all frequency bins and channels
                augmented_mel[t0:t0 + t, :, :] = 0

        # --- Frequency Masking ---
        # Applied across all time steps AND all 3 channels
        if freq_mask_param is not None and num_freq_masks > 0:
            for _ in range(num_freq_masks):
                f_mel = np.random.randint(0, freq_mask_param + 1)
                if f_mel > 0 and num_freq_bins_mel > f_mel:
                    f0_mel = np.random.randint(0, num_freq_bins_mel - f_mel + 1)
                    # Set a rectangular region to zero across all time steps and channels
                    augmented_mel[:, f0_mel:f0_mel + f_mel, :] = 0

        # The function now returns only the augmented mel spectrogram.
        # Its shape will be identical to the input shape (e.g., (time, freq, 3)).
        return augmented_mel

    def is_speech_in_file(self,filename):
        speech_time_info=self.speech_time_info[filename]
        if len(speech_time_info)==0:
            return False
        else:
            return True
    def __getitem__(self, idx):
      file_index = idx * self.batch_size
      if file_index >= len(self.filenames):
          raise StopIteration
      batch_filenames = self.filenames[idx * self.batch_size:(idx + 1) * self.batch_size]
      batch_mel_features = []
      batch_labels = []
      tracked_filenames = []
      tracked_start_times = []
      tracked_labels = []

      for (filename, augment_flag) in batch_filenames:
          filenames_for_sample = []
          try:
              # Retrieve metadata and file type
              fname, primary_labels, soft_labels, start_time, file_type = self.start_time_label_data.get_file_and_start_time(filename)
              if start_time == 'skip':
                  continue
              filenames_for_sample.append(fname)

              # Choose correct directory based on file_type
              if file_type == 'train_birds':
                  audio_path = os.path.join(self.audio_dir, fname)
              else:
                  audio_path = os.path.join(self.soundscapes_dir, fname)

              # If file doesn't exist, skip
              if not os.path.isfile(audio_path):
                  print(f"File not found: {audio_path}")
                  continue

              # If start_time is None, pick a random start time
              if start_time is None:
                  try:
                      bird_audio, sr = librosa.load(audio_path, sr=self.sr)
                      file_duration = librosa.get_duration(y=bird_audio, sr=self.sr)
                      try:
                          start_time = random.choice([i - 5 for i in range(5, int(file_duration))] + [0])
                      except Exception:
                          start_time = 0
                  except Exception as e:
                      print(f"Error loading full audio for {fname}: {e}")
                      continue

              # Apply adjustment
              if start_time > 1:
                  adjustment = random.uniform(-1, 1)
              else:
                  adjustment = random.uniform(0, 1)
              adjusted_start_time = start_time + adjustment

              # Load chunk
              try:
                  bird_audio_chunk, sr = librosa.load(
                      audio_path, sr=self.sr, duration=self.chunk_duration, offset=adjusted_start_time
                  )
              except Exception as e:
                  print(f"Error loading chunk for {fname}: {e}")
                  continue

              # Pad if needed
              if len(bird_audio_chunk) < self.sr * self.chunk_duration:
                  bird_audio_chunk = np.pad(
                      bird_audio_chunk,
                      (0, self.sr * self.chunk_duration - len(bird_audio_chunk)),
                      mode='constant'
                  )

              # Remove all mixing code here (no _mix_files calls)
              # Append only the original filename
              filenames_for_sample.append('')

              tracked_start_times.append(adjusted_start_time)
              tracked_filenames.append(filenames_for_sample)

              # Build softmax label
              softmax_label = np.zeros(self.num_classes)
              temp_track_label = []
              if primary_labels is not None and soft_labels is not None:
                  for primary_label, soft_label in zip(primary_labels, soft_labels):
                      idx_label = self.label_encoder.transform([primary_label])[0]
                      softmax_label[idx_label] = soft_label
                      temp_track_label.append(primary_label)
              tracked_labels.append(temp_track_label)

              if self.implementation_type == 'softmax':
                  # Normalize softmax label
                  if softmax_label.sum() > 0:
                      softmax_label /= softmax_label.sum()

              # Data augmentation and SpecAugmentation still applied
              if (augment_flag or random.random() < 0.20) and self.kind == 'train':
                  augmented_label = softmax_label.copy()
                  augmented_audio = self._augment_audio(bird_audio_chunk)
                  augmented_audio = self._normalize(augmented_audio)
                  mel_features = self._extract_features(augmented_audio)
                  if np.random.rand() < 0.50:
                      mel_features = self.spec_augment(mel_features)
                  batch_mel_features.append(mel_features)
                  batch_labels.append(augmented_label)
              else:
                  audio_to_process = self._normalize(bird_audio_chunk)
                  mel_features = self._extract_features(audio_to_process)
                  batch_mel_features.append(mel_features)
                  batch_labels.append(softmax_label)

          except Exception as e_audio:
              print(f"Error processing audio file {fname}: {e_audio}")
              batch_mel_features.append(np.zeros((self.target_time_length_spectrogram, 128, 3)))
              batch_labels.append(np.zeros(self.num_classes))

      batch_mel_features = np.array(batch_mel_features)
      batch_labels = np.array(batch_labels)

      if not self.return_info:
          return batch_mel_features, batch_labels
      else:
          return batch_mel_features, batch_labels, tracked_filenames, tracked_start_times

    def _pad_or_truncate_audio(self, data, target_length):
        if data.shape[0] < target_length:
            padding = np.zeros((target_length - data.shape[0],), dtype=data.dtype)
            return np.concatenate((data, padding))
        elif data.shape[0] > target_length:
            return data[:target_length]
        else:
            return data
    def on_epoch_end(self):
        if self.shuffle:
            np.random.shuffle(self.filenames)

    def _extract_features(self, audio):
        mel_spec = librosa.feature.melspectrogram(y=audio, sr=self.sr)
        mel_spec_db = librosa.power_to_db(mel_spec, ref=np.max).T

        min_db = -80.0
        max_db = 0.0

        mel_spec_scaled = (mel_spec_db - min_db) / (max_db - min_db)
        mel_spec_scaled = np.clip(mel_spec_scaled, 0.0, 1.0)

        # Pad or truncate and add a channel dimension
        mel_spec_padded = self._pad_or_truncate(mel_spec_scaled, self.target_time_length_spectrogram)[:, :, np.newaxis]

        # Duplicate the single mel channel across 3 channels for EfficientNet
        mel_spec_padded_3_channel = np.repeat(mel_spec_padded, 3, axis=-1)

        # Removed all MFCC computation and return
        return mel_spec_padded_3_channel



    def _pad_or_truncate(self, data, target_length):
        if data.shape[0] < target_length:
            padding = np.zeros((target_length - data.shape[0], data.shape[1]), dtype=data.dtype)
            return np.vstack((data, padding))
        elif data.shape[0] > target_length:
            return data[:target_length]
        else:
            return data

    def time_stretch_and_pad_truncate(self,audio):
        """
        Applies time stretching to an audio signal and then pads or truncates
        it to match the target length.

        Args:
            audio (np.ndarray): The input audio signal.
            sr (int): The sampling rate of the audio.
            rate (float): The stretching factor (e.g., 0.8 for slower, 1.2 for faster).
            target_length (int): The desired length of the output audio (in samples).

        Returns:
            np.ndarray: The time-stretched and padded/truncated audio signal.
        """
        rate = random.uniform(0.8, 1.2)
        sr=self.sr
        target_length=self.sr * self.chunk_duration
        stretched_audio = librosa.effects.time_stretch(audio, rate=rate)
        try:
            current_length = stretched_audio.shape[0]
        except:
            print('strethching function')

        if current_length < target_length:
            # Pad with zeros at the end if shorter
            padding = target_length - current_length
            stretched_audio = np.pad(stretched_audio, (0, padding), mode='constant')
        elif current_length > target_length:
            # Truncate from the end if longer
            stretched_audio = stretched_audio[:target_length]

        return stretched_audio


    def _augment_audio(self, audio):
        """
        Applies data augmentation to the audio.

        Args:
            audio (np.ndarray): The audio waveform.
            label (np.ndarray): The multi-hot encoded label.
            filename (str): the filename of the audio

        Returns:
            tuple: Augmented audio and label.
        """
        augmented_audio = audio.copy()
        is_mixed_birds = False
        # Reverb (Simple Delay-Based)
        if random.random() < 0.4:  # Probability of applying reverb
            n_delay_samples = int(0.1 * self.sr)  # 100ms delay
            feedback = random.uniform(0.2, 0.5)
            reverberated_audio = np.zeros_like(augmented_audio)
            for i in range(len(augmented_audio)):
                reverberated_audio[i] = augmented_audio[i]
                if i >= n_delay_samples:
                    reverberated_audio[i] += feedback * reverberated_audio[i - n_delay_samples]
            augmented_audio = reverberated_audio

        # Compression (Simple Soft Clipping)
        if random.random() < 0.2:  # Probability of applying compression
            threshold = random.uniform(0.6, 0.9)
            gain = random.uniform(1.0, 1.5)
            compressed_audio = np.tanh(augmented_audio * gain / threshold) * threshold
            augmented_audio = compressed_audio
        # Time shifting
        if random.random() < 0.4:
            shift_seconds = random.uniform(-2.5, 2.5)  # Shift by up to 0.5 seconds
            shift_samples = int(shift_seconds * self.sr)
            augmented_audio = np.roll(augmented_audio, shift_samples)

        if random.random() < 0.0:
            mulaw_c = random.uniform(150, 300)  # Mu-law parameter
            augmented_audio = librosa.mu_compress(augmented_audio, mu=mulaw_c)
        if random.random() < 0.2:
            clip_factor = random.uniform(0.7, 0.95) # Adjust the range
            augmented_audio = np.clip(augmented_audio, -clip_factor, clip_factor)

        # Pitch shifting
        if random.random() < 0.1:
            n_steps = random.uniform(-0.5, 0.5)
            augmented_audio = librosa.effects.pitch_shift(augmented_audio, sr=self.sr, n_steps=n_steps)
         # Volume change
        if random.random() < 0.3:
            amplitude_factor = random.uniform(0.5, 1.3)
            augmented_audio = augmented_audio * amplitude_factor

        # Time stretching
        if random.random() < 0.1:
            augmented_audio = self.time_stretch_and_pad_truncate(augmented_audio)

        return augmented_audio

unique_labels = df['primary_label'].unique()
label_encoder = LabelEncoder().fit(unique_labels)

random.shuffle(train_filenames)
# --- Parameters ---
audio_dir = os.path.join(main_dir,'train_audio')
soundscapes_dir = os.path.join(main_dir,'train_soundscapes')
sr = 32000
chunk_duration = 5
batch_size_train = 128
batch_size_val = 32
target_time_length_spectrogram = 320
target_time_length_mfcc = 320
n_mfcc = 40
start_time = 0
num_classes = len(df['primary_label'].unique()) # Get the number of unique bird species
# --- Training Data Generator ---
train_generator = AudioFeatureGeneratorMixedPadded(
    filenames=random.sample(train_filenames,10000),
    labels_df=df[['cleaned_filename', 'primary_label','secondary_labels','filename','duration','isOneBird']],
    audio_dir=audio_dir,
    soundscapes_dir=soundscapes_dir,
    sr=sr,
    chunk_duration=chunk_duration,
    batch_size=batch_size_train,
    shuffle=True,
    target_time_length_spectrogram=target_time_length_spectrogram,
    num_classes=num_classes,
    imp_type=IMPLEMENTATION

)
val_chunk_duration=5
random.shuffle(val_filenames)
val_generator = AudioFeatureGeneratorMixedPadded(
    filenames=val_filenames,
    labels_df=df[['cleaned_filename', 'primary_label','secondary_labels','filename','duration','isOneBird']],
    audio_dir=audio_dir,
    soundscapes_dir=soundscapes_dir,
    sr=sr,
    chunk_duration=chunk_duration,
    batch_size=batch_size_val,
    shuffle=False,             # No shuffling for validation
    target_time_length_spectrogram=target_time_length_spectrogram,
    num_classes=num_classes,
    kind='val' ,
    imp_type=IMPLEMENTATION # Set the 'kind' to 'val'
)

print("Train generator size:", len(train_generator))
print("Validation generator size:", len(val_generator))

Train generator size: 79
Validation generator size: 85


# **Data Validation: Ensure labels and spectrograms shapes match expectations.**

In [None]:
for batch_idx in range(5):
    try:
        features, labels = train_generator[batch_idx]
        print(f"Batch {batch_idx}: features.shape = {features.shape}, labels.shape = {labels.shape}")
        for i in range(min(5, len(labels))):
            print(f"  Label {i} shape: {labels[i].shape}, sum: {labels[i].sum()}, nonzero: {np.count_nonzero(labels[i])}")
    except Exception as e:
        print(f"Error in batch {batch_idx}: {e}")

Batch 0: features.shape = (109, 320, 128, 3), labels.shape = (109, 206)
  Label 0 shape: (206,), sum: 1.0, nonzero: 1
  Label 1 shape: (206,), sum: 1.0, nonzero: 1
  Label 2 shape: (206,), sum: 1.0, nonzero: 1
  Label 3 shape: (206,), sum: 1.0, nonzero: 1
  Label 4 shape: (206,), sum: 1.0, nonzero: 1
Batch 1: features.shape = (115, 320, 128, 3), labels.shape = (115, 206)
  Label 0 shape: (206,), sum: 1.0, nonzero: 1
  Label 1 shape: (206,), sum: 1.0, nonzero: 1
  Label 2 shape: (206,), sum: 1.0, nonzero: 1
  Label 3 shape: (206,), sum: 1.0, nonzero: 1
  Label 4 shape: (206,), sum: 1.0, nonzero: 1
Batch 2: features.shape = (111, 320, 128, 3), labels.shape = (111, 206)
  Label 0 shape: (206,), sum: 1.0, nonzero: 1
  Label 1 shape: (206,), sum: 1.0, nonzero: 1
  Label 2 shape: (206,), sum: 1.0, nonzero: 1
  Label 3 shape: (206,), sum: 1.0, nonzero: 1
  Label 4 shape: (206,), sum: 1.0, nonzero: 1
Batch 3: features.shape = (113, 320, 128, 3), labels.shape = (113, 206)
  Label 0 shape: (206,

### Optionally look at predictions for first batch

In [18]:
softmax_analysis=True
if softmax_analysis:
    # Load the model
    model = load_model(os.path.join(models_dir, 'best_model_by_val_loss_softmax.keras'))

    # Get the first batch from the generator
    inputs, labels = next(iter(train_generator))

    # Predict with the model
    softmax_probs = model.predict(inputs)  # shape: (batch_size, num_classes)

    # Decode true labels (multi-hot to class names)
    # For each sample, get indices where label == 1
    true_label_indices = [np.where(sample >0.5)[0] for sample in labels]
    true_label_names = [
        [train_generator.label_encoder.inverse_transform([idx])[0] for idx in indices]
        for indices in true_label_indices
    ]

    # Optionally, get top-N predicted classes per sample
    top_n = 3
    top_pred_indices = np.argsort(-softmax_probs, axis=1)[:, :top_n]
    top_pred_names = [
        [train_generator.label_encoder.inverse_transform([idx])[0] for idx in indices]
        for indices in top_pred_indices
    ]

    # Build a DataFrame
    df_data = []
    for i in range(len(inputs)):
        row = {
            'true_labels': true_label_names[i],
            'top_predicted_labels': top_pred_names[i],
            'top_predicted_probs': softmax_probs[i][top_pred_indices[i]].tolist(),
            'all_pred_probs': softmax_probs[i].tolist()
        }
        df_data.append(row)

    analysis_df = pd.DataFrame(df_data)
analysis_df.sample(20)

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 674ms/step


Unnamed: 0,true_labels,top_predicted_labels,top_predicted_probs,all_pred_probs
50,[868458],"[868458, 523060, 1564122]","[0.9990290999412537, 0.00046520036994479597, 0...","[1.5225268725771457e-06, 2.261600229758187e-06..."
68,[cotfly1],"[cotfly1, spepar1, saffin]","[0.9623846411705017, 0.010448773391544819, 0.0...","[1.8667620679480024e-05, 8.467578140880505e-07..."
38,[banana],"[banana, babwar, rtlhum]","[0.9026805758476257, 0.026786644011735916, 0.0...","[2.7354853955330327e-05, 5.0946808187291026e-0..."
59,[chfmac1],"[chfmac1, 41663, ywcpar]","[0.9645504355430603, 0.006017839536070824, 0.0...","[1.5760642781970091e-06, 1.5546295628610096e-0..."
63,[strowl1],"[strowl1, grekis, blchaw1]","[0.9883144497871399, 0.0014231001259759068, 0....","[7.136846420507936e-07, 4.2185808979411377e-07..."
101,[bicwre1],"[bicwre1, blcant4, rutpuf1]","[0.5998054146766663, 0.06857417523860931, 0.03...","[2.5635930796852335e-05, 2.496004299246124e-06..."
46,[eardov1],"[eardov1, whtdov, bubcur1]","[0.4790932238101959, 0.11227947473526001, 0.05...","[0.0004608906165231019, 9.307422442361712e-05,..."
7,[rtlhum],"[rtlhum, 65373, amakin1]","[0.851884663105011, 0.035564079880714417, 0.03...","[1.5940764569677413e-05, 0.0001594864879734814..."
5,[butsal1],"[rtlhum, olipic1, yeofly1]","[0.2895408272743225, 0.13998723030090332, 0.12...","[0.0002512112259864807, 0.0001700387365417555,..."
47,[bkmtou1],"[bkmtou1, 22333, strcuc1]","[0.1980391889810562, 0.07418911159038544, 0.06...","[0.000508845376316458, 0.0001421153574483469, ..."


# **Class Weighting**
- Optionally apply class weighting based on original sample sizes
- Note: All training was done with equal weighting for all classes.

In [None]:
variable_threshold=False
total_train_files = len(train_filenames)
num_classes = len(train_generator.label_encoder.classes_)
class_weights_adjusted = {}

for i, bird_class in enumerate(train_generator.label_encoder.classes_):
    positive_count = train_bird_file_counts.get(bird_class, 1e-6)
    weight = total_train_files / (num_classes * positive_count + 1e-6)
    class_weights_adjusted[i] = weight

print("Adjusted Class Weights (Encoded):", dict(list(class_weights_adjusted.items())[:10]))

# Thresholding based on oversampled frequency using encoded labels
weights_adjusted = np.array(list(class_weights_adjusted.values()))
min_weight_adj, max_weight_adj = weights_adjusted.min(), weights_adjusted.max()

min_threshold = 0.34
max_threshold = 0.66

norm_weights_adj = (weights_adjusted - min_weight_adj) / (max_weight_adj - min_weight_adj + 1e-8)

class_thresholds_adjusted_encoded = {}
for i, bird_class in enumerate(train_generator.label_encoder.classes_):
    threshold = max_threshold - norm_weights_adj[i] * (max_threshold - min_threshold)
    if variable_threshold:
        class_thresholds_adjusted_encoded[i] = threshold
    else:
        class_thresholds_adjusted_encoded[i] = 0.5
print("Adjusted Class Thresholds (Encoded):", dict(list(class_thresholds_adjusted_encoded.items())[:10]))

Adjusted Class Weights (Encoded): {0: 139.03397990760203, 1: 69.51699012253158, 2: 139.03397990760203, 3: 46.344660119183395, 4: 46.344660119183395, 5: 69.51699012253158, 6: 23.172330078339538, 7: 139.03397990760203, 8: 34.758495103448425, 9: 46.344660119183395}
Adjusted Class Thresholds (Encoded): {0: 0.5, 1: 0.5, 2: 0.5, 3: 0.5, 4: 0.5, 5: 0.5, 6: 0.5, 7: 0.5, 8: 0.5, 9: 0.5}


# **Create Tensorflow Dataset**

In [None]:
if IMPLEMENTATION == 'softmax':
    print("Train generator size:", len(train_generator))
    print("Validation generator size:", len(val_generator))

    # Inspect the first batch for debugging
    features, labels = train_generator[0]
    print(f"First batch - Mel shape: {features.shape}, Label shape: {labels.shape}")
    print("Labels (first example in batch):", labels[0])

    # If your generator returns one-hot/softmax labels (shape: (batch_size, num_classes)):
    label_vector = labels[0]  # shape: (num_classes,)
    class_index = np.argmax(label_vector)
    class_name = train_generator.label_encoder.inverse_transform([class_index])[0]
    print("Class index:", class_index, "Class name:", class_name)

    num_classes = len(train_generator.label_encoder.classes_)

    # Output signature for SOFTMAX (one-hot/soft labels)
    output_signature = (
        tf.TensorSpec(shape=(None, train_generator.target_time_length_spectrogram, 128, 3), dtype=tf.float32),  # Features
        tf.TensorSpec(shape=(None, num_classes), dtype=tf.float32)  # Labels: (batch_size, num_classes)
    )

    # Generator functions for tf.data.Dataset
    def train_gen():
        for i in range(len(train_generator)):
            features, labels = train_generator[i]
            yield (
                tf.convert_to_tensor(features, dtype=tf.float32),
                tf.convert_to_tensor(labels, dtype=tf.float32)
            )
    train_dataset = tf.data.Dataset.from_generator(train_gen, output_signature=output_signature)

    def val_gen():
        for i in range(len(val_generator)):
            features, labels = val_generator[i]
            yield (
                tf.convert_to_tensor(features, dtype=tf.float32),
                tf.convert_to_tensor(labels, dtype=tf.float32)
            )
    val_dataset = tf.data.Dataset.from_generator(val_gen, output_signature=output_signature)


Train generator size: 79
Validation generator size: 85
First batch - Mel shape: (109, 320, 128, 3), Label shape: (109, 206)
Labels (first example in batch): [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Class index: 150 Class name: rugdov


# **Model**
Builds a MobileNetV2 backbone model with ImageNet weights, adds a dense layer of 1024 neurons with ReLU activation and dropout, followed by a softmax classification layer for bird species prediction from spectrogram inputs.



In [None]:

load_checkpoint=False
def build_bird_classifier_model(input_shape, num_classes, l2_reg=1e-4): # Renamed for clarity
    """
    Builds a MobileNetV2 model for bird classification (softmax output).

    Args:
        input_shape (tuple): Shape of the input spectrograms (height, width, channels).
                             Expected: (target_time_length_spectrogram, 128, 3)
        num_classes (int): Number of output classes for multi-class classification.
        l2_reg (float): L2 regularization factor.

    Returns:
        tf.keras.Model: Compiled MobileNetV2 model.
    """
    base_model = MobileNetV2(
        include_top=False,
        weights='imagenet', # Use pre-trained ImageNet weights
        input_shape=input_shape,
        pooling='avg' # Global average pooling to reduce spatial dimensions
    )

    base_model.trainable = True # Fine-tune the base model

    inputs = tf.keras.Input(shape=input_shape)

    # 1. Rescale from [0, 1] to [0, 255] (if your generator outputs [0,1])
    x = layers.Rescaling(255)(inputs)

    # 2. Apply MobileNetV2-specific ImageNet preprocessing (mean subtraction, etc.)
    x = mobilenet_preprocess_input(x) # <--- ADD THIS LINE

    x = base_model(x, training=True) # Pass inputs through the base model

    # Add custom classification head for multi-class (softmax)
    x = layers.Dense(1024, activation='relu', kernel_regularizer=l2(l2_reg))(x)
    x = layers.Dropout(0.5)(x)
    x = layers.Dense(num_classes, activation='softmax', kernel_regularizer=l2(l2_reg))(x) # Softmax for multi-class

    model = models.Model(inputs=inputs, outputs=x, name='MobileNetV2_Bird_Classifier') # Renamed model name
    return model

# --- Instantiate the model ---
if IMPLEMENTATION=='softmax':
    # MobileNet (height, width, channels)
    # Generator provides (batch_size, time_steps, mel_bins, 3)
    mobilenet_input_shape = (train_generator.target_time_length_spectrogram, 128, 3)
    num_classes = len(train_generator.label_encoder.classes_)
    first_run=False
    if first_run:
      model = build_bird_classifier_model(mobilenet_input_shape, num_classes)
    else:
      model_path =os.path.join(models_dir, 'best_model_by_val_loss_softmax.keras')# Or wherever your checkpoint manager saved it

      try:
          # Use tf.keras.models.load_model to load the complete model (architecture, weights, optimizer state)
          model = models.load_model(model_path)
          print(f"Successfully loaded model from {model_path}")
      except Exception as e:
          print(f"Error loading model: {e}")
          # Fallback to building from scratch if loading fails, or just raise the error
          print("Attempting to build model from scratch as a fallback.")


    # --- Optimizer ---
    initial_learning_rate =  1e-5

    optimizer = Adam(learning_rate=initial_learning_rate)
    working_dir=drive_dir


    # Setup checkpointing
    checkpoint_dir = os.path.join(models_dir,'training_mobilenet_checkpoints_jun13')
    checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
    checkpoint_manager = tf.train.CheckpointManager(checkpoint, checkpoint_dir, max_to_keep=5)

    # Try to restore the latest checkpoint
    if checkpoint_manager.latest_checkpoint and load_checkpoint:
        checkpoint.restore(checkpoint_manager.latest_checkpoint)
        print(f"Restored from {checkpoint_manager.latest_checkpoint}")
    else:
        print(f"Model loaded from {model_path}")

    model.summary()

Successfully loaded model from /content/drive/MyDrive/Main_Birdclef/models/best_model_by_val_loss_softmax.keras
Model loaded from /content/drive/MyDrive/Main_Birdclef/models/best_model_by_val_loss_softmax.keras


In [None]:
def write_initial_f1_file(filepath="best_val_f1.txt",score=0.00):
    """Writes a text file with "0.0" in the first line.
    Args:
      filepath: The path to the text file to be created or overwritten.
                Defaults to "best_val_f1.txt" in the current directory.
    """
    try:
        with open(filepath, 'w') as f:
            f.write(f"{score}\n")
        print(f"Successfully wrote '0.5236' to '{filepath}'")
    except Exception as e:
        print(f"An error occurred while writing to '{filepath}': {e}")
write_initial_file=False
if IMPLEMENTATION=='softmax' and write_initial_file:
    working_dir=drive_dir
    target_filepath = os.path.join(supplemental_files_dir, 'best_val_accuracy_softmax_mobilenet.txt')
    print(f"Attempting to write to: {target_filepath}")
    write_initial_f1_file(target_filepath,score=0.6014)
else:
  target_filepath = os.path.join(supplemental_files_dir, 'best_val_accuracy_softmax_mobilenet.txt')
  if os.path.exists(target_filepath):
    with open(target_filepath, 'r') as f:
        first_line = f.readline().strip()
        print(f"Starting with initial best accuracy of: {first_line}")
  else:
      print(f"File does not exist: {target_filepath}")


Starting with initial best accuracy of: 0.762826


# **Custom Training Loop**

In [None]:
if IMPLEMENTATION=='softmax':
    # Define the loss function for Stage 1 (Softmax)
    softmax_loss_fn = tf.keras.losses.CategoricalCrossentropy()
    # Define the loss function for Stage 1 (Softmax)
    val_softmax_loss_fn = tf.keras.losses.CategoricalCrossentropy()

@tf.function
def train_step_softmax(inputs, labels, model, optimizer, metric_accuracy):
    """
    Training step for Stage 1: Single-label classification with Softmax.
    Updated to work with modern TensorFlow mixed precision handling.

    Args:
        inputs (tf.Tensor): Batch of input spectrograms.
        labels (tf.Tensor): Batch of integer class labels (e.g., [0, 5, 2, ...]).
        model (tf.keras.Model): The EfficientNet model with softmax output.
        optimizer (tf.keras.optimizers.Optimizer): The optimizer.
        metric_accuracy (tf.keras.metrics.Metric): An accuracy metric
                                                    (e.g., SparseCategoricalAccuracy).

    Returns:
        tf.Tensor: The total loss for the batch.
    """
    spectrogram_input = inputs

    with tf.GradientTape() as tape:
        predictions = model(spectrogram_input, training=True)

        is_nan = tf.reduce_any(tf.math.is_nan(predictions))
        is_inf = tf.reduce_any(tf.math.is_inf(predictions))
        invalid_values_present = tf.logical_or(is_nan, is_inf)


        main_loss = softmax_loss_fn(labels, predictions)
        main_loss = tf.reduce_mean(main_loss)

        regularization_loss = tf.reduce_sum(model.losses)
        total_loss = main_loss + regularization_loss

        gradients = tape.gradient(total_loss, model.trainable_variables)


    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    metric_accuracy.update_state(labels, predictions)

    return total_loss


def multilabel_topk_accuracy(y_true, y_pred, k=3):
    """
    y_true: binary indicator array, shape (num_classes,)
    y_pred: array of predicted probabilities, shape (num_classes,)
    """
    topk_indices = np.argsort(y_pred)[::-1][:k]
    true_indices = np.where(y_true > 0)[0]
    hits = sum(1 for idx in true_indices if idx in topk_indices)
    return hits / len(true_indices) if len(true_indices) > 0 else 0.0


class StrictMultiLabelTopKAccuracy(tf.keras.metrics.Metric):
    """
    For each sample, if the set of top-k predictions (where k = number of true labels)
    exactly matches the set of true labels, counts as correct.
    """
    def __init__(self, name='strict_multilabel_topk_accuracy', **kwargs):
        super().__init__(name=name, **kwargs)
        self.count = self.add_weight(name='count', initializer='zeros')
        self.total = self.add_weight(name='total', initializer='zeros')

    def update_state(self, y_true, y_pred, sample_weight=None):
      def batch_strict_topk(y_true_np, y_pred_np):
          correct = 0
          total = 0
          for i in range(y_true_np.shape[0]):
              true_indices = np.where(y_true_np[i] > 0)[0]
              k = len(true_indices)
              if k == 0:
                  continue
              topk_pred_indices = np.argsort(y_pred_np[i])[::-1][:k]
              if set(true_indices) == set(topk_pred_indices):
                  correct += 1
              total += 1
          return np.array([correct, total], dtype=np.float32)

      correct_total = tf.py_function(
          func=batch_strict_topk,
          inp=[y_true, y_pred],
          Tout=tf.float32
      )
      correct_total.set_shape([2])  # <--- This line fixes the shape issue!
      self.count.assign_add(correct_total[0])
      self.total.assign_add(correct_total[1])



    def result(self):
        return tf.math.divide_no_nan(self.count, self.total)

    def reset_states(self):
        self.count.assign(0)
        self.total.assign(0)



def train_model_softmax(model, train_dataset, val_dataset, epochs, checkpoint_manager, optimizer,
                        drive_dir, # <-- NEW: drive_dir is now a required parameter
                        initial_best_val_accuracy=0.0, best_accuracy_filepath='best_val_accuracy_softmax_enet.txt',
                        best_val_loss=float('inf')):
    """
    Training function adapted for Stage 1: Single-label classification with Softmax.
    Uses tqdm.write for displaying metrics and saves files to drive_dir.

    Args:
        model (tf.keras.Model): The EfficientNet model with softmax output.
        train_dataset (tf.data.Dataset): Training dataset (spectrograms, integer_labels).
        val_dataset (tf.data.Dataset): Validation dataset (spectrograms, integer_labels).
        epochs (int): Number of training epochs.
        checkpoint_manager (tf.train.CheckpointManager): Keras CheckpointManager.
        optimizer (tf.keras.optimizers.Optimizer): The optimizer.
        drive_dir (str): The base directory for saving all output files (models, accuracy logs).
        initial_best_val_accuracy (float): Initial best validation accuracy.
        best_accuracy_filepath (str): File name for saving the best validation accuracy.
        best_val_loss (float): Initial best validation loss.

    Returns:
        None
    """
    # Construct the full path for the best accuracy file
    full_best_accuracy_filepath = os.path.join(supplemental_files_dir, best_accuracy_filepath)

    tqdm.write(f'You are starting with a validation loss of {best_val_loss:.4f}')
    best_val_accuracy = initial_best_val_accuracy

    try:
        if os.path.exists(full_best_accuracy_filepath):
            with open(full_best_accuracy_filepath, 'r') as f:
                try:
                    best_val_accuracy_loaded = float(f.readline().strip())
                    if best_val_accuracy_loaded > best_val_accuracy:
                        best_val_accuracy = best_val_accuracy_loaded
                    tqdm.write(f"Loaded initial best validation accuracy from '{full_best_accuracy_filepath}': {best_val_accuracy:.4f}")
                except ValueError:
                    tqdm.write(f"Warning: Could not read a valid float from '{full_best_accuracy_filepath}'. Starting with best accuracy = 0.0")
        else:
            tqdm.write(f"'{full_best_accuracy_filepath}' not found. Starting with initial best accuracy = 0.0")
    except Exception as e:
        tqdm.write(f"Error reading '{full_best_accuracy_filepath}': {e}. Starting with initial best accuracy = 0.0")

    tqdm.write(f"Starting training with initial best validation accuracy: {best_val_accuracy:.4f}")

    for epoch in range(epochs):
        tqdm.write(f"\n--- Epoch {epoch + 1}/{epochs} ---")

        # --- Training Loop ---
        train_accuracy_metric = StrictMultiLabelTopKAccuracy()



        total_train_loss = 0.0
        num_train_batches = 0

        # Wrap train_dataset with tqdm for a progress bar
        train_pbar = tqdm(enumerate(train_dataset), total=tf.data.experimental.cardinality(train_dataset).numpy(), desc=f"Epoch {epoch+1} Training")
        for batch_num, (inputs, labels) in train_pbar:
            loss = train_step_softmax(inputs, labels, model, optimizer, train_accuracy_metric)

            total_train_loss += loss
            num_train_batches += 1

            # Update tqdm progress bar description with current metrics
            train_pbar.set_postfix(loss=f"{loss.numpy():.4f}", acc=f"{train_accuracy_metric.result().numpy():.4f}")
            if batch_num%100==0:
              model.save(os.path.join(models_dir, 'current_model_softmax_mobilenet.keras'))

        # End of epoch training metrics
        avg_train_loss = total_train_loss / num_train_batches
        train_accuracy = train_accuracy_metric.result().numpy()
        tqdm.write(
            f"Training Results - Loss: {avg_train_loss:.4f}, Accuracy: {train_accuracy:.4f}"
        )
        model.save(os.path.join(models_dir, 'current_model_softmax_mobilenet.keras'))
        train_accuracy_metric.reset_states() # Reset for next epoch
        # --- Validation Loop ---
        val_accuracy_metric = StrictMultiLabelTopKAccuracy()
        val_loss_total = 0.0
        num_val_batches = 0

        # Wrap val_dataset with tqdm for a progress bar
        val_pbar = tqdm(val_dataset, total=tf.data.experimental.cardinality(val_dataset).numpy(), desc=f"Epoch {epoch+1} Validation")
        for val_inputs, val_labels in val_pbar:
            val_predictions = model(val_inputs, training=False)

            val_loss = val_softmax_loss_fn(val_labels, val_predictions)
            val_loss_total += val_loss.numpy()
            num_val_batches += 1

            val_accuracy_metric.update_state(val_labels, val_predictions)
            val_pbar.set_postfix(loss=f"{val_loss.numpy():.4f}", acc=f"{val_accuracy_metric.result().numpy():.4f}")

        # End of epoch validation metrics
        avg_val_loss = val_loss_total / num_val_batches
        val_accuracy = val_accuracy_metric.result().numpy()
        tqdm.write(f"Validation Results - Loss: {avg_val_loss:.4f}, Accuracy: {val_accuracy:.4f}")
        val_accuracy_metric.reset_states()


        # --- Checkpoint Saving Logic (based on Validation Loss) ---
        if avg_val_loss < best_val_loss:
            best_val_loss = avg_val_loss
            save_path = checkpoint_manager.save()
            tqdm.write(f"Best Validation Loss Improved to {best_val_loss:.4f}, saving checkpoint to {save_path}")
            try:
                # Save the entire model in Keras format for easy loading
                model.save(os.path.join(models_dir, 'best_model_by_val_loss_softmax.keras'))
                tqdm.write(f"Saved Keras model to {os.path.join(models_dir, 'best_model_by_val_loss_softmax.keras')}")
            except Exception as e:
                tqdm.write(f"Error saving Keras model by validation loss: {e}")
        else:
            tqdm.write(f"Validation Loss did not improve from {best_val_loss:.4f}")

        # --- Accuracy Saving Logic ---
        if val_accuracy > best_val_accuracy:
            best_val_accuracy = val_accuracy
            try:
                with open(full_best_accuracy_filepath, 'w') as f:
                    f.write(f"{best_val_accuracy:.6f}\n")
                tqdm.write(f'Best validation accuracy updated to {best_val_accuracy:.4f} in "{full_best_accuracy_filepath}"')
            except Exception as e:
                tqdm.write(f"Error writing best validation accuracy to '{full_best_accuracy_filepath}': {e}")
        else:
            tqdm.write(f"Validation Accuracy did not improve from {best_val_accuracy:.4f}")

        # Always save the current model at the end of each epoch for recovery/resumption
        try:
            model.save(os.path.join(models_dir, 'current_model_epoch_end_softmax.keras'))
            tqdm.write(f"Saved current model to {os.path.join(models_dir, 'current_model_epoch_end_softmax.keras')}")
        except Exception as e:
            tqdm.write(f'Could not save model at epoch {epoch}: {e}')


if IMPLEMENTATION=='softmax':
    # --- 7. Set training parameters ---
    EPOCHS = 1 # Adjust as needed

    print("\nStarting model training...")

    print("\nStarting the main training process...")
    train_model_softmax(
        model=model,
        train_dataset=train_dataset,
        val_dataset=val_dataset,
        epochs=EPOCHS,
        checkpoint_manager=checkpoint_manager,
        optimizer=optimizer,
        drive_dir=drive_dir,
        initial_best_val_accuracy=0.0,
        best_accuracy_filepath='best_val_accuracy_softmax_mobilenet.txt',
        best_val_loss= 1.4421
    )



Starting model training...

Starting the main training process...
You are starting with a validation loss of 1.4421
Loaded initial best validation accuracy from '/content/drive/MyDrive/best_val_accuracy_softmax_mobilenet.txt': 0.6846
Starting training with initial best validation accuracy: 0.6846

--- Epoch 1/1 ---


Epoch 1 Training: 79it [32:54, 24.99s/it, acc=0.7736, loss=1.4818]


Training Results - Loss: 1.1235, Accuracy: 0.7736


Epoch 1 Validation: 85it [05:11,  3.67s/it, acc=0.7628, loss=1.1339]


Validation Results - Loss: 1.0378, Accuracy: 0.7628
Best Validation Loss Improved to 1.0378, saving checkpoint to /content/drive/MyDrive/training_efficiency_netcheckpoints_jun13/ckpt-3
Saved Keras model to /content/drive/MyDrive/best_model_by_val_loss_softmax.keras
Best validation accuracy updated to 0.7628 in "/content/drive/MyDrive/best_val_accuracy_softmax_mobilenet.txt"
Saved current model to /content/drive/MyDrive/current_model_epoch_end_softmax.keras


# **Predictions/Accuracy Scores for Non-Birnet Classes**

In [None]:
softmax_analysis = True
if softmax_analysis:
    eligible_val_filenames = [f[0] for f in val_filenames]
    all_data = data.data[data.data['filename'].isin(eligible_val_filenames)]['filename'].values
    val_data_to_use = [(f, False) for f in all_data]

    validation_generator = AudioFeatureGeneratorMixedPadded(
        filenames=val_data_to_use,
        labels_df=df[['cleaned_filename', 'primary_label', 'secondary_labels', 'filename', 'duration', 'isOneBird']],
        audio_dir=audio_dir,
        soundscapes_dir=soundscapes_dir,
        sr=sr,
        chunk_duration=chunk_duration,
        batch_size=batch_size_val,
        shuffle=False,
        target_time_length_spectrogram=target_time_length_spectrogram,
        num_classes=num_classes,
        kind='val',
        return_info=True  # Important to get filename and start time info
    )

    # Load your trained model
    model = load_model(os.path.join(models_dir, 'best_model_by_val_loss_softmax.keras'))

    # Storage for data records
    records = []

    # Iterate through all batches in the validation generator
    for batch in range(len(validation_generator)):
        inputs, labels, batch_filenames, batch_start_times = validation_generator[batch]

        # Predict with the model for this batch
        softmax_probs = model.predict(inputs)

        # Decode true labels with threshold > 0.5
        true_label_indices = [np.where(sample > 0.5)[0] for sample in labels]
        true_label_names = [
            [validation_generator.label_encoder.inverse_transform([idx]) for idx in indices]
            for indices in true_label_indices
        ]

        # Get top predicted label only (most probable class)
        top_pred_indices = np.argmax(softmax_probs, axis=1)
        top_pred_names = validation_generator.label_encoder.inverse_transform(top_pred_indices)

        # Collect data for each sample in batch
        for i in range(len(inputs)):
            record = {
                'filename': batch_filenames[i][0],        # Filename
                'start_time': batch_start_times[i],       # Start time
                'true_labels': true_label_names[i],       # True labels list
                'predicted_label': top_pred_names[i],     # Top predicted label (string)
                'predicted_prob': softmax_probs[i][top_pred_indices[i]],  # Probability of top predicted label
                'all_pred_probs': softmax_probs[i].tolist()               # Full softmax distribution
            }
            records.append(record)

    # Build a DataFrame summarizing predictions and labels per file
    analysis_df = pd.DataFrame(records)

    print(f"Processed {len(analysis_df)} validation samples")

    display(analysis_df.sample(20))


Length of short train filenames 19948




[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 333ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 374ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 315ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 334ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 330ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 381ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 318ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 321ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 322ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 379ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 316ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 319ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1

Unnamed: 0,filename,start_time,true_labels,predicted_label,predicted_prob,all_pred_probs
14977,littin1/iNat148171.ogg,11.705503,[[littin1]],littin1,0.862315,"[3.223069870728068e-05, 6.376443195676984e-08,..."
1395,y00678/XC364320.ogg,5.713439,[[y00678]],y00678,0.438688,"[1.6786721971584484e-05, 9.55760151555296e-06,..."
5667,yercac1/XC74696.ogg,84.980306,[[yercac1]],chfmac1,0.24036,"[3.561975245247595e-05, 3.328108141431585e-05,..."
14932,littin1/XC656067.ogg,89.052224,[[littin1]],littin1,0.998952,"[5.493537855727482e-07, 1.6864617480294442e-09..."
17913,65448/iNat925724.ogg,3.44542,[[65448]],65448,0.963586,"[7.3970318226201925e-06, 1.220016798697543e-07..."
3085,laufal1/iNat564278.ogg,98.107055,[[laufal1]],laufal1,0.991638,"[3.613880608099862e-06, 5.420420166046824e-07,..."
4403,wbwwre1/XC135369.ogg,42.703525,[[wbwwre1]],littin1,0.495259,"[8.500062540406361e-05, 3.282758882505732e-07,..."
11567,speowl1/iNat855826.ogg,48.191337,[[speowl1]],speowl1,0.994633,"[1.2078160125383874e-06, 4.5724846131633967e-0..."
10698,trokin/XC705426.ogg,98.85824,[[trokin]],trokin,0.967644,"[1.1072098459408153e-05, 4.961615900356264e-07..."
12794,whbman1/iNat561351.ogg,3.213149,[[whbman1]],blkvul,0.52713,"[0.00013667362509295344, 7.636456575710326e-05..."


In [None]:
taxonomy=pd.read_csv(os.path.join(main_dir,'taxonomy.csv'))
analysis_df['true_label']=analysis_df['true_labels'].apply(lambda x: x[0][0])
non_birdnet_labels=np.load(os.path.join(supplemental_files_dir,'non_birdnet_labels.npy'),allow_pickle=True)
non_birdnet_df=analysis_df[analysis_df['true_label'].isin(non_birdnet_labels)]
non_birdnet_df['correct']=non_birdnet_df['true_label']==non_birdnet_df['predicted_label']
non_birdnet_df.groupby('true_label')['correct'].mean().sort_values(ascending=False).reset_index().rename(columns={'correct':'accuracy','true_label':'non_birdnet_label'})
overall_accuracy=non_birdnet_df['correct'].mean()
print(f'Overall accuracy for non-birdnet classes: {overall_accuracy:.4f}')
print('\nAccuracy By Class')
class_accuracy=non_birdnet_df.groupby('true_label')['correct'].mean().sort_values(ascending=False).reset_index()
class_accuracy=pd.merge(class_accuracy,taxonomy[['primary_label','class_name']],left_on='true_label',right_on='primary_label',how='left').rename(columns={'correct':'accuracy','true_label':'non_birdnet_label'})
class_accuracy

Overall accuracy for non-birdnet classes: 0.7587

Accuracy By Class


Unnamed: 0,non_birdnet_label,accuracy,primary_label,class_name
0,134933,1.0,134933,Amphibia
1,1346504,1.0,1346504,Insecta
2,135045,1.0,135045,Amphibia
3,21211,1.0,21211,Amphibia
4,787625,1.0,787625,Amphibia
5,66893,1.0,66893,Amphibia
6,50186,1.0,50186,Insecta
7,555086,1.0,555086,Amphibia
8,24272,1.0,24272,Amphibia
9,24292,1.0,24292,Amphibia


# **Non-BirdNET Classes Accuracy Results**

The main purpose of this model is to serve as a **reliable pseudo-label generator** for a later-stage multi-label (sigmoid output) bird classification model.

---

## Overall Performance

- The model achieves an **overall accuracy of approximately 75.87%** on the non-BirdNET classes.
- This high accuracy indicates that the model can confidently and reliably generate pseudo labels for many of these less common or out-of-BirdCLEF-class birds.

---

## Per-Class Accuracy

- Per-class accuracy results show **excellent performance (often near or at 100%) for many classes**, especially those within Amphibia, Insecta, and Mammalia groups.
- While some classes have lower accuracy, the majority indicate a strong reliability in label prediction.
- These results demonstrate the model's suitability as a foundation for producing trustworthy pseudo labels to improve the subsequent multi-label model training process.

---

This strong pseudo-labeling capability will facilitate more accurate and robust multi-species detection in later models while leveraging reliable single-bird predictions from this softmax-based approach.
