![](https://wallup.net/wp-content/uploads/2016/01/19116-Eastern_Imperial_Eagle-nature-animals-birds-eagle.jpg)

# BirdCLEF 2022 : Model Building and Training
---

In this notebook we will make the Neural Network model and train it on the data itself.

**This is the second of the 3 notebooks which improvises from the findings of the first notebook and prepare the appropriate model and dataset and train the model on it.**

# Installations

In [None]:
!pip install torchsummary

# Libraries

In [None]:
# General purpose libraries for loading and manipulating data
import os
import re
import json
import time
import numpy as np
import pandas as pd
from glob import glob


# Pytorch imports for neural networks and tensor manipulations
import torch
import torchaudio
import torch.nn as nn
from torch.optim import Adam
from torch.nn import CrossEntropyLoss
from torchvision.transforms import Resize
from torchaudio.transforms import MelSpectrogram
from torch.utils.data import Dataset, DataLoader, random_split


# Libraries for visualization
import torchsummary
from termcolor import cprint
import matplotlib.pyplot as plt


# Libraries to hide warnings
import warnings
warnings.filterwarnings("ignore")


# ipywidgets
%matplotlib inline

# Datapaths

In [None]:
train_base_path = "../input/birdclef-2022/train_metadata.csv"
test_base_path = "../input/birdclef-2022/test.csv"
sample_submission_base_path = "../input/birdclef-2022/sample_submission.csv"
bird_taxonomy_base_path = "../input/birdclef-2022/eBird_Taxonomy_v2021.csv"
labels_base_path = "../input/birdclef-2022/scored_birds.json"
train_dir = "../input/birdclef-2022/train_audio"
test_dir = "../input/birdclef-2022/test_soundscapes"

### Loading train metadata

In [None]:
train_df = pd.read_csv(train_base_path)
train_df.head()

# Processing
Now we have to process training data so that it would be helpful for us.

In [None]:
imp_features = ["primary_label", "type", "rating", "filename"]
train_df = train_df[imp_features]
train_df.head()

In this scenario we are only taking the calls which only resemble a proper call not some specific or unique call , cause those will destroy the patterns.

In [None]:
def extract_call(data, call = 'call'):
    try:
        if re.search(data, call):
            return "True"
        else:
            return "False"
    except:
        return "False"

In [None]:
print("Length of data before call extraction : {}".format(len(train_df)))
train_df["type"] = train_df["type"].apply(extract_call)
train_df = train_df[train_df["type"] == "True"]
train_df.drop("type", 1, inplace = True)
print("Length of data after call extraction : {}".format(len(train_df)))
train_df.head()

Creating a class encoding dictionary which will help us find the correct class names in future.

In [None]:
class_dict = dict()

for index, label in enumerate(train_df.primary_label.unique()):
    class_dict[index] = label
    train_df["primary_label"].replace(label, index, inplace = True)
print(class_dict)

Saving the object in a file so that we can use it on further cases.

In [None]:
json.dump(class_dict, open("class_dict.json", "w"))

Let's check the processed training metadata.

In [None]:
train_df.head()

Similarly saving this for using in custom dataset and for future.

In [None]:
train_df.to_csv("training_metadata.csv", index = False)

# Dataset

The first task is to fix the random seed i.e. we can replicate all the next scenarios. also setting the audio backend to lod the audio data into tensors.

In [None]:

torch.manual_seed(42)
torchaudio.set_audio_backend("soundfile")

Now , it's time to build our custom dataset which will take the data directory and the processed training metadata and create trainable data.

In [None]:
class CLEFDataset(Dataset):
    
    def __init__(self,
                 data_dir,
                 metadata_path,
                 size = 640,
                 transform = None):
        super(CLEFDataset, self).__init__()
        self.data_dir = data_dir
        self.metadata = pd.read_csv(metadata_path)
        self.size = size
        self.transform = transform
        
    def __len__(self):
        return len(self.metadata)
    
    def __getitem__(self, index):
        path = self.metadata.loc[index, "filename"]
        path = os.path.join(self.data_dir, path)
        label = self.metadata.loc[index, "primary_label"]
        mono_audio = self.load_audio(path)
        mono_audio = mono_audio.unsqueeze(dim=0)
        return mono_audio, label
    
    
    def load_audio(self, path):
        audio, _ = torchaudio.load(path)
        if self.transform != None:
            for aug in self.transform:
                audio = aug(audio)
        return audio[0,:]

We also need a bit of data equivalence, so that training can be more specific.

In [None]:
augm = [
    MelSpectrogram(n_mels = 128),
    Resize((128, 128))
]
augm

Now creating the dataset

In [None]:
metadata_path = "./training_metadata.csv"
dataset = CLEFDataset(train_dir, metadata_path, transform = augm)

In [None]:
len(dataset)

In [None]:
data, label = dataset[10]
print("Audio Shape : {} , label : {}".format(data.shape, label))

The dataset is created correctly.
Now we should split the dataset into training and validation sets.

**Train-Validation Ratio = 4:1**

In [None]:
x1 = int(len(dataset) * 0.8)
x2 = len(dataset) - x1
train_ds, val_ds = random_split(dataset, [x1, x2])

In [None]:
print("Length of Training Dataset : {}".format(len(train_ds)))
print("Length of Validation Dataset : {}".format(len(val_ds)))

Now it's time to create patch of data which will be a better way to train the model as it won't need too much space to load the whole data but patches of it. 
Note : The datasets are shuffled so that sparsity stays present.

In [None]:
BATCH_SIZE = 64

train_dl = DataLoader(train_ds, batch_size = BATCH_SIZE, shuffle = True)
val_dl = DataLoader(val_ds, batch_size = BATCH_SIZE, shuffle = True)

Let's check the data chunks

In [None]:
for patch, labels in train_dl:
    print(patch.shape, labels.shape)
    break
for patch, labels in val_dl:
    print(patch.shape, labels.shape)
    break

# Neural Network Model

In here we have used several CNN and ANN layers just to be sure we do not leave any crucial data.

In [None]:
# Convolution shape updating function
def conv_shape(shape, kernel_size, stride, padding):
    H, W = shape[0], shape[1]
    H = ((H - kernel_size + 2*padding) // stride) + 1
    W = ((W - kernel_size + 2*padding) // stride) + 1
    return H, W

In [None]:
class Conv(nn.Module):
    
    def __init__(self, 
                   in_channels,
                   out_channels,
                   kernel_size,
                   stride=(1,1),
                   padding=(0,0),
                   momentum=0.15):
        super(Conv, self).__init__()
        self.conv_block = nn.Sequential(
            nn.BatchNorm2d(in_channels, momentum = momentum),
            nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding),
            nn.ReLU()
        )
        
    def forward(self, x):
        return self.conv_block(x)


class CLEFNetwork(nn.Module):
    
    def __init__(self,
                 num_classes,
                 in_channels = 1,
                 H = 128,
                 W = 128,
                 num_downs = 3):
        super(CLEFNetwork, self).__init__()
        
        self.num_C = num_classes
        self.num_downs = num_downs
        self.in_channels = in_channels
        self.C = 8
        self.H, self.W = self.calc_HW(H, W)
        self.in_conv_block = Conv(self.in_channels, self.C, 7, (2, 2))
        self.conv_block = nn.ModuleList(
                [
                    Conv(self.C * 2**i, self.C * 2**(i+1), 3, (2, 2))
                    for i in range(self.num_downs-1)
                ]
        )
        self.fc_block = nn.Sequential(
                nn.Linear(self.H * self.W * self.C * 2**(self.num_downs - 1), 1024),
                nn.Linear(1024, 1024),
                nn.Linear(1024, self.num_C)
        )
        
    def calc_HW(self, H, W):
        H, W = conv_shape((H, W), 7, 2, 0)
        for num_down in range(self.num_downs - 1):
            H, W = conv_shape((H, W), 3, 2, 0)
        return H, W
        
        
    def forward(self, x):
        x = self.in_conv_block(x)
        for block in self.conv_block:
            x = block(x)
        x = x.view(x.shape[0], -1)
        x = self.fc_block(x)
        return x

Now loading the class label dictionary which contains the tital number of classes.

In [None]:
class_labels_path = "./class_dict.json"

In [None]:
class_labels = json.load(open(class_labels_path, "r"))
num_classes = len(class_labels.keys())
print("Number of class : {}".format(num_classes))

Let's try a simple forward pass with some random data on the model.

In [None]:
model = CLEFNetwork(num_classes)
rand_data = torch.rand(5, 1, 128, 128)
model(rand_data).shape

Before starting the training , let's check whether all the layers are passing through the model parameters, otherwise they won't be updated with the gradients on backtracking.

In [None]:
for name, param in model.named_parameters():
  print(f"{name} : {param.shape}, requires_grad : {param.requires_grad}")

All looks fine. Let's visualize the model

In [None]:

torchsummary.summary(model, (1, 128, 128), device = "cpu")

We need an accuracy counting function for training purpose.

In [None]:
def accuracy_func(pred, true):
    pred = torch.argmax(pred, dim = 1)
    acc = sum(true == pred)
    return acc

# Model training and saving best models

The first task in these phase is to set the hyperparameters.

Also we need to check whether any distributive device (**GPU** , **TPU**) is present or not as it may be efficient for model training.

In [None]:
EPOCHS = 10
optim = Adam(model.parameters(), lr = 1e-4)
criterion = CrossEntropyLoss()
device = "cuda:0" if torch.cuda.is_available() else "cpu"
device

### Training:

Now it is the most important moment of the whole task.

The training loop will take the best model on the accuracy and loss metrics.

In [None]:
train_init = time.time()
cprint("Started training...", "blue")
best_loss = np.inf
best_acc = 0.0
if device == "cuda:0":
    print("Model Loaded on GPU...")
    model = model.cuda()
update = 0
TL, VL, TA, VA = [], [], [], []
for epoch in range(EPOCHS):
    print(f"Epoch {epoch + 1} :")
    epoch_init = time.time()
    train_loss = val_loss = 0.0
    train_acc = val_acc = 0
    tot_val_data_point = 0
    model.train()
    for train_index, (patch, labels) in enumerate(train_dl):
        optim.zero_grad()
        if device == "cuda:0":
            dev_patch = patch.cuda()
            dev_labels = labels.cuda()
        else:
            dev_patch = patch
            dev_labels = labels
        output = model(dev_patch)
        acc = accuracy_func(output, dev_labels)
        train_acc += acc
        loss = criterion(output, dev_labels)
        train_loss += loss.item()
        TL.append(loss.item())
        TA.append(acc / dev_patch.shape[0])
        if train_index % 10 == 9:
            print(f"      [Step {train_index + 1}] Loss : {'%.6f'%loss.item()}")
        loss.backward()
        optim.step()
        
    model.eval()
    with torch.no_grad():
        for val_index, (patch, labels) in enumerate(val_dl):
            if device == "cuda:0":
                dev_patch = patch.cuda()
                dev_labels = labels.cuda()
            else:
              dev_patch = patch
              dev_labels = labels
            output = model(dev_patch)
            acc = accuracy_func(output, dev_labels)
            val_acc += acc
            loss = criterion(output, dev_labels)
            val_loss += loss.item()
            VL.append(loss.item())
            VA.append(acc / dev_patch.shape[0])
    TRAIN_ACC = train_acc / len(train_ds)
    VAL_ACC = val_acc / len(val_ds)
    print(f"   Train Loss : {'%.6f'%train_loss} | Train accuracy : {'%.6f'%TRAIN_ACC}")
    print(f"   Validation Loss : {'%.6f'%val_loss} | Validation Accuracy : {'%.6f'%VAL_ACC}")
    updation_flag = False
    if val_loss < best_loss:
      update = 0
      updation_flag = True
      best_loss = val_loss
      cprint("Loss Updation : Positive", "green")
      torch.save({
          "model" : model.state_dict(),
          "optim" : optim.state_dict(),
          "epoch" : epoch + 1
      }, "best_loss_model.pt")
    if VAL_ACC > best_acc:
      update = 0
      updation_flag = True
      best_acc = VAL_ACC
      cprint("Accuracy Updation : Positive", "green")
      torch.save({
          "model" : model.state_dict(),
          "optim" : optim.state_dict(),
          "epoch" : epoch + 1
      }, "best_accuracy_model.pt")
    if updation_flag == False:
        cprint("Model Updation : Negative\n", "red")
        update += 1
    print(f"   Execution Time : {'%.3f'%(time.time() - epoch_init)} seconds\n")
    if update >= 5:
      cprint("Model Stopped due to continuous model learning degradation\n", "red")
      break
cprint("Training finished...", "blue")
cprint(f"Exceution Time : {'%.3f'%(time.time() - train_init)} seconds", "blue")

# Visualization: 

Now that the training has been completed, we can plot the loss and accuracy curves to see the model performances.

In [None]:
plt.figure(figsize = (20, 6))
plt.plot(TL, label = "training loss")
plt.plot(VL, label = "validation loss")
plt.xlabel("Steps")
plt.ylabel("Loss")
plt.title("Loss Curves", size = 20)
plt.legend()
plt.show()

The accuracies are stored as GPU tensors , so we have make them ordinary float variables for plotting.

In [None]:
for index in range(len(TA)):
    TA[index] = float(TA[index].cpu().detach())
for index in range(len(VA)):
    VA[index] = float(VA[index].cpu().detach())

In [None]:
plt.figure(figsize = (20, 6))
plt.plot(TA, label = "training accuracy score")
plt.plot(VA, label = "validation accuracy score")
plt.xlabel("Steps")
plt.ylabel("Accuracy")
plt.title("Accuracy Curves", size = 20)
plt.legend()
plt.show()

# Thanks for visiting :)

# Do UPVOTE if you like the notebook :)
## Also follow me on [kaggle](https://www.kaggle.com/sagnik1511) , [GitHub](https://github.com/sagnik1511) and on [LinkedIn](https://www.linkedin.com/in/sagnik1511)