**Aim of this notebook**

    The aim of this notebook is to implement a very basic but powerful Classification Model Resnet50's backbone with modified top.

    This is only the Training procedure but soon I will upload the Inference Pipeline.

**IF YOU LIKE THIS NOTEBOOK THEN, PLEASE UPVOTE!**

**Install Dependencies**

In [None]:
!pip install -q noisereduce

**Imports**

In [None]:
import os
import json
import tqdm
import librosa
import librosa.display
import numpy as np
import pandas as pd
import seaborn as sns
from PIL import Image
import plotly.express as px
import IPython.display as ipd
import matplotlib.pyplot as plt

pd.set_option('max_rows', 250)
pd.set_option('max_columns', 100)

**Configs**

In [None]:
seed = 42
os.environ['PYTHONHASHSEED'] = str(seed)
np.random.seed(seed)

DURATION = 15
SPEC_SHAPE = (48, 128)
SAMPLE_RATE = 32000
TEST_DURATION = 5
SPEC_SHAPE = (48, 128)
FMIN = 500
FMAX = 12500

**Load the files**

In [None]:
main_dir = '../input/birdclef-2022'
train_audio_dir = main_dir+'/train_audio'
test_audio_dir = main_dir+'/test_soundscapes'
train = pd.read_csv(main_dir+'/train_metadata.csv')
train['time_dt'] = pd.to_datetime(train['time'], errors='coerce')
train['time_dt'] = train['time_dt'].dt.round('30min')
train['time_H_M'] = train['time_dt'].dt.strftime('%H:%M')
train['secondary_label_len'] = train.secondary_labels.apply(lambda x:len(x.split(','))) 
test = pd.read_csv(main_dir+'/test.csv') 
submission = pd.read_csv(main_dir+'/sample_submission.csv')
taxonomy = pd.read_csv(main_dir+'/eBird_Taxonomy_v2021.csv')
scored_birds = json.load(open(main_dir+'/scored_birds.json', 'r'))

In [None]:
train.head(5)

In [None]:
test.head(5)

In [None]:
print("Number of rows in train data: {}\nNumber of columns in train data: {}".format(train.shape[0],train.shape[1]))
print("Number of rows in test data: {}\nNumber of columns in train data: {}".format(test.shape[0],test.shape[1]))

In [None]:
submission.head(5)

In [None]:
taxonomy.head(5)

In [None]:
print("There are {} no of unique classes but we will be evaluated only on {} no of classes".format(len(train.primary_label.unique()), len(scored_birds)))

Let's look at the scored classes 

In [None]:
print(scored_birds)

    Though we will be evaluated on 21 classes there's not much data for these 21 classes - just 1266 entries (based on primary_label). So one idea could be to train on all the classes but use about those 21 classes as validation.

**EDA**

**Distributions**

In [None]:
fig, ax = plt.subplots(figsize=(24, 8))
sns.countplot(data=train, x='primary_label', ax=ax, order=train['primary_label'].value_counts().index)
plt.xticks(rotation=90);

In [None]:
fig, ax = plt.subplots(figsize=(16, 8))
x_labels = pd.date_range(start='00:00', periods=48, freq='30min')
x_labels = list(x_labels.strftime('%H:%M'))
sns.countplot(data=train, x='time_H_M', ax=ax)
ax.set_xticklabels(x_labels, rotation=90);

In [None]:
fig = px.scatter_geo(
    train,
    lat="latitude",
    lon="longitude",
    color="common_name",
    width=1_000,
    height=500,
    title="BirdCLEF 2022 Training Data",
)
fig.show()

**Play a Few Samples**

In [None]:
ipd.Audio('../input/birdclef-2022/train_audio/bcnher/XC115512.ogg')

In [None]:
ipd.Audio('../input/birdclef-2022/train_audio/barpet/XC189894.ogg')

In [None]:
ipd.Audio('../input/birdclef-2022/train_audio/akekee/XC210201.ogg')

In [None]:
ipd.Audio('../input/birdclef-2022/train_audio/afrsil1/XC125458.ogg')

In [None]:
ipd.Audio('../input/birdclef-2022/train_audio/apapan/XC139974.ogg')

**Spectograms**

In [None]:
import torch
import torchaudio
import noisereduce as nr
from math import ceil

def create_spectrogram(
    fname: str,
    reduce_noise: bool = False,
    frame_size: int = 5,
    frame_step: int = 2,
    channel: int = 0,
    device = "cpu",
) -> list:
    waveform, sample_rate = torchaudio.load(fname)
    
    transform = torchaudio.transforms.Spectrogram(n_fft=1800, win_length=512).to(device)
    if reduce_noise:
        waveform = torch.tensor(nr.reduce_noise(
            y=waveform,
            sr=sample_rate,
            win_length=transform.win_length,
            use_tqdm=False,
            n_jobs=2,
        ))
    step = int(frame_step * sample_rate)
    size = int(frame_size * sample_rate)
    spectrograms = []
    for i in range(ceil((waveform.size()[-1] - size) / step)):
        begin = i * step
        frame = waveform[channel][begin:begin + size]
        if len(frame) < size:
            if i == 0:
                rep = round(float(size) / len(frame))
                frame = frame.repeat(int(rep))
            elif len(frame) < (size * 0.33):
                continue
            else:
                frame = waveform[channel][-size:]
        sg = transform(frame.to(device))
        spectrograms.append(np.nan_to_num(torch.log(sg).numpy()))
        # spectrograms.append(np.nan_to_num(sg.numpy()))
    return spectrograms


path_audio = os.path.join(train_audio_dir, train["filename"][0])
print(path_audio)
sgs = create_spectrogram(path_audio, reduce_noise=True)


fig, axarr = plt.subplots(ncols=len(sgs), figsize=(4 * len(sgs), 4))
for i, sg in enumerate(sgs):
    ax = axarr[i].imshow(sg, vmin=-50, vmax=10)
plt.colorbar(ax)

**Now let's get back to the modeling**

**Define the OneHotEncoder in order to convert all the classes beforehand**

In [None]:
from sklearn.preprocessing import OneHotEncoder

ohe = OneHotEncoder()
ohe.fit(train.primary_label.unique().reshape(-1, 1))

t_values = np.argmax(np.array(ohe.transform(train['primary_label'].values.reshape(-1, 1)).toarray()), 1)
train['target'] = t_values

**Genarating The Train Data**

The main idea behind generating the data is that:-
           
    The test predictions only requires audio for 5 seconds but the length of audio in           training set varies so here we will split the audio in 5 seconds intervals
    For example if the audio is 13 seconds long then we will divide it to
      1)1-5 seconds audio
      2)6-10 seconds audio
    The disadvantages of this technique is that i)We are loosing some information by 
                                                  removing the last portion of the audio
                                                  (I think that we can just pad it to use                                                     that portion)
                                                  
                                                ii)We are assuming that the bird's call will be present in all the sub audio samples which may or may not be the case.                                            

In [None]:
train.head(3)

In [None]:
train.primary_label.value_counts().describe()

In [None]:
from sklearn.model_selection import train_test_split

x_train, x_val, y_train, y_val = train_test_split(train[train.primary_label!='maupar'], train[train.primary_label!='maupar']['primary_label'], stratify=train[train.primary_label!='maupar']['primary_label'], test_size=0.2, random_state=seed)

In [None]:
'''
Here I am generating the data for all entries of all classes but if require we can just
keep some images per class to use as validation.
Ex. Use 80% images from each class for train and
    Keep 20% of images from each class for validation.
    
'''
def get_data(df):
    X = []
    Y = []
    for ul in tqdm.tqdm(df.primary_label.unique()):
        records = df[df.primary_label==ul]
        for r in records[['filename','primary_label','secondary_labels']].values:
            file = r[0]
            pl = r[1]
            sl = r[2]
            y = ohe.transform(np.array([pl]).reshape(-1, 1)).todense()
            arr, sr = librosa.load(os.path.join(train_audio_dir, file), sr=SAMPLE_RATE, duration=DURATION)
            chunks = []
            for c_ in range(0, len(arr), (TEST_DURATION*SAMPLE_RATE)):
                chunk = arr[c_:c_ + TEST_DURATION * SAMPLE_RATE]
                if len(chunk) < int(TEST_DURATION * SAMPLE_RATE):
                    break
                chunks.append(chunk)
            y_arr = []
            mel_chunks = []
            for c_ in chunks:
                hop_length = int(TEST_DURATION * SAMPLE_RATE / (SPEC_SHAPE[1] - 1))
                #Extract Mel Spec
                mel_spec = librosa.feature.melspectrogram(y=c_,sr=SAMPLE_RATE,n_fft=1024, hop_length=hop_length, 
                                                      n_mels=SPEC_SHAPE[0], fmin=FMIN, fmax=FMAX)

                mel_spec = librosa.power_to_db(mel_spec, ref=np.max) 
                # Normalize
                mel_spec = (mel_spec - mel_spec.min())/(mel_spec.max() - mel_spec.min())
                mel_chunks.append(np.asarray(Image.fromarray(mel_spec * 255.0).convert("RGB")))
                y_arr.append(y)
            y_arr = np.array(y_arr).reshape(-1, 152)
            mel_chunks = np.array(mel_chunks)
            X.extend(mel_chunks)
            Y.extend(y_arr)

    X = np.array(X)
    Y = np.array(Y) 
    print(X.shape,Y.shape)
    return X, Y

In [None]:
train_X, train_Y = get_data(x_train)
val_X, val_Y = get_data(x_val)

**Define and Plot The Model**

    1)We are using the ResNet50 as backbone with weights freezed you can try to train them if you like

In [None]:
import tensorflow as tf
import tensorflow_addons as tfa

tf.random.set_seed(seed)

ipt = tf.keras.layers.Input((48, 128, 3))
bb = tf.keras.applications.resnet.ResNet50(include_top=False, weights='imagenet')
bb.trainable = False
x = bb(ipt)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dense(1024, activation='relu')(x)
x = tf.keras.layers.Dropout(0.6)(x)
x = tf.keras.layers.Dense(512, activation='relu')(x)
x = tf.keras.layers.Dropout(0.6)(x)
x = tf.keras.layers.Dense(256, activation='relu')(x)
x = tf.keras.layers.Dropout(0.6)(x)
opt = tf.keras.layers.Dense(152, activation='softmax')(x)

model = tf.keras.models.Model(ipt, opt)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy', tfa.metrics.F1Score(num_classes=len(train.primary_label.unique()))])
model.summary()

**For Now I am just using the "validation_split" but later on I will change it to better approach and will also add K-Fold so stay tuned for that!**

**We are using ModelCheckpoint and EarlyStopping as Callbacks**

In [None]:
callbacks = [tf.keras.callbacks.EarlyStopping(monitor='val_loss', 
                                              verbose=1,
                                              patience=5),
             tf.keras.callbacks.ModelCheckpoint(filepath='best_model.h5', 
                                                monitor='val_loss',
                                                verbose=0,
                                                save_best_only=True)]

model.fit(train_X, train_Y, batch_size = 32, epochs=200, validation_data = (val_X, val_Y),
         callbacks=[])

**Conclution: **
        
    For Now the model is doing pretty bad but I will keep updating it's weights in my local machine and if the result it fair enough then I will publish!
    
    
       