# Intro

Welcome to the [BirdCLEF 2021 - Birdcall Identification](https://www.kaggle.com/c/birdclef-2021/overview) compedition.

![](https://storage.googleapis.com/kaggle-competitions/kaggle/25954/logos/header.png)

We will give you first a short introduction to start with your work. The nex step is to show a short analysis befor definen a model with keras.

We recommend [this notebook](https://www.kaggle.com/drcapa/recognizesongapp-fromscratch-tutorial) for handling audio data tutorial.

<span style="color: royalblue;">Please vote the notebook up if it helps you. Feel free to leave a comment above the notebook. Thank you. </span>

# Libraries

In [None]:
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
import soundfile as sf
import librosa
import librosa.display
import IPython.display as display

from sklearn.model_selection import train_test_split

from keras.utils import Sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv1D, MaxPool1D, BatchNormalization
from keras.optimizers import RMSprop,Adam
from keras.applications import VGG19, VGG16, ResNet50

import warnings
warnings.filterwarnings("ignore")

# Path

In [None]:
path = '/kaggle/input/birdclef-2021/'
os.listdir(path)

# Functions
We define some helper functions.

In [None]:
def read_ogg_file(path, file):
    """ Read ogg audio file and return numpay array and samplerate"""
    
    data, samplerate = sf.read(path+file)
    return data, samplerate


def plot_audio_file(data, samplerate):
    """ Plot the audio data"""
    
    sr = samplerate
    fig = plt.figure(figsize=(8, 4))
    x = range(len(data))
    y = data
    plt.plot(x, y)
    plt.plot(x, y, color='red')
    plt.legend(loc='upper center')
    plt.grid()
    
    
def plot_spectrogram(data, samplerate):
    """ Plot spectrogram with mel scaling """
    
    sr = samplerate
    spectrogram = librosa.feature.melspectrogram(data, sr=sr)
    log_spectrogram = librosa.power_to_db(spectrogram, ref=np.max)
    librosa.display.specshow(log_spectrogram, sr=sr, x_axis='time', y_axis='mel')

# Load Data

In [None]:
train_labels = pd.read_csv(path+'train_soundscape_labels.csv')
train_meta = pd.read_csv(path+'train_metadata.csv')
test_data = pd.read_csv(path+'test.csv')
samp_subm = pd.read_csv(path+'sample_submission.csv')

# Overview

In [None]:
print('Number train label samples:', len(train_labels))
print('Number train meta samples:', len(train_meta))
print('Number train short folder:', len(os.listdir(path+'train_short_audio')))
print('Number train audios:', len(os.listdir(path+'train_soundscapes')))
print('Number test samples:', len(test_data))

In [None]:
os.listdir(path+'train_short_audio/caltow')[:2]

In [None]:
train_labels.head()

In [None]:
train_meta.head()

# A Sample File
We focus on the sample in the first row of the train meta data.

In [None]:
row = 0
train_meta.iloc[row]

We extract to features, the primary label which is the name of the folder where the audio file is stored and the filename:

In [None]:
label = train_meta.loc[row, 'primary_label']
filename = train_meta.loc[row, 'filename']

# Check if the file is in the folder
filename in os.listdir(path+'train_short_audio/'+label)

Load the data and samplerate:

In [None]:
data, samplerate = sf.read(path+'train_short_audio/'+label+'/'+filename)
print(data[:8])
print(samplerate)

In [None]:
plot_audio_file(data, samplerate)

Plot [spectrogram](https://en.wikipedia.org/wiki/Spectrogram) with mel scaling:

In [None]:
plot_spectrogram(data, samplerate)

Display the audio of the file:

In [None]:
display.Audio(path+'train_short_audio/'+label+'/'+filename)

# Exploratory Data Analysis
Our challenge is to identify which birds are calling in **long** recordings.

There are 20 long audio files in the folder train_soundscapes. And there are also 20 unique audio ids: 

In [None]:
train_labels['audio_id'].unique()

Each audio file consists of 120 birds with a lenth of 5 seconds.

In [None]:
train_labels.groupby(by=['audio_id']).count()['birds'][:4]

So we have to split the long audio into 120 small audio.

## Focus On Labels
The target label birds is a space delimited list of any bird songs present in the 5 second window. So we have to encode the labels. Therefor we look on an example with 3 different birds:

In [None]:
print('original label:', train_labels.loc[458, 'birds'])
print('split into list:', train_labels.loc[458, 'birds'].split(' '))

We extract all label of the train data:

In [None]:
labels = []
for row in train_labels.index:
    labels.extend(train_labels.loc[row, 'birds'].split(' '))
labels = list(set(labels))

print('Number of unique bird labels:', len(labels))

We encode the labels and write them into a data frame:

In [None]:
df_labels_train = pd.DataFrame(index=train_labels.index, columns=labels)
for row in train_labels.index:
    birds = train_labels.loc[row, 'birds'].split(' ')
    for bird in birds:
        df_labels_train.loc[row, bird] = 1
df_labels_train.fillna(0, inplace=True)

# We set a dummy value for the target label in the test data because we will need for the Data Generator
test_data['birds'] = 'nocall'

df_labels_test = pd.DataFrame(index=test_data.index, columns=labels)
for row in test_data.index:
    birds = test_data.loc[row, 'birds'].split(' ')
    for bird in birds:
        df_labels_test.loc[row, bird] = 1
df_labels_test.fillna(0, inplace=True)

This representation of the labels we can use for further analysis. In instance for the distribution of the bird labels. We show the top 10 of the most observations:

In [None]:
df_labels_train.sum().sort_values(ascending=False)[:10]

Finally we merge the labels with the original data:

In [None]:
train_labels = pd.concat([train_labels, df_labels_train], axis=1)
test_data = pd.concat([test_data, df_labels_test], axis=1)

## Focus On Example

We focus on an example. The first audio file is named by

In [None]:
file = os.listdir(path+'train_soundscapes')[0]
file

We load the data and samplerate:

In [None]:
data, samplerate = read_ogg_file(path+'train_soundscapes/', file)

The numpy array has a lenght of 19,200,000. So every sample consists of 160,000 values. These 160,000 values describes 5 seconds of the audio file.

We split the file name into the audio_id and site:

In [None]:
audio_id = file.split('_')[0]
site = file.split('_')[1]
print('audio_id:', audio_id, ', site:', site)

We focus on the samples with the label birds unequal to nocall. There are 4 samples

In [None]:
train_labels[(train_labels['audio_id']==int(audio_id)) & (train_labels['site']==site) & (train_labels['birds']!='nocall')]

We want to extract the first example with the id 1771. This bird we can here from 455 seconds to 460 seconds.  

In [None]:
sub_data = data[int(455/5)*160000:int(460/5)*160000]

Plot the audio array:

In [None]:
plt.figure(figsize=(14, 5))
librosa.display.waveplot(sub_data, sr=samplerate)
plt.grid()
plt.show()

Listen to the bird:

In [None]:
display.Audio(sub_data, rate=samplerate)

# Parameter
Based on the EDA we define some parameters:

In [None]:
data_lenght = 160000
audio_lenght = 5
num_labels = len(labels)

For the Data Generator we want to define in the next step we need additional parameters:

In [None]:
batch_size = 16

# Train, Val And Test Data

In [None]:
list_IDs_train, list_IDs_val = train_test_split(list(train_labels.index), test_size=0.33, random_state=2021)
list_IDs_test = list(samp_subm.index)

# Audio Data Generator
We use a Data Generator to load the data on demand.

In [None]:
class DataGenerator(Sequence):
    def __init__(self, path, list_IDs, data, batch_size):
        self.path = path
        self.list_IDs = list_IDs
        self.data = data
        self.batch_size = batch_size
        self.indexes = np.arange(len(self.list_IDs))
        
    def __len__(self):
        len_ = int(len(self.list_IDs)/self.batch_size)
        if len_*self.batch_size < len(self.list_IDs):
            len_ += 1
        return len_
    
    def __getitem__(self, index):
        indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
        list_IDs_temp = [self.list_IDs[k] for k in indexes]
        X, y = self.__data_generation(list_IDs_temp)
        X = X.reshape((self.batch_size, 100, 1600//2))
        return X, y
    
    def __data_generation(self, list_IDs_temp):
        X = np.zeros((self.batch_size, data_lenght//2))
        y = np.zeros((self.batch_size, num_labels))
        for i, ID in enumerate(list_IDs_temp):
            prefix = str(self.data.loc[ID, 'audio_id'])+'_'+self.data.loc[ID, 'site']
            file_list = [s for s in os.listdir(self.path) if prefix in s]
            if len(file_list) == 0:
                # Dummy for missing test audio files
                audio_file_fft = np.zeros((data_lenght//2))
            else:
                file = file_list[0]#[s for s in os.listdir(self.path) if prefix in s][0]
                audio_file, audio_sr = read_ogg_file(self.path, file)
                audio_file = audio_file[int((self.data.loc[ID, 'seconds']-5)/audio_lenght)*data_lenght:int(self.data.loc[ID, 'seconds']/audio_lenght)*data_lenght]
                audio_file_fft = np.abs(np.fft.fft(audio_file)[: len(audio_file)//2])
                # scale data
                audio_file_fft = (audio_file_fft-audio_file_fft.mean())/audio_file_fft.std()
            X[i, ] = audio_file_fft
            y[i, ] = self.data.loc[ID, self.data.columns[5:]].values
        return X, y

Test the Data Generator

In [None]:
train_generator = DataGenerator(path+'train_soundscapes/', list_IDs_train, train_labels, batch_size)
val_generator = DataGenerator(path+'train_soundscapes/', list_IDs_val, train_labels, batch_size)
test_generator = DataGenerator(path+'test_soundscapes/', list_IDs_test, test_data, batch_size)

# Define Model

In [None]:
epochs = 2
lernrate = 2e-3

In [None]:
model = Sequential()
model.add(Conv1D(64, input_shape=(100, 1600//2,), kernel_size=5, strides=4, activation='relu'))
model.add(BatchNormalization())
model.add(MaxPool1D(pool_size=(4)))
model.add(Conv1D(64, kernel_size=3, activation='relu'))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(num_labels, activation='sigmoid'))

In [None]:
model.compile(optimizer = Adam(lr=lernrate),
              loss='binary_crossentropy',
              metrics=['binary_accuracy'])

In [None]:
model.summary()

In [None]:
history = model.fit_generator(generator=train_generator, validation_data=val_generator, epochs = epochs, workers=4)

# Analyse Training

In [None]:
fig, axs = plt.subplots(1, 2, figsize=(16, 4))
fig.subplots_adjust(hspace = .2, wspace=.2)
axs = axs.ravel()
loss = history.history['loss']
loss_val = history.history['val_loss']
epochs = range(1, len(loss)+1)
axs[0].plot(epochs, loss, 'bo', label='loss_train')
axs[0].plot(epochs, loss_val, 'ro', label='loss_val')
axs[0].set_title('Value of the loss function')
axs[0].set_xlabel('epochs')
axs[0].set_ylabel('value of the loss function')
axs[0].legend()
axs[0].grid()
acc = history.history['binary_accuracy']
acc_val = history.history['val_binary_accuracy']
axs[1].plot(epochs, acc, 'bo', label='accuracy_train')
axs[1].plot(epochs, acc_val, 'ro', label='accuracy_val')
axs[1].set_title('Accuracy')
axs[1].set_xlabel('Epochs')
axs[1].set_ylabel('Value of accuracy')
axs[1].legend()
axs[1].grid()
plt.show()

# Predict Test Data

In [None]:
y_pred = model.predict_generator(test_generator, verbose=1)

Set all values grather than 0.5 to 1:

In [None]:
y_test = np.where(y_pred > 0.5, 1, 0)

Generate target label string:

In [None]:
for row in samp_subm.index:
    string = ''
    for col in range(len(y_test[row])):
        if y_test[row][col] == 1:
            if string == '':
                string += labels[col]
            else:
                string += ' ' + labels[col]
    if string == '':
        string = 'nocall'
    samp_subm.loc[row, 'birds'] = string

# Export

In [None]:
output = samp_subm
output.to_csv('submission.csv', index=False)

In [None]:
output