# Possibly everything you need to know about the contest

![Header](https://storage.googleapis.com/kaggle-competitions/kaggle/21669/logos/header.png)

This notebook is created to understand what needs to be done with the dataset obtained in the contest. I have tried to cover a few possible aspects and explained them in simple language. What I have tried to cover is mentioned in the index below:

## Content

1. [Evaluation metrics](https://www.kaggle.com/thanatoz/everything-you-need-to-know-about-the-contest?scriptVersionId=47211475#Evaluation-metrics)
    * [Explaination](https://www.kaggle.com/thanatoz/everything-you-need-to-know-about-the-contest?scriptVersionId=47211475#Explaination)
2. [Looking into the dataset](https://www.kaggle.com/thanatoz/everything-you-need-to-know-about-the-contest?scriptVersionId=47211475#Looking-into-the-dataset)
    * [The .csv files](https://www.kaggle.com/thanatoz/everything-you-need-to-know-about-the-contest?scriptVersionId=47211475#The-.csv-files)
    * [Train test folders](https://www.kaggle.com/thanatoz/everything-you-need-to-know-about-the-contest?scriptVersionId=47211475#Train-test-folders)
    * [What is FLAC?](https://www.kaggle.com/thanatoz/everything-you-need-to-know-about-the-contest?scriptVersionId=47211475#What-is-FLAC?)
    * [Visualizing the Audio file](https://www.kaggle.com/thanatoz/everything-you-need-to-know-about-the-contest?scriptVersionId=47211475#Visualizing-the-Audio-file)
    * [Loading the FLAC files](https://www.kaggle.com/thanatoz/everything-you-need-to-know-about-the-contest?scriptVersionId=47211475#Loading-the-flac-files)
    * [Plotting the Audio files](https://www.kaggle.com/thanatoz/everything-you-need-to-know-about-the-contest?scriptVersionId=47211475#Plotting-the-Audio-files)
3. [References](https://www.kaggle.com/thanatoz/everything-you-need-to-know-about-the-contest?scriptVersionId=47211475#References)
    
I'll keep adding more and more data into this file as it clicks me, but till then you can share what else could be added into the file in the comment section.

## Evaluation metrics

[Label ranking average precision (LRAP)](https://scikit-learn.org/stable/modules/model_evaluation.html#label-ranking-average-precision) measures the average precision of predictive model instead of using precision and recall whose values ranges from $0\lt x\le1$. To simplify, it answers the question that for each of the given samples what percents of the higher-ranked labels were true labels.

Formally, given a binary indicator matrix of the ground truth labels $ y \in \left\{0, 1\right\}^{n_\text{samples} \times n_\text{labels}} $ and the score associated with each label $\hat{f} \in \mathbb{R}^{n_\text{samples} \times n_\text{labels}}$ the average precision is defined as
$$ LRAP(y, \hat{f}) = \frac{1}{n_{\text{samples}}}
  \sum_{i=0}^{n_{\text{samples}} - 1} \frac{1}{||y_i||_0}
  \sum_{j:y_{ij} = 1} \frac{|{L}_{ij}|}{\text{rank}_{ij}} $$
  
where ${L}_{ij} = \left\{k: y_{ik} = 1, \hat{f}_{ik} \geq \hat{f}_{ij} \right\}$, $\text{rank}_{ij} = \left|\left\{k: \hat{f}_{ik} \geq \hat{f}_{ij} \right\}\right|$, $|\cdot|$ computes the cardinality of the set (i.e., the number of elements in the set), and $||\cdot||_0$ is the $\ell_0$ “norm” (which computes the number of nonzero elements in a vector).

  

In [None]:
import numpy as np
from sklearn.metrics import label_ranking_average_precision_score

y_true = np.array([[1, 0, 0], [0, 0, 1]])
y_score = np.array([[0.75, 0.5, 1], [1, 0.2, 0.1]])
label_ranking_average_precision_score(y_true, y_score)

#### Explaination

In order to simplify the above calculation, we can break it down as follows:
1. In the first sample, the y_true is `[1, 0, 0]` and the y_score is `[0.75, 0.5, 1]` which means that the class A is given $2^{nd}$ rank. This makes the value of $ {\sum_{j:y_{ij} = 1}} \frac{|{L}_{ij}|}{\text{rank}_{ij}}$ to be 1/2 = 0.5.

2. In the second sample, the y_true is `[0, 0, 1]` and the y_score is `[1, 0.2, 0.1]` which means that the class C is given $3^{rd}$ rank. This gives the value of $ {\sum_{j:y_{ij} = 1}} \frac{|{L}_{ij}|}{\text{rank}_{ij}}$ to be 1/3 = 0.333333333

Now we calculate the average of all the samples (2 in our case) and thus our final result will be $\frac{1}{2}*(0.5+0.33333333) = 0.416666666$

## Looking into the dataset

In [None]:
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
import sklearn
import seaborn as sns
import plotly.figure_factory as ff

matplotlib.rcParams['figure.figsize'] = (12.0, 6.0)

### The .csv files

The dataset contains 2 csv files:
> 1. **train_tp.csv** - training data of true positive species labels, with corresponding time localization
> 2. **train_fp.csv** - training data of false positives species labels, with corresponding time localization

Where the information of every column is as follows:
> 1. **recording_id** - The id of file containing the recording
> 2. **species_id** - The id of the specie
> 3. **songtype_id** - The id for songtype
> 4. **t_min, t_max** - Contains the start and end timing of the annotated signal
> 5. **f_min, f_max** - Contains the lower frequency and upper of annotated signal


In [None]:
tp_data=pd.read_csv('/kaggle/input/rfcx-species-audio-detection/train_tp.csv')
fp_data=pd.read_csv('/kaggle/input/rfcx-species-audio-detection/train_fp.csv')

In [None]:
tp_data.tail()

In [None]:
fp_data.tail()

In [None]:
def plot_counts(feature:str, dataframe:pd.DataFrame, kind:str="bar"):
    dataframe[feature].value_counts().plot(kind=kind)
    
plot_counts('species_id', tp_data)

In [None]:
plot_counts("songtype_id", tp_data)

### Train test folders

In the train test folders of the dataset, we are presented with flac files with labels in corresponding csv files. The number of files that we have are:

In [None]:
import os

print(f'Number of files in Train folder are: {len(os.listdir("../input/rfcx-species-audio-detection/train/"))}, \
and test folder are {len(os.listdir("../input/rfcx-species-audio-detection/test/"))}')

## What is FLAC?

To simply state, FLAC files are a lossless encoding format for storing audio. FLAC ‘Quality’ is determined by the ‘sample rate’, and the ‘bit depth’.  Sample rates start at 22,050hz, 32,000hz, 44,1000hz, 48,000hz, 88,200, 96,000hz and go up to 192,000hz, and bit size can generally range from 8 to 32 bits. 
![flac vs mp3](https://www.off-the-beat.com/wp-content/uploads/2020/02/FLAC-vs-MP3-.jpg)
If you want to read more about FLAC vs MP3 encoding, here is a wonderful blog that you can refer: https://www.off-the-beat.com/flac-vs-mp3/

## Visualizing the Audio file

Refering to Allen Downey's talk on [Basics of sound processing](https://www.youtube.com/watch?v=0ALKGR0I5MA) at SciPy 2015 and [his presentation](https://docs.google.com/presentation/d/1zzgNu_HbKL2iPkHS8-qhtDV20QfWt9lC3ZwPVZo8Rw0/pub?start=false&loop=false&delayms=3000&slide=id.g5a7a9806e_0_14), we can understand audio files as wave files as follows:

![Amplitudes](https://miro.medium.com/max/1400/1*akRbhl8739UEDuKHkOUR1Q.png)

### Loading the flac files

Flac files, just like mp3 files could be easily loaded using the _librosa_ library. _Librosa_ is the ultimate library for most of the audio processing requirements that you want to carry out. Lets start by calling the library and loading a file.

In [None]:
import librosa
from librosa import display

In [None]:
audio_path = f'../input/rfcx-species-audio-detection/train/{tp_data["recording_id"].values[0]}.flac'
x, sr = librosa.load(audio_path)

where _x_ is the data from the file and _sr_ is the sampling rate of the file. The function returns the numpy array with timestamps and the default sampling rate of 22KHz (Sampling rate: number of samples per second measured in Hz or KHz).

In [None]:
print(x.shape, sr)

We can also change the sampling rate of the file by some arbitary value as follows

In [None]:
librosa.load(audio_path, sr=44100)

In [None]:
import IPython.display as ipyd

In [None]:
ipyd.Audio(audio_path)

### Plotting the Audio files

Librosa gives us a huge options to work with. So Let's start by plotting the amplitude envelope of a waveform.

In [None]:
plt.figure(figsize=(14, 5))
display.waveplot(x, sr=sr)

**Fourier transform**

The "Fast Fourier Transform" (FFT) is an important measurement method in the science of audio and acoustics measurement. It converts a signal into individual spectral components and thereby provides frequency information about the signal. FFTs are used for fault analysis, quality control, and condition monitoring of machines or systems.

In [None]:
n_fft = 2048
D = np.abs(librosa.stft(x[:n_fft], n_fft=n_fft, hop_length=n_fft+1))
plt.plot(D);

In [None]:
hop_length = 512
D = np.abs(librosa.stft(x, n_fft=n_fft,  hop_length=hop_length))
display.specshow(D, sr=sr, x_axis='time', y_axis='linear')
plt.colorbar()

1. **Spectrogram**

A spectrogram is a visual representation of the spectrum of frequencies of sound or other signals as they vary with time. Spectrograms are sometimes called sonographs, voiceprints, or voicegrams. When the data is represented in a 3D plot, they may be called waterfalls. In 2-dimensional arrays, the first axis is frequency while the second axis is time. We can display a spectrogram using. librosa.display.specshow.

In [None]:
X = librosa.stft(x)
Xdb = librosa.amplitude_to_db(abs(X))
plt.figure(figsize=(14, 5))
display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar()

The vertical axis shows frequencies (from 0 to 10kHz), and the horizontal axis shows the time of the clip. Since we see that all action is taking place at the bottom of the spectrum, we can convert the frequency axis to a logarithmic one.

In [None]:
plt.figure(figsize=(14, 5))
display.specshow(Xdb, sr=sr, x_axis='time', y_axis='log')
plt.colorbar()

2. **Feature extraction**

Every audio signal consists of many features. However, we must extract the characteristics that are relevant to the problem we are trying to solve. The process of extracting features to use them for analysis is called feature extraction. Let us study about few of the features in detail.

**Zero Crossing Rate**

The zero crossing rate is the rate of sign-changes along a signal, i.e., the rate at which the signal changes from positive to negative or back. Let us calculate the zero crossing rate for our example audio clip.

In [None]:
# Zooming in the amplitute envelope
n0 = 5000
n1 = 5100
plt.figure(figsize=(14, 5))
plt.plot(x[n0:n1])
plt.grid()

In [None]:
zero_crossings = librosa.zero_crossings(x[n0:n1], pad=False)
print(sum(zero_crossings))

**Spectral Centroid**

It indicates where the ”centre of mass” for a sound is located and is calculated as the weighted mean of the frequencies present in the sound. Consider two songs, one from a blues genre and the other belonging to metal.

In [None]:
spectral_centroids = librosa.feature.spectral_centroid(x, sr=sr)[0]
spectral_centroids.shape

In [None]:
# Computing the time variable for visualization
frames = range(len(spectral_centroids))
t = librosa.frames_to_time(frames)
# Normalising the spectral centroid for visualisation
def normalize(x, axis=0):
    return sklearn.preprocessing.minmax_scale(x, axis=axis)
#Plotting the Spectral Centroid along the waveform
librosa.display.waveplot(x, sr=sr, alpha=0.4)
plt.plot(t, normalize(spectral_centroids), color='r')

**Mel-Frequency Cepstral Coefficients**

The Mel frequency cepstral coefficients (MFCCs) of a signal are a small set of features (usually about 10–20) which concisely describe the overall shape of a spectral envelope. It models the characteristics of the human voice.

In [None]:
mfccs = librosa.feature.mfcc(x, sr=sr)
print(mfccs.shape)

In [None]:
#Displaying  the MFCCs:
display.specshow(mfccs, sr=sr, x_axis='time')

## References

1. [Explore the rainforest soundscape](https://www.kaggle.com/gpreda/explore-the-rainforest-soundscape)
2. [EDA and Audio Processing with Python](https://www.kaggle.com/parulpandey/eda-and-audio-processing-with-python/?scriptVersionId=37933197#data)
3. [Allen Downey SciPy2015 presentation](https://docs.google.com/presentation/d/1zzgNu_HbKL2iPkHS8-qhtDV20QfWt9lC3ZwPVZo8Rw0/pub?start=false&loop=false&delayms=3000&slide=id.p)