### Rainforest, species and songs
Unsurprisingly, the purpose of species conservation and any afford towards preservation of our environment is dear to my heart. So I cannot avoid this competition.

One thing I want to remind reader is the purpose we are building it for in this competition. To collapse long arguments I embedded WWF Living Planet Report 2020. Go ahead and <a href='https://livingplanet.panda.org/en-us/?utm_campaign=living-planet&utm_medium=media&utm_source=report'>read</a> if you are such a tree-hugger as myself.

In [None]:
%%html
<iframe src="https://livingplanet.panda.org/en-us/?utm_campaign=living-planet&utm_medium=media&utm_source=report" width="1000" height="500" zoom=0.6></iframe>

Now let's jump into the data we have.

In [None]:
import os
import librosa
from librosa import display as ld
import soundfile as sf
import scipy.signal as signal
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from IPython.core.interactiveshell import InteractiveShell
from IPython import display
InteractiveShell.ast_node_interactivity = "all"

%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 5);
sns.set_style('whitegrid')

In [None]:
data_dir = '/kaggle/input/rfcx-species-audio-detection/'
train_tp = pd.read_csv(os.path.join(data_dir, 'train_tp.csv'))
train_fp = pd.read_csv(os.path.join(data_dir, 'train_fp.csv'))
sub_df = pd.read_csv(os.path.join(data_dir, 'sample_submission.csv'))

In [None]:
train_tp.head(3)
train_fp.head(3)
sub_df.head(3)

In [None]:
train_tp.shape, train_fp.shape, sub_df.shape
train_tp.recording_id.nunique(), train_fp.recording_id.nunique(), sub_df.recording_id.nunique()

In [None]:
sorted(train_tp.species_id.unique())[0], sorted(train_tp.species_id.unique())[-1]

In [None]:
sorted(train_tp.songtype_id.unique()), sorted(train_fp.songtype_id.unique())

In [None]:
fig, ax = plt.subplots(1, 2)
train_tp.species_id.value_counts().plot(kind='bar', ax=ax[0]);
train_fp.species_id.value_counts().plot(kind='bar', ax=ax[1]);

Looks like specie 23 and 17 are either too common or too easily misslabeled as such... or both. Overall I expected class imbalance being far worse.

In [None]:
groups = train_tp.groupby('species_id')
fig, ax = plt.subplots(2, 2)

for val in groups.groups:
    i = val//2
    j = val%2
    _ = ax[i][j].plot(groups.get_group(val).f_min, label='f_min');
    _ = ax[i][j].plot(groups.get_group(val).f_max, label='f_max');
    _ = ax[i][j].set_title(f'Specie {val}');
    _ = ax[i][j].legend();
    plt.tight_layout();
    if val == 3:
        break

In [None]:
groups = train_fp.groupby('species_id')
fig, ax = plt.subplots(2, 2)

for val in groups.groups:
    i = val//2
    j = val%2
    _ = ax[i][j].plot(groups.get_group(val).f_min, label='f_min');
    _ = ax[i][j].plot(groups.get_group(val).f_max, label='f_max');
    _ = ax[i][j].set_title(f'Specie {val}');
    _ = ax[i][j].legend();
    plt.tight_layout();
    if val == 3:
        break

Okay, here are not much exiting patterns but one: frequencies are roughly in the same range for each specie. Even variations among TP and FP for specie 0 are close to each other.

In [None]:
sample = train_tp.sample(1)
song, rate = sf.read(data_dir+'train/'+sample.recording_id.tolist()[0]+'.flac')
display.Audio(song, rate=rate)

In [None]:
plt.plot(song)

In [None]:
start = int(sample.t_min.tolist()[0]*rate)
finish = int(sample.t_max.tolist()[0]*rate)

In [None]:
_ = plt.plot(song[start:finish]);
display.Audio(song[start:finish], rate=rate)
sample.species_id.values[0]

Well it is hard to understand what specie it is suppose to be. Seems like some sort of the cricket which was the target? Feel free to let me know in the comments what you think.

In [None]:
plt.plot(song, label='original wave');
# +2.5 and +3 terms are purely for visualization purposes
plt.plot(np.sin(song)+2.5, label='sine wave');
plt.plot(np.cos(song)+3, label='cosine wave');
plt.legend();

In [None]:
plt.plot(np.fft.rfft(song), label='Discrete Fourier transform');
plt.legend();

Remember we are dealing with waves and frequencies here, even though STFT transforms audio signal it captures "sine-cosine-ish" nature of it.

I have no intention to undermind the importance of spectrogram but let's see "pure" complex numbers in polar coordinates. Perhaps picture is not as sexy but still what if there is something?

In [None]:
plt.polar(librosa.stft(song));

In contrast spectrogram would look like that. I bet actual STFT would provide much richer information about any audio. But processing it let alone train any model is far from being practical reality any time soon. Spectrograms do quite good job capturing a gist of a signal.

In [None]:
s = librosa.amplitude_to_db(abs(librosa.stft(song)))
ld.specshow(s, sr=rate, x_axis='time', y_axis='hz');

In [None]:
mfcc = librosa.feature.mfcc(song, sr=rate, n_mfcc=12)
ld.specshow(mfcc, x_axis='time');

In [None]:
s.shape