# Fastaudio starter kit

This notebook tries to give you the basic steps to compete in the Rainforest Connection Species Audio Detection competition using fastaudio.

[Fastaudio](https://github.com/fastaudio/fastaudio) is an community contributed module for building audio machine learning applications on top of fastai 2. 

Let's start updating the pytorch version and installing it

In [None]:
!pip install --upgrade fastaudio

After installing fastaudio, restart the env before importing the library

![image.png](https://i.imgur.com/xlAOnbW.png)

# Imports and initial data exploration

In [None]:
import pandas as pd
from fastaudio.all import *
from fastai.vision.all import *

In [None]:
path = Path("../input/rfcx-species-audio-detection")
path.ls()

In [None]:
train_path = path / 'train'
test_path = path / 'test'

Let's open a training file and visualize/hear it

In [None]:
train_files = get_audio_files(train_path)
audio = AudioTensor.create(train_files[0])
audio.show();

# Processing dataframes

We will drop all the columns that are not recording_id or species_id.

In [None]:
df_train_tp = pd.read_csv(path / 'train_tp.csv')
df_train_tp["recording_id"] = df_train_tp["recording_id"].map(lambda x: "train/"+x)
df_train_tp.head()

In [None]:
df_train_tp = df_train_tp.drop(['t_min', 't_max', 'f_min', 'f_max', 'songtype_id'], axis=1)
df_train_tp['species_id'] = df_train_tp['species_id'].astype(str)
df_train_tp.head()

There are multiple lines with the same recording_id but different species_id. Now, we will group them and concat the species_id separated by commas `,`

In [None]:
# https://stackoverflow.com/questions/27298178/concatenate-strings-from-several-rows-using-pandas-groupby
df_train_tp['species_id'] = df_train_tp.groupby('recording_id')['species_id'].transform(",".join)
df_train_tp = df_train_tp.reset_index()

# Building the dataloaders

First we will build [datablocks](https://docs.fast.ai/tutorial.datablock.html), that are a general way to specify how to load our data. Then, using this blocks, the train and validation dataloaders will be created. 

In [None]:
# AudioToSpec is a Transform from fastaudio that runs on the GPU 
# (notice that it's passed as a a batch_tfms)
# Also, there's multiple AudioConfig's ready to use with parameters that can be easily adjusted.

audio_to_spec = AudioToSpec.from_cfg(AudioConfig.BasicMelSpectrogram(n_fft=512))

# Adding some data augmentation
data_augmentation = [AddNoise(color=NoiseColor.White, noise_level=0.1), SignalShifter(max_pct=0.3)]

blocks = DataBlock(blocks=(AudioBlock, MultiCategoryBlock),
                  get_x = ColReader('recording_id', pref=str(path.resolve())+"/", suff='.flac'),
                  get_y = ColReader('species_id', label_delim=','),
                  item_tfms = data_augmentation,
                  batch_tfms = audio_to_spec,
                  splitter=RandomSplitter(valid_pct=0.2, seed=42)
                  )

In [None]:
# Creating the dataloaders
dls = blocks.dataloaders(df_train_tp, bs=24)

Now let's vizualize one batch of data

In [None]:
dls.show_batch(ncols=3, nrows=2, figsize=(20, 10))

# Learner and training

The model used is based on the torchvision resnet18, with the only modification that the input should have 1 channel, because that's what our spectrograms have.

Here you can use all the standard computer vision tricks, in fact the cnn_learner comes from the fastai.vision module

In [None]:
learner = cnn_learner(dls, resnet18, n_in=1)

In [None]:
learner.lr_find()

Here you have a lot of room to experiment. As this is a starter kit, only the baseline `.fine_tune(...)` is used, but it's recommended to train for some epochs (with `.fit_one_cycle(...)`), unfreeze the model, and continue training.

In [None]:
learner.fine_tune(10, base_lr=5e-2)

In [None]:
learner.recorder.plot_loss()

# Creating submission file

In [None]:
submission_df = pd.read_csv(path / 'sample_submission.csv')
submission_df["recording_id"] = submission_df["recording_id"].map(lambda x: "test/"+x)
submission_df

In [None]:
# Easily create test dataloader and get the predictions
test_dl = dls.test_dl(submission_df)
preds = learner.get_preds(dl = test_dl)

In [None]:
preds[0].shape

In [None]:
# Copy the predictions into the submission dataframe
submission_df.iloc[:, 1:] = preds[0]

In [None]:
# It's ready to submit
submission_df["recording_id"] = submission_df["recording_id"].map(lambda x: x.split("/")[1])
submission_df

In [None]:
submission_df.to_csv('submission.csv', index=False)