# Simple training tutorial

The objective of this tutorial is to show you the basics of the library and how it can be used to simplify the audio processing pipeline.

This page is generated from the corresponding jupyter notebook, that can be found on [this folder](https://github.com/fastaudio/fastaudio/tree/master/docs)

To install the library, uncomment and run this cell:

In [None]:
# !pip install git+https://github.com/fastaudio/fastaudio.git

**COLAB USERS: Before you continue and import the lib, go to the `Runtime` menu and select `Restart Runtime`.**

In [None]:
from fastai.vision.all import *
from fastaudio.core.all import *
from fastaudio.augment.all import *

# ESC-50: Dataset for Environmental Sound Classification

In [None]:
#The first time this will download a dataset that is ~650mb
path = untar_data(URLs.ESC50, dest="ESC50")

The audio files are inside a subfolder `audio/`

In [None]:
(path/"audio").ls()

And there's another folder `meta/` with some metadata about all the files and the labels

In [None]:
(path/"meta").ls()

Opening the metadata file

In [None]:
df = pd.read_csv(path/"meta"/"esc50.csv")
df.head()

## Datablock and Basic End to End Training

In [None]:
# Helper function to split the data
def CrossValidationSplitter(col='fold', fold=1):
    "Split `items` (supposed to be a dataframe) by fold in `col`"
    def _inner(o):
        assert isinstance(o, pd.DataFrame), "ColSplitter only works when your items are a pandas DataFrame"
        col_values = o.iloc[:,col] if isinstance(col, int) else o[col]
        valid_idx = (col_values == fold).values.astype('bool')
        return IndexSplitter(mask2idxs(valid_idx))(o)
    return _inner

Creating the Audio to Spectrogram transform from a predefined config.

In [None]:
cfg = AudioConfig.BasicMelSpectrogram(n_fft=512)
a2s = AudioToSpec.from_cfg(cfg)

Creating the Datablock

In [None]:
auds = DataBlock(blocks=(AudioBlock, CategoryBlock),  
                 get_x=ColReader("filename", pref=path/"audio"), 
                 splitter=CrossValidationSplitter(fold=1),
                 batch_tfms = [a2s],
                 get_y=ColReader("category"))

In [None]:
dbunch = auds.dataloaders(df, bs=64)

Visualizing one batch of data. Notice that the title of each Spectrogram is the corresponding label.

In [None]:
dbunch.show_batch(figsize=(10, 5))

# Learner and Training

While creating the learner, we need to pass a special cnn_config to indicate that our input spectrograms only have one channel. Besides that, it's the usual vision learner.

In [None]:
learn = cnn_learner(dbunch, 
            resnet18,
            config=cnn_config(n_in=1), #<- Only audio specific modification here
            loss_func=CrossEntropyLossFlat(),
            metrics=[accuracy])

In [None]:
%reload_ext autoreload
%autoreload 1

In [None]:
from fastaudio.ci import skip_if_ci

@skip_if_ci
def run_learner():
    # epochs are a bit longer due to the chosen melspectrogram settings
    learn.fine_tune(10)

# We only validate the model when running in CI
run_learner()