# Speech2Text - Audio Pre Processing

## Introduction
The intent of this notebook is to demonstrate the methodology for cleaning datasets employed by us. Cleaning up datasets refers to transforming it so that metadata such as individual sample length, mean SNR of the sample etc are not parameters that end up strongly affecting our network. The details of each operations are listed below

In [None]:
%reload_ext autoreload 
%autoreload 2

## Pipeline
There are various parts to the pipeline, starting from the most obvious one, which is to wrap and load data as a single object. This is achieved via the [AN4](data/an4/an4.py) definition. It provides a composition of iterable objects that we can use for some _initial hand-wavy_ analysis without too much pain.

The totality of the pipeline resembles this chain:

`Load Data -> Denoise Audio -> [Cache Denoised Audio]`

The code cell below shows code for reading all of the audio files and transforming them into spectrograms creating using `scipy`

I used ffmpeg to batch convert the raw files to wav files with proper headers, which are then loaded using `scipy.io.wavfile`.

Below is the directory structure:
```
notebook
    |_data
        |_an4
            |_etc
                |... (metadata files here)
            |_wav
                |_an4_cltsk
                    |_...
                    |_...
                |_an4test_cltsk
                    |_...
                    |_...
            an4.py
            __init__.py
        __init__.py
```

The coming cell demonstrates code to generate and save spectrograms locally.

In [None]:
from data.an4.an4 import AN4
import pprint
import os
import numpy as np
from scipy import signal
from scipy.io import wavfile
from matplotlib import pyplot as plt

an4data = AN4(debug=False, conversion=False)

for filename in an4data.trainset.data:
    rate, data = wavfile.read(filename) 
    f, t, s = signal.spectrogram(data, rate, mode='magnitude')
    plt.pcolormesh(t, f, s)
    plt.ylabel('Frequency [Hz]')
    plt.xlabel('Time [sec]')
    plt.savefig(filename.replace('.wav', '.png'))