In [None]:
import os
from data_processing import *
from util import *

from warnings import simplefilter
# ignore all future warnings
simplefilter(action='ignore', category=FutureWarning)

# Feature extraction from audio files

The original data given by the GTZAN Dataset are in .wav format. In Kaggle, there is also the csv files that hold the extracted data from the audio files. The script here is to re-generate the csv file from the input .wav audio files. 

**There is a corrupted audio file (jazz.00054.wav) within the GTZAN Dataset that is downloaded from Kaggle**

There are two parts to the feature extraction from the audio files

## 1. Splitting up the audio files into 3-seconds blocks

The function **extract_audio_chunks** from the data_processing.py splits up an audio file into smaller blocks. The main purpose to this is to create more more data samples for model training. Splitting up a 30 seconds audio file into 3-seconds blocks will increase the data sample by 10 times. The length is of each small audio chunk is defined by the seconds_per_chunk parameters of the extract_audio_chunks function. This function requires the pydub library. The splitting up of the audio file into small chunks is done explicity by calling **make_chunks** within the extract_audio_chunks function. The **make_chunks** function returns an array of the small audio chunks that are splitted up from the main audio file. 

These audio chunks are then exported as .wav file and stored in a temporary folder location for the feature extraction in step 2.

## 2. Extracting the features from each of the 3-seconds blocks

The **extract_features** function from the data_processing.py extracts the features from each audio file. The function requires the **librosa** library.

The features extracted are:
* file name: extracted from the given file name
* length: calculated from librosa.get_duration * librosa.get_samplerate 
* chroma_stft
* rms
* spectral_centroid
* spectral_bandwidth
* spectral_rolloff
* zero_crossing_rate
* harmonic
* tempo
* mfcc (20 instances)
* label

The means and variance are calculated for the following features:
* chroma_stft
* rms
* spectral_centroid
* spectral_bandwidth
* spectral_rolloff
* zero_crossing_rate
* harmonic
* tempo (only the mean is calculated for this feature)
* mfcc (20 instances)

Essentially, the extracted features are stored as Pandas dataframe. Each audio file feature should have the shape of (1, 58):
* file name
* length
* chroma_stft_mean
* chroma_stft_var
* rms_mean
* rms_var
* spectral_centroid_mean
* spectral_centroid_var
* spectral_bandwidth_mean
* spectral_bandwidth_var
* rolloff_mean
* rolloff_var
* zero_crossing_rate_mean
* zero_crossing_rate_var
* harmonic_mean
* harmonic_var
* tempo
* mfcc (20 instances, of mean and variance)
* label

Finally, the extracted features are saved as a .csv file to be used for model training.

In [None]:
dataset_path = './data/'
data = pd.DataFrame()
temp_dir = os.path.abspath("temp")
file_count = 0


for dirpath, dirnames, filenames in os.walk(dataset_path):
    for i, filename in enumerate(filenames):
        if filename.endswith('.wav'):
            filepath = os.path.join(dirpath, filename)
            file_count += 1
            print("Splitting {}".format(filepath))
            audio_chunks = extract_audio_chunks(filepath, 3000, temp_dir)
            for audio_chunk in audio_chunks:
                data = data.append(extract_features(audio_chunk), ignore_index=True)

print("Total audio files processed: {}".format(file_count))
clear_folder(temp_dir)

data.to_csv("audio_features_3_sec.csv", header=True, index=False)