Author: Matthew Kinsley

Class: CIS663

Date: 2021-02-25


# Overview

This file will build on the code and teaching by Valerio Velardo in which he provide a bare bones and simple example code for MFCC feature extraction using librosa.

His github post can be found at.
https://github.com/musikalkemist/Deep-Learning-Audio-Application-From-Design-to-Deployment/tree/master/3-%20Implementing%20a%20Speech%20Recognition%20System%20in%20TensorFlow%202

We have adapted his code to work with our dataset.

# Global Imports
The following contains the global imports for the notebook.

In [35]:
import os
import json
import wave
import shutil
import librosa
import collections
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt

from pathlib import Path
from sklearn.model_selection import train_test_split

# Global Constants
The following code contains the constants that are used throughout the file.

In [2]:
DATASET_PATH = "../../data/wav48"
WORK_PATH = '../../data/working'    # Working folder to use for blocks.
JSON_PATH = "."
EXP_SUBJECTS = 10                   # The number of subjects to process so we don't 
                                    # test on all which would take a long time.

# Shared Methods
Teh follwing code block defines shared methods that can be used throughout this notebook.

In [3]:
# ------------------------------------------------------------------------------------------------
# Description:
#    This function finds the next file in the folder that hasn't been processed
# yet.
#
# Inputs:
#    data_path - The path where files are stored
#    idx - The index of the suspected next file
#
# Output:
#    is_good - True if a file was found, otherwise False
#    file_path - The full file name and path string.
#    idx - The suspected idx of the next file
# ------------------------------------------------------------------------------------------------
def get_next_file_name(data_path, idx):
    is_good = False
    file_path = data_path + '_' + str(idx).zfill(3) + '.wav'

    # Give it 3 attempts to find a good file because some file ids are missing
    x = 0
    while not Path(file_path).exists() and x < 3:
        idx = idx+1
        file_path = data_path + '_' + str(idx).zfill(3) + '.wav'
        x = x+1

    # If we found a file make sure is_good is true
    if Path(file_path).exists():
        is_good = True

    return is_good, file_path, idx+1

In [4]:
# ------------------------------------------------------------------------------------------------
# Description:
#    This function loads the original waveforms and then blocks them into the  
# specified number of blocks eccach of a specified length.
#
# Paramaters:
#    data_path - The path where files are stored
#    working_path - The path were the working files are created.
#    blocks - The number of files to create for each subject.
#    len - The number of seconds each files should be.
#    max_subj - Specifies the maximum number of subjects to process
#
# Outputs:
#    DataFrame containing the subject ID and file name for each file created.
# ------------------------------------------------------------------------------------------------
def prep_wave_files(data_path, working_path, len_sec, max_subj=-1):
    # Define named tuple
    wparams = collections.namedtuple('WParams', 'nchannels sampwidth, framerate, nframes, comptype, compname')

    # Delete and files in the working path that may be left over
    try:
        shutil.rmtree(working_path);
    except:
        print('No working path')
    os.mkdir(working_path)
    
    # Get a listing of all dir entries in the datapath    
    entries = os.scandir(data_path)

    # Each directory represents a subject.  We need to load those files an build
    # the necessary amount of 60 second files.  
    subj = 0
    for entry in entries:
        if subj < max_subj or max_subj==-1:
            subj = subj+1
            subject_id = entry.name
            path = entry.path

            out_path = working_path + '/' + subject_id
            os.mkdir(out_path)

            print('    Preparing Subject ', subject_id)

            j = 1
            bcnt = 0
            audiof = []
            done = False
            while not done:
                out_file_name = out_path + '/' + subject_id + '_' + str(bcnt).zfill(3) + '.wav'
                is_good, in_file_name, j = get_next_file_name(path + '/' + subject_id, j)

                # If good then read in the wave file
                if is_good:
                    # read in the file
                    wf = wave.open(in_file_name, 'rb')

                    # Downsample it
                    p = wf.getparams()
                    data = wf.readframes(wf.getnframes())

                    # Append it to the working stream
                    audiof.append([p, data])
                    sr = p.framerate
                    wf.close()

                    # Check to see if we have exceeded 60 seconds yet
                    length = 0
                    for i in range(0,len(audiof)):
                        length = length + len(audiof[i][1])

                    if length > len_sec*sr*2:
                        # Calculate important parameters for writing the file
                        l_block = len(audiof)-1
                        l_block_overflow = (length-len_sec*sr*2)
                        l_block_len = len(audiof[l_block][1])-l_block_overflow

                        # Write the block
                        output = wave.open(out_file_name, 'wb')
                        output.setparams(audiof[0][0])
                        for i in range(0, len(audiof)-1):
                            output.writeframes(audiof[i][1])

                        output.writeframes(audiof[l_block][1][0:l_block_len])
                        output.close()

                        # Increment the block count
                        # Save the remainder
                        audiof = []
                        bcnt = bcnt + 1
                else:
                    done = True

            print('    Subject ', subject_id, ' Complete')

In [5]:
# ------------------------------------------------------------------------------------------------
# Description:
#    This method orginally produced by Valerio Velardo uses librosa to extract MFCC from all the 
# audo files in a given folder and specified parameters.  All extracted features are exprted to a
# json file.  It has been modified to work with our dataset.
#
# Note: This method requires that all audio files for each individual are stored in the same folder
# It uses the folder name to determine that a file belongs to an individual.  This aligns with the
# structure of the dataset and can be run without chagnes.
#
# Parameters:
#    dataset_path - The path to the root folder to search for audio files.
#    jason_path - Full path to the json file to fill with the extacted features.
#    num_mfcc - The number of coefficients to extract
#    n_fft - Interval to use to apply the FFT.  Measured in $ of samples.
#    hop_length - Slidding window for FFT measured in # of samples.
#    max_subj - Specifies the maximum number of subjects to process
# ------------------------------------------------------------------------------------------------
def preprocess_dataset(dataset_path, json_path, min_samples=22050, num_mfcc=13, n_fft=2048, hop_length=512,
                       max_subj=-1):
    # dictionary where we'll store mapping, labels, MFCCs and filenames
    data = {
        "mapping": [],
        "labels": [],
        "MFCCs": [],
        "files": []
    }

    # Get a listing of all dir entries in the datapath    
    entries = os.scandir(dataset_path)

    subj = 0
    for entry in entries:
        # make sure we don't process too many subjects
        if subj < max_subj or max_subj==-1:
            subj = subj+1
            subject_id = entry.name
            path = entry.path
            print("    Processing: '{}'".format(subject_id))

            j = 1
            bcnt = 0
            done = False
            while not done:
                is_good, in_file_name, j = get_next_file_name(path + '/' + subject_id, j)

                # If good then read in the wave file
                if is_good:
                    # save label (i.e., sub-folder name) in the mapping
                    data["mapping"].append(subject_id)

                    # Original code stopped processing when it hit a non audio file.  Adding the 
                    # exception handler allows the file to be logged and the processing to continue.
                    try:
                        # load audio file and slice it to ensure length consistency among different files
                        signal, sample_rate = librosa.load(in_file_name)

                        # drop audio files with less than pre-decided number of samples
                        if len(signal) >= min_samples:

                            # ensure consistency of the length of the signal
                            signal = signal[:min_samples]

                            # extract MFCCs
                            MFCCs = librosa.feature.mfcc(signal, sample_rate, n_mfcc=num_mfcc, n_fft=n_fft,
                                                        hop_length=hop_length)

                            # store data for analysed track
                            data["MFCCs"].append(MFCCs.T.tolist())
                            data["labels"].append(subj-1)
                            data["files"].append(in_file_name)
                    except:
                        print('    Can not load: {}: {}'.format(in_file_name, subj-1))
                else:
                    done = True
        
    # save data in json file
    print('    Saving json file')
    with open(json_path, "w") as fp:
        json.dump(data, fp, indent=4)

In [6]:
# ------------------------------------------------------------------------------------------------
# Description:
#    This method loads the input and target data from the specified json file.  
# Originally created by Valerio Velardo.
#
# Parameters:
#    data_path - Path to the json file containing the data
#
# Outputs: 
#    X - input data
#    y - target data
# ------------------------------------------------------------------------------------------------
def load_data(data_path):
    with open(data_path, "r") as fp:
        data = json.load(fp)

    X = np.array(data["MFCCs"])
    y = np.array(data["labels"])
    print("Training sets loaded!")
    return X, y


In [15]:
# ------------------------------------------------------------------------------------------------
# Description:
#    This creates the training and validation splits that will be used for this generic training
# pass.  Adapted from a method created by Valerio Velardo.
#
# Parameters:
#    data_path - Path to the json file containing data
#    validation_size - Describes what percentage to use for the validation set size.
#
# Outputs: 
#    x_train - The training inputs
#    y_train  - The training targets
#    x_val - The validation inputs
#    y_val - The validation targets
# ------------------------------------------------------------------------------------------------
def prepare_dataset(data_path, validation_size=0.2):
    # load dataset
    X, y = load_data(data_path)

    # create train, validation, test split
    X_train, X_validation, y_train, y_validation = train_test_split(X, y, test_size=validation_size)

    # add an axis to nd array
    X_train = X_train[..., np.newaxis]
    X_validation = X_validation[..., np.newaxis]

    return X_train, y_train, X_validation, y_validation

In [24]:
# ------------------------------------------------------------------------------------------------
# Description:
#    This method orginally produced by Valerio Velardo ubuilds a simple convolutional model in 
# keras that can be used to train for speaker identification from the dataset stored in a json 
# file.
#
# Parameters:
#    input_shape - Is a touple representing the shape of a training sample.
#    output_shape - Is the shape of the output layer
#    loss - A string representing which loss funciton to use
#    learning_rate - A fload specifying the learning rate.
#
# Outputs:
#    model - The tensorflow model
# ------------------------------------------------------------------------------------------------
def build_model(input_shape, output_shape, loss="sparse_categorical_crossentropy", learning_rate=0.0001):
    # build network architecture using convolutional layers
    model = tf.keras.models.Sequential()

    # 1st conv layer
    model.add(tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape,
                                     kernel_regularizer=tf.keras.regularizers.l2(0.001)))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.MaxPooling2D((3, 3), strides=(2,2), padding='same'))

    # 2nd conv layer
    model.add(tf.keras.layers.Conv2D(16, (3, 3), activation='relu',
                                     kernel_regularizer=tf.keras.regularizers.l2(0.001)))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.MaxPooling2D((3, 3), strides=(2,2), padding='same'))

    # 3rd conv layer
    model.add(tf.keras.layers.Conv2D(32, (2, 2), activation='relu',
                                     kernel_regularizer=tf.keras.regularizers.l2(0.001)))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.MaxPooling2D((2, 2), strides=(2,2), padding='same'))

    # flatten output and feed into dense layer
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(64, activation='relu'))
    tf.keras.layers.Dropout(0.3)

    # softmax output layer
    model.add(tf.keras.layers.Dense(output_shape, activation='softmax'))

    optimiser = tf.optimizers.Adam(learning_rate=learning_rate)

    # compile model
    model.compile(optimizer=optimiser,
                  loss=loss,
                  metrics=["accuracy"])

    return model

In [9]:
# ------------------------------------------------------------------------------------------------
# Description:
#    This method orginally produced by Valerio Velardo trains a model and returns the training
# history.
#
# Parameters:
#    model - The model to train
#    epochs - The number of epochs
#    batch_size - The size of each batch
#    patience - The number of epochs before early stop is allowed
#    x_train - The trianing inputs
#    y_train - The training targets
#    x_validation - The validation inputs
#    y_validation - The validation targets
#
# Outputs:
#    history the training history
# ------------------------------------------------------------------------------------------------
def train(model, epochs, batch_size, patience, X_train, y_train, X_validation, y_validation):
    # Setup the earlystop callback
    earlystop_callback = tf.keras.callbacks.EarlyStopping(monitor="accuracy", min_delta=0.001, patience=patience)

    # train model
    history = model.fit(X_train,
                        y_train,
                        epochs=epochs,
                        batch_size=batch_size,
                        validation_data=(X_validation, y_validation),
                        callbacks=[earlystop_callback])
    return history

# Extract Features
The following code extracts a series of inputs that can be used to build models.  To do this we will extract the files in a couple of different ways.  We will then mix them up all the way.  Once that is done we will test each of those inputs by running them through a simple model with minimumal training iterations to see which set of data creates the best generalized model.  

For this section the inputs will be as follows.
- Length of speach
    - Raw Files
    - 15 seconds
    - 30 seconds
    - 45 seconds
    - 60 seconds


- num_mfccs
    - 13
    - 20
    - 30


- n_fft
    - 1024
    - 2048
    - 4096


- hop_length
    - 256
    - 512
    - 1024
    
    
The first step is to extract all audo files into various jason files.

In [10]:
# List of constants to use to vary the extraction
audio_length = [0, 30, 60]
num_mfccs = [13, 20, 30]
n_fft = [1024, 2048, 4096]
hop_length = [256, 512, 1024]

# List of all create json files
json_files = []

# Extract the files to various json files
for al in audio_length:
    for nm in num_mfccs:
        for nf in n_fft:
            for hl in hop_length:
                json_file_name = 'data_{0}_{1}_{2}_{3}.json'.format(al, nm, nf, hl)
                json_files.append(json_file_name)
                
                print('building data for', json_file_name)
                dsPath = DATASET_PATH
                
                # Merge files to the specified length as needed
                if al != 0:
                    prep_wave_files(dsPath, WORK_PATH, al, max_subj=EXP_SUBJECTS)
                    dsPath = WORK_PATH
                    
                # Extract the features into the json file
                preprocess_dataset(dsPath, json_file_name, num_mfcc=nm, n_fft=nf, hop_length=hl, 
                                   max_subj=EXP_SUBJECTS)

building data for data_0_13_1024_256.json
    Processing: 'p226'
    Processing: 'p302'
    Processing: 'p284'
    Processing: 'p239'
    Processing: 'p335'
    Processing: 'p263'
    Processing: 'p283'
    Processing: 'p282'
    Processing: 'p276'
    Processing: 'p362'
    Saving json file
building data for data_0_13_1024_512.json
    Processing: 'p226'
    Processing: 'p302'
    Processing: 'p284'
    Processing: 'p239'
    Processing: 'p335'
    Processing: 'p263'
    Processing: 'p283'
    Processing: 'p282'
    Processing: 'p276'
    Processing: 'p362'
    Saving json file
building data for data_0_13_1024_1024.json
    Processing: 'p226'
    Processing: 'p302'
    Processing: 'p284'
    Processing: 'p239'
    Processing: 'p335'
    Processing: 'p263'
    Processing: 'p283'
    Processing: 'p282'
    Processing: 'p276'
    Processing: 'p362'
    Saving json file
building data for data_0_13_2048_256.json
    Processing: 'p226'
    Processing: 'p302'
    Processing: 'p284'
    Proce

    Subject  p335  Complete
    Preparing Subject  p263
    Subject  p263  Complete
    Preparing Subject  p283
    Subject  p283  Complete
    Preparing Subject  p282
    Subject  p282  Complete
    Preparing Subject  p276
    Subject  p276  Complete
    Preparing Subject  p362
    Subject  p362  Complete
    Processing: 'p226'
    Processing: 'p302'
    Processing: 'p284'
    Processing: 'p239'
    Processing: 'p335'
    Processing: 'p263'
    Processing: 'p283'
    Processing: 'p282'
    Processing: 'p276'
    Processing: 'p362'
    Saving json file
building data for data_30_13_1024_512.json
    Preparing Subject  p226
    Subject  p226  Complete
    Preparing Subject  p302
    Subject  p302  Complete
    Preparing Subject  p284
    Subject  p284  Complete
    Preparing Subject  p239
    Subject  p239  Complete
    Preparing Subject  p335
    Subject  p335  Complete
    Preparing Subject  p263
    Subject  p263  Complete
    Preparing Subject  p283
    Subject  p283  Complete
    Pr

    Processing: 'p362'
    Saving json file
building data for data_30_20_1024_512.json
    Preparing Subject  p226
    Subject  p226  Complete
    Preparing Subject  p302
    Subject  p302  Complete
    Preparing Subject  p284
    Subject  p284  Complete
    Preparing Subject  p239
    Subject  p239  Complete
    Preparing Subject  p335
    Subject  p335  Complete
    Preparing Subject  p263
    Subject  p263  Complete
    Preparing Subject  p283
    Subject  p283  Complete
    Preparing Subject  p282
    Subject  p282  Complete
    Preparing Subject  p276
    Subject  p276  Complete
    Preparing Subject  p362
    Subject  p362  Complete
    Processing: 'p226'
    Processing: 'p302'
    Processing: 'p284'
    Processing: 'p239'
    Processing: 'p335'
    Processing: 'p263'
    Processing: 'p283'
    Processing: 'p282'
    Processing: 'p276'
    Processing: 'p362'
    Saving json file
building data for data_30_20_1024_1024.json
    Preparing Subject  p226
    Subject  p226  Complete
  

    Subject  p282  Complete
    Preparing Subject  p276
    Subject  p276  Complete
    Preparing Subject  p362
    Subject  p362  Complete
    Processing: 'p226'
    Processing: 'p302'
    Processing: 'p284'
    Processing: 'p239'
    Processing: 'p335'
    Processing: 'p263'
    Processing: 'p283'
    Processing: 'p282'
    Processing: 'p276'
    Processing: 'p362'
    Saving json file
building data for data_30_30_1024_1024.json
    Preparing Subject  p226
    Subject  p226  Complete
    Preparing Subject  p302
    Subject  p302  Complete
    Preparing Subject  p284
    Subject  p284  Complete
    Preparing Subject  p239
    Subject  p239  Complete
    Preparing Subject  p335
    Subject  p335  Complete
    Preparing Subject  p263
    Subject  p263  Complete
    Preparing Subject  p283
    Subject  p283  Complete
    Preparing Subject  p282
    Subject  p282  Complete
    Preparing Subject  p276
    Subject  p276  Complete
    Preparing Subject  p362
    Subject  p362  Complete
    P

    Subject  p284  Complete
    Preparing Subject  p239
    Subject  p239  Complete
    Preparing Subject  p335
    Subject  p335  Complete
    Preparing Subject  p263
    Subject  p263  Complete
    Preparing Subject  p283
    Subject  p283  Complete
    Preparing Subject  p282
    Subject  p282  Complete
    Preparing Subject  p276
    Subject  p276  Complete
    Preparing Subject  p362
    Subject  p362  Complete
    Processing: 'p226'
    Processing: 'p302'
    Processing: 'p284'
    Processing: 'p239'
    Processing: 'p335'
    Processing: 'p263'
    Processing: 'p283'
    Processing: 'p282'
    Processing: 'p276'
    Processing: 'p362'
    Saving json file
building data for data_60_13_2048_256.json
    Preparing Subject  p226
    Subject  p226  Complete
    Preparing Subject  p302
    Subject  p302  Complete
    Preparing Subject  p284
    Subject  p284  Complete
    Preparing Subject  p239
    Subject  p239  Complete
    Preparing Subject  p335
    Subject  p335  Complete
    Pr

    Processing: 'p335'
    Processing: 'p263'
    Processing: 'p283'
    Processing: 'p282'
    Processing: 'p276'
    Processing: 'p362'
    Saving json file
building data for data_60_20_2048_256.json
    Preparing Subject  p226
    Subject  p226  Complete
    Preparing Subject  p302
    Subject  p302  Complete
    Preparing Subject  p284
    Subject  p284  Complete
    Preparing Subject  p239
    Subject  p239  Complete
    Preparing Subject  p335
    Subject  p335  Complete
    Preparing Subject  p263
    Subject  p263  Complete
    Preparing Subject  p283
    Subject  p283  Complete
    Preparing Subject  p282
    Subject  p282  Complete
    Preparing Subject  p276
    Subject  p276  Complete
    Preparing Subject  p362
    Subject  p362  Complete
    Processing: 'p226'
    Processing: 'p302'
    Processing: 'p284'
    Processing: 'p239'
    Processing: 'p335'
    Processing: 'p263'
    Processing: 'p283'
    Processing: 'p282'
    Processing: 'p276'
    Processing: 'p362'
    Savi

    Subject  p283  Complete
    Preparing Subject  p282
    Subject  p282  Complete
    Preparing Subject  p276
    Subject  p276  Complete
    Preparing Subject  p362
    Subject  p362  Complete
    Processing: 'p226'
    Processing: 'p302'
    Processing: 'p284'
    Processing: 'p239'
    Processing: 'p335'
    Processing: 'p263'
    Processing: 'p283'
    Processing: 'p282'
    Processing: 'p276'
    Processing: 'p362'
    Saving json file
building data for data_60_30_2048_512.json
    Preparing Subject  p226
    Subject  p226  Complete
    Preparing Subject  p302
    Subject  p302  Complete
    Preparing Subject  p284
    Subject  p284  Complete
    Preparing Subject  p239
    Subject  p239  Complete
    Preparing Subject  p335
    Subject  p335  Complete
    Preparing Subject  p263
    Subject  p263  Complete
    Preparing Subject  p283
    Subject  p283  Complete
    Preparing Subject  p282
    Subject  p282  Complete
    Preparing Subject  p276
    Subject  p276  Complete
    Pr

NameError: name 'jason_files' is not defined

Next we run each of these input files through the basic model with the same parameters and extract the performance metrics.

In [25]:
results = []

for fName in json_files:
    print('Processing', fName)
    
    # generate train, validation and test sets
    x_train, y_train, x_validation, y_validation = prepare_dataset(fName)

    # extract the shapes
    input_shape = (x_train.shape[1], x_train.shape[2], 1)
    output_count = np.unique(y_train).shape[0]
    
    print('input shape:', x_train.shape[1], x_train.shape[2])
    print('output shape:', output_count)
    
    # create network
    model = build_model(input_shape, output_count, learning_rate=0.0001)

    # train the network we will great each set of inputs based  on theor performance on a fixed number of epoch
    # and batch size to get a feel for which has the best descriptive power.
    history = train(model, 10, 32, 5, x_train, y_train, x_validation, y_validation)

    results.append({'filename':fName, 'history':history, 'model':model})

Processing data_0_13_1024_256.json
Training sets loaded!
input shape: 87 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_13_1024_512.json
Training sets loaded!
input shape: 44 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_13_1024_1024.json
Training sets loaded!
input shape: 22 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_13_2048_256.json
Training sets loaded!
input shape: 87 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_13_2048_512.json
Training sets loaded!
input shape: 44 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Ep

Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_13_4096_256.json
Training sets loaded!
input shape: 87 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_13_4096_512.json
Training sets loaded!
input shape: 44 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_13_4096_1024.json
Training sets loaded!
input shape: 22 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_20_1024_256.json
Training sets loaded!
input shape: 87 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_20_1024_512.json
Training sets loaded!
input shape: 44 20
output shape: 10
Epoch 1/10
Epoch 2/10
E

Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_20_1024_1024.json
Training sets loaded!
input shape: 22 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_20_2048_256.json
Training sets loaded!
input shape: 87 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_20_2048_512.json
Training sets loaded!
input shape: 44 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_20_2048_1024.json
Training sets loaded!
input shape: 22 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_20_4096_256.json
Training sets loaded!
input shape: 87 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_20_4096_1024.json
Training sets loaded!
input shape: 22 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_30_1024_256.json
Training sets loaded!
input shape: 87 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_30_1024_512.json
Training sets loaded!
input shape: 44 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_30_1024_1024.json
Training sets loaded!
input shape: 22 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_30_2048_256.json
Training sets loaded!
input shape: 87 30
output

Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_30_2048_512.json
Training sets loaded!
input shape: 44 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_30_2048_1024.json
Training sets loaded!
input shape: 22 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_30_4096_256.json
Training sets loaded!
input shape: 87 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_30_4096_512.json
Training sets loaded!
input shape: 44 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_0_30_4096_1024.json
Training sets loaded!
input shape: 22 30
output shape: 10
Epoch 1/10
Epoch 2/10


Epoch 9/10
Epoch 10/10
Processing data_30_13_1024_256.json
Training sets loaded!
input shape: 87 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_13_1024_512.json
Training sets loaded!
input shape: 44 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_13_1024_1024.json
Training sets loaded!
input shape: 22 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_13_2048_256.json
Training sets loaded!
input shape: 87 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_13_2048_512.json
Training sets loaded!
input shape: 44 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6

Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_13_4096_256.json
Training sets loaded!
input shape: 87 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_13_4096_512.json
Training sets loaded!
input shape: 44 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_13_4096_1024.json
Training sets loaded!
input shape: 22 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_20_1024_256.json
Training sets loaded!
input shape: 87 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_20_1024_512.json
Training sets loaded!
input shape: 44 20
output shape: 10
Epoch 1

Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_20_1024_1024.json
Training sets loaded!
input shape: 22 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_20_2048_256.json
Training sets loaded!
input shape: 87 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_20_2048_512.json
Training sets loaded!
input shape: 44 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_20_2048_1024.json
Training sets loaded!
input shape: 22 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_20_4096_256.json
Training sets loaded!
input shape: 87 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 

Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_20_4096_1024.json
Training sets loaded!
input shape: 22 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_30_1024_256.json
Training sets loaded!
input shape: 87 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_30_1024_512.json
Training sets loaded!
input shape: 44 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_30_1024_1024.json
Training sets loaded!
input shape: 22 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_30_2048_256.json
Training sets loaded!
input shape: 87 30
output shape

Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_30_2048_512.json
Training sets loaded!
input shape: 44 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_30_2048_1024.json
Training sets loaded!
input shape: 22 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_30_4096_256.json
Training sets loaded!
input shape: 87 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_30_4096_512.json
Training sets loaded!
input shape: 44 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_30_30_4096_1024.json
Training sets loaded!
input shape: 22 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 

Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_13_1024_512.json
Training sets loaded!
input shape: 44 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_13_1024_1024.json
Training sets loaded!
input shape: 22 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_13_2048_256.json
Training sets loaded!
input shape: 87 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_13_2048_512.json
Training sets loaded!
input shape: 44 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_13_2048_1024.json
Training sets loaded!
input shape: 22 13
o

Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_13_4096_256.json
Training sets loaded!
input shape: 87 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_13_4096_512.json
Training sets loaded!
input shape: 44 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_13_4096_1024.json
Training sets loaded!
input shape: 22 13
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_20_1024_256.json
Training sets loaded!
input shape: 87 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_20_1024_512.json
Training sets loaded!
input shape: 44 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_20_2048_256.json
Training sets loaded!
input shape: 87 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_20_2048_512.json
Training sets loaded!
input shape: 44 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_20_2048_1024.json
Training sets loaded!
input shape: 22 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_20_4096_256.json
Training sets loaded!
input shape: 87 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_20_4096_512.json
Training sets loaded!
input shap

Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_20_4096_1024.json
Training sets loaded!
input shape: 22 20
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_30_1024_256.json
Training sets loaded!
input shape: 87 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_30_1024_512.json
Training sets loaded!
input shape: 44 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_30_1024_1024.json
Training sets loaded!
input shape: 22 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_30_2048_256.json
Training sets loaded!
input shape: 87 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 

Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_30_2048_1024.json
Training sets loaded!
input shape: 22 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_30_4096_256.json
Training sets loaded!
input shape: 87 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_30_4096_512.json
Training sets loaded!
input shape: 44 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Processing data_60_30_4096_1024.json
Training sets loaded!
input shape: 22 30
output shape: 10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Now that we have our estimates we will build a table sort them and print out the top 5 performers.

In [42]:
resultsdf = pd.DataFrame({}, columns=['File','Validation Accuracy'])
for row in results:
    data = {'File': row['filename'], 
            'Validation Accuracy': max(row['history'].history['val_accuracy'])}
    resultsdf = resultsdf.append(data, ignore_index=True)

resultsdf = resultsdf.sort_values(['Validation Accuracy'], ascending=False)
resultsdf.head()

Unnamed: 0,File,Validation Accuracy
18,data_0_30_1024_256.json,0.865796
24,data_0_30_4096_256.json,0.861045
22,data_0_30_2048_512.json,0.846793
21,data_0_30_2048_256.json,0.844418
15,data_0_20_4096_256.json,0.839667


Now that we know the best performer for the generic model will will extract the entire dataset using that performer.

In [50]:
json_file_name = 'full_data.json'

fname = resultsdf['File'].at[0]
opt = fname.split('_')
opt2 = opt[4].split('.')

a_length = int(opt[1])
mfccs = int(opt[2])
fft = int(opt[3])
h_length = int(opt2[0])

print('Audio Length:', a_length)
print('Num MFCCS:', mfccs)
print('N FFT:', fft)
print('Hop Length:', h_length)

print('building full dataset', json_file_name)
dsPath = DATASET_PATH

# Merge files to the specified length as needed
if a_length != 0:
    prep_wave_files(dsPath, WORK_PATH, a_length, max_subj=-1)
    dsPath = WORK_PATH

# Extract the features into the json file
preprocess_dataset(dsPath, json_file_name, num_mfcc=mfccs, n_fft=fft, hop_length=h_length, max_subj=-1)

Audio Length: 0
Num MFCCS: 13
N FFT: 1024
Hop Length: 256
building full dataset full_data.json
    Processing: 'p226'
    Processing: 'p302'
    Processing: 'p284'
    Processing: 'p239'
    Processing: 'p335'
    Processing: 'p263'
    Processing: 'p283'
    Processing: 'p282'
    Processing: 'p276'
    Processing: 'p362'
    Processing: 'p341'
    Processing: 'p323'
    Processing: 'p292'
    Processing: 'p238'
    Processing: 'p237'
    Processing: 'p234'
    Processing: 'p293'
    Processing: 'p271'
    Processing: 'p260'
    Processing: 'p255'
    Processing: 'p273'
    Processing: 'p265'
    Processing: 'p313'
    Processing: 'p232'
    Processing: 'p262'
    Processing: 'p314'
    Processing: 'p253'
    Processing: 'p316'
    Processing: 'p361'
    Processing: 'p254'
    Processing: 'p363'
    Processing: 'p288'
    Processing: 'p287'
    Processing: 'p305'
    Processing: 'p264'
    Processing: 'p307'
    Processing: 'p347'
    Processing: 'p241'
    Processing: 'p270'
    Proc