### Changes from run-1

#### Test data
* Previously we used cross-validation to select the best model, and tested that model on data (opus 131) that we had kept apart from the beginning.
* The scores achieved when predicting on opus 131 were our final results.
* This time we train/validate on all data and take the final cross-validation scores as our results.

#### Crossvalidation
* Previously we simply took all sequences in the train/validate set, shuffled them and trained/validated with a 80/20 split.
* This time we instead shuffle the opuses (opi?) before generating the sequences, with the idea that this will be a more correct estimate of how patterns generalize across opuses with leave one (opus) out cross validation.

#### Input
* Previously we grouped similar chords together and grouped chords that appeared rarely (less than 10 times) under a single label.
* The idea was to remove outliers, reduce the output space and improve generalization
* As it was indicated that having the amount of output classes be dependent on the input was a bad idea we now use rules independent of the data for grouping (?)

#### Model
* Previously we had a bi-directional LSTM layer in the model architecture as it increased performance. For the sake of being able to compare the results to a simple N-gram model we decided to remove that layer in this iteration.

#### Hyperparameters
* Given the increased amount of outliers and the removal of the bidirectional layer we expect generalization accuracy to decrease.
* To remedy that we used the current model and iterated through different values for a regularization parameter, which we didn't explore previously.


In [1]:
#Imports
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
import seaborn as sns
sns.set()

from tensorflow.keras import *
from tensorflow.keras.models import *
from tensorflow.keras.layers import *
from tensorflow.keras.callbacks import *

from chord_functions import *

from sklearn.metrics import *
from sklearn.model_selection import KFold

from collections import defaultdict

  from ._conv import register_converters as _register_converters


# Setup

In [2]:
# fix random seed for reproducibility
seed = 1
np.random.seed(seed)

#Load all data
data = pd.read_csv('data/820chords.csv')

#Remove redundant attributes. Keep op to split into opuses
data = data[['chord', 'op']]

#Use dummy variable representation for the chords
data = pd.get_dummies(data)

# Model

In [3]:
def lstm(lstm_x, lstm_y, optimizer, loss, metrics, regstrength):
    model = Sequential()
    
    model.add(LSTM(256, return_sequences=True, input_shape=(lstm_x.shape[1], lstm_x.shape[2]),\
                   kernel_regularizer=regularizers.l2(regstrength)))
    
    model.add(Dropout(0.5))

    model.add(LSTM(64, return_sequences=False,\
              kernel_regularizer=regularizers.l2(regstrength)))
    
    model.add(Dropout(0.3))
    
    model.add(Dense(lstm_y.shape[1], activation='softmax', \
             kernel_regularizer=regularizers.l2(regstrength)))

    model.compile(loss=loss,
                  optimizer=optimizer,
                  metrics=metrics)
    return model

# Train/Test

### Select parameters for the learning process

In [4]:
optimizer = 'Adam'
loss = 'categorical_crossentropy'
metrics = ['accuracy']
epochs = 10
verbose = 2
seq_length = 10

#Save the weights whenever validation accuracy is increased
checkpoint = ModelCheckpoint(
    'weights.{epoch:02d}-{val_acc:.4f}.hdf5',
    monitor='val_acc', 
    verbose=0,        
    save_best_only=False
)
# Stop the learning process if we havent improved validation accuracy for 10 epochs
earlystop = EarlyStopping(monitor='val_acc', min_delta=0, patience=5, verbose=1)

#callbacks_list = [checkpoint, earlystop]   
callbacks_list = [earlystop]  

### Cross validate

In [None]:
#define the range of regularization strength to check
regstrength = [0, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5]
regstrength = [0, 0.1]
print("Start!")

#Create container for results
RESULTS = pd.DataFrame()

for strength in regstrength:
    print("\nChecking regstrength {}".format(strength))
    cv_results = pd.DataFrame()
    
    #for opus in data['op'].unique():
    for opus in [74, 59, 95, 18]:
        print("\nValidating on opus {}".format(opus))

        #Split into training and validation
        valid = data[data['op'] == opus]
        train = data[data['op'] != opus]

        #Drop the opus attribute since it's no longer needed
        valid = valid.drop(columns='op')
        train = train.drop(columns='op')

        #Generate sequences from the data
        valid_in, valid_out = generate_sequences(valid, valid, seq_length)
        train_in, train_out = generate_sequences(train, train, seq_length)

        #Create model
        model = lstm(train_in, train_out, optimizer, loss, metrics, strength)

        #Train on the folds
        model.fit(train_in,
                  train_out,
                  epochs = epochs,
                  verbose = verbose,
                  validation_data = (valid_in, valid_out),
                  callbacks = callbacks_list)

        #Save the history object for the model, appending test opus and regstrength
        history = pd.DataFrame(model.history.history)
        history.index.name = 'epoch'
        history['opus'] = opus
        history['reg'] = strength
        cv_results = cv_results.append(history)
    
    RESULTS = RESULTS.append(cv_results)

print("Done!")

Start!

Checking regstrength 0

Validating on opus 74
Train on 26558 samples, validate on 1515 samples
Epoch 1/10
 - 170s - loss: 4.1009 - acc: 0.1174 - val_loss: 3.8080 - val_acc: 0.1406
Epoch 2/10
 - 170s - loss: 3.9862 - acc: 0.1252 - val_loss: 3.7991 - val_acc: 0.1406
Epoch 3/10
 - 170s - loss: 3.9790 - acc: 0.1249 - val_loss: 3.8172 - val_acc: 0.1406
Epoch 4/10
 - 172s - loss: 3.9798 - acc: 0.1272 - val_loss: 3.8267 - val_acc: 0.1406
Epoch 5/10
 - 167s - loss: 3.9767 - acc: 0.1260 - val_loss: 3.8090 - val_acc: 0.1406
Epoch 6/10
 - 168s - loss: 4.0793 - acc: 0.1307 - val_loss: 3.7756 - val_acc: 0.1406
Epoch 00006: early stopping

Validating on opus 59
Train on 22514 samples, validate on 5559 samples
Epoch 1/10
 - 149s - loss: 4.1048 - acc: 0.1257 - val_loss: 3.9729 - val_acc: 0.1049
Epoch 2/10
 - 147s - loss: 3.9795 - acc: 0.1307 - val_loss: 4.0072 - val_acc: 0.1049
Epoch 3/10
 - 148s - loss: 3.9699 - acc: 0.1328 - val_loss: 4.0204 - val_acc: 0.1049
Epoch 4/10
 - 147s - loss: 3.967

# Results

In [None]:
BACKUP = RESULTS
pd.DataFrame.to_csv(BACKUP, './results/BACKUP.csv')


#OBS: Assumes that we are working with a multiindex

AVERAGES = pd.DataFrame()

#For each level of regularization
for regularization, cvscores in RESULTS.groupby(level=0):
    average = pd.DataFrame()
    
    #Iterate through all folds and extract the highest validation score for each fold
    for opus, fold in cvscores.groupby(level=1):
        best = fold[fold['val_acc'] == fold['val_acc'].max()]
        average = average.append(best)
    
    #Make a pretty dataframe of the mean
    average = average.describe().loc[['mean']]
    average['reg'] = regularization
    average = average.set_index(average['reg']).drop(columns='reg')
    
    #Take the mean scores for this regularization value and store them in AVERAGE for comparisons
    AVERAGES = AVERAGES.append(average)

BEST = AVERAGES[AVERAGES['val_acc'] == AVERAGES['val_acc'].max()]

print("Full table of cross validated scores for each regularization value")
display(AVERAGES)

print("Best score")
display(BEST)