### To play around
You can just head to the section after results and load either the weighed_backup.csv or backup.csv to use the results from the cross validation, no need to run it through again.

### Changes from run-1

#### Test data
* Previously we used cross-validation to select the best model, and tested that model on data (opus 131) that we had kept apart from the beginning.
* The scores achieved when predicting on opus 131 were our final results.
* This time we train/validate on all data and take the final cross-validation scores as our results.

#### Crossvalidation
* Previously we simply took all sequences in the train/validate set, shuffled them and trained/validated with a 80/20 split.
* This time we instead shuffle the opuses (opi?) before generating the sequences, with the idea that this could hint at how patterns generalize across opuses with. So we basically have leave one (opus) out cross validation.

#### Input
* Previously we grouped similar chords together and grouped chords that appeared rarely (less than 10 times) under a single label.
* The idea was to remove outliers, reduce the output space and improve generalization
* As it was indicated that having the amount of output classes be dependent on the input was a bad idea we now use rules independent of the data for grouping and have ~800 classes instead of ~100

#### Model
* Previously we had a bi-directional LSTM layer in the model architecture as it increased performance. For the sake of being able to compare the results to a simple N-gram model we decided to remove that layer in this iteration.

#### Hyperparameters
* Given the increased amount of outliers and the removal of the bidirectional layer we expect generalization accuracy to decrease.
* To remedy that we used the current model and iterated through different values for a regularization parameter, which we didn't explore previously.
* Best scores were obtained with regstrength = 0 however.

#### Weighing
* Current version, multiply the metrics by the amount of chords in the opus being validated on. ?? Doublecheck


In [5]:
#Imports
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
import seaborn as sns
sns.set()
sns.set_style("whitegrid")

from tensorflow.keras import *
from tensorflow.keras.models import *
from tensorflow.keras.layers import *
from tensorflow.keras.callbacks import *

from chord_functions import *

# Setup

In [2]:
# fix random seed for reproducibility
seed = 1
np.random.seed(seed)

#Load all data
data = pd.read_csv('data/820chords.csv')

#Remove redundant attributes. Keep op to split into opuses
data = data[['chord', 'op']]

#Use dummy variable representation for the chords
data = pd.get_dummies(data)

# Model

In [3]:
def lstm(lstm_x, lstm_y, optimizer, loss, metrics, strength):
    model = Sequential()
    
    model.add(LSTM(256, return_sequences=True, input_shape=(lstm_x.shape[1], lstm_x.shape[2])))
    
    model.add(Dropout(strength))

    model.add(LSTM(64, return_sequences=False))
    
    model.add(Dropout(strength))
    
    model.add(Dense(lstm_y.shape[1], activation='softmax'))

    model.compile(loss=loss,
                  optimizer=optimizer,
                  metrics=metrics)
    return model

# Train/Test

### Select parameters for the learning process

In [4]:
optimizer = 'Adam'
loss = 'categorical_crossentropy'
metrics = ['accuracy']
epochs = 30
verbose = 2
seq_length = 10

#Save the weights whenever validation accuracy is increased
checkpoint = ModelCheckpoint(
    'weights.{epoch:02d}-{val_acc:.4f}.hdf5',
    monitor='val_acc', 
    verbose=0,        
    save_best_only=False
)
# Stop the learning process if we havent improved validation accuracy for 10 epochs
earlystop = EarlyStopping(monitor='val_acc', min_delta=0, patience=5, verbose=1)

#callbacks_list = [checkpoint, earlystop]   
callbacks_list = []  

### Cross validate

In [None]:
#define the range of regularization strengths to check
dropstrength = [0, 0.1, 0.2, 0.3]
print("Start!")

#Create container for results
RESULTS = pd.DataFrame()

for strength in dropstrength:
    print("\nChecking dropout strength {}".format(strength))
    cv_results = pd.DataFrame()
    
    #Cross validate on each opus
    for opus in data['op'].unique():
        print("\nValidating on opus {}".format(opus))

        #Split into training and validation
        valid = data[data['op'] == opus]
        train = data[data['op'] != opus]

        #Drop the opus attribute since it's no longer needed
        valid = valid.drop(columns='op')
        train = train.drop(columns='op')

        #Generate sequences from the data
        valid_in, valid_out = generate_sequences(valid, valid, seq_length)
        train_in, train_out = generate_sequences(train, train, seq_length)

        #Create model
        model = lstm(train_in, train_out, optimizer, loss, metrics, strength)

        #Train on the folds
        model.fit(train_in,
                  train_out,
                  epochs = epochs,
                  verbose = verbose,
                  validation_data = (valid_in, valid_out),
                  callbacks = callbacks_list)

        #Save the history object for the model, appending test opus and regstrength
        history = pd.DataFrame(model.history.history)
        history.index.name = 'epoch'
        history['opus'] = opus
        history['reg'] = strength
        cv_results = cv_results.append(history)
    
    RESULTS = RESULTS.append(cv_results)

print("Done!")
pd.DataFrame.to_csv(RESULTS, './results/dropout_BACKUP.csv')

Start!

Checking dropout strength 0

Validating on opus 127
Train on 25863 samples, validate on 2210 samples
Epoch 1/30
 - 158s - loss: 4.0402 - acc: 0.1247 - val_loss: 4.1691 - val_acc: 0.1253
Epoch 2/30
 - 153s - loss: 3.9363 - acc: 0.1284 - val_loss: 4.1654 - val_acc: 0.1253
Epoch 3/30
 - 153s - loss: 3.9358 - acc: 0.1281 - val_loss: 4.1949 - val_acc: 0.1253
Epoch 4/30
 - 154s - loss: 3.9349 - acc: 0.1281 - val_loss: 4.1828 - val_acc: 0.1253
Epoch 5/30
 - 149s - loss: 3.9367 - acc: 0.1279 - val_loss: 4.1958 - val_acc: 0.1253
Epoch 6/30
 - 149s - loss: 3.9348 - acc: 0.1281 - val_loss: 4.1942 - val_acc: 0.1253
Epoch 7/30
 - 149s - loss: 3.9348 - acc: 0.1271 - val_loss: 4.2205 - val_acc: 0.1253
Epoch 8/30
 - 149s - loss: 3.9346 - acc: 0.1288 - val_loss: 4.2039 - val_acc: 0.1253
Epoch 9/30
 - 149s - loss: 3.9355 - acc: 0.1271 - val_loss: 4.2164 - val_acc: 0.1253
Epoch 10/30
 - 149s - loss: 3.9350 - acc: 0.1268 - val_loss: 4.1953 - val_acc: 0.1253
Epoch 11/30
 - 149s - loss: 3.9339 - acc

Epoch 3/30
 - 147s - loss: 3.8982 - acc: 0.1290 - val_loss: 4.6484 - val_acc: 0.1139
Epoch 4/30
 - 147s - loss: 3.8987 - acc: 0.1286 - val_loss: 4.7050 - val_acc: 0.1139
Epoch 5/30
 - 147s - loss: 3.8985 - acc: 0.1271 - val_loss: 4.7045 - val_acc: 0.1139
Epoch 6/30
 - 147s - loss: 3.8985 - acc: 0.1284 - val_loss: 4.7314 - val_acc: 0.1139
Epoch 7/30
 - 148s - loss: 3.8983 - acc: 0.1287 - val_loss: 4.7425 - val_acc: 0.1139
Epoch 8/30
 - 148s - loss: 3.8983 - acc: 0.1295 - val_loss: 4.7349 - val_acc: 0.1139
Epoch 9/30
 - 148s - loss: 3.8989 - acc: 0.1276 - val_loss: 4.7300 - val_acc: 0.1139
Epoch 10/30
 - 148s - loss: 3.9827 - acc: 0.1290 - val_loss: 4.8461 - val_acc: 0.1139
Epoch 11/30
 - 147s - loss: 3.9066 - acc: 0.1289 - val_loss: 4.7081 - val_acc: 0.1139
Epoch 12/30
 - 147s - loss: 3.8672 - acc: 0.1310 - val_loss: 4.6639 - val_acc: 0.1320
Epoch 13/30
 - 147s - loss: 3.8372 - acc: 0.1391 - val_loss: 4.6617 - val_acc: 0.1312
Epoch 14/30
 - 147s - loss: 3.8238 - acc: 0.1445 - val_loss: 

 - 136s - loss: 3.9449 - acc: 0.1346 - val_loss: 4.0727 - val_acc: 0.1049
Epoch 7/30
 - 136s - loss: 3.9413 - acc: 0.1338 - val_loss: 4.0493 - val_acc: 0.1049
Epoch 8/30
 - 136s - loss: 3.9049 - acc: 0.1368 - val_loss: 3.9852 - val_acc: 0.1207
Epoch 9/30
 - 136s - loss: 3.8575 - acc: 0.1454 - val_loss: 3.8933 - val_acc: 0.1473
Epoch 10/30
 - 136s - loss: 3.7681 - acc: 0.1585 - val_loss: 3.8337 - val_acc: 0.1585
Epoch 11/30
 - 136s - loss: 3.6877 - acc: 0.1675 - val_loss: 3.7633 - val_acc: 0.1666
Epoch 12/30
 - 136s - loss: 3.5758 - acc: 0.1997 - val_loss: 3.7206 - val_acc: 0.1993
Epoch 13/30
 - 136s - loss: 3.4709 - acc: 0.2217 - val_loss: 3.6689 - val_acc: 0.2067
Epoch 14/30
 - 136s - loss: 3.3759 - acc: 0.2384 - val_loss: 3.6691 - val_acc: 0.2033
Epoch 15/30
 - 136s - loss: 3.2837 - acc: 0.2539 - val_loss: 3.6864 - val_acc: 0.2132
Epoch 16/30
 - 136s - loss: 3.1873 - acc: 0.2695 - val_loss: 3.7126 - val_acc: 0.2089
Epoch 17/30
 - 136s - loss: 3.0823 - acc: 0.2860 - val_loss: 3.7438 -

Epoch 9/30
 - 151s - loss: 3.9415 - acc: 0.1246 - val_loss: 4.1975 - val_acc: 0.1253
Epoch 10/30
 - 151s - loss: 3.9429 - acc: 0.1280 - val_loss: 4.2144 - val_acc: 0.1253
Epoch 11/30
 - 151s - loss: 3.9340 - acc: 0.1276 - val_loss: 4.2155 - val_acc: 0.1253
Epoch 12/30
 - 151s - loss: 3.9354 - acc: 0.1277 - val_loss: 4.2071 - val_acc: 0.1253
Epoch 13/30
 - 151s - loss: 3.9367 - acc: 0.1282 - val_loss: 4.1859 - val_acc: 0.1253
Epoch 14/30
 - 151s - loss: 3.9350 - acc: 0.1272 - val_loss: 4.1900 - val_acc: 0.1253
Epoch 15/30
 - 151s - loss: 3.9359 - acc: 0.1285 - val_loss: 4.2060 - val_acc: 0.1253
Epoch 16/30
 - 151s - loss: 3.9369 - acc: 0.1274 - val_loss: 4.2287 - val_acc: 0.1253
Epoch 17/30
 - 151s - loss: 3.9213 - acc: 0.1278 - val_loss: 4.1865 - val_acc: 0.1253
Epoch 18/30
 - 150s - loss: 3.9148 - acc: 0.1278 - val_loss: 4.2099 - val_acc: 0.0946
Epoch 19/30
 - 151s - loss: 3.9158 - acc: 0.1253 - val_loss: 4.1953 - val_acc: 0.1253
Epoch 20/30
 - 151s - loss: 3.9122 - acc: 0.1263 - val_

 - 150s - loss: 3.9015 - acc: 0.1294 - val_loss: 4.7310 - val_acc: 0.1139
Epoch 13/30
 - 150s - loss: 3.9016 - acc: 0.1282 - val_loss: 4.7226 - val_acc: 0.1139
Epoch 14/30
 - 150s - loss: 3.9016 - acc: 0.1294 - val_loss: 4.7251 - val_acc: 0.1139
Epoch 15/30
 - 150s - loss: 3.9016 - acc: 0.1282 - val_loss: 4.7303 - val_acc: 0.1139
Epoch 16/30
 - 150s - loss: 3.9018 - acc: 0.1278 - val_loss: 4.7416 - val_acc: 0.1139
Epoch 17/30
 - 150s - loss: 3.9015 - acc: 0.1290 - val_loss: 4.7240 - val_acc: 0.1139
Epoch 18/30
 - 150s - loss: 3.9008 - acc: 0.1284 - val_loss: 4.7233 - val_acc: 0.1139
Epoch 19/30
 - 150s - loss: 3.8999 - acc: 0.1283 - val_loss: 4.7338 - val_acc: 0.1139
Epoch 20/30
 - 150s - loss: 3.9013 - acc: 0.1286 - val_loss: 4.7177 - val_acc: 0.1139
Epoch 21/30
 - 150s - loss: 3.9005 - acc: 0.1285 - val_loss: 4.7248 - val_acc: 0.1139
Epoch 22/30
 - 150s - loss: 3.9016 - acc: 0.1274 - val_loss: 4.7363 - val_acc: 0.1139
Epoch 23/30
 - 150s - loss: 3.9012 - acc: 0.1279 - val_loss: 4.734

Epoch 16/30
 - 138s - loss: 3.9355 - acc: 0.1352 - val_loss: 4.0271 - val_acc: 0.1049
Epoch 17/30
 - 138s - loss: 3.9334 - acc: 0.1352 - val_loss: 4.0369 - val_acc: 0.1049
Epoch 18/30
 - 138s - loss: 3.9307 - acc: 0.1352 - val_loss: 4.0381 - val_acc: 0.1049
Epoch 19/30
 - 138s - loss: 3.9310 - acc: 0.1352 - val_loss: 4.0425 - val_acc: 0.1049
Epoch 20/30
 - 138s - loss: 3.9298 - acc: 0.1352 - val_loss: 4.0393 - val_acc: 0.1049
Epoch 21/30
 - 138s - loss: 3.9328 - acc: 0.1352 - val_loss: 4.0392 - val_acc: 0.1049
Epoch 22/30
 - 138s - loss: 3.9873 - acc: 0.1334 - val_loss: 4.1077 - val_acc: 0.1049
Epoch 23/30
 - 138s - loss: 3.9944 - acc: 0.1330 - val_loss: 4.0606 - val_acc: 0.1049
Epoch 24/30
 - 138s - loss: 3.9646 - acc: 0.1344 - val_loss: 4.0600 - val_acc: 0.1049
Epoch 25/30
 - 138s - loss: 3.9533 - acc: 0.1351 - val_loss: 4.0594 - val_acc: 0.1049
Epoch 26/30
 - 138s - loss: 3.9399 - acc: 0.1338 - val_loss: 4.0591 - val_acc: 0.1049
Epoch 27/30
 - 138s - loss: 3.9136 - acc: 0.1350 - val

Epoch 19/30
 - 152s - loss: 2.5009 - acc: 0.3856 - val_loss: 4.5158 - val_acc: 0.1516
Epoch 20/30
 - 152s - loss: 2.3948 - acc: 0.4026 - val_loss: 4.5850 - val_acc: 0.1480
Epoch 21/30
 - 152s - loss: 2.2965 - acc: 0.4264 - val_loss: 4.6885 - val_acc: 0.1466
Epoch 22/30
 - 152s - loss: 2.1927 - acc: 0.4470 - val_loss: 4.8099 - val_acc: 0.1443
Epoch 23/30
 - 152s - loss: 2.0937 - acc: 0.4669 - val_loss: 4.9035 - val_acc: 0.1403
Epoch 24/30
 - 152s - loss: 2.0000 - acc: 0.4847 - val_loss: 4.9786 - val_acc: 0.1353
Epoch 25/30
 - 152s - loss: 1.8923 - acc: 0.5154 - val_loss: 5.1313 - val_acc: 0.1290
Epoch 26/30
 - 151s - loss: 1.8095 - acc: 0.5310 - val_loss: 5.2979 - val_acc: 0.1299
Epoch 27/30
 - 151s - loss: 1.7131 - acc: 0.5533 - val_loss: 5.3970 - val_acc: 0.1330
Epoch 28/30
 - 151s - loss: 1.6269 - acc: 0.5697 - val_loss: 5.5621 - val_acc: 0.1317
Epoch 29/30
 - 151s - loss: 1.5439 - acc: 0.5929 - val_loss: 5.6677 - val_acc: 0.1262
Epoch 30/30
 - 151s - loss: 1.4591 - acc: 0.6142 - val

 - 150s - loss: 3.3535 - acc: 0.2311 - val_loss: 4.4083 - val_acc: 0.1646
Epoch 23/30
 - 173s - loss: 3.3135 - acc: 0.2369 - val_loss: 4.4455 - val_acc: 0.1630
Epoch 24/30
 - 154s - loss: 3.2651 - acc: 0.2423 - val_loss: 4.4770 - val_acc: 0.1621
Epoch 25/30
 - 153s - loss: 3.2251 - acc: 0.2497 - val_loss: 4.4725 - val_acc: 0.1559
Epoch 26/30
 - 150s - loss: 3.1785 - acc: 0.2541 - val_loss: 4.5125 - val_acc: 0.1522
Epoch 27/30
 - 150s - loss: 3.1367 - acc: 0.2599 - val_loss: 4.5142 - val_acc: 0.1584
Epoch 28/30
 - 150s - loss: 3.0833 - acc: 0.2696 - val_loss: 4.5752 - val_acc: 0.1568
Epoch 29/30
 - 150s - loss: 3.0345 - acc: 0.2748 - val_loss: 4.6531 - val_acc: 0.1584
Epoch 30/30
 - 150s - loss: 2.9817 - acc: 0.2831 - val_loss: 4.6821 - val_acc: 0.1473

Validating on opus 135
Train on 26570 samples, validate on 1503 samples
Epoch 1/30
 - 166s - loss: 4.0632 - acc: 0.1222 - val_loss: 4.2681 - val_acc: 0.0798
Epoch 2/30
 - 153s - loss: 3.9595 - acc: 0.1231 - val_loss: 4.3132 - val_acc: 0.

Epoch 26/30
 - 138s - loss: 3.5609 - acc: 0.1893 - val_loss: 3.7362 - val_acc: 0.1885
Epoch 27/30
 - 138s - loss: 3.5122 - acc: 0.2012 - val_loss: 3.7377 - val_acc: 0.1903
Epoch 28/30
 - 138s - loss: 3.4653 - acc: 0.2060 - val_loss: 3.7191 - val_acc: 0.1945
Epoch 29/30
 - 138s - loss: 3.4221 - acc: 0.2163 - val_loss: 3.7381 - val_acc: 0.1964
Epoch 30/30
 - 138s - loss: 3.3800 - acc: 0.2225 - val_loss: 3.7452 - val_acc: 0.2002

Validating on opus 74
Train on 26558 samples, validate on 1515 samples
Epoch 1/30
 - 169s - loss: 4.0792 - acc: 0.1205 - val_loss: 3.7972 - val_acc: 0.1406
Epoch 2/30
 - 156s - loss: 3.9804 - acc: 0.1252 - val_loss: 3.8069 - val_acc: 0.1406
Epoch 3/30
 - 156s - loss: 3.9762 - acc: 0.1250 - val_loss: 3.8165 - val_acc: 0.1406
Epoch 4/30
 - 156s - loss: 3.9709 - acc: 0.1272 - val_loss: 3.8309 - val_acc: 0.0950
Epoch 5/30
 - 156s - loss: 3.9712 - acc: 0.1255 - val_loss: 3.8385 - val_acc: 0.1406
Epoch 6/30
 - 156s - loss: 3.9705 - acc: 0.1262 - val_loss: 3.8367 - val_

Epoch 29/30
 - 153s - loss: 3.2574 - acc: 0.2502 - val_loss: 4.0410 - val_acc: 0.1606
Epoch 30/30
 - 153s - loss: 3.2087 - acc: 0.2572 - val_loss: 4.0363 - val_acc: 0.1643

Validating on opus 130
Train on 25609 samples, validate on 2464 samples
Epoch 1/30
 - 165s - loss: 4.1204 - acc: 0.1159 - val_loss: 3.7083 - val_acc: 0.1530
Epoch 2/30
 - 151s - loss: 4.0070 - acc: 0.1223 - val_loss: 3.7318 - val_acc: 0.1530
Epoch 3/30
 - 151s - loss: 3.9833 - acc: 0.1243 - val_loss: 3.7231 - val_acc: 0.1425
Epoch 4/30
 - 151s - loss: 3.9054 - acc: 0.1428 - val_loss: 3.6172 - val_acc: 0.1534
Epoch 5/30
 - 151s - loss: 3.8032 - acc: 0.1574 - val_loss: 3.6050 - val_acc: 0.1587
Epoch 6/30
 - 151s - loss: 3.7447 - acc: 0.1622 - val_loss: 3.6162 - val_acc: 0.1603
Epoch 7/30
 - 151s - loss: 3.6838 - acc: 0.1715 - val_loss: 3.5531 - val_acc: 0.1794
Epoch 8/30
 - 151s - loss: 3.6223 - acc: 0.1848 - val_loss: 3.5254 - val_acc: 0.1806
Epoch 9/30
 - 151s - loss: 3.5607 - acc: 0.1962 - val_loss: 3.5139 - val_ac

Epoch 2/30
 - 163s - loss: 3.9733 - acc: 0.1215 - val_loss: 4.2786 - val_acc: 0.1564
Epoch 3/30
 - 160s - loss: 3.9549 - acc: 0.1253 - val_loss: 4.2456 - val_acc: 0.1610
Epoch 4/30
 - 166s - loss: 3.9138 - acc: 0.1349 - val_loss: 4.2149 - val_acc: 0.1437
Epoch 5/30
 - 156s - loss: 3.8329 - acc: 0.1494 - val_loss: 4.1757 - val_acc: 0.1583
Epoch 6/30
 - 156s - loss: 3.7602 - acc: 0.1568 - val_loss: 4.1497 - val_acc: 0.1557
Epoch 7/30
 - 156s - loss: 3.7084 - acc: 0.1613 - val_loss: 4.1294 - val_acc: 0.1597
Epoch 8/30
 - 156s - loss: 3.6466 - acc: 0.1735 - val_loss: 4.1093 - val_acc: 0.1710
Epoch 9/30
 - 156s - loss: 3.5789 - acc: 0.1913 - val_loss: 4.1144 - val_acc: 0.1850
Epoch 10/30
 - 156s - loss: 3.5192 - acc: 0.2042 - val_loss: 4.0850 - val_acc: 0.1843
Epoch 11/30
 - 156s - loss: 3.4641 - acc: 0.2161 - val_loss: 4.0917 - val_acc: 0.1896
Epoch 12/30
 - 156s - loss: 3.3984 - acc: 0.2253 - val_loss: 4.1229 - val_acc: 0.1783
Epoch 13/30
 - 156s - loss: 3.3367 - acc: 0.2378 - val_loss: 4

 - 153s - loss: 3.9802 - acc: 0.1270 - val_loss: 3.8245 - val_acc: 0.1406
Epoch 6/30
 - 154s - loss: 3.9765 - acc: 0.1260 - val_loss: 3.8337 - val_acc: 0.1393
Epoch 7/30
 - 153s - loss: 3.9722 - acc: 0.1278 - val_loss: 3.8060 - val_acc: 0.1406
Epoch 8/30
 - 154s - loss: 3.9020 - acc: 0.1409 - val_loss: 3.6937 - val_acc: 0.1584
Epoch 9/30
 - 154s - loss: 3.7977 - acc: 0.1568 - val_loss: 3.6314 - val_acc: 0.1624
Epoch 10/30
 - 154s - loss: 3.7291 - acc: 0.1610 - val_loss: 3.6060 - val_acc: 0.1696
Epoch 11/30
 - 154s - loss: 3.6538 - acc: 0.1727 - val_loss: 3.5799 - val_acc: 0.1881
Epoch 12/30
 - 154s - loss: 3.5812 - acc: 0.1932 - val_loss: 3.5680 - val_acc: 0.2132
Epoch 13/30
 - 154s - loss: 3.5093 - acc: 0.2084 - val_loss: 3.5359 - val_acc: 0.2343
Epoch 14/30
 - 154s - loss: 3.4452 - acc: 0.2174 - val_loss: 3.5409 - val_acc: 0.2343
Epoch 15/30
 - 154s - loss: 3.3822 - acc: 0.2264 - val_loss: 3.5421 - val_acc: 0.2343
Epoch 16/30
 - 154s - loss: 3.3147 - acc: 0.2387 - val_loss: 3.5807 - 

### Weigh results

In [None]:
RESULTS = pd.read_csv('results/BACKUP.csv')
for index, row in RESULTS.iterrows():
    #Get the opus
    opus = RESULTS.iloc[[index]].opus.values[0]
    
    #Count the chords in it
    weight = (data[data['op'] == opus]).shape[0]
    
    #Multiply the metrics by the weight
    row = RESULTS.iloc[[index]]
    row[['val_loss', 'val_acc', 'loss', 'acc']] *= weight
    
    #Overwrite the old data in the RESULTS
    RESULTS.iloc[[index]] = row
    if(index%20 == 0):
        print("{0:.2f} percent done".format(index/RESULTS.shape[0]))
        
print("Done!")
pd.DataFrame.to_csv(RESULTS, './results/dropout_WEIGHED_BACKUP.csv', index=False)

# Results

### Select which results to use

In [None]:
#Select to load the weighed or the unweighed results
RESULTS = pd.read_csv('results/dropout_WEIGHED_BACKUP.csv')
RESULTS = pd.read_csv('results/dropout_BACKUP.csv')
    
#Reindex
RESULTS = RESULTS.set_index(['reg','opus'])

### For each regularization value, calculate the average cross-validated score and output it

In [None]:
AVERAGES = pd.DataFrame()

#For each level of regularization
for regularization, cvscores in RESULTS.groupby(level=0):
    average = pd.DataFrame()
    
    #Iterate through all folds and extract the highest validation scores
    for opus, fold in cvscores.groupby(level=1):

        #Retrieve the best score
        best = fold[fold['val_acc'] == fold['val_acc'].max()]
        average = average.append(best)
    
    #Make a pretty dataframe of the mean
    average = average.describe().loc[['mean']]
    average = average.rename(index={'mean': regularization})
    
    #Take the mean scores for this regularization value and store them in AVERAGE for comparisons
    AVERAGES = AVERAGES.append(average)

BEST = AVERAGES[AVERAGES['val_acc'] == AVERAGES['val_acc'].max()]

print("Full table of cross validated scores for each regularization value")
display(AVERAGES)

print("Best score")
display(BEST)

# Graphs

### Chords in each opus

In [None]:
sizes = data.groupby('op')[['chord_#I']].count()
sizes = sizes.rename(columns={'chord_#I': 'count'})
#sizes = sizes.sort_values(by = 'count', ascending=False)
sizes = sizes.append(sizes.describe().loc[['mean']])

fig, ax = plt.subplots(figsize=(15,10))
sns.barplot(data=sizes.T)
ax.set_title("Amount of chords in each opus")
ax.set_ylabel("Chords")
ax.set_xlabel("Opus")
plt.show()

fig.savefig("./figs/chords_per_opus.png")

### Validation accuracy for each opus

In [None]:
#Use the data from the reg_value that produced the best results
best_reg = BEST.index.values[0]
df = RESULTS.loc[best_reg]

#For each of the nine folds/opuses...
scores = pd.DataFrame()
for opus, fold in df.groupby(level=0):
        
        #Drop potential duplicate values
        fold = fold.drop_duplicates(subset='val_acc')
        
        #Retrieve the scores when the val_acc was highest
        score = fold[fold['val_acc'] == fold['val_acc'].max()]
        
        #Store
        scores = scores.append(score)

#Sort scores by valacc
#scores = scores.sort_values(by='val_acc', ascending=False)

#Append the average score to the individual scores
avg = BEST.rename(index={best_reg: 'mean'})
    
scores = scores.append(avg)

#Plot them
fig, ax = plt.subplots(figsize=(15,10))
sns.barplot(data=scores[['val_acc']].T)
#ax.set_title("Weighed accuracy on each opus")
ax.set_title("Accuracy on each opus")
ax.set_ylabel("Accuracy (%)")
ax.set_xlabel("Opus")
ax.set_yticklabels(np.around(ax.get_yticks()* 100,2))
#fig.savefig("./figs/Weighed_NNacc_V.png")
fig.savefig("./figs/NNacc_V.png")

fig, ax = plt.subplots(figsize=(10,15))
sns.barplot(data=scores[['val_acc']].T, orient='horizontal')
#ax.set_title("Weighed accuracy on each opus")
ax.set_title("Accuracy on each opus")
ax.set_xlabel("Accuracy (%)")
ax.set_ylabel("Opus")
ax.set_xticklabels(np.around(ax.get_xticks()* 100,2))
#fig.savefig("./figs/Weighed_NNacc_H.png")
fig.savefig("./figs/NNacc_H.png")

plt.show()

In [19]:
f = open("savefile.txt", "r")
for line in f:
    print(line)

START

Checking dropout strength 0



Validating on opus 127

Train on 25863 samples, validate on 2210 samples

Epoch 1/30

 - 158s - loss: 4.0402 - acc: 0.1247 - val_loss: 4.1691 - val_acc: 0.1253

Epoch 2/30

 - 153s - loss: 3.9363 - acc: 0.1284 - val_loss: 4.1654 - val_acc: 0.1253

Epoch 3/30

 - 153s - loss: 3.9358 - acc: 0.1281 - val_loss: 4.1949 - val_acc: 0.1253

Epoch 4/30

 - 154s - loss: 3.9349 - acc: 0.1281 - val_loss: 4.1828 - val_acc: 0.1253

Epoch 5/30

 - 149s - loss: 3.9367 - acc: 0.1279 - val_loss: 4.1958 - val_acc: 0.1253

Epoch 6/30

 - 149s - loss: 3.9348 - acc: 0.1281 - val_loss: 4.1942 - val_acc: 0.1253

Epoch 7/30

 - 149s - loss: 3.9348 - acc: 0.1271 - val_loss: 4.2205 - val_acc: 0.1253

Epoch 8/30

 - 149s - loss: 3.9346 - acc: 0.1288 - val_loss: 4.2039 - val_acc: 0.1253

Epoch 9/30

 - 149s - loss: 3.9355 - acc: 0.1271 - val_loss: 4.2164 - val_acc: 0.1253

Epoch 10/30

 - 149s - loss: 3.9350 - acc: 0.1268 - val_loss: 4.1953 - val_acc: 0.1253

Epoch 11/30

 - 1

In [48]:
RESULTS = pd.DataFrame()
import re
f = open("savefile.txt", "r")
for line in f:

    D  = re.search('Checking dropout strength (.*)',line, re.IGNORECASE)
    E  = re.search('Validating on opus (\d*)',line, re.IGNORECASE)
    O  = re.search('Epoch (\d\d?)/30',line, re.IGNORECASE)
    L  = re.search(' - loss: (\d\.\d*) ',line, re.IGNORECASE)
    A  = re.search(' - acc: (\d\.\d*) ',line, re.IGNORECASE)
    VL = re.search(' - val_loss: (\d\.\d*) ',line, re.IGNORECASE)
    VA = re.search(' - val_acc: (\d\.\d*)',line, re.IGNORECASE)

    if D:
        dropout = D.group(1)
    if E:
        epoch = E.group(1)
    if O:
        opus = O.group(1)
    if L:
        loss = L.group(1)
    if A:
        acc = A.group(1)
    if VL:
        vloss = VL.group(1)    
    if VA:
        vacc = VA.group(1)
     
    if(L):
        entry = {'dropout' : [dropout],
                 'opus': [opus],
                 'epoch': [epoch],
                 'val_loss':[vloss],
                 'val_acc':[vacc],
                 'loss': [loss],
                 'acc': [acc]
                }
        ENTRY = pd.DataFrame(entry)
        RESULTS = RESULTS.append(ENTRY)