# Run the classifier

This notebook uses functions from [classifier.py](classifier.py) in order to run the classifier.

In [None]:
import pandas as pd
import numpy as np

import classifier

In [None]:
SEQ_LEN=180
SPLIT=(0.6, 0.2, 0.2)

## Load data

Read data from the directory that contains our preprocessed data. We still have some preprocessing to do though, because the length of input sequences is a variable that we want to be able to fiddle with.

In [None]:
data_path = 'dataset/displacements/'

inputs_by_player = classifier.players_inputs(data_path)

inputs_by_player['1'].shape

In [None]:
inputs_by_player.keys()

## Organize data into sequences and split data into training, validation, and test sets

In [None]:
(train_x, train_y), (valid_x, valid_y), (test_x, test_y) = classifier.prepare_data(inputs_by_player, SEQ_LEN, SPLIT)

## Create the model

Here is the constructor code for the best version of the model that I tested. Some of the variables that I tested are now hardcoded in the current version, and so the code that is actually in [classifier.py](classifier.py) is not guaranteed to be the same.

```python
def createClassifier(width=3, seq_len=180):
    """
    Returns a classifier model with the given input shape. Default to width of 3, sequence length of 180.
    """
    input_layer = Input(shape=(seq_len, width))
    conv1 = Conv1D(filters=32, kernel_size=7, strides=2, activation=ELU())(input_layer)
    conv2 = Conv1D(filters=32, kernel_size=3, strides=1, activation=ELU())(input_layer)

    catted = Concatenate(axis=1)([conv1, conv2])
    elu1 = ELU(32)(catted)
    conv3 = Conv1D(filters=32, kernel_size=2, strides=1, activation=ELU())(elu1)
    conv4 = Conv1D(filters=32, kernel_size=2, strides=1, activation=ELU())(conv3)
    drop1 = Dropout(0.2)(conv4)

    gru1 = LSTM(32, return_sequences=True)(drop1)
    gru2 = LSTM(32)(gru1)
    drop2 = Dropout(0.2)(gru2)

    output = Dense(len(players_set), activation='softmax')(drop2)

    model = Model(inputs=input_layer, outputs=output)
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model
```

In [None]:
model = classifier.createClassifier(width=3, seq_len=SEQ_LEN)

## Train the model

We now fit the model to the data that we prepared earlier. It is also possible to load in weights instead of running this, so if you already have appropriate weights, this step is optionsl

In [None]:
# You do not need to run this if you already have model weights saved somewhere
history = model.fit(
    train_x, train_y, epochs=60, verbose=1, batch_size=64, validation_data=(valid_x, valid_y)
)

In [None]:
# OPTIONAL: run something like this instead of training if you have saved weights previously
# model.load_weights('models/default')

## Plot learning curve

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.plot('val_accuracy', data=history.history)
plt.plot('accuracy', data=history.history)
plt.ylabel('accuracy')
plt.xlabel('epoch')

plt.legend()

# plt.savefig('results/learning_curve')
plt.show()

## Example prediction



In [None]:
example_df = pd.read_csv('dataset/displacements/0_1_20210718T014445.csv', index_col=None)
predict = model(np.array([example_df.iloc[range(SEQ_LEN), :]], dtype=np.float32), training=False)

classifier.i_to_p[np.argmax(predict)]

In [None]:
np.argmax(predict)

In [None]:
# FIXME
# It would be best to use model store/load rather than checkpoints
# https://www.tensorflow.org/guide/keras/serialization_and_saving
# model.save_weights('models/test')

## Test the model

Run the model on the test data and record performance metrics, namely, top N accuracy. More detailed testing can be found in [test_model.ipynb](test_model.ipynb)

### Get test outputs

In [None]:
test_h = []
test_h = model.predict(test_x)

test_h.shape

### Compare test outputs to labels

In [None]:
ranks = []

for i in range(test_h.shape[0]):
    rankings = np.argsort(test_h[i])
    rank = (len(classifier.players_set)-1) - np.where((rankings == np.argmax(test_y[i])))[0][0]
    ranks.append(rank)

topn_occurences = []
running = 0
for i in range(len(classifier.players_set)):
    topn_occurences.append(ranks.count(i) + running)
    running += ranks.count(i)

topn_acc = [t / topn_occurences[-1] for t in topn_occurences]

print(topn_acc)