# Results - Classification on simulated events
This notebook is the primary source of plots and tables for the classificationn part of the thesis, 
with the goal of keeping every table and figure as standardized as possible. (And who has the time to update 90 tables one by one anyway).

## Questions
* Descriptive statistics
    - Should descriptive statistics of the simulated data be included?\
    If so, how much? And should it be included for each fold in the k-fold cross-validation?
* Classification results
    - Breakdown of results based on event type? Single, double, close double?
    Reasonable to include in order to confirm the assumption that close doubles are the
    most difficult event type to classify correctly in simulated data
    Random state is included, so should be simple to reproduce the indices


## TODO
* Implement reproducing the validation indices for each fold based on the random seed from config
* Run this for pixelmod data

## Handy links
* [matplotlib-plots to latex](https://timodenk.com/blog/exporting-matplotlib-plots-to-latex/)
* [Robert's thesis df output](https://github.com/ATTPC/VAE-event-classification/blob/master/src/make_classification_table.py)

In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
from master_scripts.data_functions import get_git_root
from master_scripts.analysis_functions import load_experiment, experiment_metrics_to_df
import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

THESIS_PATH = "../../../master_thesis/"

# Pre-processed simulated data - no additional modifications
This is the basic metrics for all the models trained on simulated data.
The basic pre-processing includes formatting and min-max normalization.

## Logistic regression

In [2]:
# Load logistic regression experiment
log_ex_id = "003e1b62336e"
log_ex = load_experiment(log_ex_id)
#log_model = tf.keras.models.load_model(repo_root + "models/" + log_ex_id + ".h5")

log_metrics = experiment_metrics_to_df(log_ex)
display(log_metrics)
log_means = log_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
log_means = log_means.rename(index={'mean': 'log_mean', 'std': 'log_std'})
display(log_means)

Unnamed: 0,accuracy_score,f1_score,matthews_corrcoef,roc_auc_score,TN,FP,FN,TP
fold_0,0.740821,0.720636,0.486738,0.832966,154484,35527,62961,127028
fold_1,0.721626,0.736618,0.446164,0.832152,126294,63718,42064,147924
fold_2,0.720718,0.737242,0.444982,0.832358,124988,65024,41103,148885
fold_3,0.709929,0.739569,0.431198,0.833482,113262,76750,33477,156511
fold_4,0.71865,0.737511,0.4419,0.83187,122891,67121,39792,150196


Unnamed: 0,accuracy_score,f1_score,matthews_corrcoef,roc_auc_score,TN,FP,FN,TP
log_mean,0.722349,0.734316,0.450196,0.832565,128383.8,61628.0,43879.4,146108.8
log_std,0.011317,0.007727,0.021263,0.000652,15459.292099,15459.714179,11180.656255,11180.229591


## Small dense network

In [3]:
# Load logistic regression experiment
dense_ex_id = "c19117f62bd8"
dense_ex = load_experiment(dense_ex_id)
#log_model = tf.keras.models.load_model(repo_root + "models/" + dense_ex_id + ".h5")

dense_metrics = experiment_metrics_to_df(dense_ex)
display(dense_metrics)
dense_means = dense_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
dense_means = dense_means.rename(index={'mean': 'dense_mean', 'std': 'dense_std'})
display(dense_means)

Unnamed: 0,accuracy_score,f1_score,matthews_corrcoef,roc_auc_score,TN,FP,FN,TP
fold_0,0.927039,0.921922,0.861499,0.962638,188591,1420,26305,163684
fold_1,0.908492,0.908957,0.817028,0.959432,171644,18368,16405,173583
fold_2,0.909421,0.910222,0.818975,0.960006,171095,18917,15503,174485
fold_3,0.879271,0.885307,0.76279,0.919938,157063,32949,12928,177060
fold_4,0.904984,0.906353,0.810318,0.953581,169170,20842,15264,174724


Unnamed: 0,accuracy_score,f1_score,matthews_corrcoef,roc_auc_score,TN,FP,FN,TP
dense_mean,0.905842,0.906552,0.814122,0.951119,171512.6,18499.2,17281.0,172707.2
dense_std,0.017146,0.013294,0.035093,0.017742,11248.801238,11249.180801,5205.240965,5204.807556


## Small CNN

In [4]:
# Load logistic regression experiment
cnn_ex_id = "b56e64ac3b1c"
cnn_ex = load_experiment(cnn_ex_id)
#log_model = tf.keras.models.load_model(repo_root + "models/" + cnn_ex_id + ".h5")

cnn_metrics = experiment_metrics_to_df(cnn_ex)
display(cnn_metrics)
cnn_means = cnn_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
cnn_means = cnn_means.rename(index={'mean': 'cnn_mean', 'std': 'cnn_std'})
display(cnn_means)

Unnamed: 0,accuracy_score,f1_score,matthews_corrcoef,roc_auc_score,TN,FP,FN,TP
fold_0,0.950571,0.948138,0.905124,0.982672,189521,490,18293,171696
fold_1,0.960053,0.958639,0.922254,0.985671,188904,1108,14072,175916
fold_2,0.963263,0.962188,0.928021,0.986537,188422,1590,12370,177618
fold_3,0.961945,0.960966,0.925047,0.987761,187532,2480,11981,178007
fold_4,0.965118,0.96411,0.931703,0.988156,188713,1299,11956,178032


Unnamed: 0,accuracy_score,f1_score,matthews_corrcoef,roc_auc_score,TN,FP,FN,TP
cnn_mean,0.96019,0.958808,0.92243,0.98616,188618.4,1393.4,13734.4,176253.8
cnn_std,0.005687,0.006286,0.010291,0.002185,728.704535,729.014266,2692.281245,2691.857946


## Pretrained - VGG
As an additional baseline for performance, we include a pretrained SOTA network
where trained on the ImageNet database.

Due to the size of our detector images (16x16) compared with the size the VGG network is
designed for, we cannot use all layers in the VGG network. This stems from the use of max-pooling
which effectively reduces the image size to half (8x8) each time the input is passed through such a
layer. At some point our input is too small to pass through to the rest of the network.
We therefore cut the network at the point where this becomes and issue.
Alternatively, one could possibly keep the depth but remove max-pooling layers.

In [5]:
# Load logistic regression experiment
pretrained_ex_id = "c96f61c743c2"
pretrained_ex = load_experiment(pretrained_ex_id)
#log_model = tf.keras.models.load_model(repo_root + "models/" + pretrained_ex_id + ".h5")

pretrained_metrics = experiment_metrics_to_df(pretrained_ex)
display(pretrained_metrics)
pretrained_means = pretrained_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
pretrained_means = pretrained_means.rename(index={'mean': 'pretrained_mean', 'std': 'pretrained_std'})
display(pretrained_means)

Unnamed: 0,accuracy_score,f1_score,matthews_corrcoef,roc_auc_score,TN,FP,FN,TP
fold_0,0.884926,0.870529,0.789605,0.933739,189264,747,42981,147008
fold_1,0.907403,0.902052,0.8197,0.949066,182786,7226,27961,162027
fold_2,0.905205,0.902705,0.811477,0.951145,176872,13140,22882,167106
fold_3,0.882782,0.884662,0.765974,0.944825,164630,25382,19161,170827
fold_4,0.912111,0.909292,0.82581,0.956346,179204,10808,22590,167398


Unnamed: 0,accuracy_score,f1_score,matthews_corrcoef,roc_auc_score,TN,FP,FN,TP
pretrained_mean,0.898485,0.893848,0.802513,0.947024,178551.2,11460.6,27115.0,162873.2
pretrained_std,0.013609,0.015909,0.024598,0.008505,9079.417448,9079.712429,9408.848256,9408.426686


## Combine the metrics into one table
We use the standard deviation in the folds as an error measure, and report the mean classification f1_score.

In [6]:
all_means = pd.DataFrame(
    [
        log_means.iloc[0][['f1_score', 'TN', 'FP', 'FN', 'TP']],
        dense_means.iloc[0][['f1_score', 'TN', 'FP', 'FN', 'TP']],
        cnn_means.iloc[0][['f1_score', 'TN', 'FP', 'FN', 'TP']],
        pretrained_means.iloc[0][['f1_score', 'TN', 'FP', 'FN', 'TP']]
    ]
).rename(
    index={
        'log_mean': 'Logistic',
        'dense_mean': 'Dense',
        'cnn_mean': 'Convolutional',
        'pretrained_mean': 'Pretrained VGG16',
    }
)
display(all_means)
all_std = pd.DataFrame(
    [
        log_means.iloc[1][['f1_score', 'TN', 'FP', 'FN', 'TP']],
        dense_means.iloc[1][['f1_score', 'TN', 'FP', 'FN', 'TP']],
        cnn_means.iloc[1][['f1_score', 'TN', 'FP', 'FN', 'TP']],
        pretrained_means.iloc[1][['f1_score', 'TN', 'FP', 'FN', 'TP']],
    ]
).rename(
    index={
        'log_std': 'Logistic',
        'dense_std': 'Dense',
        'cnn_std': 'Convolutional',
        'pretrained_std': 'Pretrained VGG16',
    }
)
display(all_std)

Unnamed: 0,f1_score,TN,FP,FN,TP
Logistic,0.734316,128383.8,61628.0,43879.4,146108.8
Dense,0.906552,171512.6,18499.2,17281.0,172707.2
Convolutional,0.958808,188618.4,1393.4,13734.4,176253.8
Pretrained VGG16,0.893848,178551.2,11460.6,27115.0,162873.2


Unnamed: 0,f1_score,TN,FP,FN,TP
Logistic,0.007727,15459.292099,15459.714179,11180.656255,11180.229591
Dense,0.013294,11248.801238,11249.180801,5205.240965,5204.807556
Convolutional,0.006286,728.704535,729.014266,2692.281245,2691.857946
Pretrained VGG16,0.015909,9079.417448,9079.712429,9408.848256,9408.426686


### Output combined frame to latex

In [15]:
rows = all_means.index
f1_str_array = np.zeros((1, all_means.shape[0]), dtype=object)
for i in range(all_means.shape[0]):
    f1_str_array[0, i] = r"$\underset{{\num{{+- {:.3e} }}  }}{{\num{{ {:.3g} }} }}$".format(
        all_std["f1_score"][i], all_means["f1_score"][i])
        
f1_df = pd.DataFrame(f1_str_array, columns=rows)

section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "classification_simulated_f1.tex"
caption = """
Mean F1-scores for classification of simulated data using multiple models. 
Error estimates are the standard deviation in results from k-fold cross-validation 
with $K=5$ folds.
"""
label = "tab:classification-simulated-f1"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    f1_df.to_latex(fp, escape=False, caption=caption, label=label, index=False)


In [21]:
columns=["TN", "FP", "FN", "TP"]
rows = all_means.index
confmat_str_array = np.zeros((all_means.shape[0], 4), dtype=object)
for i in range(confmat_str_array.shape[0]):
    for j in range(confmat_str_array.shape[1]):
        confmat_str_array[i, j] = r"$\underset{{\num{{+- {:.3e} }}  }}{{\num{{ {:.3g} }} }}$".format(
            all_std[columns].iloc[i, j], all_means[columns].iloc[i, j])
        
confmat_df = pd.DataFrame(confmat_str_array, columns=columns, index=rows)

section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "classification_simulated_confmat.tex"
caption = """
Mean confusion matrix values for classification of simulated data using multiple models. 
Error estimates are the standard deviation in results from k-fold cross-validation with $K=5$ folds.
"""
label = "tab:classification-simulated-confmat"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    confmat_df.to_latex(fp, escape=False, caption=caption, label=label)


# Pre-processed simulated data - Pixel modified
The basic pre-processing includes formatting and min-max normalization.
Additionally, the data has had the top and bottom lines of pixels set to 0, plus
one pixel inside the detector permanently 0 (which idx again?).

## Logistic regression

In [2]:
# Load logistic regression experiment
log_ex_id = "003e1b62336e"
log_ex = load_experiment(log_ex_id)
#log_model = tf.keras.models.load_model(repo_root + "models/" + log_ex_id + ".h5")

log_metrics = experiment_metrics_to_df(log_ex)
display(log_metrics)
log_means = log_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
log_means = log_means.rename(index={'mean': 'log_mean', 'std': 'log_std'})
display(log_means)

Unnamed: 0,accuracy_score,f1_score,matthews_corrcoef,roc_auc_score,TN,FP,FN,TP
fold_0,0.740821,0.720636,0.486738,0.832966,154484,35527,62961,127028
fold_1,0.721626,0.736618,0.446164,0.832152,126294,63718,42064,147924
fold_2,0.720718,0.737242,0.444982,0.832358,124988,65024,41103,148885
fold_3,0.709929,0.739569,0.431198,0.833482,113262,76750,33477,156511
fold_4,0.71865,0.737511,0.4419,0.83187,122891,67121,39792,150196


Unnamed: 0,accuracy_score,f1_score,matthews_corrcoef,roc_auc_score,TN,FP,FN,TP
log_mean,0.722349,0.734316,0.450196,0.832565,128383.8,61628.0,43879.4,146108.8
log_std,0.011317,0.007727,0.021263,0.000652,15459.292099,15459.714179,11180.656255,11180.229591


## Small dense network

In [3]:
# Load logistic regression experiment
dense_ex_id = "c19117f62bd8"
dense_ex = load_experiment(dense_ex_id)
#log_model = tf.keras.models.load_model(repo_root + "models/" + dense_ex_id + ".h5")

dense_metrics = experiment_metrics_to_df(dense_ex)
display(dense_metrics)
dense_means = dense_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
dense_means = dense_means.rename(index={'mean': 'dense_mean', 'std': 'dense_std'})
display(dense_means)

Unnamed: 0,accuracy_score,f1_score,matthews_corrcoef,roc_auc_score,TN,FP,FN,TP
fold_0,0.927039,0.921922,0.861499,0.962638,188591,1420,26305,163684
fold_1,0.908492,0.908957,0.817028,0.959432,171644,18368,16405,173583
fold_2,0.909421,0.910222,0.818975,0.960006,171095,18917,15503,174485
fold_3,0.879271,0.885307,0.76279,0.919938,157063,32949,12928,177060
fold_4,0.904984,0.906353,0.810318,0.953581,169170,20842,15264,174724


Unnamed: 0,accuracy_score,f1_score,matthews_corrcoef,roc_auc_score,TN,FP,FN,TP
dense_mean,0.905842,0.906552,0.814122,0.951119,171512.6,18499.2,17281.0,172707.2
dense_std,0.017146,0.013294,0.035093,0.017742,11248.801238,11249.180801,5205.240965,5204.807556


## Small CNN

In [4]:
# Load logistic regression experiment
cnn_ex_id = "b56e64ac3b1c"
cnn_ex = load_experiment(cnn_ex_id)
#log_model = tf.keras.models.load_model(repo_root + "models/" + cnn_ex_id + ".h5")

cnn_metrics = experiment_metrics_to_df(cnn_ex)
display(cnn_metrics)
cnn_means = cnn_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
cnn_means = cnn_means.rename(index={'mean': 'cnn_mean', 'std': 'cnn_std'})
display(cnn_means)

Unnamed: 0,accuracy_score,f1_score,matthews_corrcoef,roc_auc_score,TN,FP,FN,TP
fold_0,0.950571,0.948138,0.905124,0.982672,189521,490,18293,171696
fold_1,0.960053,0.958639,0.922254,0.985671,188904,1108,14072,175916
fold_2,0.963263,0.962188,0.928021,0.986537,188422,1590,12370,177618
fold_3,0.961945,0.960966,0.925047,0.987761,187532,2480,11981,178007
fold_4,0.965118,0.96411,0.931703,0.988156,188713,1299,11956,178032


Unnamed: 0,accuracy_score,f1_score,matthews_corrcoef,roc_auc_score,TN,FP,FN,TP
cnn_mean,0.96019,0.958808,0.92243,0.98616,188618.4,1393.4,13734.4,176253.8
cnn_std,0.005687,0.006286,0.010291,0.002185,728.704535,729.014266,2692.281245,2691.857946


## Pretrained - VGG
As an additional baseline for performance, we include a pretrained SOTA network
where trained on the ImageNet database.

Due to the size of our detector images (16x16) compared with the size the VGG network is
designed for, we cannot use all layers in the VGG network. This stems from the use of max-pooling
which effectively reduces the image size to half (8x8) each time the input is passed through such a
layer. At some point our input is too small to pass through to the rest of the network.
We therefore cut the network at the point where this becomes and issue.
Alternatively, one could possibly keep the depth but remove max-pooling layers.

In [5]:
# Load logistic regression experiment
pretrained_ex_id = "c96f61c743c2"
pretrained_ex = load_experiment(pretrained_ex_id)
#log_model = tf.keras.models.load_model(repo_root + "models/" + pretrained_ex_id + ".h5")

pretrained_metrics = experiment_metrics_to_df(pretrained_ex)
display(pretrained_metrics)
pretrained_means = pretrained_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
pretrained_means = pretrained_means.rename(index={'mean': 'pretrained_mean', 'std': 'pretrained_std'})
display(pretrained_means)

Unnamed: 0,accuracy_score,f1_score,matthews_corrcoef,roc_auc_score,TN,FP,FN,TP
fold_0,0.884926,0.870529,0.789605,0.933739,189264,747,42981,147008
fold_1,0.907403,0.902052,0.8197,0.949066,182786,7226,27961,162027
fold_2,0.905205,0.902705,0.811477,0.951145,176872,13140,22882,167106
fold_3,0.882782,0.884662,0.765974,0.944825,164630,25382,19161,170827
fold_4,0.912111,0.909292,0.82581,0.956346,179204,10808,22590,167398


Unnamed: 0,accuracy_score,f1_score,matthews_corrcoef,roc_auc_score,TN,FP,FN,TP
pretrained_mean,0.898485,0.893848,0.802513,0.947024,178551.2,11460.6,27115.0,162873.2
pretrained_std,0.013609,0.015909,0.024598,0.008505,9079.417448,9079.712429,9408.848256,9408.426686


## Combine the metrics into one table
We use the standard deviation in the folds as an error measure, and report the mean classification f1_score.

In [6]:
all_means = pd.DataFrame(
    [
        log_means.iloc[0][['f1_score', 'TN', 'FP', 'FN', 'TP']],
        dense_means.iloc[0][['f1_score', 'TN', 'FP', 'FN', 'TP']],
        cnn_means.iloc[0][['f1_score', 'TN', 'FP', 'FN', 'TP']],
        pretrained_means.iloc[0][['f1_score', 'TN', 'FP', 'FN', 'TP']]
    ]
).rename(
    index={
        'log_mean': 'Logistic',
        'dense_mean': 'Dense',
        'cnn_mean': 'Convolutional',
        'pretrained_mean': 'Pretrained VGG16',
    }
)
display(all_means)
all_std = pd.DataFrame(
    [
        log_means.iloc[1][['f1_score', 'TN', 'FP', 'FN', 'TP']],
        dense_means.iloc[1][['f1_score', 'TN', 'FP', 'FN', 'TP']],
        cnn_means.iloc[1][['f1_score', 'TN', 'FP', 'FN', 'TP']],
        pretrained_means.iloc[1][['f1_score', 'TN', 'FP', 'FN', 'TP']],
    ]
).rename(
    index={
        'log_std': 'Logistic',
        'dense_std': 'Dense',
        'cnn_std': 'Convolutional',
        'pretrained_std': 'Pretrained VGG16',
    }
)
display(all_std)

Unnamed: 0,f1_score,TN,FP,FN,TP
Logistic,0.734316,128383.8,61628.0,43879.4,146108.8
Dense,0.906552,171512.6,18499.2,17281.0,172707.2
Convolutional,0.958808,188618.4,1393.4,13734.4,176253.8
Pretrained VGG16,0.893848,178551.2,11460.6,27115.0,162873.2


Unnamed: 0,f1_score,TN,FP,FN,TP
Logistic,0.007727,15459.292099,15459.714179,11180.656255,11180.229591
Dense,0.013294,11248.801238,11249.180801,5205.240965,5204.807556
Convolutional,0.006286,728.704535,729.014266,2692.281245,2691.857946
Pretrained VGG16,0.015909,9079.417448,9079.712429,9408.848256,9408.426686


### Output combined frame to latex

In [15]:
rows = all_means.index
f1_str_array = np.zeros((1, all_means.shape[0]), dtype=object)
for i in range(all_means.shape[0]):
    f1_str_array[0, i] = r"$\underset{{\num{{+- {:.3e} }}  }}{{\num{{ {:.3g} }} }}$".format(
        all_std["f1_score"][i], all_means["f1_score"][i])
        
f1_df = pd.DataFrame(f1_str_array, columns=rows)

section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "classification_simulated_f1.tex"
caption = """
Mean F1-scores for classification of simulated data using multiple models. 
Error estimates are the standard deviation in results from k-fold cross-validation 
with $K=5$ folds.
"""
label = "tab:classification-simulated-f1"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    f1_df.to_latex(fp, escape=False, caption=caption, label=label, index=False)


In [21]:
columns=["TN", "FP", "FN", "TP"]
rows = all_means.index
confmat_str_array = np.zeros((all_means.shape[0], 4), dtype=object)
for i in range(confmat_str_array.shape[0]):
    for j in range(confmat_str_array.shape[1]):
        confmat_str_array[i, j] = r"$\underset{{\num{{+- {:.3e} }}  }}{{\num{{ {:.3g} }} }}$".format(
            all_std[columns].iloc[i, j], all_means[columns].iloc[i, j])
        
confmat_df = pd.DataFrame(confmat_str_array, columns=columns, index=rows)

section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "classification_simulated_confmat.tex"
caption = """
Mean confusion matrix values for classification of simulated data using multiple models. 
Error estimates are the standard deviation in results from k-fold cross-validation with $K=5$ folds.
"""
label = "tab:classification-simulated-confmat"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    confmat_df.to_latex(fp, escape=False, caption=caption, label=label)
