**Results - Regression of simulated events**

This notebook is the primary source of plots and tables for the regression part of the thesis, 
with the goal of keeping every table and figure as standardized as possible. (And who has the time to update
90 tables one by one anyway).

**Questions:**
* Descriptive statistics
    - Should descriptive statistics of the simulated data be included?\
    If so, how much? And should it be included for each fold in the k-fold cross-validation?
* Classification results
    - Breakdown of results based on event type? Single, double, close double?
    Reasonable to include in order to confirm the assumption that close doubles are the
    most difficult event type to classify correctly in simulated data
    Random state is included, so should be simple to reproduce the indices


**TODO**
* Implement reproducing the validation indices for each fold based on the random seed from config

**Handy links**
* [matplotlib-plots to latex](https://timodenk.com/blog/exporting-matplotlib-plots-to-latex/)
* [Robert's thesis df output](https://github.com/ATTPC/VAE-event-classification/blob/master/src/make_classification_table.py)

In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
from master_scripts.data_functions import get_git_root, normalize_image_data, event_indices, normalize_position_data
from master_scripts.analysis_functions import load_experiment, experiment_metrics_to_df
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
import tensorflow as tf
import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

THESIS_PATH = "../../../master_thesis/"

In [2]:
# Load test set and normalize
repo_root = get_git_root()
test_images = np.load(repo_root + "data/simulated/test/" + "images_test.npy")
test_images = normalize_image_data(test_images)
test_positions = np.load(repo_root + "data/simulated/test/" + "positions_test.npy") 
test_energies = np.load(repo_root + "data/simulated/test/" + "energies_test.npy") 
test_labels = np.load(repo_root + "data/simulated/test/" + "labels_test.npy") 

# Set up indices for position and energy data
# s = single, d = double, c = close double
s_idx, d_idx, c_idx = event_indices(test_positions)

In [3]:
def regression_metrics(model, x_val, y_val, name):
    """ Calculates regression metrics on the validation data.
    
    :param x_val: normalized detector images
    :param y_val: target values
    """

    y_pred = model.predict(x_val)

    metrics = {}
    metrics['r2_score'] = r2_score(y_val, y_pred)
    metrics['mse'] = mean_squared_error(y_val, y_pred)
    metrics['rmse'] = mean_squared_error(y_val, y_pred, squared=False)
    metrics['mae'] = mean_absolute_error(y_val, y_pred)
    
    df = pd.DataFrame.from_dict(data={name: metrics}, orient='index')
    return df

# Pre-processed simulated data - no additional modifications
This is the basic metrics for all the models trained on simulated data.
The basic pre-processing includes formatting and min-max normalization.

## Single events

### Positions

#### Linear Regression

In [4]:
# Load linear regression experiment
#lin_ex_id = "225ca879103d"
lin_ex_id = "73956fa2f1ae" # latest exp
lin_ex = load_experiment(lin_ex_id)

# Load model and predict
lin_model = tf.keras.models.load_model(repo_root + "models/" + lin_ex_id + ".h5", compile=False)
lin_test = regression_metrics(lin_model, test_images.reshape(test_images.shape[0], 256)[s_idx], normalize_position_data(test_positions[s_idx])[:,:2], "lin_test")
del lin_model #No longer needed, clear memory just in case.

lin_metrics = experiment_metrics_to_df(lin_ex)
#display(lin_metrics)
lin_means = lin_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
lin_means = lin_means.rename(index={'mean': 'lin_mean', 'std': 'lin_std'})
#display(lin_means)

#### Small dense network

In [5]:
# Load logistic regression experiment
#dense_ex_id = "a3716bc3648a"
dense_ex_id = "3724b7186087" # latest exp
dense_ex = load_experiment(dense_ex_id)

# Load model and predict
dense_model = tf.keras.models.load_model(repo_root + "models/" + dense_ex_id + ".h5", compile=False)
dense_test = regression_metrics(dense_model, test_images.reshape(test_images.shape[0], 256)[s_idx], normalize_position_data(test_positions[s_idx])[:,:2], "dense_test")
del dense_model

dense_metrics = experiment_metrics_to_df(dense_ex)
#display(dense_metrics)
dense_means = dense_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
dense_means = dense_means.rename(index={'mean': 'dense_mean', 'std': 'dense_std'})
#display(dense_means)

#### Small CNN

In [6]:
# Load logistic regression experiment
#cnn_ex_id = "1cac590bf1fe"
cnn_ex_id = "ef38aded9bd0" # Latest
cnn_ex = load_experiment(cnn_ex_id)
# Load model and predict
cnn_model = tf.keras.models.load_model(repo_root + "models/" + cnn_ex_id + ".h5", compile=False)
cnn_test = regression_metrics(cnn_model, test_images[s_idx], normalize_position_data(test_positions[s_idx])[:,:2], "cnn_test")
del cnn_model

cnn_metrics = experiment_metrics_to_df(cnn_ex)
#display(cnn_metrics)
cnn_means = cnn_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
cnn_means = cnn_means.rename(index={'mean': 'cnn_mean', 'std': 'cnn_std'})
#display(cnn_means)

#### Pretrained - VGG
As an additional baseline for performance, we include a pretrained SOTA network
where trained on the ImageNet database.

Due to the size of our detector images (16x16) compared with the size the VGG network is
designed for, we cannot use all layers in the VGG network. This stems from the use of max-pooling
which effectively reduces the image size to half (8x8) each time the input is passed through such a
layer. At some point our input is too small to pass through to the rest of the network.
We therefore cut the network at the point where this becomes an issue.
Alternatively, one could possibly keep the depth but remove max-pooling layers.

In [7]:
# Load pretrained regression experiment
#pretrained_ex_id = "d53a2353251f"
pretrained_ex_id = "d17f871649ca" # latest
pretrained_ex = load_experiment(pretrained_ex_id)
# Load model and predict
pretrained_model = tf.keras.models.load_model(repo_root + "models/" + pretrained_ex_id + ".h5", compile=False)
pretrained_test = regression_metrics(pretrained_model, np.concatenate((test_images, test_images, test_images), axis=-1)[s_idx], normalize_position_data(test_positions[s_idx])[:,:2], "pretrained_test")
del pretrained_model

pretrained_metrics = experiment_metrics_to_df(pretrained_ex)
#display(pretrained_metrics)
pretrained_means = pretrained_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
pretrained_means = pretrained_means.rename(index={'mean': 'pretrained_mean', 'std': 'pretrained_std'})
#display(pretrained_means)

#### Custom model

In [8]:
# Load custom regression experiment
#custom_ex_id = "f29da7bbd96f"
custom_ex_id = "b932a0bc5e13" # latest
custom_ex = load_experiment(custom_ex_id)
# Load model and predict
custom_model = tf.keras.models.load_model(repo_root + "models/" + custom_ex_id + ".h5", compile=False)
custom_test = regression_metrics(custom_model, test_images[s_idx], normalize_position_data(test_positions[s_idx])[:,:2], "custom_test")
del custom_model

custom_metrics = experiment_metrics_to_df(custom_ex)
#display(custom_metrics)
custom_means = custom_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
custom_means = custom_means.rename(index={'mean': 'custom_mean', 'std': 'custom_std'})
#display(custom_means)

#### Output
We use the standard deviation in the folds as an error measure, and report the mean classification f1_score.

In [9]:
all_means_single_pos = pd.DataFrame(
    [
        lin_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_mean': 'Linear',
        'dense_mean': 'Dense',
        'cnn_mean': 'CNN',
        'pretrained_mean': 'Pretrained',
        'custom_mean': 'Custom',
    }
)

all_std_single_pos = pd.DataFrame(
    [
        lin_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_std': 'Linear',
        'dense_std': 'Dense',
        'cnn_std': 'CNN',
        'pretrained_std': 'Pretrained',
        'custom_std': 'Custom',
    }
)
all_test_single_pos = pd.DataFrame(
    [
        lin_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_test': 'Linear',
        'dense_test': 'Dense',
        'cnn_test': 'CNN',
        'pretrained_test': 'Pretrained',
        'custom_test': 'Custom',
    }
)
display(all_test_single_pos)
display(all_std_single_pos)

Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.800295,0.01401,0.118362,0.088637
Dense,0.985798,0.000996,0.03156,0.020754
CNN,0.996934,0.000215,0.014665,0.008883
Pretrained,0.826069,0.012203,0.110468,0.063813
Custom,-6.1e-05,0.070156,0.264871,0.229253


Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.002871,0.000178,0.000753,0.001552
Dense,0.001101,7.8e-05,0.001303,0.001062
CNN,0.005085,0.000357,0.007576,0.003637
Pretrained,0.069452,0.004871,0.03679,0.022758
Custom,0.44693,0.031314,0.114368,0.100544


In [10]:
rows = all_test_single_pos.index
r2_str_array_single_pos = np.zeros((1, all_test_single_pos.shape[0]), dtype=object)
for i in range(all_test_single_pos.shape[0]):
    r2_str_array_single_pos[0, i] = r"$\underset{{\num{{+- {:.3e} }}  }}{{\num{{ {:.3g} }} }}$".format(
        all_std_single_pos["r2_score"][i], all_test_single_pos["r2_score"][i])
        
r2_df_single_pos = pd.DataFrame(r2_str_array_single_pos, columns=rows)

section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "regression_simulated_single_position_r2.tex"
caption = """
Mean R2-scores for regresson of positions of origin, on single events in simulated data, using multiple models. 
Error estimates are the standard deviation in results from k-fold cross-validation 
with $K=5$ folds.
"""
label = "tab:regression-simulated-single-position-r2"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    r2_df_single_pos.to_latex(fp, escape=False, caption=caption, label=label, index=False)


### Energy

#### Linear regression

In [11]:
# Load linear regression experiment
#lin_ex_id = "87e8f4558d97"
lin_ex_id = "87e8f4558d97" # latest
lin_ex = load_experiment(lin_ex_id)

# Load model and predict
lin_model = tf.keras.models.load_model(repo_root + "models/" + lin_ex_id + ".h5", compile=False)
lin_test = regression_metrics(lin_model, test_images.reshape(test_images.shape[0], 256)[s_idx], test_energies[s_idx,0], "lin_test")
del lin_model #No longer needed, clear memory just in case.

lin_metrics = experiment_metrics_to_df(lin_ex)
#display(lin_metrics)
lin_means = lin_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
lin_means = lin_means.rename(index={'mean': 'lin_mean', 'std': 'lin_std'})
#display(lin_means)
print(lin_ex['experiment_name'])

generate_results_energies_single_linreg


#### Small dense network

In [12]:
# Load dense regression experiment
#dense_ex_id = "4cab676db128"
dense_ex_id = "38606c5c1fde" # latest
dense_ex = load_experiment(dense_ex_id)
# Load model and predict
dense_model = tf.keras.models.load_model(repo_root + "models/" + dense_ex_id + ".h5", compile=False)
dense_test = regression_metrics(dense_model, test_images.reshape(test_images.shape[0], 256)[s_idx], test_energies[s_idx,0], "dense_test")
del dense_model

dense_metrics = experiment_metrics_to_df(dense_ex)
#display(dense_metrics)
dense_means = dense_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
dense_means = dense_means.rename(index={'mean': 'dense_mean', 'std': 'dense_std'})
#display(dense_means)

#### Small CNN

In [13]:
# Load cnn regression experiment 
#cnn_ex_id = "3a91fd0e74b5"
cnn_ex_id = "de90ad6d063e" # latest
cnn_ex = load_experiment(cnn_ex_id)
# Load model and predict
cnn_model = tf.keras.models.load_model(repo_root + "models/" + cnn_ex_id + ".h5", compile=False)
cnn_test = regression_metrics(cnn_model, test_images[s_idx], test_energies[s_idx,0], "cnn_test")
del cnn_model

cnn_metrics = experiment_metrics_to_df(cnn_ex)
#display(cnn_metrics)
cnn_means = cnn_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
cnn_means = cnn_means.rename(index={'mean': 'cnn_mean', 'std': 'cnn_std'})
#display(cnn_means)

#### Pretrained - VGG16 

In [14]:
# Load logistic regression experiment 
#pretrained_ex_id = "ea8d88850f6e"
pretrained_ex_id = "44315681e795" # latest
pretrained_ex = load_experiment(pretrained_ex_id)
# Load model and predict
pretrained_model = tf.keras.models.load_model(repo_root + "models/" + pretrained_ex_id + ".h5", compile=False)
pretrained_test = regression_metrics(pretrained_model, np.concatenate((test_images, test_images, test_images), axis=-1)[s_idx], test_energies[s_idx,0], "pretrained_test")
del pretrained_model

pretrained_metrics = experiment_metrics_to_df(pretrained_ex)
#display(pretrained_metrics)
pretrained_means = pretrained_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
pretrained_means = pretrained_means.rename(index={'mean': 'pretrained_mean', 'std': 'pretrained_std'})
#display(pretrained_means)

#### Custom model

In [15]:
# Load custom regression experiment 
#custom_ex_id = "3d45e6694b1d"
custom_ex_id = "a56eeebc097c" # latest
custom_ex = load_experiment(custom_ex_id)
# Load model and predict
custom_model = tf.keras.models.load_model(repo_root + "models/" + custom_ex_id + ".h5", compile=False)
custom_test = regression_metrics(custom_model, test_images[s_idx], test_energies[s_idx,0], "custom_test")
del custom_model

custom_metrics = experiment_metrics_to_df(custom_ex)
#display(custom_metrics)
custom_means = custom_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
custom_means = custom_means.rename(index={'mean': 'custom_mean', 'std': 'custom_std'})
#display(custom_means)

#### Output

In [16]:
all_means_single_energy = pd.DataFrame(
    [
        lin_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_mean': 'Linear',
        'dense_mean': 'Dense',
        'cnn_mean': 'CNN',
        'pretrained_mean': 'Pretrained',
        'custom_mean': 'Custom',
    }
)

all_std_single_energy = pd.DataFrame(
    [
        lin_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_std': 'Linear',
        'dense_std': 'Dense',
        'cnn_std': 'CNN',
        'pretrained_std': 'Pretrained',
        'custom_std': 'Custom',
    }
)
all_test_single_energy = pd.DataFrame(
    [
        lin_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_test': 'Linear',
        'dense_test': 'Dense',
        'cnn_test': 'CNN',
        'pretrained_test': 'Pretrained',
        'custom_test': 'Custom',
    }
)
display(all_test_single_energy)
display(all_std_single_energy)

Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.935729,0.005379,0.073342,0.055065
Dense,0.936297,0.005331,0.073017,0.054859
CNN,0.942395,0.004821,0.069434,0.050359
Pretrained,0.931946,0.005696,0.075469,0.057446
Custom,-3e-06,0.083692,0.289296,0.250707


Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.036908,0.00307,0.019433,0.020329
Dense,0.035781,0.002987,0.019819,0.02147
CNN,0.034403,0.00287,0.018563,0.019811
Pretrained,0.037338,0.003118,0.018636,0.019823
Custom,0.410132,0.034138,0.091832,0.079855


In [17]:
rows = all_test_single_energy.index
r2_str_array_single_energy = np.zeros((1, all_test_single_energy.shape[0]), dtype=object)
for i in range(all_test_single_energy.shape[0]):
    r2_str_array_single_energy[0, i] = r"$\underset{{\num{{+- {:.3e} }}  }}{{\num{{ {:.3g} }} }}$".format(
        all_std_single_energy["r2_score"][i], all_test_single_energy["r2_score"][i])
        
r2_df_single_energy = pd.DataFrame(r2_str_array_single_energy, columns=rows)

section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "regression_simulated_single_energy_r2.tex"
caption = """
Mean R2-scores for regresson of energy values, on single events in simulated data, using multiple models. 
Error estimates are the standard deviation in results from k-fold cross-validation 
with $K=5$ folds.
"""
label = "tab:regression-simulated-single-energy-r2"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    r2_df_single_energy.to_latex(fp, escape=False, caption=caption, label=label, index=False)


## Double events

### Positions

#### Linear Regression

In [18]:
# Load linear regression experiment
#lin_ex_id = "7b74b3cfc586"
lin_ex_id = "7b74b3cfc586" # latest
lin_ex = load_experiment(lin_ex_id)

# Load model and predict
lin_model = tf.keras.models.load_model(repo_root + "models/" + lin_ex_id + ".h5", compile=False)
lin_test = regression_metrics(lin_model, test_images.reshape(test_images.shape[0], 256)[d_idx], normalize_position_data(test_positions[d_idx]), "lin_test")
del lin_model #No longer needed, clear memory just in case.

lin_metrics = experiment_metrics_to_df(lin_ex)
#display(lin_metrics)
lin_means = lin_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
lin_means = lin_means.rename(index={'mean': 'lin_mean', 'std': 'lin_std'})
#display(lin_means)

#### Small dense network

In [19]:
# Load logistic regression experiment
#dense_ex_id = "ef55911e49d1"
dense_ex_id = "c922275131fe" # latest
dense_ex = load_experiment(dense_ex_id)
# Load model and predict
dense_model = tf.keras.models.load_model(repo_root + "models/" + dense_ex_id + ".h5", compile=False)
dense_test = regression_metrics(dense_model, test_images.reshape(test_images.shape[0], 256)[d_idx], normalize_position_data(test_positions[d_idx]), "dense_test")
del dense_model

dense_metrics = experiment_metrics_to_df(dense_ex)
#display(dense_metrics)
dense_means = dense_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
dense_means = dense_means.rename(index={'mean': 'dense_mean', 'std': 'dense_std'})
#display(dense_means)

#### Small CNN

In [20]:
# Load logistic regression experiment
#cnn_ex_id = "cc2654aea019"
cnn_ex_id = "130620a09c56" # latest
cnn_ex = load_experiment(cnn_ex_id)
# Load model and predict
cnn_model = tf.keras.models.load_model(repo_root + "models/" + cnn_ex_id + ".h5", compile=False)
cnn_test = regression_metrics(cnn_model, test_images[d_idx], normalize_position_data(test_positions[d_idx]), "cnn_test")
del cnn_model

cnn_metrics = experiment_metrics_to_df(cnn_ex)
#display(cnn_metrics)
cnn_means = cnn_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
cnn_means = cnn_means.rename(index={'mean': 'cnn_mean', 'std': 'cnn_std'})
#display(cnn_means)

#### Pretrained - VGG
As an additional baseline for performance, we include a pretrained SOTA network
where trained on the ImageNet database.

Due to the size of our detector images (16x16) compared with the size the VGG network is
designed for, we cannot use all layers in the VGG network. This stems from the use of max-pooling
which effectively reduces the image size to half (8x8) each time the input is passed through such a
layer. At some point our input is too small to pass through to the rest of the network.
We therefore cut the network at the point where this becomes an issue.
Alternatively, one could possibly keep the depth but remove max-pooling layers.

In [21]:
# Load logistic regression experiment
#pretrained_ex_id = "3c0d1b7bd0ac"
pretrained_ex_id = "80f9eeddec6d" # latest
pretrained_ex = load_experiment(pretrained_ex_id)
# Load model and predict
pretrained_model = tf.keras.models.load_model(repo_root + "models/" + pretrained_ex_id + ".h5", compile=False)
pretrained_test = regression_metrics(pretrained_model, np.concatenate((test_images, test_images, test_images), axis=-1)[d_idx], normalize_position_data(test_positions[d_idx]), "pretrained_test")
del pretrained_model

pretrained_metrics = experiment_metrics_to_df(pretrained_ex)
#display(pretrained_metrics)
pretrained_means = pretrained_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
pretrained_means = pretrained_means.rename(index={'mean': 'pretrained_mean', 'std': 'pretrained_std'})
#display(pretrained_means)

#### Custom model

In [22]:
# Load custom regression experiment
#custom_ex_id = "468fefa67787"
custom_ex_id = "f28df84924ea" # latest
custom_ex = load_experiment(custom_ex_id)
# Load model and predict
custom_model = tf.keras.models.load_model(repo_root + "models/" + custom_ex_id + ".h5", compile=False)
custom_test = regression_metrics(custom_model, test_images[d_idx], normalize_position_data(test_positions[d_idx]), "custom_test")
del custom_model

custom_metrics = experiment_metrics_to_df(custom_ex)
#display(custom_metrics)
custom_means = custom_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
custom_means = custom_means.rename(index={'mean': 'custom_mean', 'std': 'custom_std'})
#display(custom_means)

#### Output
We use the standard deviation in the folds as an error measure, and report the mean classification f1_score.

In [23]:
all_means_double_pos = pd.DataFrame(
    [
        lin_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_mean': 'Linear',
        'dense_mean': 'Dense',
        'cnn_mean': 'CNN',
        'pretrained_mean': 'Pretrained',
        'custom_mean': 'Custom',
    }
)

all_std_double_pos = pd.DataFrame(
    [
        lin_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_std': 'Linear',
        'dense_std': 'Dense',
        'cnn_std': 'CNN',
        'pretrained_std': 'Pretrained',
        'custom_std': 'Custom',
    }
)
all_test_double_pos = pd.DataFrame(
    [
        lin_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_test': 'Linear',
        'dense_test': 'Dense',
        'cnn_test': 'CNN',
        'pretrained_test': 'Pretrained',
        'custom_test': 'Custom',
    }
)
display(all_test_double_pos)
display(all_std_double_pos)

Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.364356,0.044569,0.211113,0.168144
Dense,0.434857,0.039625,0.199061,0.159962
CNN,0.470012,0.03716,0.192771,0.157599
Pretrained,0.239753,0.053304,0.230877,0.192363
Custom,-0.000341,0.07014,0.264839,0.22931


Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.005796,0.000432,0.001021,0.001438
Dense,0.013889,0.000966,0.00247,0.001453
CNN,0.001945,0.000141,0.000367,0.00058
Pretrained,0.002085,0.000174,0.000459,0.000582
Custom,0.271061,0.018968,0.041886,0.041471


In [24]:
rows = all_test_double_pos.index
r2_str_array_double_pos = np.zeros((1, all_test_double_pos.shape[0]), dtype=object)
for i in range(all_test_double_pos.shape[0]):
    r2_str_array_double_pos[0, i] = r"$\underset{{\num{{+- {:.3e} }}  }}{{\num{{ {:.3g} }} }}$".format(
        all_std_double_pos["r2_score"][i], all_test_double_pos["r2_score"][i])
        
r2_df_double_pos = pd.DataFrame(r2_str_array_double_pos, columns=rows)

section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "regression_simulated_double_position_r2.tex"
caption = """
Mean R2-scores for regresson of positions of origin, on double events in simulated data, using multiple models. 
Error estimates are the standard deviation in results from k-fold cross-validation 
with $K=5$ folds.
"""
label = "tab:regression-simulated-double-position-r2"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    r2_df_double_pos.to_latex(fp, escape=False, caption=caption, label=label, index=False)


### Energy

#### Linear regression

In [25]:
# Load linear regression experiment 
#lin_ex_id = "6e600e08e8af"
lin_ex_id = "619e37880a0b" # latest
lin_ex = load_experiment(lin_ex_id)

# Load model and predict
lin_model = tf.keras.models.load_model(repo_root + "models/" + lin_ex_id + ".h5", compile=False)
lin_test = regression_metrics(lin_model, test_images.reshape(test_images.shape[0], 256)[d_idx], test_energies[d_idx], "lin_test")
del lin_model #No longer needed, clear memory just in case.

lin_metrics = experiment_metrics_to_df(lin_ex)
#display(lin_metrics)
lin_means = lin_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
lin_means = lin_means.rename(index={'mean': 'lin_mean', 'std': 'lin_std'})
#display(lin_means)

#### Small dense network

In [26]:
# Load dense regression experiment 
#dense_ex_id = "96cd3707d131"
dense_ex_id = "e6fb16ca2fcf" # latest
dense_ex = load_experiment(dense_ex_id)
# Load model and predict
dense_model = tf.keras.models.load_model(repo_root + "models/" + dense_ex_id + ".h5", compile=False)
dense_test = regression_metrics(dense_model, test_images.reshape(test_images.shape[0], 256)[d_idx], test_energies[d_idx], "dense_test")
del dense_model

dense_metrics = experiment_metrics_to_df(dense_ex)
#display(dense_metrics)
dense_means = dense_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
dense_means = dense_means.rename(index={'mean': 'dense_mean', 'std': 'dense_std'})
#display(dense_means)

#### Small CNN

In [27]:
# Load cnn regression experiment 
#cnn_ex_id = "f41605cb58b4"
cnn_ex_id = "74cd811a44dc" # latest
cnn_ex = load_experiment(cnn_ex_id)
# Load model and predict
cnn_model = tf.keras.models.load_model(repo_root + "models/" + cnn_ex_id + ".h5", compile=False)
cnn_test = regression_metrics(cnn_model, test_images[d_idx], test_energies[d_idx], "cnn_test")
del cnn_model

cnn_metrics = experiment_metrics_to_df(cnn_ex)
#(cnn_metrics)
cnn_means = cnn_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
cnn_means = cnn_means.rename(index={'mean': 'cnn_mean', 'std': 'cnn_std'})
#display(cnn_means)

#### Pretrained - VGG16 

In [28]:
# Load logistic regression experiment 
#pretrained_ex_id = "9f33b3fc7fff"
pretrained_ex_id = "984b053b562c" # latest
pretrained_ex = load_experiment(pretrained_ex_id)
# Load model and predict
pretrained_model = tf.keras.models.load_model(repo_root + "models/" + pretrained_ex_id + ".h5", compile=False)
pretrained_test = regression_metrics(pretrained_model, np.concatenate((test_images, test_images, test_images), axis=-1)[d_idx], test_energies[d_idx], "pretrained_test")
del pretrained_model

pretrained_metrics = experiment_metrics_to_df(pretrained_ex)
#display(pretrained_metrics)
pretrained_means = pretrained_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
pretrained_means = pretrained_means.rename(index={'mean': 'pretrained_mean', 'std': 'pretrained_std'})
#display(pretrained_means)

#### Custom model

In [29]:
# Load custom regression experiment 
#custom_ex_id = "6bab88fbd66f"
custom_ex_id = "8dac4fb88e4c" # latest
custom_ex = load_experiment(custom_ex_id)
# Load model and predict
custom_model = tf.keras.models.load_model(repo_root + "models/" + custom_ex_id + ".h5", compile=False)
custom_test = regression_metrics(custom_model, test_images[d_idx], test_energies[d_idx], "custom_test")
del custom_model

custom_metrics = experiment_metrics_to_df(custom_ex)
#display(custom_metrics)
custom_means = custom_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
custom_means = custom_means.rename(index={'mean': 'custom_mean', 'std': 'custom_std'})
#display(custom_means)

#### Output

In [30]:
all_means_double_energy = pd.DataFrame(
    [
        lin_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_mean': 'Linear',
        'dense_mean': 'Dense',
        'cnn_mean': 'CNN',
        'pretrained_mean': 'Pretrained',
        'custom_mean': 'Custom',
    }
)

all_std_double_energy = pd.DataFrame(
    [
        lin_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_std': 'Linear',
        'dense_std': 'Dense',
        'cnn_std': 'CNN',
        'pretrained_std': 'Pretrained',
        'custom_std': 'Custom',
    }
)
all_test_double_energy = pd.DataFrame(
    [
        lin_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_test': 'Linear',
        'dense_test': 'Dense',
        'cnn_test': 'CNN',
        'pretrained_test': 'Pretrained',
        'custom_test': 'Custom',
    }
)
display(all_test_double_energy)
display(all_std_double_energy)

Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.489732,0.04256,0.206301,0.168099
Dense,0.489818,0.042553,0.206284,0.168101
CNN,0.487877,0.042715,0.206676,0.16849
Pretrained,0.490902,0.042462,0.206064,0.168003
Custom,-0.000157,0.083421,0.288827,0.250207


Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.033465,0.002792,0.006378,0.005136
Dense,0.030867,0.002578,0.005907,0.004767
CNN,0.042569,0.003534,0.007993,0.006444
Pretrained,0.02398,0.00201,0.004602,0.003707
Custom,0.209123,0.01745,0.034939,0.034835


In [31]:
rows = all_test_double_energy.index
r2_str_array_double_energy = np.zeros((1, all_test_double_energy.shape[0]), dtype=object)
for i in range(all_test_double_energy.shape[0]):
    r2_str_array_double_energy[0, i] = r"$\underset{{\num{{+- {:.3e} }}  }}{{\num{{ {:.3g} }} }}$".format(
        all_std_double_energy["r2_score"][i], all_test_double_energy["r2_score"][i])
        
r2_df_double_energy = pd.DataFrame(r2_str_array_double_energy, columns=rows)

section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "regression_simulated_double_energy_r2.tex"
caption = """
Mean R2-scores for regresson of energy values, on double events in simulated data, using multiple models. 
Error estimates are the standard deviation in results from k-fold cross-validation 
with $K=5$ folds.
"""
label = "tab:regression-simulated-double-energy-r2"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    r2_df_double_energy.to_latex(fp, escape=False, caption=caption, label=label, index=False)


# Pre-processed simulated data - Pixel modified
This is the basic metrics for all the models trained on simulated data.
The basic pre-processing includes formatting and min-max normalization.
Additionally, the data has had the top and bottom lines of pixels set to 0, plus
one pixel inside the detector permanently 0 (which idx again?).

## Single events

### Positions

#### Linear Regression

In [32]:
# Load linear regression experiment
lin_ex_id = "d65ec088580a"
lin_ex = load_experiment(lin_ex_id)

# Load model and predict
lin_model = tf.keras.models.load_model(repo_root + "models/" + lin_ex_id + ".h5", compile=False)
lin_test = regression_metrics(lin_model, test_images.reshape(test_images.shape[0], 256)[s_idx], normalize_position_data(test_positions[s_idx])[:,:2], "lin_test")
del lin_model #No longer needed, clear memory just in case.

lin_metrics = experiment_metrics_to_df(lin_ex)
#display(lin_metrics)
lin_means = lin_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
lin_means = lin_means.rename(index={'mean': 'lin_mean', 'std': 'lin_std'})
#display(lin_means)

#### Small dense network

In [33]:
# Load logistic regression experiment
dense_ex_id = "2218dcb0de80"
dense_ex = load_experiment(dense_ex_id)
# Load model and predict
dense_model = tf.keras.models.load_model(repo_root + "models/" + dense_ex_id + ".h5", compile=False)
dense_test = regression_metrics(dense_model, test_images.reshape(test_images.shape[0], 256)[s_idx], normalize_position_data(test_positions[s_idx])[:,:2], "dense_test")
del dense_model

dense_metrics = experiment_metrics_to_df(dense_ex)
#display(dense_metrics)
dense_means = dense_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
dense_means = dense_means.rename(index={'mean': 'dense_mean', 'std': 'dense_std'})
#display(dense_means)

#### Small CNN

In [34]:
# Load logistic regression experiment
cnn_ex_id = "3a70de184f3c"
cnn_ex = load_experiment(cnn_ex_id)
# Load model and predict
cnn_model = tf.keras.models.load_model(repo_root + "models/" + cnn_ex_id + ".h5", compile=False)
cnn_test = regression_metrics(cnn_model, test_images[s_idx], normalize_position_data(test_positions[s_idx])[:,:2], "cnn_test")
del cnn_model

cnn_metrics = experiment_metrics_to_df(cnn_ex)
#display(cnn_metrics)
cnn_means = cnn_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
cnn_means = cnn_means.rename(index={'mean': 'cnn_mean', 'std': 'cnn_std'})
#display(cnn_means)

#### Pretrained - VGG
As an additional baseline for performance, we include a pretrained SOTA network
where trained on the ImageNet database.

Due to the size of our detector images (16x16) compared with the size the VGG network is
designed for, we cannot use all layers in the VGG network. This stems from the use of max-pooling
which effectively reduces the image size to half (8x8) each time the input is passed through such a
layer. At some point our input is too small to pass through to the rest of the network.
We therefore cut the network at the point where this becomes an issue.
Alternatively, one could possibly keep the depth but remove max-pooling layers.

In [35]:
# Load logistic regression experiment
pretrained_ex_id = "b5223ba6beaa"
pretrained_ex = load_experiment(pretrained_ex_id)
# Load model and predict
pretrained_model = tf.keras.models.load_model(repo_root + "models/" + pretrained_ex_id + ".h5", compile=False)
pretrained_test = regression_metrics(pretrained_model, np.concatenate((test_images, test_images, test_images), axis=-1)[s_idx], normalize_position_data(test_positions[s_idx])[:,:2], "pretrained_test")
del pretrained_model

pretrained_metrics = experiment_metrics_to_df(pretrained_ex)
#display(pretrained_metrics)
pretrained_means = pretrained_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
pretrained_means = pretrained_means.rename(index={'mean': 'pretrained_mean', 'std': 'pretrained_std'})
#display(pretrained_means)

#### Custom model

In [36]:
# Load custom regression experiment
custom_ex_id = "379bca43b134"
custom_ex = load_experiment(custom_ex_id)
# Load model and predict
custom_model = tf.keras.models.load_model(repo_root + "models/" + custom_ex_id + ".h5", compile=False)
custom_test = regression_metrics(custom_model, test_images[s_idx], normalize_position_data(test_positions[s_idx])[:,:2], "custom_test")
del custom_model

custom_metrics = experiment_metrics_to_df(custom_ex)
#display(custom_metrics)
custom_means = custom_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
custom_means = custom_means.rename(index={'mean': 'custom_mean', 'std': 'custom_std'})
#display(custom_means)

#### Output
We use the standard deviation in the folds as an error measure, and report the mean classification f1_score.

In [37]:
all_means_single_pos_pmod = pd.DataFrame(
    [
        lin_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_mean': 'Linear',
        'dense_mean': 'Dense',
        'cnn_mean': 'CNN',
        'pretrained_mean': 'Pretrained',
        'custom_mean': 'Custom',
    }
)

all_std_single_pos_pmod = pd.DataFrame(
    [
        lin_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_std': 'Linear',
        'dense_std': 'Dense',
        'cnn_std': 'CNN',
        'pretrained_std': 'Pretrained',
        'custom_std': 'Custom',
    }
)
all_test_single_pos_pmod = pd.DataFrame(
    [
        lin_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_test': 'Linear',
        'dense_test': 'Dense',
        'cnn_test': 'CNN',
        'pretrained_test': 'Pretrained',
        'custom_test': 'Custom',
    }
)
display(all_test_single_pos_pmod)
display(all_std_single_pos_pmod)

Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.776153,0.015702,0.125308,0.094354
Dense,0.987185,0.000899,0.02998,0.019358
CNN,0.987682,0.000863,0.029382,0.018213
Pretrained,0.87259,0.00894,0.094553,0.061185
Custom,0.997204,0.000196,0.014002,0.006549


Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.002737,0.0002,0.000845,0.001267
Dense,0.00068,4.7e-05,0.000827,0.000669
CNN,0.001015,7.1e-05,0.001954,0.001231
Pretrained,0.017231,0.001193,0.006636,0.007076
Custom,0.000211,1.5e-05,0.00093,0.000557


In [38]:
rows = all_test_single_pos_pmod.index
r2_str_array_single_pos_pmod = np.zeros((1, all_test_single_pos_pmod.shape[0]), dtype=object)
for i in range(all_test_single_pos_pmod.shape[0]):
    r2_str_array_single_pos_pmod[0, i] = r"$\underset{{\num{{+- {:.3e} }}  }}{{\num{{ {:.3g} }} }}$".format(
        all_std_single_pos_pmod["r2_score"][i], all_test_single_pos_pmod["r2_score"][i])
        
r2_df_single_pos_pmod = pd.DataFrame(r2_str_array_single_pos_pmod, columns=rows)

section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "regression_simulated_single_position_pixelmod_r2.tex"
caption = """
Mean R2-scores for regresson of positions of origin, on single events in simulated data with specific pixels
set to zero, using multiple models. 
Error estimates are the standard deviation in results from k-fold cross-validation 
with $K=5$ folds.
"""
label = "tab:regression-simulated-single-position-pixelmod-r2"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    r2_df_single_pos_pmod.to_latex(fp, escape=False, caption=caption, label=label, index=False)


### Energy

#### Linear regression

In [39]:
# Load linear regression experiment
lin_ex_id = "7dfe302a7c09"
lin_ex = load_experiment(lin_ex_id)

# Load model and predict
lin_model = tf.keras.models.load_model(repo_root + "models/" + lin_ex_id + ".h5", compile=False)
lin_test = regression_metrics(lin_model, test_images.reshape(test_images.shape[0], 256)[s_idx], test_energies[s_idx,0], "lin_test")
del lin_model #No longer needed, clear memory just in case.

lin_metrics = experiment_metrics_to_df(lin_ex)
#display(lin_metrics)
lin_means = lin_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
lin_means = lin_means.rename(index={'mean': 'lin_mean', 'std': 'lin_std'})
#display(lin_means)

#### Small dense network

In [40]:
# Load dense regression experiment
dense_ex_id = "2dbd6c697bc5"
dense_ex = load_experiment(dense_ex_id)
# Load model and predict
dense_model = tf.keras.models.load_model(repo_root + "models/" + dense_ex_id + ".h5", compile=False)
dense_test = regression_metrics(dense_model, test_images.reshape(test_images.shape[0], 256)[s_idx], test_energies[s_idx,0], "dense_test")
del dense_model

dense_metrics = experiment_metrics_to_df(dense_ex)
#display(dense_metrics)
dense_means = dense_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
dense_means = dense_means.rename(index={'mean': 'dense_mean', 'std': 'dense_std'})
#display(dense_means)

#### Small CNN
This is really sensitive to pixel modifications.
Performs similarly to the other models if you pixelmod the test data.

In [41]:
# Load cnn regression experiment
cnn_ex_id = "fb0685871cf3"
#cnn_ex_id = "fb0685871cf3"
cnn_ex = load_experiment(cnn_ex_id)
# Load model and predict
#tmp_images = test_images.copy()
#tmp_images[:, 3, 13] = 0
#tmp_images[:, 0, :] = 0
#tmp_images[:, 15, :] = 0
cnn_model = tf.keras.models.load_model(repo_root + "models/" + cnn_ex_id + ".h5", compile=False)
cnn_test = regression_metrics(cnn_model, test_images[s_idx], test_energies[s_idx,0], "cnn_test")
del cnn_model

cnn_metrics = experiment_metrics_to_df(cnn_ex)
#display(cnn_metrics)
cnn_means = cnn_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
cnn_means = cnn_means.rename(index={'mean': 'cnn_mean', 'std': 'cnn_std'})
#display(cnn_means)
#display(cnn_test)

#### Pretrained - VGG16 

In [42]:
# Load logistic regression experiment
pretrained_ex_id = "8aa9f731b693"
pretrained_ex = load_experiment(pretrained_ex_id)
# Load model and predict
pretrained_model = tf.keras.models.load_model(repo_root + "models/" + pretrained_ex_id + ".h5", compile=False)
pretrained_test = regression_metrics(pretrained_model, np.concatenate((test_images, test_images, test_images), axis=-1)[s_idx], test_energies[s_idx,0], "pretrained_test")
del pretrained_model

pretrained_metrics = experiment_metrics_to_df(pretrained_ex)
#display(pretrained_metrics)
pretrained_means = pretrained_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
pretrained_means = pretrained_means.rename(index={'mean': 'pretrained_mean', 'std': 'pretrained_std'})
#display(pretrained_means)

#### Custom model

In [43]:
# Load custom regression experiment
custom_ex_id = "02c59a04c095"
custom_ex = load_experiment(custom_ex_id)
# Load model and predict
custom_model = tf.keras.models.load_model(repo_root + "models/" + custom_ex_id + ".h5", compile=False)
custom_test = regression_metrics(custom_model, test_images[s_idx], test_energies[s_idx,0], "custom_test")
del custom_model

custom_metrics = experiment_metrics_to_df(custom_ex)
#display(custom_metrics)
custom_means = custom_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
custom_means = custom_means.rename(index={'mean': 'custom_mean', 'std': 'custom_std'})
#display(custom_means)

#### Output

In [44]:
all_means_single_energy_pmod = pd.DataFrame(
    [
        lin_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_mean': 'Linear',
        'dense_mean': 'Dense',
        'cnn_mean': 'CNN',
        'pretrained_mean': 'Pretrained',
        'custom_mean': 'Custom',
    }
)

all_std_single_energy_pmod = pd.DataFrame(
    [
        lin_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_std': 'Linear',
        'dense_std': 'Dense',
        'cnn_std': 'CNN',
        'pretrained_std': 'Pretrained',
        'custom_std': 'Custom',
    }
)

all_test_single_energy_pmod = pd.DataFrame(
    [
        lin_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_test': 'Linear',
        'dense_test': 'Dense',
        'cnn_test': 'CNN',
        'pretrained_test': 'Pretrained',
        'custom_test': 'Custom',
    }
)
display(all_test_single_energy_pmod)
display(all_std_single_energy_pmod)

Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.738587,0.021878,0.147912,0.122523
Dense,0.754008,0.020587,0.143483,0.120516
CNN,-0.127648,0.094375,0.307205,0.205946
Pretrained,0.72836,0.022734,0.150778,0.125227
Custom,0.733101,0.022337,0.149457,0.123951


Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.024595,0.002042,0.016237,0.01877
Dense,0.022607,0.001877,0.016135,0.019852
CNN,0.024045,0.001997,0.017148,0.021458
Pretrained,0.014177,0.001176,0.008603,0.0113
Custom,0.028649,0.002379,0.0195,0.023297


In [45]:
rows = all_test_single_energy_pmod.index
r2_str_array_single_energy_pmod = np.zeros((1, all_test_single_energy_pmod.shape[0]), dtype=object)
for i in range(all_test_single_energy_pmod.shape[0]):
    r2_str_array_single_energy_pmod[0, i] = r"$\underset{{\num{{+- {:.3e} }}  }}{{\num{{ {:.3g} }} }}$".format(
        all_std_single_energy_pmod["r2_score"][i], all_test_single_energy_pmod["r2_score"][i])
        
r2_df_single_energy_pmod = pd.DataFrame(r2_str_array_single_energy_pmod, columns=rows)

section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "regression_simulated_single_energy_pixelmod_r2.tex"
caption = """
Mean R2-scores for regresson of energy values, on single events in simulated data with specific pixels
set to zero, using multiple models. 
Error estimates are the standard deviation in results from k-fold cross-validation 
with $K=5$ folds.
"""
label = "tab:regression-simulated-single-energy-pixelmod-r2"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    r2_df_single_energy_pmod.to_latex(fp, escape=False, caption=caption, label=label, index=False)


## Double events

### Positions

#### Linear Regression

In [46]:
# Load linear regression experiment
lin_ex_id = "2c62e711e234"
lin_ex = load_experiment(lin_ex_id)

# Load model and predict
lin_model = tf.keras.models.load_model(repo_root + "models/" + lin_ex_id + ".h5", compile=False)
lin_test = regression_metrics(lin_model, test_images.reshape(test_images.shape[0], 256)[d_idx], normalize_position_data(test_positions[d_idx]), "lin_test")
del lin_model #No longer needed, clear memory just in case.

lin_metrics = experiment_metrics_to_df(lin_ex)
#display(lin_metrics)
lin_means = lin_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
lin_means = lin_means.rename(index={'mean': 'lin_mean', 'std': 'lin_std'})
#display(lin_means)

#### Small dense network

In [47]:
# Load logistic regression experiment
dense_ex_id = "4cea43be5aa4"
dense_ex = load_experiment(dense_ex_id)
# Load model and predict
dense_model = tf.keras.models.load_model(repo_root + "models/" + dense_ex_id + ".h5", compile=False)
dense_test = regression_metrics(dense_model, test_images.reshape(test_images.shape[0], 256)[d_idx], normalize_position_data(test_positions[d_idx]), "dense_test")
del dense_model

dense_metrics = experiment_metrics_to_df(dense_ex)
#display(dense_metrics)
dense_means = dense_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
dense_means = dense_means.rename(index={'mean': 'dense_mean', 'std': 'dense_std'})
#display(dense_means)

#### Small CNN

In [48]:
# Load logistic regression experiment
cnn_ex_id = "7960fa803199"
cnn_ex = load_experiment(cnn_ex_id)
# Load model and predict
cnn_model = tf.keras.models.load_model(repo_root + "models/" + cnn_ex_id + ".h5", compile=False)
cnn_test = regression_metrics(cnn_model, test_images[d_idx], normalize_position_data(test_positions[d_idx]), "cnn_test")
del cnn_model

cnn_metrics = experiment_metrics_to_df(cnn_ex)
#display(cnn_metrics)
cnn_means = cnn_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
cnn_means = cnn_means.rename(index={'mean': 'cnn_mean', 'std': 'cnn_std'})
#display(cnn_means)

#### Pretrained - VGG
As an additional baseline for performance, we include a pretrained SOTA network
where trained on the ImageNet database.

Due to the size of our detector images (16x16) compared with the size the VGG network is
designed for, we cannot use all layers in the VGG network. This stems from the use of max-pooling
which effectively reduces the image size to half (8x8) each time the input is passed through such a
layer. At some point our input is too small to pass through to the rest of the network.
We therefore cut the network at the point where this becomes an issue.
Alternatively, one could possibly keep the depth but remove max-pooling layers.

In [49]:
# Load logistic regression experiment
pretrained_ex_id = "4f70fd9e6d8a"
pretrained_ex = load_experiment(pretrained_ex_id)
# Load model and predict
pretrained_model = tf.keras.models.load_model(repo_root + "models/" + pretrained_ex_id + ".h5", compile=False)
pretrained_test = regression_metrics(pretrained_model, np.concatenate((test_images, test_images, test_images), axis=-1)[d_idx], normalize_position_data(test_positions[d_idx]), "pretrained_test")
del pretrained_model

pretrained_metrics = experiment_metrics_to_df(pretrained_ex)
#display(pretrained_metrics)
pretrained_means = pretrained_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
pretrained_means = pretrained_means.rename(index={'mean': 'pretrained_mean', 'std': 'pretrained_std'})
#display(pretrained_means)

#### Custom model

In [50]:
# Load custom regression experiment
custom_ex_id = "98ea91d193ba"
custom_ex = load_experiment(custom_ex_id)
# Load model and predict
custom_model = tf.keras.models.load_model(repo_root + "models/" + custom_ex_id + ".h5", compile=False)
custom_test = regression_metrics(custom_model, test_images[d_idx], normalize_position_data(test_positions[d_idx]), "custom_test")
del custom_model

custom_metrics = experiment_metrics_to_df(custom_ex)
#display(custom_metrics)
custom_means = custom_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
custom_means = custom_means.rename(index={'mean': 'custom_mean', 'std': 'custom_std'})
#display(custom_means)

#### Output
We use the standard deviation in the folds as an error measure, and report the mean classification f1_score.

In [51]:
all_means_double_pos_pmod = pd.DataFrame(
    [
        lin_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_mean': 'Linear',
        'dense_mean': 'Dense',
        'cnn_mean': 'CNN',
        'pretrained_mean': 'Pretrained',
        'custom_mean': 'Custom',
    }
)

all_std_double_pos_pmod = pd.DataFrame(
    [
        lin_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_std': 'Linear',
        'dense_std': 'Dense',
        'cnn_std': 'CNN',
        'pretrained_std': 'Pretrained',
        'custom_std': 'Custom',
    }
)
all_test_double_pos_pmod = pd.DataFrame(
    [
        lin_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_test': 'Linear',
        'dense_test': 'Dense',
        'cnn_test': 'CNN',
        'pretrained_test': 'Pretrained',
        'custom_test': 'Custom',
    }
)
display(all_test_double_pos_pmod)
display(all_std_double_pos_pmod)

Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.365183,0.044511,0.210976,0.169567
Dense,0.465533,0.037475,0.193583,0.157434
CNN,0.363243,0.044646,0.211296,0.167319
Pretrained,0.342786,0.046082,0.214667,0.170133
Custom,0.488469,0.035866,0.189384,0.154176


Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.00068,6.1e-05,0.000145,0.000445
Dense,0.000866,8.7e-05,0.000225,0.000225
CNN,0.001841,0.000137,0.000357,0.000637
Pretrained,0.013968,0.000985,0.002309,0.002105
Custom,0.000269,3.1e-05,8.2e-05,0.000142


In [52]:
rows = all_test_double_pos_pmod.index
r2_str_array_double_pos_pmod = np.zeros((1, all_test_double_pos_pmod.shape[0]), dtype=object)
for i in range(all_test_double_pos_pmod.shape[0]):
    r2_str_array_double_pos_pmod[0, i] = r"$\underset{{\num{{+- {:.3e} }}  }}{{\num{{ {:.3g} }} }}$".format(
        all_std_double_pos_pmod["r2_score"][i], all_test_double_pos_pmod["r2_score"][i])
        
r2_df_double_pos_pmod = pd.DataFrame(r2_str_array_double_pos_pmod, columns=rows)

section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "regression_simulated_double_position_pixelmod_r2.tex"
caption = """
Mean R2-scores for regresson of positions of origin, on double events in simulated data with specific pixels
set to zero, using multiple models. 
Error estimates are the standard deviation in results from k-fold cross-validation 
with $K=5$ folds.
"""
label = "tab:regression-simulated-double-position-pixelmod-r2"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    r2_df_double_pos_pmod.to_latex(fp, escape=False, caption=caption, label=label, index=False)


### Energy

#### Linear regression

In [53]:
# Load linear regression experiment
lin_ex_id = "fcc62faf0d97"
lin_ex = load_experiment(lin_ex_id)
# Load model and predict
lin_model = tf.keras.models.load_model(repo_root + "models/" + lin_ex_id + ".h5", compile=False)
lin_test = regression_metrics(lin_model, test_images.reshape(test_images.shape[0], 256)[d_idx], test_energies[d_idx], "lin_test")
del lin_model #No longer needed, clear memory just in case.

lin_metrics = experiment_metrics_to_df(lin_ex)
#display(lin_metrics)
lin_means = lin_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
lin_means = lin_means.rename(index={'mean': 'lin_mean', 'std': 'lin_std'})
#display(lin_means)

#### Small dense network

In [54]:
# Load dense regression experiment
dense_ex_id = "0c1eb0cbcceb"
dense_ex = load_experiment(dense_ex_id)
# Load model and predict
dense_model = tf.keras.models.load_model(repo_root + "models/" + dense_ex_id + ".h5", compile=False)
dense_test = regression_metrics(dense_model, test_images.reshape(test_images.shape[0], 256)[d_idx], test_energies[d_idx], "dense_test")
del dense_model

dense_metrics = experiment_metrics_to_df(dense_ex)
#display(dense_metrics)
dense_means = dense_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
dense_means = dense_means.rename(index={'mean': 'dense_mean', 'std': 'dense_std'})
#display(dense_means)

#### Small CNN

In [55]:
# Load cnn regression experiment
cnn_ex_id = "85a088b1c550"
cnn_ex = load_experiment(cnn_ex_id)
# Load model and predict
cnn_model = tf.keras.models.load_model(repo_root + "models/" + cnn_ex_id + ".h5", compile=False)
cnn_test = regression_metrics(cnn_model, test_images[d_idx], test_energies[d_idx], "cnn_test")
del cnn_model

cnn_metrics = experiment_metrics_to_df(cnn_ex)
#(cnn_metrics)
cnn_means = cnn_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
cnn_means = cnn_means.rename(index={'mean': 'cnn_mean', 'std': 'cnn_std'})
#display(cnn_means)

#### Pretrained - VGG16 

In [56]:
# Load logistic regression experiment
pretrained_ex_id = "e9484282c396"
pretrained_ex = load_experiment(pretrained_ex_id)
# Load model and predict
pretrained_model = tf.keras.models.load_model(repo_root + "models/" + pretrained_ex_id + ".h5", compile=False)
pretrained_test = regression_metrics(pretrained_model, np.concatenate((test_images, test_images, test_images), axis=-1)[d_idx], test_energies[d_idx], "pretrained_test")
del pretrained_model

pretrained_metrics = experiment_metrics_to_df(pretrained_ex)
#display(pretrained_metrics)
pretrained_means = pretrained_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
pretrained_means = pretrained_means.rename(index={'mean': 'pretrained_mean', 'std': 'pretrained_std'})
#display(pretrained_means)

#### Custom model

In [57]:
# Load custom regression experiment
custom_ex_id = "a7714c38fd74"
custom_ex = load_experiment(custom_ex_id)
# Load model and predict
custom_model = tf.keras.models.load_model(repo_root + "models/" + custom_ex_id + ".h5", compile=False)
custom_test = regression_metrics(custom_model, test_images[d_idx], test_energies[d_idx], "custom_test")
del custom_model

custom_metrics = experiment_metrics_to_df(custom_ex)
#display(custom_metrics)
custom_means = custom_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
custom_means = custom_means.rename(index={'mean': 'custom_mean', 'std': 'custom_std'})
#display(custom_means)

#### Output

In [58]:
all_means_double_energy_pmod = pd.DataFrame(
    [
        lin_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_mean': 'Linear',
        'dense_mean': 'Dense',
        'cnn_mean': 'CNN',
        'pretrained_mean': 'Pretrained',
        'custom_mean': 'Custom',
    }
)

all_std_double_energy_pmod = pd.DataFrame(
    [
        lin_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_std': 'Linear',
        'dense_std': 'Dense',
        'cnn_std': 'CNN',
        'pretrained_std': 'Pretrained',
        'custom_std': 'Custom',
    }
)
all_test_double_energy_pmod = pd.DataFrame(
    [
        lin_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_test': 'Linear',
        'dense_test': 'Dense',
        'cnn_test': 'CNN',
        'pretrained_test': 'Pretrained',
        'custom_test': 'Custom',
    }
)
display(all_test_double_energy_pmod)
display(all_std_double_energy_pmod)

Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.486663,0.042817,0.206922,0.168505
Dense,0.489836,0.042551,0.20628,0.168059
CNN,0.282451,0.059847,0.244637,0.192272
Pretrained,0.455371,0.045429,0.213141,0.173201
Custom,0.46642,0.044504,0.210959,0.171638


Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.003155,0.00026,0.000627,0.000566
Dense,0.002571,0.000218,0.000528,0.000481
CNN,0.00312,0.000271,0.000654,0.00062
Pretrained,0.010319,0.000869,0.002023,0.001074
Custom,0.002746,0.000244,0.000591,0.000525


In [59]:
rows = all_test_double_energy_pmod.index
r2_str_array_double_energy_pmod = np.zeros((1, all_test_double_energy_pmod.shape[0]), dtype=object)
for i in range(all_test_double_energy_pmod.shape[0]):
    r2_str_array_double_energy_pmod[0, i] = r"$\underset{{\num{{+- {:.3e} }}  }}{{\num{{ {:.3g} }} }}$".format(
        all_std_double_energy_pmod["r2_score"][i], all_test_double_energy_pmod["r2_score"][i])
        
r2_df_double_energy_pmod = pd.DataFrame(r2_str_array_double_energy_pmod, columns=rows)

section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "regression_simulated_double_energy_pixelmod_r2.tex"
caption = """
Mean R2-scores for regresson of energy values, on double events in simulated data with specific pixels
set to zero, using multiple models. Error estimates are the standard deviation in results from k-fold 
cross-validation with $K=5$ folds.
"""
label = "tab:regression-simulated-double-energy-pixelmod-r2"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    r2_df_double_energy_pmod.to_latex(fp, escape=False, caption=caption, label=label, index=False)


# Pre-processed simulated data - Pixel modified and imbalanced
This is the basic metrics for all the models trained on simulated data.
The basic pre-processing includes formatting and min-max normalization.
Additionally, the data has had the top and bottom lines of pixels set to 0, plus
one pixel inside the detector permanently 0 (which idx again?).

This dataset has also been purposefully imbalanced to mimic the properties of experimental data
where doubles in space are expected to be rare.

## Single events

### Positions

#### Linear Regression

In [60]:
# Load linear regression experiment 
#lin_ex_id = "78f01912d908"
lin_ex_id = "c3f6c00a17f9" # latest
lin_ex = load_experiment(lin_ex_id)

# Load model and predict
lin_model = tf.keras.models.load_model(repo_root + "models/" + lin_ex_id + ".h5", compile=False)
lin_test = regression_metrics(lin_model, test_images.reshape(test_images.shape[0], 256)[s_idx], normalize_position_data(test_positions[s_idx])[:,:2], "lin_test")
del lin_model #No longer needed, clear memory just in case.

lin_metrics = experiment_metrics_to_df(lin_ex)
#display(lin_metrics)
lin_means = lin_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
lin_means = lin_means.rename(index={'mean': 'lin_mean', 'std': 'lin_std'})
#display(lin_means)

#### Small dense network

In [61]:
# Load logistic regression experiment 
#dense_ex_id = "af61fe608db1"
dense_ex_id = "c816138d187d" # latest
dense_ex = load_experiment(dense_ex_id)
# Load model and predict
dense_model = tf.keras.models.load_model(repo_root + "models/" + dense_ex_id + ".h5", compile=False)
dense_test = regression_metrics(dense_model, test_images.reshape(test_images.shape[0], 256)[s_idx], normalize_position_data(test_positions[s_idx])[:,:2], "dense_test")
del dense_model

dense_metrics = experiment_metrics_to_df(dense_ex)
#display(dense_metrics)
dense_means = dense_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
dense_means = dense_means.rename(index={'mean': 'dense_mean', 'std': 'dense_std'})
#display(dense_means)

#### Small CNN

In [62]:
# Load logistic regression experiment
#cnn_ex_id = "e2f24a47f2f3"
cnn_ex_id = "d106586c7886" # latest 
cnn_ex = load_experiment(cnn_ex_id)
# Load model and predict
cnn_model = tf.keras.models.load_model(repo_root + "models/" + cnn_ex_id + ".h5", compile=False)
cnn_test = regression_metrics(cnn_model, test_images[s_idx], normalize_position_data(test_positions[s_idx])[:,:2], "cnn_test")
del cnn_model

cnn_metrics = experiment_metrics_to_df(cnn_ex)
#display(cnn_metrics)
cnn_means = cnn_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
cnn_means = cnn_means.rename(index={'mean': 'cnn_mean', 'std': 'cnn_std'})
#display(cnn_means)

#### Pretrained - VGG
As an additional baseline for performance, we include a pretrained SOTA network
where trained on the ImageNet database.

Due to the size of our detector images (16x16) compared with the size the VGG network is
designed for, we cannot use all layers in the VGG network. This stems from the use of max-pooling
which effectively reduces the image size to half (8x8) each time the input is passed through such a
layer. At some point our input is too small to pass through to the rest of the network.
We therefore cut the network at the point where this becomes an issue.
Alternatively, one could possibly keep the depth but remove max-pooling layers.

In [63]:
# Load logistic regression experiment 
#pretrained_ex_id = "a7340b9e74ad"
pretrained_ex_id = "87634e735151" # latest
pretrained_ex = load_experiment(pretrained_ex_id)
# Load model and predict
pretrained_model = tf.keras.models.load_model(repo_root + "models/" + pretrained_ex_id + ".h5", compile=False)
pretrained_test = regression_metrics(pretrained_model, np.concatenate((test_images, test_images, test_images), axis=-1)[s_idx], normalize_position_data(test_positions[s_idx])[:,:2], "pretrained_test")
del pretrained_model

pretrained_metrics = experiment_metrics_to_df(pretrained_ex)
#display(pretrained_metrics)
pretrained_means = pretrained_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
pretrained_means = pretrained_means.rename(index={'mean': 'pretrained_mean', 'std': 'pretrained_std'})
#display(pretrained_means)

#### Custom model

In [64]:
# Load custom regression experiment
#custom_ex_id = "33fa607a199b"
custom_ex_id = "886416b261cc" # latest
custom_ex = load_experiment(custom_ex_id)
# Load model and predict
custom_model = tf.keras.models.load_model(repo_root + "models/" + custom_ex_id + ".h5", compile=False)
custom_test = regression_metrics(custom_model, test_images[s_idx], normalize_position_data(test_positions[s_idx])[:,:2], "custom_test")
del custom_model

custom_metrics = experiment_metrics_to_df(custom_ex)
#display(custom_metrics)
custom_means = custom_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
custom_means = custom_means.rename(index={'mean': 'custom_mean', 'std': 'custom_std'})
#display(custom_means)

#### Output
We use the standard deviation in the folds as an error measure, and report the mean classification f1_score.

In [65]:
all_means_single_pos_imbalanced = pd.DataFrame(
    [
        lin_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_mean': 'Linear',
        'dense_mean': 'Dense',
        'cnn_mean': 'CNN',
        'pretrained_mean': 'Pretrained',
        'custom_mean': 'Custom',
    }
)

all_std_single_pos_imbalanced = pd.DataFrame(
    [
        lin_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_std': 'Linear',
        'dense_std': 'Dense',
        'cnn_std': 'CNN',
        'pretrained_std': 'Pretrained',
        'custom_std': 'Custom',
    }
)
all_test_single_pos_imbalanced = pd.DataFrame(
    [
        lin_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_test': 'Linear',
        'dense_test': 'Dense',
        'cnn_test': 'CNN',
        'pretrained_test': 'Pretrained',
        'custom_test': 'Custom',
    }
)
display(all_test_single_pos_imbalanced)
display(all_std_single_pos_imbalanced)

Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.77614,0.015703,0.125311,0.094552
Dense,0.980138,0.001393,0.037322,0.024524
CNN,0.939942,0.004206,0.06485,0.03718
Pretrained,0.474523,0.036874,0.192025,0.155803
Custom,-7.1e-05,0.070157,0.264872,0.229254


Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.002725,0.0002,0.000846,0.001292
Dense,0.00155,0.00011,0.001765,0.001485
CNN,0.005303,0.000371,0.007445,0.003795
Pretrained,0.269542,0.018912,0.083555,0.074367
Custom,0.446443,0.03128,0.112901,0.098692


In [66]:
rows = all_test_single_pos_imbalanced.index
r2_str_array_single_pos_imbalanced = np.zeros((1, all_test_single_pos_imbalanced.shape[0]), dtype=object)
for i in range(all_test_single_pos_imbalanced.shape[0]):
    r2_str_array_single_pos_imbalanced[0, i] = r"$\underset{{\num{{+- {:.3e} }}  }}{{\num{{ {:.3g} }} }}$".format(
        all_std_single_pos_imbalanced["r2_score"][i], all_test_single_pos_imbalanced["r2_score"][i])
        
r2_df_single_pos_imbalanced = pd.DataFrame(r2_str_array_single_pos_imbalanced, columns=rows)

section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "regression_simulated_single_position_pixelmod_r2.tex"
caption = """
Mean R2-scores for regresson of positions of origin, on single events in simulated data with specific pixels
set to zero, using multiple models. 
Error estimates are the standard deviation in results from k-fold cross-validation 
with $K=5$ folds.
"""
label = "tab:regression-simulated-single-position-pixelmod-r2"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    r2_df_single_pos_imbalanced.to_latex(fp, escape=False, caption=caption, label=label, index=False)


### Energy

#### Linear regression

In [67]:
# Load linear regression experiment 
#lin_ex_id = "9f256a4990c0"
lin_ex_id = "7525a985e913" # latest
lin_ex = load_experiment(lin_ex_id)

# Load model and predict
lin_model = tf.keras.models.load_model(repo_root + "models/" + lin_ex_id + ".h5", compile=False)
lin_test = regression_metrics(lin_model, test_images.reshape(test_images.shape[0], 256)[s_idx], test_energies[s_idx,0], "lin_test")
del lin_model #No longer needed, clear memory just in case.

lin_metrics = experiment_metrics_to_df(lin_ex)
#display(lin_metrics)
lin_means = lin_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
lin_means = lin_means.rename(index={'mean': 'lin_mean', 'std': 'lin_std'})
#display(lin_means)

#### Small dense network

In [68]:
# Load dense regression experiment 
#dense_ex_id = "29b1f98a4879"
dense_ex_id = "a1dec017891f" # latest
dense_ex = load_experiment(dense_ex_id)
# Load model and predict
dense_model = tf.keras.models.load_model(repo_root + "models/" + dense_ex_id + ".h5", compile=False)
dense_test = regression_metrics(dense_model, test_images.reshape(test_images.shape[0], 256)[s_idx], test_energies[s_idx,0], "dense_test")
del dense_model

dense_metrics = experiment_metrics_to_df(dense_ex)
#display(dense_metrics)
dense_means = dense_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
dense_means = dense_means.rename(index={'mean': 'dense_mean', 'std': 'dense_std'})
#display(dense_means)

#### Small CNN

In [69]:
# Load cnn regression experiment 
#cnn_ex_id = "8422f85d6ff6"
cnn_ex_id = "afd8b647848a" # latest
cnn_ex = load_experiment(cnn_ex_id)
# Load model and predict
cnn_model = tf.keras.models.load_model(repo_root + "models/" + cnn_ex_id + ".h5", compile=False)
cnn_test = regression_metrics(cnn_model, test_images[s_idx], test_energies[s_idx,0], "cnn_test")
del cnn_model

cnn_metrics = experiment_metrics_to_df(cnn_ex)
#display(cnn_metrics)
cnn_means = cnn_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
cnn_means = cnn_means.rename(index={'mean': 'cnn_mean', 'std': 'cnn_std'})
#display(cnn_means)

#### Pretrained - VGG16 

In [70]:
# Load logistic regression experiment 
#pretrained_ex_id = "73de75db91e4"
pretrained_ex_id = "4b042c01821a"
pretrained_ex = load_experiment(pretrained_ex_id)
# Load model and predict
pretrained_model = tf.keras.models.load_model(repo_root + "models/" + pretrained_ex_id + ".h5", compile=False)
pretrained_test = regression_metrics(pretrained_model, np.concatenate((test_images, test_images, test_images), axis=-1)[s_idx], test_energies[s_idx,0], "pretrained_test")
del pretrained_model

pretrained_metrics = experiment_metrics_to_df(pretrained_ex)
#display(pretrained_metrics)
pretrained_means = pretrained_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
pretrained_means = pretrained_means.rename(index={'mean': 'pretrained_mean', 'std': 'pretrained_std'})
#display(pretrained_means)

#### Custom model

In [71]:
# Load custom regression experiment 
#custom_ex_id = "0071c04bef42"
custom_ex_id = "f42c20ae6b04"
custom_ex = load_experiment(custom_ex_id)
# Load model and predict
custom_model = tf.keras.models.load_model(repo_root + "models/" + custom_ex_id + ".h5", compile=False)
custom_test = regression_metrics(custom_model, test_images[s_idx], test_energies[s_idx,0], "custom_test")
del custom_model

custom_metrics = experiment_metrics_to_df(custom_ex)
#display(custom_metrics)
custom_means = custom_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
custom_means = custom_means.rename(index={'mean': 'custom_mean', 'std': 'custom_std'})
#display(custom_means)

#### Output

In [72]:
all_means_single_energy_imbalanced = pd.DataFrame(
    [
        lin_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_mean': 'Linear',
        'dense_mean': 'Dense',
        'cnn_mean': 'CNN',
        'pretrained_mean': 'Pretrained',
        'custom_mean': 'Custom',
    }
)

all_std_single_energy_imbalanced = pd.DataFrame(
    [
        lin_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_std': 'Linear',
        'dense_std': 'Dense',
        'cnn_std': 'CNN',
        'pretrained_std': 'Pretrained',
        'custom_std': 'Custom',
    }
)
all_test_single_energy_imbalanced = pd.DataFrame(
    [
        lin_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_test': 'Linear',
        'dense_test': 'Dense',
        'cnn_test': 'CNN',
        'pretrained_test': 'Pretrained',
        'custom_test': 'Custom',
    }
)
display(all_test_single_energy_imbalanced)
display(all_std_single_energy_imbalanced)

Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.738587,0.021878,0.147912,0.122523
Dense,0.75323,0.020653,0.14371,0.119631
CNN,0.581018,0.035065,0.187257,0.140524
Pretrained,0.770128,0.019238,0.138703,0.114183
Custom,0.778352,0.01855,0.136199,0.113486


Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.024595,0.002042,0.016237,0.018769
Dense,0.024217,0.002011,0.016792,0.019988
CNN,0.027098,0.00225,0.018568,0.022902
Pretrained,0.019986,0.001659,0.013743,0.017439
Custom,0.432689,0.03603,0.108255,0.100271


In [73]:
rows = all_test_single_energy_imbalanced.index
r2_str_array_single_energy_imbalanced = np.zeros((1, all_test_single_energy_imbalanced.shape[0]), dtype=object)
for i in range(all_test_single_energy_imbalanced.shape[0]):
    r2_str_array_single_energy_imbalanced[0, i] = r"$\underset{{\num{{+- {:.3e} }}  }}{{\num{{ {:.3g} }} }}$".format(
        all_std_single_energy_imbalanced["r2_score"][i], all_test_single_energy_imbalanced["r2_score"][i])
        
r2_df_single_energy_imbalanced = pd.DataFrame(r2_str_array_single_energy_imbalanced, columns=rows)

section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "regression_simulated_single_energy_pixelmod_r2.tex"
caption = """
Mean R2-scores for regresson of energy values, on single events in simulated data with specific pixels
set to zero, using multiple models. 
Error estimates are the standard deviation in results from k-fold cross-validation 
with $K=5$ folds.
"""
label = "tab:regression-simulated-single-energy-pixelmod-r2"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    r2_df_single_energy_imbalanced.to_latex(fp, escape=False, caption=caption, label=label, index=False)


## Double events

### Positions

#### Linear Regression

In [74]:
# Load linear regression experiment 
#lin_ex_id = "e3f840121ced"
lin_ex_id = "0bd0f7580b57" # latest
lin_ex = load_experiment(lin_ex_id)

# Load model and predict
lin_model = tf.keras.models.load_model(repo_root + "models/" + lin_ex_id + ".h5", compile=False)
lin_test = regression_metrics(lin_model, test_images.reshape(test_images.shape[0], 256)[d_idx], normalize_position_data(test_positions[d_idx]), "lin_test")
del lin_model #No longer needed, clear memory just in case.

lin_metrics = experiment_metrics_to_df(lin_ex)
#display(lin_metrics)
lin_means = lin_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
lin_means = lin_means.rename(index={'mean': 'lin_mean', 'std': 'lin_std'})
#display(lin_means)

#### Small dense network

In [75]:
# Load logistic regression experiment 
#dense_ex_id = "44de4c962f6c"
dense_ex_id = "d3649a6b4759" # latest
dense_ex = load_experiment(dense_ex_id)
# Load model and predict
dense_model = tf.keras.models.load_model(repo_root + "models/" + dense_ex_id + ".h5", compile=False)
dense_test = regression_metrics(dense_model, test_images.reshape(test_images.shape[0], 256)[d_idx], normalize_position_data(test_positions[d_idx]), "dense_test")
del dense_model

dense_metrics = experiment_metrics_to_df(dense_ex)
#display(dense_metrics)
dense_means = dense_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
dense_means = dense_means.rename(index={'mean': 'dense_mean', 'std': 'dense_std'})
#display(dense_means)

#### Small CNN

In [76]:
# Load logistic regression experiment 
#cnn_ex_id = "7cb4c91d34d3"
cnn_ex_id = "c0ca76a469aa" # latest
cnn_ex = load_experiment(cnn_ex_id)
# Load model and predict
cnn_model = tf.keras.models.load_model(repo_root + "models/" + cnn_ex_id + ".h5", compile=False)
cnn_test = regression_metrics(cnn_model, test_images[d_idx], normalize_position_data(test_positions[d_idx]), "cnn_test")
del cnn_model

cnn_metrics = experiment_metrics_to_df(cnn_ex)
#display(cnn_metrics)
cnn_means = cnn_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
cnn_means = cnn_means.rename(index={'mean': 'cnn_mean', 'std': 'cnn_std'})
#display(cnn_means)

#### Pretrained - VGG
As an additional baseline for performance, we include a pretrained SOTA network
where trained on the ImageNet database.

Due to the size of our detector images (16x16) compared with the size the VGG network is
designed for, we cannot use all layers in the VGG network. This stems from the use of max-pooling
which effectively reduces the image size to half (8x8) each time the input is passed through such a
layer. At some point our input is too small to pass through to the rest of the network.
We therefore cut the network at the point where this becomes an issue.
Alternatively, one could possibly keep the depth but remove max-pooling layers.

In [77]:
# Load logistic regression experiment 
#pretrained_ex_id = "5230ffcd7119"
pretrained_ex_id = "36fb02863acf" # latest
pretrained_ex = load_experiment(pretrained_ex_id)
# Load model and predict
pretrained_model = tf.keras.models.load_model(repo_root + "models/" + pretrained_ex_id + ".h5", compile=False)
pretrained_test = regression_metrics(pretrained_model, np.concatenate((test_images, test_images, test_images), axis=-1)[d_idx], normalize_position_data(test_positions[d_idx]), "pretrained_test")
del pretrained_model

pretrained_metrics = experiment_metrics_to_df(pretrained_ex)
#display(pretrained_metrics)
pretrained_means = pretrained_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
pretrained_means = pretrained_means.rename(index={'mean': 'pretrained_mean', 'std': 'pretrained_std'})
#display(pretrained_means)

#### Custom model

In [78]:
# Load custom regression experiment 
#custom_ex_id = "1a1fd5dff9ae"
custom_ex_id = "c8850bc2abe6" # latest
custom_ex = load_experiment(custom_ex_id)
# Load model and predict
custom_model = tf.keras.models.load_model(repo_root + "models/" + custom_ex_id + ".h5", compile=False)
custom_test = regression_metrics(custom_model, test_images[d_idx], normalize_position_data(test_positions[d_idx]), "custom_test")
del custom_model

custom_metrics = experiment_metrics_to_df(custom_ex)
#display(custom_metrics)
custom_means = custom_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
custom_means = custom_means.rename(index={'mean': 'custom_mean', 'std': 'custom_std'})
#display(custom_means)

#### Output
We use the standard deviation in the folds as an error measure, and report the mean classification f1_score.

In [79]:
all_means_double_pos_imbalanced = pd.DataFrame(
    [
        lin_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_mean': 'Linear',
        'dense_mean': 'Dense',
        'cnn_mean': 'CNN',
        'pretrained_mean': 'Pretrained',
        'custom_mean': 'Custom',
    }
)

all_std_double_pos_imbalanced = pd.DataFrame(
    [
        lin_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_std': 'Linear',
        'dense_std': 'Dense',
        'cnn_std': 'CNN',
        'pretrained_std': 'Pretrained',
        'custom_std': 'Custom',
    }
)

all_test_double_pos_imbalanced = pd.DataFrame(
    [
        lin_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_test': 'Linear',
        'dense_test': 'Dense',
        'cnn_test': 'CNN',
        'pretrained_test': 'Pretrained',
        'custom_test': 'Custom',
    }
)
display(all_test_double_pos_imbalanced)
display(all_std_double_pos_imbalanced)

Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.357468,0.045052,0.212254,0.173011
Dense,0.422413,0.040498,0.201242,0.16163
CNN,0.44938,0.038607,0.196487,0.158102
Pretrained,0.437315,0.039453,0.198628,0.160055
Custom,-0.008008,0.070677,0.265851,0.229919


Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.007794,0.000734,0.001732,0.001058
Dense,0.007345,0.000671,0.001667,0.000753
CNN,0.004092,0.000466,0.001186,0.001217
Pretrained,0.010311,0.000781,0.001942,0.001563
Custom,0.26484,0.018356,0.040269,0.040213


In [80]:
rows = all_test_double_pos_imbalanced.index
r2_str_array_double_pos_imbalanced = np.zeros((1, all_test_double_pos_imbalanced.shape[0]), dtype=object)
for i in range(all_test_double_pos_imbalanced.shape[0]):
    r2_str_array_double_pos_imbalanced[0, i] = r"$\underset{{\num{{+- {:.3e} }}  }}{{\num{{ {:.3g} }} }}$".format(
        all_std_double_pos_imbalanced["r2_score"][i], all_test_double_pos_imbalanced["r2_score"][i])
        
r2_df_double_pos_imbalanced = pd.DataFrame(r2_str_array_double_pos_imbalanced, columns=rows)

section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "regression_simulated_double_position_pixelmod_r2.tex"
caption = """
Mean R2-scores for regresson of positions of origin, on double events in simulated data with specific pixels
set to zero, using multiple models. 
Error estimates are the standard deviation in results from k-fold cross-validation 
with $K=5$ folds.
"""
label = "tab:regression-simulated-double-position-pixelmod-r2"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    r2_df_double_pos_imbalanced.to_latex(fp, escape=False, caption=caption, label=label, index=False)


### Energy

#### Linear regression

In [81]:
# Load linear regression experiment 
#lin_ex_id = "fa1bac5bbad7"
lin_ex_id = "d63ccc7d71d8"
lin_ex = load_experiment(lin_ex_id)

# Load model and predict
lin_model = tf.keras.models.load_model(repo_root + "models/" + lin_ex_id + ".h5", compile=False)
lin_test = regression_metrics(lin_model, test_images.reshape(test_images.shape[0], 256)[d_idx], test_energies[d_idx], "lin_test")
del lin_model #No longer needed, clear memory just in case.

lin_metrics = experiment_metrics_to_df(lin_ex)
#display(lin_metrics)
lin_means = lin_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
lin_means = lin_means.rename(index={'mean': 'lin_mean', 'std': 'lin_std'})
#display(lin_means)

#### Small dense network

In [82]:
# Load dense regression experiment
#dense_ex_id = "a603f7b7d717"
dense_ex_id = "fe6122de2d69"
dense_ex = load_experiment(dense_ex_id)
# Load model and predict
dense_model = tf.keras.models.load_model(repo_root + "models/" + dense_ex_id + ".h5", compile=False)
dense_test = regression_metrics(dense_model, test_images.reshape(test_images.shape[0], 256)[d_idx], test_energies[d_idx], "dense_test")
del dense_model

dense_metrics = experiment_metrics_to_df(dense_ex)
#display(dense_metrics)
dense_means = dense_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
dense_means = dense_means.rename(index={'mean': 'dense_mean', 'std': 'dense_std'})
#display(dense_means)

#### Small CNN

In [83]:
# Load cnn regression experiment
#cnn_ex_id = "aae44d283ef0"
cnn_ex_id = "b85f508d2aef"
cnn_ex = load_experiment(cnn_ex_id)
# Load model and predict
cnn_model = tf.keras.models.load_model(repo_root + "models/" + cnn_ex_id + ".h5", compile=False)
cnn_test = regression_metrics(cnn_model, test_images[d_idx], test_energies[d_idx], "cnn_test")
del cnn_model

cnn_metrics = experiment_metrics_to_df(cnn_ex)
#(cnn_metrics)
cnn_means = cnn_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
cnn_means = cnn_means.rename(index={'mean': 'cnn_mean', 'std': 'cnn_std'})
#display(cnn_means)

#### Pretrained - VGG16 

In [84]:
# Load logistic regression experiment
#pretrained_ex_id = "4f5d0b4bd0ef"
pretrained_ex_id = "640c96c9730b"
pretrained_ex = load_experiment(pretrained_ex_id)
# Load model and predict
pretrained_model = tf.keras.models.load_model(repo_root + "models/" + pretrained_ex_id + ".h5", compile=False)
pretrained_test = regression_metrics(pretrained_model, np.concatenate((test_images, test_images, test_images), axis=-1)[d_idx], test_energies[d_idx], "pretrained_test")
del pretrained_model

pretrained_metrics = experiment_metrics_to_df(pretrained_ex)
#display(pretrained_metrics)
pretrained_means = pretrained_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
pretrained_means = pretrained_means.rename(index={'mean': 'pretrained_mean', 'std': 'pretrained_std'})
#display(pretrained_means)

#### Custom model

In [85]:
# Load custom regression experiment 
#custom_ex_id = "c227bd3fd86a"
custom_ex_id = "8ab907032835"
custom_ex = load_experiment(custom_ex_id)
# Load model and predict
custom_model = tf.keras.models.load_model(repo_root + "models/" + custom_ex_id + ".h5", compile=False)
custom_test = regression_metrics(custom_model, test_images[d_idx], test_energies[d_idx], "custom_test")
del custom_model

custom_metrics = experiment_metrics_to_df(custom_ex)
#display(custom_metrics)
custom_means = custom_metrics.agg([np.mean, np.std])#.applymap('{:.3f}'.format)
custom_means = custom_means.rename(index={'mean': 'custom_mean', 'std': 'custom_std'})
#display(custom_means)

#### Output

In [86]:
all_means_double_energy_imbalanced = pd.DataFrame(
    [
        lin_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_mean': 'Linear',
        'dense_mean': 'Dense',
        'cnn_mean': 'CNN',
        'pretrained_mean': 'Pretrained',
        'custom_mean': 'Custom',
    }
)

all_std_double_energy_imbalanced = pd.DataFrame(
    [
        lin_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        dense_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
        custom_means.iloc[1][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_std': 'Linear',
        'dense_std': 'Dense',
        'cnn_std': 'CNN',
        'pretrained_std': 'Pretrained',
        'custom_std': 'Custom',
    }
)

all_test_double_energy_imbalanced = pd.DataFrame(
    [
        lin_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        dense_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        cnn_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        pretrained_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
        custom_test.iloc[0][['r2_score', 'mse', 'rmse', 'mae']],
    ]
).rename(
    index={
        'lin_test': 'Linear',
        'dense_test': 'Dense',
        'cnn_test': 'CNN',
        'pretrained_test': 'Pretrained',
        'custom_test': 'Custom',
    }
)
display(all_test_double_energy_imbalanced)
display(all_std_double_energy_imbalanced)

Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.410889,0.049145,0.221687,0.180724
Dense,0.431818,0.047397,0.217708,0.177637
CNN,0.460801,0.044975,0.212073,0.172855
Pretrained,0.440374,0.046678,0.216052,0.176819
Custom,0.41165,0.049083,0.221548,0.180421


Unnamed: 0,r2_score,mse,rmse,mae
Linear,0.046128,0.003816,0.008775,0.007044
Dense,0.045632,0.003773,0.00866,0.006951
CNN,0.053899,0.004457,0.010137,0.008097
Pretrained,0.026815,0.002227,0.005239,0.003738
Custom,0.209144,0.017365,0.034881,0.03484


In [87]:
rows = all_test_double_energy_imbalanced.index
r2_str_array_double_energy_imbalanced = np.zeros((1, all_test_double_energy_imbalanced.shape[0]), dtype=object)
for i in range(all_test_double_energy.shape[0]):
    r2_str_array_double_energy_imbalanced[0, i] = r"$\underset{{\num{{+- {:.3e} }}  }}{{\num{{ {:.3g} }} }}$".format(
        all_std_double_energy_imbalanced["r2_score"][i], all_test_double_energy_imbalanced["r2_score"][i])
        
r2_df_double_energy_imbalanced = pd.DataFrame(r2_str_array_double_energy_imbalanced, columns=rows)

section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "regression_simulated_double_energy_pixelmod_r2.tex"
caption = """
Mean R2-scores for regresson of energy values, on double events in simulated data with specific pixels
set to zero, using multiple models. Error estimates are the standard deviation in results from k-fold 
cross-validation with $K=5$ folds.
"""
label = "tab:regression-simulated-double-energy-pixelmod-r2"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    r2_df_double_energy_imbalanced.to_latex(fp, escape=False, caption=caption, label=label, index=False)


# Combined tables

In [88]:
df_pos = pd.concat(
    [
        r2_df_single_pos.rename({0:"Single (a)"}),
        r2_df_single_pos_pmod.rename({0:"Single (b)"}),
        r2_df_single_pos_imbalanced.rename({0:"Single (c)"}),
        r2_df_double_pos.rename({0:"Double (a)"}),
        r2_df_double_pos_pmod.rename({0:"Double (b)"}),
        r2_df_double_pos_imbalanced.rename({0:"Double (c)"}),
    ],
)
#display(df_pos)

df_energy = pd.concat(
    [
        r2_df_single_energy.rename({0:"Single (a)"}),
        r2_df_single_energy_pmod.rename({0:"Single (b)"}),
        r2_df_single_energy_imbalanced.rename({0:"Single (c)"}),
        r2_df_double_energy.rename({0:"Double (a)"}),
        r2_df_double_energy_pmod.rename({0:"Double (b)"}),
        r2_df_double_energy_imbalanced.rename({0:"Double (c)"}),
    ],
)
#display(df_energy)

In [89]:
# Output position df
section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "regression_simulated_all_positions_r2.tex"
caption = """
Test set R2-scores for regresson of positions of origin on simulated data, with models trained on data with: 
a) no modifications, b) specific pixels set to zero to mimic experimental data, and c) imbalanced dataset
in addition to modifications in b) to further mimic experimental data. Error estimates are the standard deviation 
in results from validation data in k-fold cross-validation with $K=5$ folds.
"""
label = "tab:regression-simulated-all-positions-r2"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    df_pos.to_latex(fp, escape=False, caption=caption, label=label, index=True)

In [90]:
# Output energy df
section_path = "chapters/results/figures/"
fname = THESIS_PATH + section_path + "regression_simulated_all_energies_r2.tex"
caption = """
Test set R2-scores for regresson of energies on simulated data, with models trained on data with: 
a) no modifications, b) specific pixels set to zero to mimic experimental data, and c) imbalanced dataset
in addition to modifications in b) to further mimic experimental data. Error estimates are the standard deviation 
in results from validation data in k-fold cross-validation with $K=5$ folds.
"""
label = "tab:regression-simulated-all-energies-r2"
with open(fname, "w") as fp:
    pd.set_option('display.max_colwidth', -1)
    df_energy.to_latex(fp, escape=False, caption=caption, label=label, index=True)