# Variational Autoencoder for classifying electron events
Following TensorFlow introduction: [Convolutional VAE](https://www.tensorflow.org/tutorials/generative/cvae),
but with our image data. The notebook takes most the text from the tutorial for simplicity.

Looking to start by generating single-electron events, to see if it works at all

In [1]:
# Imports
import numpy as np
import tensorflow as tf
import os
import time
import glob
import PIL
import imageio
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split
import sys
import matplotlib.pyplot as plt
from master_data_functions.functions import *
from master_models.pretrained import pretrained_model
from IPython import display
%load_ext autoreload
%autoreload 2


## Data import

In [23]:

# File import
# Sample filenames are:
# CeBr10kSingle_1.txt -> single events, 
# CeBr10kSingle_2.txt -> single events
# CeBr10k_1.txt -> mixed single and double events 
# CeBr10.txt -> small file of 10 samples
# CeBr2Mil_Mix.txt -> 2 million mixed samples of simulated events

# Flag import, since we can now import 200k events from .npy files
from_file = False
if from_file:

    folder = "simulated"
    filename = "CeBr2Mil_Mix.txt"
    num_samples = 2e5
    #folder = "sample"
    #filename = "CeBr10k_1.txt"
    #num_samples = 1e3

    data = import_data(folder=folder, filename=filename, num_samples=num_samples)
    images = data[filename]["images"]
    energies = data[filename]["energies"]
    positions = data[filename]["positions"]
    labels = to_categorical(data[filename]["labels"])
    n_classes = labels.shape[1]
else:
    images = load_feature_representation("images_noscale_200k.npy")
    energies = load_feature_representation("energies_noscale_200k.npy")
    positions = load_feature_representation("positions_noscale_200k.npy")
    labels = load_feature_representation("labels_noscale_200k.npy")

n_classes = labels.shape[1]
print("Number of classes: {}".format(n_classes))
print("Images shape: {}".format(images.shape))
print("Energies shape: {}".format(energies.shape))
print("Positions shape: {}".format(positions.shape))
print("Labels shape: {}".format(labels.shape))


Number of classes: 2
Images shape: (200000, 16, 16, 1)
Energies shape: (200000, 2)
Positions shape: (200000, 4)
Labels shape: (200000, 2)


## Training and test setup

In [28]:
MODEL_PATH = "../../data/output/models/"
FIGURE_PATH = "../../"

# Indices to use for training and test
x_idx = np.arange(images.shape[0])

# Split the indices into training and test sets
train_idx, test_idx, not_used1, not_used2 = train_test_split(x_idx, x_idx, test_size = 0.2)    

test_positions = positions[test_idx]
train_positions = positions[train_idx]
test_energies = energies[test_idx]
train_energies = energies[train_idx]

# indices, relative distances and relative energies for test set
single_indices, double_indices, close_indices = event_indices(test_positions)
rel_distance_test = relative_distance(test_positions)
energy_diff_test = energy_difference(test_energies)
rel_energy_test = relative_energy(test_energies)

#y_true = y_test.argmax(axis=-1)
#tmp_predicted = loaded_model.predict(x_test)
#y_pred = tmp_predicted.argmax(axis=-1)




## VAE setup
### Network architecture
For the inference network, we use two convolutional layers followed by a fully-connected layer. In the generative network, we mirror this architecture by using a fully-connected layer followed by three convolution transpose layers (a.k.a. deconvolutional layers in some contexts). Note, it's common practice to avoid using batch normalization when training VAEs, since the additional stochasticity due to using mini-batches may aggravate instability on top of the stochasticity from sampling.

### Inference network
This defines an approximate posterior distribution $q(z|x)$ , which takes as input an observation and outputs a set of parameters for the conditional distribution of the latent representation. In this example, we simply model this distribution as a diagonal Gaussian. In this case, the inference network outputs the mean and log-variance parameters of a factorized Gaussian (log-variance instead of the variance directly is for numerical stability).

### Generative network
This defines the generative model which takes a latent encoding as input, and outputs the parameters for a conditional distribution of the observation, i.e. $p(z|x)$ . Additionally, we use a unit Gaussian prior $p(z)$ for the latent variable.

### Reparametrization Trick
During optimization, we can sample from $q(z|x)$  by first sampling from a unit Gaussian, and then multiplying by the standard deviation and adding the mean. This ensures the gradients could pass through the sample to the inference network parameters.

### Define the loss function and the optimizer
VAEs train by maximizing the evidence lower bound (ELBO) on the marginal log-likelihood:

$$\log p(x) \geq ELBO = \mathbb{E}_{q(z|x)}\left[\log\frac{p(x,z)}{q(z,x)}\right]$$.

In practice, we optimize the single sample Monte Carlo estimate of this expectation:

$$\log p(x|z) + \log p(z) - \log q(z|x)$$

where $z$ is sampled from $q(z|x)$.

Note: we could also analytically compute the KL term, but here we incorporate all three terms in the Monte Carlo estimator for simplicity.

## Training
* We start by iterating over the dataset
* During each iteration, we pass the image to the encoder to obtain a set of mean and log-variance parameters of the approximate posterior $q(z|x)$
* We then apply the reparameterization trick to sample from $q(z|x)$
* Finally, we pass the reparameterized samples to the decoder to obtain the logits of the generative distribution $p(x|z)$

## Generate Images
* After training, it is time to generate some images
* We start by sampling a set of latent vectors from the unit Gaussian prior distribution $p(z)$
* The generator will then convert the latent sample $z$ to logits of the observation, giving a distribution $p(x|z)$

Here we plot the probabilities of Bernoulli distributions

## Display an image using the epoch number