### Intro to Autoencoders

An autoencoder (AE) is a neural network that 
1. has an intermediate layer that is significantly smaller than the input layer, 
2. has a loss function that scores a difference between the output and input. Therefore the output layer's width is equal to the input layer. 

Schematically, it looks like:

<img src="https://www.compthree.com/images/blog/ae/ae.png" alt="drawing" width="500"/>




The entire network is trained as one unit. The idea is that the latent space is a lower dimensional space that the data can be accurately represented in. The figure above is designed for _vector_ inputs, but there exist networks where the first layers are convolutional layers, and the last layers are _de_convolutional layers, see image below:

![img](https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2018/10/4-Figure1-1.png)

Let's denote the orignal data point (images or vectors) as $\vec{x}$ and the lower-dimensional encoded data as $\vec{z}$. One question to ask is: can we interpolate between points in $z$-space and still have meaningful representations in $x$-space?? That is, does decoding $\alpha \vec{z_1} + (1-\alpha)\vec{z_2}$ give us something "like" $\alpha \vec{x_1} + (1-\alpha)\vec{x_2}$? Often no, as $\alpha \vec{x_1} + (1-\alpha)\vec{x_2}$ is usually meaningless. In images, this corresponds to simply fading between the pixel values of the two images, which is meaningless for most images. 

However, this is pixel-fading is how interpolating between EEMs should behave. I propose that interpolating in encoded-EEM space is possible, and the decoded values are meaningful. 

#### Heuristic lower bound on the number of dimensions to encode to

For simplicity, let's assume that all "peaks" in an EEM can be represented by 2D-Gaussian distributions with diagonal covariance matrices. This is diagonal-only assumption is justified because of Kasha's rule. Therefore each peak is represented by 4 parameters: $\mu_1, \mu_2, \sigma_1, \sigma_2$. Therefore, a perfect encoding of an EEM needs only (4 * [number of peaks]) dimensions to represent it. 


### Composition analysis

Suppose we have built an EEM-autoencoder: the first half denoted as $E$, and the back half denoted as $D$. Given a sample's EEM, $\mathbf{y}$, we can encode it to $\vec{y} = E(\mathbf{y})$. Along with this, suppose we have already measured the EEMs of many internal standards: tyrptophan, tyrosine, chlorophyll, riboflavin, etc. These are encoded as $\vec{z_1}, \vec{z_2}, ...$. Since addition in the encoded space is meaningful, we can set up a linear equation as:

$$ \vec{y} = \sum_i^N \alpha_i \vec{z_i}  + \vec{\epsilon} $$


Solving this equation for $\alpha_i$ using a minimization routine, ideally with sparse prior and non-negative constraints, gives us solutions to what the original sample contains in it. 

The nice parts of this approach is that i) the minimization is much easier to solve since we are working with much lower-dimensional objects than EEMs, ii) using second-order calibration on one of the components, relative molarities can be estimated, iii) since $E$ and $D$ and non-linear, more complicated phenomena like IFE can be learned. 

#### Residual analysis

After estimating the above equation for $\hat{\alpha}_i$, we will have a residual vector, $\vec{R} = \vec{y} - \sum_i^N \hat{\alpha}_i \vec{z_i}$. What does $D(\vec{R})$ look like? Is it meaningful? Could this tell us if we are missing a internal standard? 