# Autoencoders

* K-Means Clustering
* Principal Component Analysis (dimensionality reduction)
* Autoencoders (feature learning)
* Density estimation 

# Maximum Likelihood

# Density Estimation
*

# Maximum Likelihood
* Data: $p_{data}(x)$
* Parameters: $\theta$
* Model: $p_\theta(x)$
* Samples: $x \sim p_{data}(x)$

![](https://github.com/jordanott/DeepLearning/blob/master/Figures/data_distributions.png?raw=true)

# Latent Variables
* **Latent:** hidden or concealed

# Latent Variable Example
* Your **health** is a latent variable
* There isn’t a single measurement of “health” that can be measured, it is a rather abstract concept
* Measure physical properties from our bodies
    * Blood pressure
    * Cholesterol level
    * Weight
    * Blood sugar
    * Temperature
    
* These **measurements/observations** give us a clue of a persons health

# Latent Variable Models

model:  
$p_\theta(x, z) = p_\theta(x|z)p_\theta(z)$

* joint $p_\theta(x, z)$
* conditional likelihood $p_\theta(x|z)$
* prior $p_\theta(z)$

marginalization:  
$p_\theta(x) = \int p_\theta(x,z)dz$


# Variational Inference

# ELBO

# Encoder network 
* z = $g_\phi(x)$
* Translates the original high-dimension input, $x$, into the latent low-dimensional code, $z$
* The input size is larger than the output size


# Decoder network
* $x' = f_\theta(g_\phi(x)) = f_\theta(z)$
* Recovers the data from the code
* Likely with larger and larger output layers

# Architecture 
<img src="https://lilianweng.github.io/lil-log/assets/images/autoencoder-architecture.png" width="650">

# Training
* $(\theta, \phi)$ are learned together
* $\mathbf{x} \approx f_\theta(g_\phi(\mathbf{x}))$

\begin{equation}
    L_\text{AE}(\theta, \phi) = \frac{1}{n}\sum_{i=1}^n (\mathbf{x}^{(i)} - f_\theta(g_\phi(\mathbf{x}^{(i)})))^2
\end{equation}

# Example
* Latent dimension is 2
<img src="https://github.com/jordanott/DeepLearning/blob/master/Figures/ae.png?raw=true" width="650">

# Visualize Latent Space
![](https://www.researchgate.net/profile/Ehsan_Hosseini_Asl2/publication/275960143/figure/fig3/AS:392026551013379@1470477821195/Visualization-of-MNIST-handwritten-digits-196-higher-representation-of-digits-computed.png)

# Generate Samples
<img src="https://blog.keras.io/img/ae/vae_digits_manifold.png" width="400">

[Video](https://gertjanvandenburg.com/figures/autoencoder/latent_circle.mp4)

# Denoising Autoencoder
* Risk of overfitting because AE learns identity function
* Especially when there are more parameters than data points

* Partially corrupt the input by adding noise
* $\mathcal{M}_\mathcal{D}$ adds noise to the original input
* $\tilde{\mathbf{x}} \sim \mathcal{M}_\mathcal{D}(\tilde{\mathbf{x}} \vert \mathbf{x})$

# Training

\begin{aligned}
\tilde{\mathbf{x}}^{(i)} &\sim \mathcal{M}_\mathcal{D}(\tilde{\mathbf{x}}^{(i)} \vert \mathbf{x}^{(i)})\\
L_\text{DAE}(\theta, \phi) &= \frac{1}{n} \sum_{i=1}^n (\mathbf{x}^{(i)} - f_\theta(g_\phi(\tilde{\mathbf{x}}^{(i)})))^2
\end{aligned}

# Denoising AE Architecture
![](https://lilianweng.github.io/lil-log/assets/images/denoising-autoencoder-architecture.png)

# MNIST Results
<img src="https://cdn-images-1.medium.com/max/1600/1*hfzos8xmCGjrgpTW78PFLg@2x.png" width="500">

# Variational Autoencoder

# Reparameterization Trick
<img src="https://lilianweng.github.io/lil-log/assets/images/reparameterization-trick.png" width="500">

# VAE Architecture
![](https://lilianweng.github.io/lil-log/assets/images/vae-gaussian.png)

# References
* [AE in Keras](https://blog.keras.io/building-autoencoders-in-keras.html)
* [From AE to Beta VAE](https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-beta-vae.html)