In [1]:
import importlib
from pathlib import Path
import shutil

from torch.utils.data import DataLoader, random_split
from torch import cuda, tensor, nn, optim
from torch.backends import mps
from torchvision import transforms

import matplotlib.pyplot as plt
from skimage import io
import numpy as np

import datasets.catdog_loader as mnist_loader
from ptmodels import vae_pytorch as vp

# Use a gpu or M1 chipset to train PyTorch networks if you have it.
if cuda.is_available():
    pt_device = 'cuda'

elif mps.is_available():
    pt_device = 'mps'

else: 
    pt_device = 'cpu'

print(f'Using {pt_device}.')

Using mps.


# Project 2: Autoencoders and Variational Autoencoders.
In this project, we are going to first implement a couple autoencoders trained on MNIST-compatible data sets for dimensionality reduction. Afterwards, we'll then create a generative model using a variational autoencoder. 

## Project 2.1: Autoencoders for Dimensionality Reduction.
High dimensionality data frequently has more dimensions than is needed to perform regression, classification, or clustering.  More formally, there is a lot of covariance within most data, covariance that reduces the intrinsic dimensionality of the data set.  Think of image data --- a $128 \times 128$ pixel image can be thought of as a vector $\mathbf{x}$ which resides in a 416384$-dimensional vector space. That does not mean there are $16384$ unique features.  Intuitively, we know there are less features in the data, and those features are captured by correlations between pixels.  In other words, we could find a mapping from the starting representation $\mathbf{x}$ to a reduced-dimension latent representation $\mathbf{z}\in \mathcal{R}^m$, where hopefully $m \ll 16384$.  We can then use more approachable latent representation $\mathbf{z}\in \mathcal{Z}$ to analyze the starting dataset $\mathbf{x}\in \mathcal{X}$.

An autoencoder does this by finding three things: a latent representation $\mathbf{z}\in \mathcal{Z}$, an encoding function $E_{\phi}(\mathbf{x})=\mathbf{z}$ parameterized by $\phi$, and a decoding function $D_{\theta}(\mathbf{z})=\mathbf{x}$ parameterized by $\theta$. Here, we will simultaneously train two dense, multi-layer perceptrons to estimate functions $E_{\phi}$ and $D_{\theta}$, recovering the latent space $\mathcal{Z}$ in the process. 

Training the perceptron networks requires a loss function.  Although the data we are training on is labeled, we will not be using them.  Instead, we are going to perform unsupervised learning.  Specifically, we'll are going to optimize by minimizing the 'distance' between the starting vector $\mathbf{x}$ and its predicted decoding $D_{\phi}(E_{\theta}(\mathbf{x}))$: 

$L(\mathcal{X}|\phi, \theta)=-\frac{1}{N}\sum^N_{i=1}L_2(\mathbf{x}_i, D_{\phi}(E_{\theta}(\mathbf{x}_i)))$

where $N$ is the size of the training data sample $\mathcal{X}$ and $L_2(\mathbf{x}, \mathbf{x}^{\prime})=|| \mathbf{x} - \mathbf{x}^{\prime} ||^2$ is the L2 loss (basically, the Euclidian distance up to some multiplicative constant). To train, we will minimize $L(\mathcal{X}|\phi, \theta)$ with respect to the parameters $\phi$ and $\theta$.

Enough math.  Let's start setting up the model to train!