Gerating new images:

Basically mean **we would want the network to save space by mapping similar images to similar embeddings**

Say we are done training, we remove the decoder

For each image, we have the embedding

--> Now given 2 images from training data, we can pass through the encoder, which gives us 2 embeddings

Now I can interpolate between 2 embeddings

e.g. e3 = alpha e1 + (1 - alpha) e2

Now pass this e3 through Decoder, and will generate image X3 that's not from the data (result in something alpha% similar to image 1, 1-alpha% similar to image 2)

WHAT IF we create a random embedding and pass it through encoder?
- Lucky: random embedding will be close to 1 meaningful image (low probability)
- Usual: get something nonsense 

--> How can we have a **true generative model**: pass in random numbers and generate a meaningful image

***
## Variational AutoEncoders (VAE)

Variant of autoencoders -> Used to generate new images 

Different:
- Models so far is deterministic. Once we trained, everytime we pass in 1 input, we get 1 output
- **Probabilistic**: 1 input, different output at different times
- **Generative**: can generate new instances that look like they were sampled from the training set

They impose a distribution constraint on the latent space to have a smooth space.

VAEs not learning single static embedding per input, we learn a **distribution over embeddings**. Assume that this is normal/gaussian distribution

The only thing we require to define these distributions are $\mu$ (mean of data) and $\sigma$ (std deviation of data)

--> Only need to learn 2 variable. Therefore, the encoder will generate **(learning) 2 vectors of mu and sigma.**

What we're going to use is to use these mu and sigma to sample and get the embedding

<img src="images/Screenshot 2023-10-25 at 10.18.29 AM.png">



NOTE: Sampling is a non-differentiable operation. Once we sample, there's no gradient. So how do we train the network?

--> Use **reparametrization trick**

$$\text{z = mu + sigma * epsilon}$$

$$\text{epsilon is random sampling, got from N(0, I)}$$

***
## Convolutional Autoencoders

Remember: convolution is much better fit for images

- **Encoder** part: we already know (similar to Convolutional Classifier)
- **Decoder** part: once we get to the embedding, we want to go in the **reverse direction**

### Transposed Convolution

Increasing resolution rather than decreasing it

$$o = (i - 1) * s + (k - 1) - 2p + op + 1$$

- o: output dimension
- i: input dimension
- s: stride (only for the output, consider all input pixels)
- k: kernel size
- p: input padding
- op: output padding

<img src="images/Screenshot 2023-10-25 at 10.44.26 AM.png" height=30% width=30%>

### Padding: drop output border

### Output padding

- Previous convolution may map multiple inputs to one output
- So, output padding is needed for transpose convolution to identify exactly which dimension we want


TECHNICALLY, if we pass the input -> conv -> transpose conv -> input again

In [None]:
def __init__(self):
    super(Autoencoder, self).__init__()
    self.encoder = nn.Sequential(
        nn.Conv2d(1, 16, 3, stride=2, padding=1),
        nn.ReLU(),
        nn.Conv2d(16, 32, 3, stride=2, padding=1),
        nn.ReLU(),
        nn.Conv2d(32, 64, 7)
    )
    self.decoder = nn.Sequential(           # A function that packs multiple layers together
        nn.ConvTranspose2d(64, 32, 7),
        nn.ReLU(),
        nn.ConvTranspose2d(32, 16, 3, stride=2, padding=1,output_padding=1),
        nn.ReLU(),
        nn.ConvTranspose2d(16, 1, 3, stride=2, padding=1, output_padding=1),
        nn.Sigmoid()
    )
    
def forward(self, x):
    x = self.encoder(x)
    x = self.decoder(x)
    return x
def embed(self, x):
    return self.encoder(x)
def decode(self, e):
    return self.decode(e)

***
## Pre-training with Autoencoders

Recall: we can do Transfer learning in CNN

We can do the exact same thing with autoencoders.

Once we are done training, we can **drop decoder**, attach a classifier to classify your data :) 

--> Only need very small amount of supervised data (with label)