# Flow Models

Fitting models for when $x$ is a real variable. Softmax is not an option.

**Main idea**

We can find/pick distribution $Z$, and learn a flow (invertible, differentiable) $f_\theta(x)$ which maps from the underlying dist of x to $z \sim Z$. Then this allows us to sample $x$. 

<img src="https://www.evernote.com/l/AgtDIkTXkYpOupUncfMSwTsTf-lqMyWaxHQB/image.png" alt="drawing" width="500"/>

## Density models

Def of PDF is function $p$ such that

$$P(x \in [a, b]) = \int_a^b p(x)dx$$

### How to fit density model?

Maximum likelihood

$$ \max_{\theta} \sum_i \log p_{\theta}(x^{(i)}) $$

### Example: Gaussian Mixture

$$p_{\theta} = \sum_{i=1}^k \pi_i \mathcal{N}(x; \mu_i, \sigma_i^2) $$
Where $$\sum_i \pi_i = 1$$

Then  

$$\theta = (\pi_i, \dots, \pi_k, \mu_1, \dots, \mu_k, \sigma_1, \dots, \sigma_k)$$

<img src="https://www.evernote.com/l/AgsZdCVhKFhBEIf6mrBWsH2cpeHmSOYMzKYB/image.png" alt="drawing" width="500"/>

#### Highlights

* Does not work well for high dimensional data

E.g. when modelling images, only realistic images are generated very close to the mean of the fitted gaussians. Small perturbations of an image results in non-realistic looking images:

<img src="https://www.evernote.com/l/Agvbod8yTRhGk4fdvQ512A0G_g5ZEjfY52oB/image.png" alt="drawing" width="500"/>

## How to fit a general density model?

Intuition:

<img src="https://www.evernote.com/l/Agvz7oLkjAtFWZMnndxjqzDpfFShHl0WDKEB/image.png" alt="drawing" width="500"/>

* How do we ensure that it is normalized?
$$\int_{-\infty}^{\infty} p_{\theta}(x) dx = 1 $$

Softmax does not work here.

* How to sample?
* Latent representation?

## Flows: Main idea

<img src="https://www.evernote.com/l/AgsOgtDAdhpKn6Hp32wRet3e5yQ5T9LPGOMB/image.png" alt="drawing" width="500"/>

Generally $$z \sim p_Z(z)$$
Normalizing flow $$ z \sim \mathcal{N}(0, 1) $$

## Train

Still use MLE

$$\max_\theta \sum_i \log p_\theta(x^{(i)})$$

### Use change of vars

$$z = f_\theta(x)$$

A transformation of a probability density needs to preserve probability mass. i.e.:

$$p_\theta(x)dx = p(z)dz$$

Then
$$p_\theta(x) = p(f_\theta(x)) \left| \frac{\partial f_\theta(x)}{\partial x}\right| $$

The steeper the slope of $f$ in a region, the lower the density in the image. $f$ needs to be invertible (monotonic) and differentiable.


### This transforms training into

$$\max_\theta\sum_i\log p_\theta(x^{(i)}) = \max_\theta\sum_i\left(\log p_Z(f_\theta(x^{(i)})) + \log\left|\frac{\partial f_\theta}{\partial x}(x^{(i)}\right|\right) $$

Assuming we have an expression of $p_Z$

## Flows: Sampling

Sample 

$$z \sim p_Z(z) \Rightarrow x = f_\theta^{-1}(z)$$

## Example

<img src="https://www.evernote.com/l/AgsY0_E34QhKZrmtRrFwwlEc_3Nd897mdegB/image.png" alt="drawing" width="500"/>

## How to pick dist Z?

Usually uniform or gaussian

## What to use to learn $f_\theta$?

* Mixture of gaussians, logistics

### Neural nets 

Composition of flows is a flow. So making every layer a flow guarantees that the computed function is a flow:

* Sigmoid (works for distributions between 0-1)
* Tanh (works for fitting uniform between -1, 1)




## How General?

**VERY**

Every CDF is a flow from the underlying prob dist to the uniform distribution.

If $x \rightarrow u$ is a flow and and $z\rightarrow u$ is a flow, then we can invirt this to get a flow $x\rightarrow u \rightarrow z$

# Autoregressive Flows

Sampling:

$$ x_1 \sim p_\theta(x_1)\quad x_1 = f_\theta^{-1}(z_1) $$
$$ x_2 \sim p_\theta(x_2 | x_1)\quad x_2 = f_\theta^{-1}(z_2; x1) $$
$$ x_3 \sim p_\theta(x_3 | x_1, x_2)\quad x_3 = f_\theta^{-1}(z_3; x1, x2) $$


# [AF and IAF](https://youtu.be/JBb5sSC0JoY?list=PLwRJQ4m4UJjPiJP3691u-qWwPGVKzSlNP&t=4126)

<img src="https://www.evernote.com/l/AgszxkHnRPxJjqj7shevq50HpzJsIO_RJeUB/image.png" alt="drawing" width="500"/>

Why do we need 1M layers?


# Change many variables at once

$$ p(x) = p(z) \frac{\text{vol}(dz)}{\text{vol}(dx)} = p(z)\left|\det \frac{dz}{dx} \right| $$

The determinant's absolute value represents the 