## In deep generative models, model design means defining objective functions
- Any deep generative models explicitly set the objective function to optimize
    - Autoregressive models・Flow models: Kullback-Leibler divergence(log likelihood)
    - VAE: Evidence lower bound
    - GAN: Jensen-Shannon divergence(GAN also needs update of objective function itsself(=adversarial learning))
- Regularization terms of inference or random variable representation is incorporated in the objective function
<img src='../tutorial_figs/vae_loss_EN.png'>
   
    - In deep generative models, model design means defining objective functions
    - Unlike traditional generative models, deep generative models don't inference by sampling
- A framework that receives probability distributions and defines the objective functions
    - LossAPI  
<img src='../tutorial_figs/pixyz_API.png'>

## Receive probability distribution and define the objective function
- Loss API document: https://pixyz.readthedocs.io/en/latest/losses.html#

We take probability distributions defined in Distribution API and define the objective function.  
In order to define the objective function, it needs these elements.  
1. Calculate likelihood
1. Calculate the distance between probability distribution
1. Calculate the expected value
1. Calculation considering data distribution(mean, sum)  

In VAE loss, each elements corresponds as follows
<img src='../tutorial_figs/vae_loss_API.png'>

### Evaluate the loss
Loss API needs input variable(`input_var`). The value of loss is calculated not until the input variable feeds into the loss.
```python
p = DistributionAPI()
# define the objective function receiving distribution
loss = LossAPI(p)
# the value of loss is calculated when input_var is feeded
loss_value = loss.eval({'input_var': input_data})
```

In [1]:
from __future__ import print_function
import torch
from torch import nn
from torch.nn import functional as F
import numpy as np

torch.manual_seed(1)

<torch._C.Generator at 0x7f35a0113c50>

In [2]:
# Pixyz module
from pixyz.distributions import Normal
from pixyz.utils import print_latex

### Calculate likelihood
When the observation $x_1$, ...., $x_N$ is obtained, we calculate the likelihood of the probability distribution p, which we assume x follows.  
Here, we assume x follows a Gaussian distribution with mean=0, variance = 1.  
$p(x) = \cal N(\mu=0, \sigma^2=1)$

In [3]:
# define probability distribution p
x_dim = 5
p_nor_x = Normal(var=['x'], loc=torch.tensor(0.), scale=torch.tensor(1.), features_shape=[x_dim])
print(p_nor_x)
print_latex(p_nor_x)

Distribution:
  p(x)
Network architecture:
  Normal(
    name=p, distribution_name=Normal,
    var=['x'], cond_var=[], input_var=[], features_shape=torch.Size([5])
    (loc): torch.Size([1, 5])
    (scale): torch.Size([1, 5])
  )


<IPython.core.display.Math object>

In [4]:
# observe x
observed_x_num = 100
observed_x = torch.randn(observed_x_num, x_dim)
print(observed_x.shape)

torch.Size([100, 5])


Log likelihood is calculated as follows:  
$L=\sum_{i=1}^{100} \log p\left(x_{i}\right)$  
We can calculate log likelihood easily by using `LogProb()`.  
To define log likelihood, We set the probability distribution defined in Pixyz distribution to `LogProb()`'s argument.  
The value is calculated when observed data feeded into `LogProb.eval()`.  
Pixyz document: https://docs.pixyz.io/en/latest/losses.html#probability-density-function

In [5]:
from pixyz.losses import LogProb
# set the probability distribution to LogProb()'s arg
log_likelihood_x = LogProb(p_nor_x)
print_latex(log_likelihood_x)

<IPython.core.display.Math object>

In [6]:
# The likelihood for each observation is calculated
print(log_likelihood_x.eval({'x': observed_x}))
# observed_x_num = 100
print('observed_x_num: ', len(log_likelihood_x.eval({'x': observed_x})))

tensor([ -7.5539,  -6.8545,  -6.4024,  -5.8851,  -6.1517,  -8.3702,  -6.7028,
         -5.0395,  -7.4346,  -7.1497,  -5.7594,  -7.3006, -11.9857,  -5.8238,
         -6.7561,  -5.7640,  -6.2382,  -4.9060,  -6.1076,  -8.2535,  -7.8250,
         -7.1956,  -7.6949,  -5.2324, -11.5860,  -8.1068,  -7.1763,  -8.3332,
        -11.4631,  -6.6297,  -6.1200, -12.2358,  -5.3402,  -7.1465,  -7.5106,
         -7.0829,  -6.6300,  -6.1832,  -7.2049, -10.8676,  -6.8674,  -5.8339,
         -9.1939,  -7.5965,  -8.7743,  -7.3492,  -5.2578, -10.3097,  -6.5646,
         -4.8807,  -5.9738,  -6.2394, -10.3945,  -9.1760,  -9.2957,  -5.5627,
         -7.1047,  -6.4066,  -6.8100,  -6.0878,  -6.8835,  -7.9132,  -5.0738,
         -8.8378,  -6.2286,  -5.8401,  -5.9691,  -5.6857,  -7.6903,  -6.4982,
         -7.1259,  -8.7953, -10.5572,  -5.9161,  -7.0649,  -6.1292,  -6.0871,
         -7.2513,  -7.2517,  -7.1378,  -6.4228,  -5.5728,  -5.6155,  -5.1962,
         -8.3940,  -7.8178,  -9.8129,  -6.1119,  -5.0492,  -8.98

`log_likelihood_x.eval({'x': observed_x})`'s output has the calculated result of  
$\log p(x_{1})$, $\log p(x_{2})$, ...., $\log p(x_{100})$  

log_likelihood_x.eval({'x': observed_x})[i] = $\log p(x_{i})$

Next, calculate  
$L=\sum_{i=1}^{100} \log p\left(x_{i}\right)$

In [7]:
# sum
print('log likelihood result:', log_likelihood_x.eval({'x': observed_x}).sum())

log likelihood result: tensor(-715.5875)


As shown above, we can easily calculate log likelihood by using pixyz.losses `LogProb()`.    
The same calculation can be performed by defined probability distribution method `p.log_prob().eval()`  

In [8]:
print('LogProb()')
print(LogProb(p_nor_x).eval({'x': observed_x}).sum())
print('.log_prob()')
print(p_nor_x.log_prob().eval({'x': observed_x}).sum())

LogProb()
tensor(-715.5875)
.log_prob()
tensor(-715.5875)


For more Loss API related to probability density function:  
https://docs.pixyz.io/en/latest/losses.html#probability-density-function

### Calculate the distance between probability distributions
In the learning of generative models, we consider $p_{\theta}(x)$ that is closed to the true distribution(data distribution) $p_{data}(x)$.  
To find the appropriate parameter $\theta$, we measure the distance between distributions.

In VAE models we calculate Kullback-Leibler divergence, and in GAN models we calculate Jensen-Shannon divergence.  
We can easily calculte the distance between distributions by Loss API  
Pixyz document:  
https://docs.pixyz.io/en/latest/losses.html#statistical-distance  
https://pixyz.readthedocs.io/en/latest/losses.html#adversarial-statistical-distance

Here, we calculate the Kullback-Leibler divergence between a Gaussian distribution with mean=0, variance=1 and a Gaussian distribution with mean=5, variance=0.1  
$p(x) = \cal N(\mu=0, \sigma^2=1)$  
$q(x) = \cal N(\mu=5, \sigma^2=0.1)$  
$KL(q(x) || p(x))$

In [9]:
# define probability distribution
x_dim = 10
# p 
p_nor_x = Normal(var=['x'], loc=torch.tensor(0.), scale=torch.tensor(1.), features_shape=[x_dim])
print_latex(p_nor_x)

<IPython.core.display.Math object>

In [10]:
# q
q_nor_x = Normal(var=['x'], loc=torch.tensor(5.), scale=torch.tensor(0.1), features_shape=[x_dim], name='q')
print_latex(q_nor_x)

<IPython.core.display.Math object>

To calculate Kullback-Leibler divergence, we use pixyz.losses `KullbackLeibler`.  
We set the probability distribution defined in Pixyz distribution to `KullbackLeibler()`'s argument.  
The value is calculated by `.eval()` method    
Pixyz document: https://docs.pixyz.io/en/latest/losses.html#kullbackleibler  

In [11]:
from pixyz.losses import KullbackLeibler

kl_q_p = KullbackLeibler(q_nor_x, p_nor_x)
print_latex(kl_q_p)

<IPython.core.display.Math object>

In [12]:
# calculte the value
kl_q_p.eval()

tensor([143.0759])

For more Loss API related to statistical distance:  
https://docs.pixyz.io/en/latest/losses.html#statistical-distance  
https://docs.pixyz.io/en/latest/losses.html#adversarial-statistical-distance  

### Calculate the expected value
Expected value is a weighted average of all values of a random variable with a probability weight.
In Pixyz, we calculate expected values of the variables which we can't feed as `input_var` like latent variable(Because it does't exist explicitly in the observation).  
We can easily calculte the expected value of the random variables by Loss API.  
Pixyz document:  
https://docs.pixyz.io/en/latest/losses.html#expected-value

Here, we consider these two probability distributions  
$q(z|x) = \cal N(\mu=x, \sigma^2=1)$  
$p(x|z) = \cal N(\mu=z, \sigma^2=1)$  
and calculate following expected value  
$\mathbb{E}_{q(z|x)} \left[\log p(x|z) \right]$

In [13]:
# define probability distributions
from pixyz.distributions import Normal

q_nor_z__x = Normal(loc="x", scale=torch.tensor(1.), var=["z"], cond_var=["x"],
           features_shape=[10], name='q') # q(z|x)
p_nor_x__z = Normal(loc="z", scale=torch.tensor(1.), var=["x"], cond_var=["z"],
                    features_shape=[10]) # p(x|z)

In [14]:
# Caltulate the Log likelihood of p(x|z)
from pixyz.losses import LogProb

p_log_likelihood = LogProb(p_nor_x__z)
print_latex(p_log_likelihood)

<IPython.core.display.Math object>

To calculate expected values, we use pixyz.losses `Expectation`.  
`Expectation()` has argument `p` and `f`.  
We set the function of which we want to calculate expected values to argument `f`, and we set probability distributions which `f` function's random variable follows to argument `p`.  
The value is calculated by `.eval()` method.  
Pixyz document: https://docs.pixyz.io/en/latest/losses.html#expected-value

In [15]:
from pixyz.losses import Expectation as E

E_q_logprob_p = E(q_nor_z__x, LogProb(p_nor_x__z))
print_latex(E_q_logprob_p)

<IPython.core.display.Math object>

In [16]:
sample_x = torch.randn(2, 10)
E_q_logprob_p.eval({'x': sample_x})

tensor([-10.7006, -11.9861])

For more details about Expectatoin API:  
https://docs.pixyz.io/en/latest/losses.html#expected-value

### Calculation considering data distribution(mean, sum) 
In theory, it is necessary to take the expected value of x, but since the data distribution is not actually given, we need to do calculations such as average and sum in the ovserved x batch direction.  
We can easily calculte average and sum by Loss API.  
Here, we calculate likelihood of training data observed_x and calculate its mean.  
$p(x) = \cal N(\mu=0, \sigma^2=1)$  
$\frac{1}{N} \sum_{i=1}^N\left[\log p\left(x^{(i)}\right)\right]$

In [17]:
# observe x
observed_x_num = 100
x_dim = 5
observed_x = torch.randn(observed_x_num, x_dim)
print(observed_x.shape)

torch.Size([100, 5])


In [18]:
# define probability distribution
p_nor_x = Normal(var=['x'], loc=torch.tensor(0.), scale=torch.tensor(1.), features_shape=[x_dim])
print(p_nor_x)
print_latex(p_nor_x)

Distribution:
  p(x)
Network architecture:
  Normal(
    name=p, distribution_name=Normal,
    var=['x'], cond_var=[], input_var=[], features_shape=torch.Size([5])
    (loc): torch.Size([1, 5])
    (scale): torch.Size([1, 5])
  )


<IPython.core.display.Math object>

We can calculate sum or mean by `Loss.mean()` or `Loss.sum()`.

In [19]:
from pixyz.losses import LogProb
# calculate mean
mean_log_likelihood_x = LogProb(p_nor_x).mean() # .mean()
print_latex(mean_log_likelihood_x)

<IPython.core.display.Math object>

In [20]:
mean_log_likelihood_x.eval({'x': observed_x})

tensor(-7.1973)

### Combine Loss
We can do arithmetic operations between losses.  
As an example, we define the following Loss by combining losses.  
$\frac{1}{N} \sum_{i=1}^{N}\left[\mathbb{E}_{q\left(z | x^{(i)}\right)}\left[\log p\left(x^{(i)} | z\right)\right]-K L\left(q\left(z | x^{(i)}\right) \| p(z)\right)\right]$

In [21]:
# define probability distributions
from pixyz.distributions import Normal

# p(x|z)
p_nor_x__z = Normal(loc="z", scale=torch.tensor(1.), var=["x"], cond_var=["z"],
                    features_shape=[10])

# p(z)
p_nor_z = Normal(loc=torch.tensor(0.), scale=torch.tensor(1.), var=["z"],
                    features_shape=[10])

# q(z|x)
q_nor_z__x = Normal(loc="x", scale=torch.tensor(1.), var=["z"], cond_var=["x"],
           features_shape=[10], name='q')

In [22]:
# define Loss
from pixyz.losses import LogProb
from pixyz.losses import Expectation as E
from pixyz.losses import KullbackLeibler

# Log likelihood 
logprob_p_x__z = LogProb(p_nor_x__z)# input_var: x, z

# Expecration
E_q_z__x_logprob_p__z = E(q_nor_z__x, logprob_p_x__z)# input_car: x(z is not needed because of Expectation)

# KL divergence
KL_q_z__x_p_z = KullbackLeibler(q_nor_z__x, p_nor_z)

# Subtraction between losses
total_loss = E_q_z__x_logprob_p__z - KL_q_z__x_p_z# input_var: x(E_q_z__x_logprob_p__z needs x as input_var)

# mean of loss
total_loss = total_loss.mean()

# check the loss
print_latex(total_loss)

<IPython.core.display.Math object>

In [23]:
# calculate the loss value
# observe x
observed_x_num = 100
x_dim = 10
observed_x = torch.randn(observed_x_num, x_dim)

# calculate the loss given observed x
total_loss.eval({'x': observed_x})

tensor(-18.9965)

As shown above, we can define loss flexibly wth arithemtic operations between the Pixyz Loss API.  
We can convert formulas to codes easily and intuitively with Pixyz Loss API.

### Loss API(ELBO)
Pixyz Loss API has `ELBO` loss class.  

Evidence Lower Bound ELBO: https://docs.pixyz.io/en/latest/losses.html#lower-bound

### Next Tutorial
ModelAPITutorial.ipynb