# Exercise - Estimating Mean and Standard Deviation of Normal Distribution with Pyro

## Table of Contents
* [Introduction](#Introduction)
* [Requirements](#Requirements) 
  * [Knowledge](#Knowledge)
  * [Modules](#Python-Modules)
* [Data](#Data)
* [Working with Pyro](#Working-with-Pyro)
 * [The Model](#The-Model)
 * [The Guide](#The-Guide)
 * [Stochastic Variational Inference - SVI](#Stochastic-Variational-Inference---SVI)
* [Exercise - Estimate Tau (and Mean)](#Exercise---Estimate-Precision-and-Mean)
* [Literature](#Literature)
* [Licenses](#Licenses)

## Introduction

"Pyro is a universal probabilistic programming language (PPL) written in Python and supported by PyTorch on the backend. Pyro enables flexible and expressive deep probabilistic modeling, unifying the best of modern deep learning and Bayesian modeling." ([https://pyro.ai/](https://pyro.ai/)).

In this exercise you will use Pyro to estimate the parameters of a normal distribution.


In order to detect errors in your own code, execute the notebook cells containing `assert` or `assert_almost_equal`.

## Requirements

### Knowledge

#### Theory

All *Pyro*-exercises are intended as part of the course [Bayesian Learning](https://dev.deep-teaching.org/courses/bayesian-learning). Therefore work through the course up to and including chapter [Probabilistic Progrmaming](https://dev.deep-teaching.org/courses/bayesian-learning#probabilistic-programming).


#### Pyro

* The official Tutorial:
    * https://pyro.ai/examples/intro_part_i.html
    * https://pyro.ai/examples/intro_part_ii.html
    * https://pyro.ai/examples/svi_part_i.html

### Python Modules

In [None]:
import numpy as np

import scipy.stats
from scipy.stats import norm

from matplotlib import pyplot as plt
from IPython.core.pylabtools import figsize

%matplotlib inline

In [None]:
import torch
from torch.distributions import constraints

import pyro
import pyro.infer
import pyro.optim as optim
import pyro.distributions as dist

## Data

Our observed data comes from a normal distribution:

Data:
$$
 X \sim \mathcal N(\mu, \frac{1}{\tau})
$$


Probability Density Function:
$$
p(X \mid \mu, \tau) = \sqrt{\frac{\tau}{2\pi}} \exp\left( -\frac{\tau (X-\mu)^2 }{2} \right)
$$

with  
- $\mu$: mean
- $\sigma^2$: variance
- $\tau =\frac{1}{\sigma^2}$ : precision

In [None]:
dtype=torch.float32

In [None]:
torch.manual_seed(101);
np.random.seed(10)

In [None]:
# generate observed data
N = 10
mu_ = 10.
sigma_=2.
X = np.random.normal(mu_, sigma_, N)
X = np.array(X, dtype=np.float32)

In [None]:
X

In [None]:
x = np.arange(3,18,0.01)
p_x = scipy.stats.norm.pdf(x, loc=mu_, scale=sigma_)
plt.plot(x, p_x, label="true distribution")
plt.plot(X, np.zeros_like(X), "ro", label="observed data")
plt.title("")
plt.xlabel("x")
plt.ylabel("p(x)")
plt.legend();

## Working with Pyro

### The Model

We build the following model with pyro

- We use the generated data $X \sim \mathcal N(\mu, \sigma^2)$ as observed data.
- We use a Uniform prior for the mean $\mu$:
    * $\mu \sim \text{Uniform}(-25,25)$
- We use a constant $\tau=1/4$ for the precision.
    * Note: This has to be a `torch.tensor` object

In [None]:
def model(X):
    # Prior
    mu = pyro.sample("mu", dist.Uniform(torch.tensor(-25.), torch.tensor(+25.))) 
    tau = torch.tensor(1/4)

    # Observation
    # pyro.plate creates a loop through x
    with pyro.plate("data_loop", size=len(X)):
        sample = pyro.sample("gaussian_data", dist.Normal(mu, 1/torch.sqrt(tau)), obs=X)
    
    return sample

### The Guide

Next we implement the "Guide", which we will later on use in conjuction with our model for stochacstic variational inference (`pyro.infer.SVI()`).

We use as variational distribution also a Gaussian.
$$
\mu \sim \mathcal N(mean_{\mu}, scale_{\mu}^2)
$$

In [None]:
### same arguments for guide and model !!!
def guide(X):
    mean_loc = torch.randn((1)) 
    # note that we initialize the scale to be pretty narrow
    mean_scale = 0.01 * torch.tensor(0.01)
    mu_loc = pyro.param("guide_mu_mean", mean_loc)
    mu_scale = pyro.param("guide_mu_scale", mean_scale, constraint=constraints.positive)
    # note the same name "mu" here as in our model
    mu = pyro.sample("mu", dist.Normal(mu_loc, mu_scale)) 


### Stochastic Variational Inference - SVI

Now we optimize the variational parameters, i.e. find values for $mean_{\mu}, scale_{\mu}$

In [None]:
pyro.clear_param_store()

adam_params = {"lr": 0.003, "betas": (0.95, 0.999)}
optimizer = optim.Adam(adam_params)

svi = pyro.infer.SVI(model=model,
                     guide=guide,
                     optim=optimizer,
                     loss=pyro.infer.Trace_ELBO())

In [None]:
### to keep track of our loss history
losses = []

### convert observed data to a torch tensor object
X_ = torch.tensor(X, dtype=dtype)

### training / inference
for t in range(10000):
    ### svi.step takes same parameters as inpust as our defined model(X) and guide(X) function
    loss = svi.step(X_)
    losses.append(loss)
    ### for monitoring
    if t%100==0:
        print (t, "\t", loss)

In [None]:
### Let us plot the costs / iteration curve

plt.xlabel("# iteration")
plt.ylabel("MC-Estimate of ELBO")
plt.plot(range(len(losses)), losses)

In [None]:
# Adjust the strings according to your names for
# the parameters "mu_mean", etc...
mu_mean_param = pyro.param("guide_mu_mean")
mu_scale_param = pyro.param("guide_mu_scale")
mu_mean_param, mu_scale_param

In [None]:
plt.figure(figsize=(12,4))

mu_mean = mu_mean_param.detach().numpy()
mu_scale = mu_scale_param.detach().numpy()

x = np.arange(5,15,0.01)
p_mu = scipy.stats.norm.pdf(x, loc=mu_mean, scale=np.sqrt(mu_scale))
ax = plt.subplot(121)
ax.plot(x, p_mu)
ax.set_xlabel("$\mu$")
ax.set_ylabel("q($\mu$)")
ax.set_title("Mean: q($\\mu$)")
print("true mu: ", mu_)

## Exercise - Estimate Precision and Mean

**Task:**

Extend the model and the Guide by using additionally a variational distribution for $\tau$:
- Use a Uniform distribution for $\tau \sim \text{Uniform}(0.01, 2)$
- Use a Gamma distribution as variational distribution for $\tau$: $\text{Gamma}(a, b)$
- Find the parameters $a$ (`guide_tau_concentration`), $b$ (`guide_tau_rate`) (and $mean_{\mu}, scale_{\mu}$) via optimization.


If your extensions are correct, executing the cells at the end should plot figures similar to these:

<img src="https://gitlab.com/deep.TEACHING/educational-materials/raw/master/media/klaus/exercise-mean-field-approximation-simple-gaussian-plot.png" width="768" alt="internet connection needed">

In [None]:
def model_with_tau(X):
    
    ######################
    ### Your Code here ###
    ######################

    
    return

In [None]:
def guide_with_tau(X):
    
    ######################
    ### Your Code here ###
    ######################

    
    return

In [None]:
### Initilize pyro.infer.SVI object

######################
### Your Code here ###
######################

In [None]:
### Training

######################
### Your Code here ###
######################

In [None]:
# Adjust the strings according to your names for
# the parameters "mu_mean", etc...
mu_mean_param = pyro.param("guide_mu_mean")
mu_scale_param = pyro.param("guide_mu_scale")
mu_mean_param, mu_scale_param

In [None]:
# Adjust the strings according to your names for
# the parameters "mu_mean", etc...
tau_concentration_param = pyro.param("guide_tau_concentration")
tau_rate_param = pyro.param("guide_tau_rate")
tau_concentration_param, tau_rate_param

In [None]:
plt.figure(figsize=(12,4))

mu_mean = mu_mean_param.detach().numpy()
mu_scale = mu_scale_param.detach().numpy()

x = np.arange(5,15,0.01)
p_mu = scipy.stats.norm.pdf(x, loc=mu_mean, scale=np.sqrt(mu_scale))
ax = plt.subplot(121)
ax.plot(x, p_mu)
ax.set_xlabel("$\mu$")
ax.set_ylabel("q($\mu$)")
ax.set_title("Mean: q($\\mu$)")
print("true mu: ", mu_)

tau_concentration =tau_concentration_param.detach().numpy()
tau_rate = tau_rate_param.detach().numpy()

x = np.arange(0,1,0.01)
p_tau = scipy.stats.gamma.pdf(x, a=tau_concentration, scale=1/tau_rate)
ax = plt.subplot(122)
ax.plot(x, p_tau)
ax.set_xlabel("$\\tau$")
ax.set_ylabel("q($\\tau$)")
ax.set_title("Precision: q($\\tau$)")
print("true tau: ", 1/sigma_**2)


## Licenses

### Notebook License (CC-BY-SA 4.0)

*The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).*

Exercise - Pyro Simple Gaussian <br/>
by Christian Herta<br/>
is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/).<br/>
Based on a work at https://gitlab.com/deep.TEACHING.


### Code License (MIT)

*The following license only applies to code cells of the notebook.*

Copyright 2019 Christian Herta

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.