# Working with Time-Series Data in a Consistent Bayesian Framework
---

Copyright 2017 Michael Pilosov

Demonstration available at https://www.youtube.com/watch?v=rUIVcl64NXw

### Import Libraries
_(should be 2.7 and 3.x compatible) _

In [None]:
# Mathematics and Plotting
import numpy as np
from matplotlib import pyplot as plt
import scipy.stats as sstats
from scipy.stats import gaussian_kde as gkde
%matplotlib inline
plt.rcParams.update({'font.size': 14})
plt.rcParams['figure.figsize'] = 5, 5

# Interactivity
from ipywidgets import *

---
## Defining the Parameter to Observables (PtO) and Quantity of Interest (QoI) maps

---
Consider the Initival Value Problem (IVP)

$$
\begin{cases}
    \dot{u}(t) = -u(t), & t>0 \\
    u(0) = \lambda, & 
\end{cases}
$$

with solution $u(t;\lambda) = \lambda \,e^{-t}$.

Let $0<t_0<t_1<\ldots, t_K$ denote the observation times. 
Given a fixed initial condition (i.e., parameter) $\lambda$, let $y_k$ denote the set of (noisy) observations of the state variable $u(t_k,\lambda)$ for $k=0,1,\ldots, K$. 

We make the standard assumption of an additive error model with independent identically distributed noise, i.e., for each $k=0,1,\ldots,K$ and fixed value of $\lambda$, we assume that the Parameter-to-Observables (PtO) maps are given by

$$
O_k(\lambda) := u(t_k;\lambda) + \epsilon_k, \quad \epsilon_k \sim N(0,\sigma_k). 
$$

Assume that there is a true value of $\lambda$, which we denoted by $\lambda_0$, for which the observations $y_k:=O_k(\lambda_0)$ are given for $k=0,1,\ldots,K$.

Then, for any other value of $\lambda$ in the IVP above, we define the Quantity of Interest (QoI) as the **Weighted Sum Squared Error (a weighted 2-norm) between the observations and the model predictions**, i.e., we define the QoI map as

$$
    \boxed{Q(\lambda) := \sum_{k=0}^{K} \frac{(u(t_k;\lambda) - y_k) ^ 2}{\sigma_k^2}}
$$

We let $\mathcal{D} := Q(\Lambda)$ denote the space of all possible observations of mean squared error. 


---
## Formulating the Inverse Problem:
---
### Prior Information/Assumptions

* We assume that the true value $\lambda_0$ belongs to the parameter space defined by $\Lambda:= [0, 2]$.


* Prior to the data $\{y_k\}_{k=0}^K$ being available, any value of the parameter $\lambda$ in $\Lambda$ is assumed to be equally likely. In other words, we take $\pi^{prior}_\Lambda(\lambda)$ to be a uniform density.


### The Observed Density

* For the true value of $\lambda_0$, we have that $u(t_k;\lambda_0)-y_k = \epsilon_k$ for each $k$. Thus, the observed density on $\mathcal{D}$, denoted by $\pi^{obs}_{\mathcal{D}}(d)$, is given by a $\chi^2_{K+1}$ distribution.

### The Posterior Density

* Let $\pi^{O(prior)}_{\mathcal{D}}(d)$ denote the push-forward of the prior density onto $\mathcal{D}$. Then, the posterior density on $\Lambda$ is given by

$$
    \pi^{post}_\Lambda(\lambda) := \pi^{prior}_\Lambda(\lambda)\frac{\pi^{obs}_{\mathcal{D}}(Q(\lambda))}{\pi^{O(prior)}_{\mathcal{D}}(Q(\lambda))}
$$

---
## The numerical implementation and practical considerations
---
Here, we provide only a few brief remarks on the implementation.
For a step-by-step walkthrough, please see the CBayes_TS.ipynb file.
Below you will find an all-in-one version. 

***Some useful remarks go here.***

* In the `sandbox` function below, `T` is an interval of observation times



---

### Define some functions for the sandbox

In [None]:
def sandbox(num_samples = int(1E4), lam_bound = [3,6], lam_0=3.5, 
            t_0 = 0.1, Delta_t = 0.1, num_observations = 4, sd=1):
#            T=[0.1,1], uncertainty = 0.05, sd = 1):
    # NOTE this version only uses constant variances for the sake
    # of interactivity.
    sigma = sd*np.ones(num_observations)
    
    if num_observations == 1:
        print('K=0 specified, This is a single observation at t = %f.'%t_0)
        
    t = np.linspace(t_0, t_0 + Delta_t*(num_observations-1), num_observations)
    
    def Q_fun(lam,obs_data):
        predictions = lam*np.exp(-t)
        residuals = predictions - obs_data
        QoI = np.sum( (residuals/sigma)**2 )
        return QoI
    
    # Sample the Parameter Space
    a, b = lam_bound
    lam = np.random.uniform(a, b, size = (int(num_samples), 1) ) # standard uniform
    
    # Create observations
    obs_data = lam_0 * np.exp(-t) + np.random.randn(int(num_observations))*sigma
    
    # Map to Data Space
    D = np.zeros(int(num_samples))
    for i in range(int(num_samples)):
        D[i] = Q_fun(lam[i,:], obs_data)
    
#     print('dimensions :  lambda = ' + str(lam.shape) + '   D = ' + str(D.shape) )
    # Perform KDE to estimate the pushforward
    pf_dens = gkde(D) # compute KDE estimate of it
    # Specify Observed Measure - Uniform Density
    
    #obs_dens = sstats.uniform(0,uncertainty) # 1D only
    obs_dens = sstats.chi2(int(num_observations))
    
    # Solve the problem
    r = obs_dens.pdf(D) / pf_dens.evaluate(D) # vector of ratios evaluated at all the O(lambda)'s
    M = np.max(r)

    r = r[:,np.newaxis]
    eta_r = r[:,0]/M
    
    print('\tEntropy is %1.4e'%sstats.entropy( obs_dens.pdf(D), pf_dens.evaluate(D) ))
    
    res = 50;
    max_x = D.max();
    # Plot stuff
    plt.rcParams['figure.figsize'] = (18, 6)
    plt.figure()
    plt.subplot(1, 3, 1)
    x = np.linspace(-0.25, max_x, res)
    plt.plot(x, pf_dens.evaluate(x))
    plt.title('Pushforward of Prior')
    plt.xlabel('O(lambda)')
    
    plt.subplot(1, 3, 2)
    xx = np.linspace(0, max_x, res)
    plt.plot(xx, obs_dens.pdf(xx))
    plt.title('Observed Density')
    plt.xlabel('O(lambda)')

    plt.subplot(1, 3, 3)
    plt.scatter(lam, eta_r)
    # plt.plot(lam_accept, gkde(lam_accept))
    plt.scatter(lam_0, 0.05)
    plt.title('Posterior Distribution') #\nof Uniform Observed Density \nwith bound = %1.2e'%uncertainty)
    plt.xlabel('Lambda')
#     plt.title('$\eta_r$')
    # # OPTIONAL:
    # pr = 0.2 # percentage view-window around true parameter.
#     plt.xlim(lam0*np.array([1-pr,1+pr]))
    plt.xlim([a,b])
    plt.show()
    
#     return eta_r

---

# All-in-One Sandbox!
_Run the cells below to start experimenting_

In [None]:
interact_manual(sandbox, 
        num_samples = IntSlider(value=2500, 
            min=int(5E2), max=int(5E4), step=500, description='Samp. $N$ ='), 
        lam_bound = FloatRangeSlider(value=[3.0, 6.0], 
            min=2.0, max = 7.0, step=0.5, description='Param $\Lambda \in$'),
        lam_0 = FloatSlider(value=3.5, 
            min=2.0, max=7.0, step=0.1, description='IC: $\lambda_0$ ='), 
        t_0 = FloatSlider(value=0.5, 
            min=0.1, max=2, step=0.1, description='$t_0$ ='),
        Delta_t = FloatSlider(value=0.1, 
            min=0.05, max=0.5, step=0.05, description='$\Delta_t$ ='),
        num_observations = IntSlider(value=50, 
            min=1, max=100, description='Num. of Obs. ='), 
        T = FloatRangeSlider( value=[0.5, 2], min=0.1, max=7.5, step=0.1,
            description='$t\in [T_0, T]$:', orientation='horizontal',
            readout=True, readout_format='.1f'), 
        uncertainty = FloatSlider(value=0.01, 
            min=0.005, max=0.25, step=0.005, 
            description='Invert MSE $\leq$', readout_format='.3f'),
        sd = FloatSlider(value=0.1, 
            min=0.05, max=0.25, step=0.01, description='Constant $\sigma$:'));
plt.show()

In [None]:
num_samples = widgets.IntSlider(value=1000, continuous_update=False, orientation='vertical',
    min=int(5E2), max=int(5E4), step=500, description='$N$ :')

lam_bound = widgets.FloatRangeSlider(value=[0.0, 2.0], continuous_update=False, orientation='horizontal',
    min=-5.0, max = 5.0, step=0.25, description='Param: $\Lambda \in$')

lam0 = widgets.FloatSlider(value=1.0, continuous_update=False, orientation='horizontal',
    min=0.25, max=1.75, step=0.05, description='IC: $\lambda_0$')

def update_lam0_range(*args): # update ref lambda if lambda bound changes
    lam0.min = lam_bound.value[0]
    lam0.max = lam_bound.value[1]
lam_bound.observe(update_lam0_range, 'value')

dof = widgets.IntSlider(value=0, continuous_update=False, orientation='horizontal',
    min=0, max=50, description='d.o.f: $K$ =')

T = widgets.FloatRangeSlider( value=[0.5, 1], min=0.1, max=7.5, step=0.1, continuous_update=False,
    description='$t\in [T_0, T]$ :', orientation='horizontal',
    readout=True, readout_format='.1f')

uncertainty = widgets.FloatSlider(value=0.01, continuous_update=False, orientation='vertical',
    min=0.005, max=0.25, step=0.005, 
    description='$\epsilon$ :', readout_format='.3f')

sd = widgets.FloatSlider(value=1, continuous_update=False, orientation='vertical',
    min=0.15, max=1.85, step=0.05, description='$\sigma$ :')

lbl = widgets.Label("UQ Sandbox", disabled=False)
u1 = widgets.VBox([lbl, lam_bound, lam0, dof, T])
u2 = widgets.HBox([num_samples, uncertainty, sd])
# u3 = widgets.HBox([uncertainty, sd])
ui = widgets.HBox([u1, u2])
u1.layout.justify_content = 'center'
ui.layout.justify_content = 'center'


out = interactive_output(sandbox, {'num_samples': num_samples,
                        'lam_bound': lam_bound,
                        'lam0': lam0,
                        'dof': dof,
                        'T': T,
                        'uncertainty': uncertainty,
                         'sd': sd} )
display(ui, out)

---

### Suggestions

- Increase $N$ and watch the Pushforward of the Prior change/converge.
- If you broaden the standard deviation $\sigma$, we suggest to also broaden the bound on the parameter space $\Lambda$ in order to avoid voilating the predictability assumption.
- Notice the relationship between the bound on the interval we are inverting for the Mean Squared Error and the support of the posterior.
- The same happens as you increase $\sigma$.
- Change the initial condition $\lambda_0$ and watch the posterior distribution follow the slider.



- Fix the number of observations to 1 and change the interval over which the observation is being made (with $K=0$, the observation occurs only at $T_0$). Notice the diminishing returns as you wait to make your measurement. 
- Fix some interval and change the number of observations made during this time period.
- Fix a number of observations (several) and fix $T_0$ while changing $T$ to observe another example of diminshing returns.

### Observations

- Entropy barely changes as $\lambda_0$ moves around. Increases a bit near boundary of $\Lambda$ (likely due to predictability assumption being violated)
- For a wide time measurement window, entropy increases with the number of observations $K$ (d.o.f.)
- Widening $\Lambda$ decreases entropy, obviously enlarges $\mathcal{D}$, support of $P_\mathcal{D}$.
- If you narrow the window, the entropy decreases.
- As the window slides earlier in time, the entropy decreases.
- Higher MSE threshold ($\epsilon$, support of observed density) means higher entropy.
- Higher variance means higher entropy. We might run a suite of $\sigma$s MADS-style to study the robustness of a design. 
    - perhaps if we try to minimize entropy (maximize information gain), we look for designs that are less sensitive to the choice of $\sigma$s, which would **correspond to an experimental design that is robust to measurement uncertainty.**
- Increasing the number of samples $N$ increases entropy quite a bit. Would like to figure out a way to control for this? _Is it even right to be using `scipy.stats.entropy`?_

In [None]:
display(ui, out)

In [None]:
np.exp(-1)*2