# analytical approach

- analytically tractable problem: Gaussian prior, Gaussian proposal, _linear-Gaussian likelihood_
- analytically tractable MDN: linear-affine network
- analytically tractable gradients and closed-form solution for MDN parameters for given dataset


TO DO:
- check gradients again (and again and again...)
- analytical division for actual CDELDI posterior estimates (vs. 'proposal-posterior' estimates atm)


prior: 

$p(\theta) = \mathcal{N}(\theta \ | \ 0, \eta^2)$

proposal prior: 

$\tilde{p}(\theta) = \mathcal{N}(\theta \ | \ \nu, \xi^2)$

simulator: 

$p(x \ | \ \theta) =  \mathcal{N}(x \ | \ \theta, \sigma^2)$

analytic posteriors: 

$p(\theta \ | \ x) = \mathcal{N}(\theta \ | \frac{\eta^2}{\eta^2 + \sigma^2} x, \eta^2 - \frac{\eta^4}{\eta^2 + \sigma^2})$ 

$\tilde{p}(\theta \ | \ x) = \mathcal{N}(\theta \ | \frac{\xi^2}{\xi^2 + \sigma^2} x + \frac{\sigma^2}{\xi^2 + \sigma^2} \nu, \xi^2 - \frac{\xi^4}{\xi^2 + \sigma^2})$

Data:

$(x_n, \theta_n) \sim p(\theta) p(x \ | \ \theta) = \mathcal{N}( (x_n, \theta_n) \ | \ (0, \nu), 
\begin{pmatrix}
\xi^{2} + \sigma^{2} &  \xi^{2}  \\
\xi^{2} & \xi^{2}  \\
\end{pmatrix})$

Loss: 

$ \mathcal{L}(\phi) = \sum_n \frac{{p}(\theta_n)}{\tilde{p}(\theta_n)} K_\epsilon(x_n | x_0) \ \log q_\phi(\theta_n | x_n)$

Model: 

$ q_\phi(\theta_n | x_n) = \mathcal{N}(\theta_n \ | \ \mu_\phi(x_n), \sigma^2_\phi(x_n))$

$ (\mu_\phi(x), \sigma^2_\phi(x)) = MDN_\phi(x) = \begin{pmatrix} \beta \\ 0 \end{pmatrix} x + \begin{pmatrix} \alpha \\ \gamma^2 \end{pmatrix}$

Gradients: 

$\mathcal{N}_n := \mathcal{N}(x_n, \theta_n \ | \ \mu_y, \Sigma_y)$

$\Sigma_y = 
\begin{pmatrix}
\epsilon^2  &  0  \\
0 & \left( \eta^{-2} - \xi^{-2} \right)^{-1}  \\
\end{pmatrix}$


$\mu_y = \begin{pmatrix} x_0  \\ \frac{\eta^2}{\eta^2 + \xi^2}\nu \end{pmatrix}$

$\frac{\partial{}\mathcal{L}}{\partial{}\alpha} = -2 \sum_n \mathcal{N}_n \frac{\theta_n - \mu_\phi(x_n)}{\sigma^2_\phi(x_n)}$

$\frac{\partial{}\mathcal{L}}{\partial{}\beta} = -2 \sum_n \mathcal{N}_n \frac{\theta_n - \mu_\phi(x_n)}{\sigma^2_\phi(x_n)} x_n$

$\frac{\partial{}\mathcal{L}}{\partial{}\gamma^2} 
= \sum_n \mathcal{N}_n \left( \frac{1}{\sigma^2_\phi(x_n)} 
- \frac{\left(\theta_n - \mu_\phi(x_n) \right)^2}{\sigma^4_\phi(x_n)} \right) 
= \frac{1}{\gamma^2} \sum_n \mathcal{N}_n \left( 1 
- \frac{\left(\theta_n - \mu_\phi(x_n) \right)^2}{\gamma^2} \right) $

Optima: 

$\hat{\alpha} = 
\frac{\sum_n \mathcal{N}_n \left(\theta_n - \frac{\sum_m \mathcal{N}_m \theta_m x_m}{\sum_m \mathcal{N}_m x_m^2} x_n \right)}{\sum_n \mathcal{N}_n - \frac{\left( \sum_n \mathcal{N}_n x_n \right)^2}{\sum_n \mathcal{N}_n x_n^2}}$

$\hat{\beta} = 
\frac{\sum_n \mathcal{N}_n \theta_n x_n - \hat{\alpha} \sum_n \mathcal{N}_n x_n}{\sum_n \mathcal{N}_n x_n^2}$

$\hat{\gamma}^2 = 
\frac{\sum_n \mathcal{N}_n \left( \theta_n - \hat{\alpha} - \hat{\beta} x_n \right)^2}{\sum_n \mathcal{N}_n}$

# define problem setup

- proposals narrower than the priors seem to induce some form of bias on MLE estimates (non-SVI fits), especially for posterior variances
- up to a certain degree, this can be negated by switching to students-t proposals (with few degrees of freedom)
- apart from bias, even switching to students-t distributions results in increased variance of the estimates (probably some ESS thing?)


- generally speaking, the real challenge for SNPE arises when $\sigma^2 << \eta^2$, i.e. when the likelihood (and hence posterior) are much narrower than the prior
- in a multi-round SNPE fit, the proposal will tend to be much narrower than the prior after several rounds
- the 'relevant' (relative) proposal width thus is the one that corresponds to the posterior variance


- for a simple setup that can be largely 'fixed' with students-t proposals choose $\sigma^2 = \eta^2$.

- intermediate setup:  $\sigma^2 = 9 \cdot \eta^2$ (i.e. posterior variance is $10\%$ of prior variance)

- to watch Rome burn, try  $\sigma^2 = 99 \cdot \eta^2$

In [None]:
%%capture 
import util
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

import delfi.distribution as dd
import delfi.generator as dg
import delfi.inference as infer
import delfi.summarystats as ds
from delfi.simulator.Gauss import Gauss


## problem setup ##

n_params = 1

assert n_params == 1 # cannot be overstressed: everything in this notebook goes downhill sharply otherwise

sig2 = 1.0/9. # likelihood variance
eta2 = 1.0     # prior variance
eps2 = 1e20    # calibration kernel width (everything above a certain threshold will be treated as 'uniform')

# pick observed summary statistics
x0 = 0.8 * np.ones(1) #_,obs = g.gen(1) 

## simulation setup ##

n_fits = 200  # number of MLE fits (i.e. dataset draws), each single-round fits with pre-specified proposal!
N      = 300  # number of simulations per dataset

# set proposal priors (one per experiment)


ksi2s = np.exp(np.linspace(-5, -0.01, 10)) * eta2  # proposal variance
nus = np.zeros_like(ksi2s) #eta2/(eta2+sig2)*x0[0]* np.ones(len(ksi2s))              # proposal mean

n_bins = 50 # number of bins for plotting


res = {'normal'  : np.zeros((len(ksi2s), n_fits,5)),
       't_df10'  : np.zeros((len(ksi2s), n_fits,5)),
       't_df3'   : np.zeros((len(ksi2s), n_fits,5)),
       'cdelfi'  : np.zeros((len(ksi2s), n_fits,5)),
       'unif_w1' : np.zeros((len(ksi2s), n_fits,5)),
       'unif_w6' : np.zeros((len(ksi2s), n_fits,5)),
       'sig2' : sig2,
       'eta2' : eta2,
       'eps2' : eps2,
       'ksi2s' : ksi2s,
       'nus' : nus,
       'x0' : x0,
      }

# SNPE (Gaussian proposals) 
## multiple fits, with different proposal widths
- n_fits many runs, every run with different seed, i.e. different data set $\{(x_n, \theta_n)\}_{n=1}^N$
- collect statistics on mean and std. of estimated posterior mean and variance (bias?)
- outer loop over different proposal distribution variances (can in principle also move means)
  
Keep in mind that for large enough $N$, in theory SNPE should return ground-truth posteriors irrespective of the proposal prior (cf. Kaan's proof in JakobsNotes.pdf)

In [None]:
proposal_form = 'studentT'
df = 3

out_snpe  = res['t_df' + str(df)]
ksi2, nu = 0.5, 0.
util.test_setting(out_snpe, n_params, N, sig2, eta2, eps2, x0, [ksi2], [nu], proposal_form, track_rp=True, df=df);

# first-order analytic approximation to bias for students-t proposals 

In [None]:
import scipy as sp
gammaln = sp.special.gammaln

def gmom(m, mu, sig2):    
    # first few non-central moments of Gaussian distribution 
    # (generally involves confluent hypergeometric function/Hermite polynomials to solve)
    
    if m==0:
        return 1.
    if m ==1:
        return mu
    if m==2:
        return mu**2 + sig2
    if m==3:
        return mu*(mu**2 + 3*sig2)
    if m==4:
        return mu**4 +  6*mu**2*sig2 +   3      *sig2**2
    if m==5:
        return mu**5 + 10*mu**3*sig2 +  15*mu   *sig2**2
    if m==6:
        return mu**6 + 15*mu**4*sig2 +  45*(mu*sig2)**2  +15*sig2**3
    if m==7:
        return mu**7 + 21*mu**5*sig2 + 105*mu**3*sig2**2+105*mu*sig2**3
    if m==8:
        return mu**8 + 28*mu**6*sig2 + 210*mu**4*sig2**2+420*mu**2*sig2**3+105*sig2**4
 
    return None

def comp_firstorder_approx_bias(eta2, ksi2, nu, mu, df, N):

    k,l = 2,1 # l = order for moments of weights, here we need E[w(x)^l * f(x)], l=1 for Ep[w*f]
    c = np.sqrt(2*np.pi*eta2/(k))*\
                np.exp( l*(gammaln(df/2)-gammaln(df/2+0.5)) 
               +l/2*np.log(ksi2)-k/2*np.log(eta2)
               +(l-k)/2*np.log(np.pi)-k/2*np.log(2)+l/2*np.log(df))
    m_, s2_,mu_ = -nu, eta2/k, mu-nu
    M = int(l*(df+1)/2)
    bf = [sp.special.binom(M,m)/(df*ksi2)**m \
          *(gmom(2*m+2,m_,s2_)-2*mu_*gmom(2*m+1,m_,s2_)+mu_**2*gmom(2*m,m_,s2_)) \
          for m in range(M+1)]

    Ep_f = nu**2 + eta2
    
    Ep_wf = c * np.sum(bf)

    bias = 1/N * (Ep_f - Ep_wf)
    
    return Ep_wf

N = 500
sig2 = 1/9.
eta2 = 1.0
ksi2, nu = 0.1, 1.0
df = 3
mu = -2.5


bias = comp_firstorder_approx_bias(eta2, ksi2, nu, mu, df, N)

print(bias)

# numerically verify approx. analytic bias against MCMC

In [None]:
from delfi.utils.progress import no_tqdm, progressbar
n_fits = 1000
res = np.zeros(n_fits)

# compute target solution
m = Gauss(dim=n_params, noise_cov=sig2)
p = dd.Gaussian(m=0. * np.ones(n_params), 
                S=eta2 * np.eye(n_params))
post   = dd.Gaussian(m = np.ones(n_params) * eta2/(eta2+sig2)*x0, 
                     S=eta2 - eta2**2 / (eta2 + sig2) * np.eye(n_params)) 
# compute proposal-posterior
postpr = dd.Gaussian(m = np.ones(n_params) * (ksi2/(ksi2+sig2)*x0 + sig2/(ksi2+sig2)*nu), 
                     S=ksi2 - ksi2**2 / (ksi2 + sig2) * np.eye(n_params))


# set up importance weight computation
eta2p = 1/(1/eta2 - 1/ksi2)
Sig_y = np.array([[eps2,0], [0,eta2p]])    
mu_y = np.array([ [x0], [eta2/(eta2-ksi2)*nu]])

s = ds.Identity()
pbar = progressbar(total=n_fits)
desc = 'repeated fits'
pbar.set_description(desc)
with pbar:
    for idx_seed in range(n_fits):

        #print( str(idx_seed) + '/' + str(n_fits) )

        # excessive fixating of random seeds
        seed = 42 + idx_seed
        ppr = dd.StudentsT(m=nu * np.ones(n_params), 
                           S=ksi2 * np.eye(n_params), # * (df-2.)/df,
                           dof=df,
                           seed=seed)    
        p = dd.Gaussian(m=0 * np.ones(n_params), 
                        S=eta2 * np.eye(n_params), # * (df-2.)/df,
                        seed=seed)    
        
        
        m = Gauss(dim=n_params, noise_cov=sig2, seed=seed)
        g = dg.Default(model=m, prior=p, summary=s)


        # gen data
        data = g.gen(N, verbose=False)
        params, stats = data[0].reshape(-1), data[1].reshape(-1)

        # 'fit' MDN
        normals = util.get_weights('studentT', eta2, ksi2, eps2, x0, nu, stats, params, df=df) 
        Ep_f = ((params-mu)**2).mean() #np.var(params)
        Ep_wf = np.mean( normals*(params-mu)**2)
        
        res[idx_seed] = Ep_wf #(1/N * (Ep_f - Ep_wf))
        pbar.update(1)

res.mean(), res.std()

# full scenario ($\mu = f(x)$)

In [None]:
import scipy as sp
gammaln = sp.special.gammaln

def gmom(m, mu, sig2):    
    # first few non-central moments of Gaussian distribution 
    # (generally involves confluent hypergeometric function/Hermite polynomials to solve)
    
    if m==0:
        return 1.
    if m ==1:
        return mu
    if m==2:
        return mu**2 + sig2
    if m==3:
        return mu*(mu**2 + 3*sig2)
    if m==4:
        return mu**4 +  6*mu**2*sig2 +   3      *sig2**2
    if m==5:
        return mu**5 + 10*mu**3*sig2 +  15*mu   *sig2**2
    if m==6:
        return mu**6 + 15*mu**4*sig2 +  45*(mu*sig2)**2  +15*sig2**3
    if m==7:
        return mu**7 + 21*mu**5*sig2 + 105*mu**3*sig2**2+105*mu*sig2**3
    if m==8:
        return mu**8 + 28*mu**6*sig2 + 210*mu**4*sig2**2+420*mu**2*sig2**3+105*sig2**4
 
    return None

def comp_firstorder_approx_bias_full(eta2, ksi2, nu, sig2, df, N, alpha, beta):

    
    k,l = 2,1 # l = order for moments of weights, here we need E[w(x)^l * f(x)], l=1 for Ep[w*f]
    
    # ratio of prior densities normalizers
    c = np.sqrt(2*np.pi*eta2/(k))*\
                np.exp( l*(gammaln(df/2)-gammaln(df/2+0.5)) 
               +l/2*np.log(ksi2)-k/2*np.log(eta2)
               +(l-k)/2*np.log(np.pi)-k/2*np.log(2)+l/2*np.log(df))

    # Gaussian moments
    m_, s2_,mu_ = -nu, eta2/k, -alpha/(beta-1)-nu
    
    ll_offset = (beta/(beta-1))**2 * sig2

    # order of binomial polynomial 
    M = int(l*(df+1)/2)
    bf = [(beta-1)**2 * sp.special.binom(M,m)/(df*ksi2)**m \
          *(gmom(2*m+2,m_,s2_)-2*mu_*gmom(2*m+1,m_,s2_)+(mu_**2+ll_offset)*gmom(2*m,m_,s2_)) \
          for m in range(M+1)]

    Ep_f = (beta-1)**2 * eta2 + beta**2 * sig2  + alpha**2
    
    Ep_wf = c * np.sum(bf)

    bias = 1/N * (Ep_f - Ep_wf)
    
    return bias

N = 500
sig2 = 1/9.
eta2 = 1.0
ksi2, nu = 0.1, 1.0
df = 5
alpha = 0.
beta = 0.5

bias = comp_firstorder_approx_bias_full(eta2, ksi2, nu, sig2, df, N, alpha, beta)

print(bias)


In [None]:
from delfi.utils.progress import no_tqdm, progressbar
n_fits = 1000
res = np.zeros(n_fits)

# compute target solution
m = Gauss(dim=n_params, noise_cov=sig2)
p = dd.Gaussian(m=0. * np.ones(n_params), 
                S=eta2 * np.eye(n_params))
post   = dd.Gaussian(m = np.ones(n_params) * eta2/(eta2+sig2)*x0, 
                     S=eta2 - eta2**2 / (eta2 + sig2) * np.eye(n_params)) 
# compute proposal-posterior
postpr = dd.Gaussian(m = np.ones(n_params) * (ksi2/(ksi2+sig2)*x0 + sig2/(ksi2+sig2)*nu), 
                     S=ksi2 - ksi2**2 / (ksi2 + sig2) * np.eye(n_params))


# set up importance weight computation
eta2p = 1/(1/eta2 - 1/ksi2)
Sig_y = np.array([[eps2,0], [0,eta2p]])    
mu_y = np.array([ [x0], [eta2/(eta2-ksi2)*nu]])

s = ds.Identity()
pbar = progressbar(total=n_fits)
desc = 'repeated fits'
pbar.set_description(desc)
with pbar:
    for idx_seed in range(n_fits):

        #print( str(idx_seed) + '/' + str(n_fits) )

        # excessive fixating of random seeds
        seed = 42 + idx_seed
        ppr = dd.StudentsT(m=nu * np.ones(n_params), 
                           S=ksi2 * np.eye(n_params), # * (df-2.)/df,
                           dof=df,
                           seed=seed)    
        p = dd.Gaussian(m=0 * np.ones(n_params), 
                        S=eta2 * np.eye(n_params), # * (df-2.)/df,
                        seed=seed)    
        
        m = Gauss(dim=n_params, noise_cov=sig2, seed=seed)
        g = dg.Default(model=m, prior=p, summary=s)

        # gen data
        data = g.gen(N, verbose=False)
        params, stats = data[0].reshape(-1), data[1].reshape(-1)

        # 'fit' MDN
        normals = util.get_weights('studentT', eta2, ksi2, eps2, x0, nu, stats, params, df=df) 
        Ep_f = ((params-alpha-beta*stats)**2).mean() #np.var(params)
        Ep_wf = np.mean( normals*(params-alpha-beta*stats)**2)
        
        res[idx_seed] = (1/N * (Ep_f - Ep_wf))
        pbar.update(1)

res.mean(), res.std()

In [None]:
import seaborn

eta2 = 1.
x0 = 0.8
dfs = [3,5]
sig2s = [1./99, 1./9, 1.] # likelihood variance
Ns      = [30, 100, 300, 1000, 3000]  # number of simulations per dataset
ksi2s = np.array(np.exp(np.log(10) * np.linspace(-2,-.00001,50))) * eta2  # proposal variance

clrs = np.array([[0,1,0]]) * np.linspace(0.1, 0.9, len(Ns)).reshape(-1,1)
plt.figure(figsize=(12,8))        

for l in range(len(dfs)):
    df = dfs[l]
    biases = np.zeros((len(sig2s), len(ksi2s), len(Ns)))
    for k in range(len(sig2s)):

        sig2 = sig2s[k]
        nu = eta2/(eta2+sig2)*x0

        for i in range(len(Ns)):

            N = Ns[i]

            for j in range(len(ksi2s)):    

                ksi2 = ksi2s[j]

                biases[k,j,i] = comp_firstorder_approx_bias_full(eta2, ksi2, nu, sig2, df, N, alpha, beta)

    ksi2s_MC = np.array([0.01, 0.1, 0.5, 0.999]) * eta2  # proposal variance

    for k in range(len(sig2s)):
        plt.subplot(len(dfs)+1,len(sig2s),k+1+(l+1)*len(sig2s))
        sig2 = sig2s[k]
        gt = eta2 - eta2**2 / (eta2 + sig2s[k])
        for i in range(len(Ns)):
            N = Ns[i]
            plt.semilogx(ksi2s, gt + biases[k,:,i], '-', color=clrs[i])

        for i in range(len(Ns)):
            N = Ns[i]
            tmp = np.load('res_analytic_n_fits' + str(1000) + '_N' + str(N) +'_postVar' + str(np.int(np.round(1/sig2)))+'.npy')[()]        
            out_snpe = tmp['t_df' + str(int(df))]
            m = out_snpe[:,:,1].mean(axis=1)
            plt.semilogx(ksi2s_MC, m, 'o--', color=clrs[i], markersize=4, alpha=0.3)    


            gt = eta2 - eta2**2 / (eta2 + sig2)
            plt.semilogx(ksi2s_MC, gt*np.ones_like(ksi2s_MC), 'k--', linewidth=2)
            

        plt.axis([0.95*ksi2s[0], 1.05*ksi2s[-1], 0, 1.1*gt ])

    plt.subplot(len(dfs)+1,len(sig2s),1+(l+1)*len(sig2s))
    plt.ylabel('df =' + str(int(dfs[l])))
    
for i in range(len(sig2s)):

    sig2 = sig2s[i]

    clrs = np.array([[0,1,0]]) * np.linspace(0.1, 0.9, len(Ns)).reshape(-1,1)
    clrs2 = np.linspace(0.7, 0.95, len(sig2s)).reshape(-1,1) * np.array([[1.0,0.4,0.4]])

    plt.subplot(len(dfs)+1,len(sig2s),i+1)
    th = np.linspace(-3, 3, 300).reshape(-1,1)
    plt.plot(th,  p.eval(th, log=False), color='k', linewidth=2.0)
    clr = clrs2[i]
    post   = dd.Gaussian(m = np.ones(n_params) * eta2/(eta2+sig2)*x0, 
                         S=eta2 - eta2**2 / (eta2 + sig2) * np.eye(n_params)) 
    l = dd.Gaussian(m=x0 * np.ones(n_params),
                    S=sig2 * np.eye(n_params))    
    plt.plot(th, l.eval(th, log=False), '--', color=clr, alpha=0.3)
    plt.plot(th, post.eval(th, log=False), color=clr)
    plt.grid(False)
    plt.xticks([-3, -2, -1, 0, x0, 2, 3], [-3, -2, -1, 0, 'x0', 2, 3])
    plt.yticks([])
    plt.axis([-3.01, 3.01, 0, 3.0])
    plt.title('posterior var = ' + str(np.int(np.round(100/(1/sig2+1/eta2)))) + '% prior variance')

    
    for l in range(len(dfs)):

        plt.subplot(len(dfs)+1,len(sig2s),i+1+(l+1)*len(sig2s))
        plt.plot(1/(1/sig2+1/eta2)*np.ones(2), [0, 1.1*(eta2 - eta2**2 / (eta2 + sig2))], color=clrs2[i])

    if i == len(sig2s)//2: 
        plt.xlabel('proposal prior variance / prior variance') 
        
plt.subplot(len(dfs)+1,len(sig2s),1+2*len(sig2s))
plt.legend(['N = ' + str(n) for n in Ns], loc=3)

plt.savefig('bias_linear_approx_studentt_df3_5_withMCE.pdf')
plt.show()

# make summary figure

( loading Monte Carlo fits from disk ! )

In [None]:
import seaborn

eta2 = 1.
x0 = 0.8
dfs = [3,5]
sig2s = [1./99, 1./9, 1.] # likelihood variance
Ns    = np.asarray(np.exp(np.log(10)*np.linspace(1, 4, 50)),dtype=int)  # number of simulations per dataset
Ns_MC = [30, 100, 300, 1000, 3000]
ksi2s = np.array(np.exp(np.log(10) * np.linspace(-2,-.00001,3))) * eta2  # proposal variance

ksi2s_MC = np.array([0.01, 0.1, 0.999]) * eta2  # proposal variance
idx_ksi2_MC = [0,1,3]

clrs = np.array([[0,1,0]]) * np.linspace(0.1, 0.9, len(ksi2s)).reshape(-1,1)
plt.figure(figsize=(12,10))        

for l in range(len(dfs)):
    df = dfs[l]
    biases = np.zeros((len(sig2s), len(ksi2s), len(Ns)))
    for k in range(len(sig2s)):

        sig2 = sig2s[k]
        nu = eta2/(eta2+sig2)*x0

        for i in range(len(Ns)):

            N = Ns[i]

            for j in range(len(ksi2s)):    

                ksi2 = ksi2s[j]

                biases[k,j,i] = comp_firstorder_approx_bias_full(eta2, ksi2, nu, sig2, df, N, alpha, beta)

    for k in range(len(sig2s)):
        plt.subplot(len(dfs)+1,len(sig2s),k+1+(l+1)*len(sig2s))
        sig2 = sig2s[k]
        gt = eta2 - eta2**2 / (eta2 + sig2s[k])
        for i in range(len(ksi2s)):
            
            plt.loglog(Ns, -biases[k,i,:], '-', color=clrs[i], label=r'$\tilde{\sigma}^2=' + str(int(100*ksi2s[i]))+ '$')

        for i in range(len(ksi2s_MC)):
            for j in range(len(Ns_MC)):
                N = Ns_MC[j]
                try:
                    gt = eta2 - eta2**2 / (eta2 + sig2)
                    tmp = np.load('res_analytic_n_fits' + str(1000) + '_N' + str(N) +'_postVar' + str(np.int(np.round(1/sig2)))+'.npy')[()]        
                    out_snpe = tmp['t_df' + str(int(df))]
                    m = out_snpe[:,:,1].mean(axis=1)
                    ms = 6 if (2-i)==k else 4
                    plt.loglog(N, -(m[idx_ksi2_MC[i]]-gt), 'o', color=clrs[i], markersize=ms, alpha=1.)    
                except: 
                    pass

            #plt.semilogx(N, gt*np.ones_like(ksi2s_MC), 'k--', linewidth=2)

        #plt.axis([0.95*ksi2s[0], 1.05*ksi2s[-1], 0, 1.1*gt ])

    plt.subplot(len(dfs)+1,len(sig2s),1+(l+1)*len(sig2s))
    plt.ylabel('df =' + str(int(dfs[l])))
    
for i in range(len(sig2s)):

    sig2 = sig2s[i]

    clrs = np.array([[0,1,0]]) * np.linspace(0.1, 0.9, len(Ns)).reshape(-1,1)
    clrs2 = np.linspace(0.7, 0.95, len(sig2s)).reshape(-1,1) * np.array([[1.0,0.4,0.4]])

    plt.subplot(len(dfs)+1,len(sig2s),i+1)
    th = np.linspace(-3, 3, 300).reshape(-1,1)
    plt.plot(th,  p.eval(th, log=False), color='k', linewidth=2.0)
    clr = clrs2[i]
    post   = dd.Gaussian(m = np.ones(n_params) * eta2/(eta2+sig2)*x0, 
                         S=eta2 - eta2**2 / (eta2 + sig2) * np.eye(n_params)) 
    l = dd.Gaussian(m=x0 * np.ones(n_params),
                    S=sig2 * np.eye(n_params))    
    plt.plot(th, l.eval(th, log=False), '--', color=clr, alpha=0.3)
    plt.plot(th, post.eval(th, log=False), color=clr)
    plt.grid(False)
    plt.xticks([-3, -2, -1, 0, x0, 2, 3], [-3, -2, -1, 0, 'x0', 2, 3])
    plt.yticks([])
    plt.axis([-3.01, 3.01, 0, 3.0])
    plt.title('posterior var = ' + str(np.int(np.round(100/(1/sig2+1/eta2)))) + '% prior variance')

    #for l in range(len(dfs)):
    #   plt.subplot(len(dfs)+1,len(sig2s),i+1+(l+1)*len(sig2s))
    #   plt.plot(1/(1/sig2+1/eta2)*np.ones(2), [0, 1.1*(eta2 - eta2**2 / (eta2 + sig2))], color=clrs2[i])
        
plt.subplot(len(dfs)+1,len(sig2s),1+1*len(sig2s))
plt.legend(loc=1)

plt.subplot(len(dfs)+1,len(sig2s),2+2*len(sig2s))
plt.xlabel('N')

#plt.savefig('bias_linear_approx_studentt_df3_5_withMCE.pdf')
plt.show()

# why does the approximation tend to underestimate the variances so much?
- $1/(1+x) \approx 1-x$ only for $|x| \approx 0$.
- in our setup, $x= w(\theta) > 0$, which can be pretty large as seen below

In [None]:
proposal_form = 'studentT'
df = 3

eta2, ksi2 = 1., 0.1 # prior and proposal-prior variances

nu,x0  = 0., 0.
params = np.linspace(-5,5,100).reshape(-1)
normals = util.get_weights(proposal_form, eta2, ksi2, eps2, x0, nu, stats, params, df=df) 

plt.plot(params, normals)
plt.plot(params, 5*dd.Gaussian(m=np.zeros(1),S=eta2*np.eye(1)).eval(params.reshape(-1,1),log=False))
plt.legend(['importance weights', 'prior (unnormalized)'])
plt.show()


# SNPE (students T proposals, df = 10) 
## multiple fits, with different proposal widths
- same as before, but with students T proposals instead of Gaussian proposals (same mean and variance)
- df $=10$ in 1D : roughly Gaussian, but with slightly stronger tails


In [None]:
proposal_form = 'studentT'
df = 10

out_snpe  = res['t_df' + str(df)]

util.test_setting(out_snpe, n_params, N, sig2, eta2, eps2, x0, ksi2s, nus, proposal_form, track_rp=True, df=df);

# SNPE (students T proposals, df = 3) 
## multiple fits, with different proposal widths
- same as before, but with students T proposals instead of Gaussian proposals (same mean and variance)
- df $=3$ in 1D : much stronger tails than Gaussian (need df>2 for well-defined student-T variance)


In [None]:
proposal_form = 'studentT'
df = 3

out_snpe  = res['t_df' + str(df)]

util.test_setting(out_snpe, n_params, N, sig2, eta2, eps2, x0, ksi2s, nus, proposal_form, track_rp=True, df=df);

# uniform proposals
- uniform proposals 'cut' the prior tails, directly influencing the target posterior
- the importance sampling on this narrower prior may however be easier to achieve
- can we balance damage to the posterior against better importance sampling and gain something overall ?

In [None]:
proposal_form = 'unif'
sds = 1
marg = np.sqrt(sds) * np.sqrt(12)
out_snpe  = res['unif_w'+str(sds)]

util.test_setting(out_snpe, n_params, N, sig2, eta2, eps2, x0, ksi2s, nus, proposal_form, track_rp=True, marg=marg);

# compare with CDELFI
- simply set track_rp = False to compare MLE solution with mean and variance of proposal-posterior
- however requires Gaussian proposals to directly compare with proposal-posterior
- if the proposal-posterior is good, so should be the prior-corrected result of the analytical division


In [None]:
proposal_form = 'normal'

util.test_setting(out_snpe, n_params, N, sig2, eta2, eps2, x0, ksi2s, nus, proposal_form, track_rp=False);

In [None]:

np.save('res_analytic_n_fits' + str(n_fits) + '_N' + str(N), res)


# numerical checks for gradients


In [None]:
N = 3
track_rp = True
proposal_form = 'normal'
df = None

nu = 0.
ksi2 = 0.5 * eta2

ppr = dd.Gaussian(m=nu * np.ones(n_params), 
                S=ksi2 * np.eye(n_params))
s = ds.Identity()
m = Gauss(noise_cov=sig2)
g = dg.Default(model=m, prior=ppr, summary=s)

post   = dd.Gaussian(m = np.ones(n_params) * eta2/(eta2+sig2)*x0[0], 
                     S=eta2 - eta2**2 / (eta2 + sig2) * np.eye(n_params)) 
    
seed = 42
g.model.seed = seed
g.prior.seed = seed
g.seed = seed

data = g.gen(N, verbose=False)
params, stats = data[0].reshape(-1), data[1].reshape(-1)

normals = util.get_weights(proposal_form, eta2, ksi2, eps2, x0, nu, stats, params, df=df) if track_rp else np.ones(N)/N


# numerically check $\frac{\partial}{\partial{}\alpha}$

- $\frac{\partial}{\partial\alpha}$ is being difficult here. Analytic solution $\hat{\alpha}$ still fails to numberically set the stated partial derivative $\frac{\partial\mathcal{L}}{\partial{}\alpha}(\hat{\alpha})$ to zero ...
- Obtained $\hat{\alpha}$ however are pretty much sensible though (correct 'ballpark')

In [None]:
alpha_hat = np.array(util.alpha(params, stats, normals))

gamma2_ = post.std**2
alphas = np.linspace(-0.09, -0.02, 100000)

beta_ = util.beta(params, stats, normals, alpha_hat)
out_hat = -2*(normals.reshape(-1,1) * (params.reshape(-1,1) - beta_ * stats.reshape(-1,1) - alpha_hat)/gamma2_).sum(axis=0)
out = -2*(normals.reshape(-1,1) * (params.reshape(-1,1) - beta_ * stats.reshape(-1,1) - alphas.reshape(1,-1))/gamma2_).sum(axis=0)
plt.plot(alphas, out)
plt.show()


# (numerical solution, analytical solution, derivate evaluated at analytical solution (should be zero-ish) ) = 
alphas[np.argmin(np.abs(out))], alpha_hat, out_hat

# numerically check $\frac{\partial}{\partial{}\beta}$

In [None]:
alpha_ = alpha_hat
gamma2_ = post.std**2
betas = np.linspace(0., 1., 1000)
out = -2*(normals.reshape(-1,1) * (params.reshape(-1,1) - betas.reshape(1,-1) * stats.reshape(-1,1) - alpha_)/gamma2_ * stats.reshape(-1,1)).sum(axis=0)
plt.plot(betas, out)
plt.show()

beta_hat = util.beta(params, stats, normals, ahat=alpha_)
out_hat = -2*(normals.reshape(-1,1) * (params.reshape(-1,1) - beta_hat * stats.reshape(-1,1) - alpha_)/gamma2_ * stats.reshape(-1,1)).sum(axis=0)

# (numerical solution, analytical solution, derivate evaluated at analytical solution (should be zero-ish) ) = 
betas[np.argmin(np.abs(out))], beta_hat, out_hat

# numerically check $\frac{\partial}{\partial{}\gamma^2}$

In [None]:
alpha_ = alpha_hat
beta_ = beta_hat
gamma2s = np.linspace(0.03, 0.04, 1000)

# something off with below (hard-coded...) gradients now. Outcommenting numerical solution for now!

tmp = (params.reshape(-1,1) - beta_*stats.reshape(-1,1) - alpha_)**2 / gamma2s.reshape(1,-1)
out = 1/gamma2s.reshape(-1,) * (normals.reshape(-1,1) * (1 - tmp)).sum(axis=0)

plt.plot(gamma2s, out)
plt.show()

gamma2_hat = util.gamma2(params, stats, normals, ahat=alpha_, bhat=beta_)
tmp_ = (params.reshape(-1,1) - beta_*stats.reshape(-1,1) - alpha_)**2 / gamma2_hat
out_hat = 1/gamma2_hat * (normals.reshape(-1,1) * (1 - tmp_)).sum(axis=0)

#gamma2s *= np.nan 
#out = np.zeros_like(gamma2s)

# (numerical solution, analytical solution, derivate evaluated at analytical solution (should be zero-ish) ) = 
gamma2s[np.argmin(np.abs(out))], gamma2_hat, out_hat