## regression-adjustment does not capture the posterior shape beyond mean and variance

Regression-adjustment fits mean $\mu(x)$ and variance $\sigma^2(x)$ of the conditional distribution $p(\theta |x )$ as a function of $x$.

Unlike DELFI methods, it does not return $p(\theta | x_0) = \mathcal{N}(\theta | \mu(x_0), \sigma^2(x_0))$ as its posterior estimate.
Rather, it adjusts the sampled parameters $\theta_i$ through a locally linear-affine transformation:

$\hat{\theta}_i = \mu(x_0) + (\theta_i - \mu(x_i)) \frac{\sigma(x_0)}{\sigma(x)}$

The returned posterior is given as the set of adjusted parameters $\{ \hat{\theta}_i \}_i$. 

### Question: Is the full shape of the adjusted samples always meaningful? 

Below is a simple counter-example with binary summary statistics $x \in \{0, 1\}$ to demonstrate that regression-adjustment posteriors cannot necessarily be interpreted beyond their mean and variance. 

The case of binary summary statistic is nice because every drawn $\theta_i$ is either a valid draw from the true posterior ($x_i=x_0$) or not ($x_i\neq x_0$).

prior:

$p(\theta) = \mathcal{N}(0,1)$

likelihood: 

$P(x=1 \ | \ \theta) = 1$   if   $\theta\geq 0$ 

$P(x=1 \ | \ \theta) = 0$   if   $\theta<0$ 

The posterior $p(\theta | x_0)$ is a truncated Gaussian. 

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from delfi import distribution as dd

# define likelihood

def simulator(theta):
    return (theta >= 0) * 1.

# define prior
prior = dd.Gaussian(m = np.zeros(1), 
                     S =np.eye(1))

# draw artificial dataset
th = prior.gen(100000)
x = simulator(th)

# compute conditional probabilities
th_x_0 = th[x==0] # all samples from p(\theta | x=0)
th_x_1 = th[x==1] # all samples from p(\theta | x=1)

# get posterior samples 
x_0 = 0
s_post = th[x==x_0].copy()

# train linear-Gaussian regression model: 
def mu(x):
    if x == 0:
        return th_x_0.mean()
    elif x==1:
        return th_x_1.mean()
    
def sig(x): 
    if x == 0:
        return th_x_0.std()
    elif x==1:
        return th_x_1.std()

# do regression-adjustment    
th_reg_adj_0 = mu(0) + (th_x_0 - mu(0)) * sig(1)/sig(1) # nothing to do 
th_reg_adj_1 = mu(0) + (th_x_1 - mu(1)) * sig(0)/sig(1) 
    
# collect regression-adjusted samples
th_reg_adj = np.zeros_like(th) 
th_reg_adj[x==0] = th_reg_adj_0
th_reg_adj[x==1] = th_reg_adj_1

plt.figure(figsize=(12,12))

# plot raw samples with mean +/- 1 std 
plt.subplot(2,2,1)
x_plot = x+np.random.normal(size=x.size).reshape(x.shape)/50 # add some jitter to x for visualization
plt.plot(x_plot, th, 'ko')
for x_ in [0,1]:
    plt.plot(x_, mu(x_), 'ro')
    plt.plot([x_,x_], sig(x_)*np.array([-1,1])+mu(x_), 'r-')
plt.axis([-1, 2, -5, 5])
plt.xlabel('x (+ jitter for visibility)')
plt.ylabel('theta')
plt.title('conditional distribution p( theta | x )')
plt.xticks([0,1])

# plot real posterior
plt.subplot(2,2,2)
plt.hist(s_post, bins=np.linspace(-4, 2, 61))
plt.title('actual posterior p( theta | x_0 ) for x_0 = 0')
plt.xlabel('theta')
plt.yticks([])

# plot regression-adjusted samples with mean +/- 1 std
plt.subplot(2,2,3)
plt.plot(x_plot, th_reg_adj, 'ko')
for x_ in [0,1]:
    plt.plot(x_, mu(0), 'mo')
    plt.plot([x_,x_], sig(0)*np.array([-1,1])+mu(0), 'm-')
plt.axis([-1, 2, -5, 5])
plt.xlabel('x (+ jitter for visibility)')
plt.ylabel('theta')
plt.title('regression-adjusted samples')
plt.xticks([0,1])

# plot regression-adjusted posterior
plt.subplot(2,2,4)
plt.hist(th_reg_adj, bins=np.linspace(-4, 2, 61))
plt.title('regression-adjusted posterior')
plt.xlabel('theta')
plt.yticks([])
plt.show()


print(' mean and variance of the regression-adjusted posterior are correct by construction: ')

print('true posterior mean: ', s_post.mean())
print('true posterior std.: ', s_post.std())

print('regr.adj. posterior mean: ', th_reg_adj.mean())
print('regr.adj. posterior std.: ', th_reg_adj.std())


# What went wrong?

The real posterior is asymmetric and its support is bounded from above by $\theta \leq 0$.

The regression-adjusted posterior is symmetric and has infinite support. 

At heart, regression adjustment assumes that one can transform draws $\theta_i \sim p(\theta | x_i)$ from the conditional distribution for any $x_i$ into valid draws from the real posterior $p(\theta | x_0)$ simply by ensuring that all these distribtions have the same mean and variance.

This is true if $p(\theta | x)$ for different $x$ all lie in the same location–scale family. 

Otherwise, it is not true in general. 

In the above example, $p(\theta | x=0)$ is (proportional to) the negative half of a Gaussian, and $p(\theta | x=1)$ the positive half. 
These two distributions are not within the same location-scale family. 

In the above case with binary $x$, the regression-adjusted samples are sampled from a mixture of $p(\theta | x=0)$ and a 'adjusted' version of $p(\theta | x=1)$ (with mixture weight $E[\theta > 0] = \frac{1}{2}$). 
More generally for continuous $x$, the regression-adjusted samples are drawn from a complex compound distribution that mixes 'adjusted' $p(\theta |x)$ for all $x$ with $|| x_0 - x || < \epsilon$. 
Again, unless all $p(\theta | x)$ for different $x$ all lie in the same location–scale family, things can go wrong. 

Note that we are talking about *ground-truth* conditionals $p(\theta |x)$ here. 
In general, these are intractable, which makes it very hard to verify whether or not the location-scale family assumption holds or not.

Remark: 
The paper "Convergence of Regression Adjusted Approximate Bayesian Computation" (Li \& Fearnhead 2017) proofs convergence of the regression-adjusted posterior to ground-truth, but it only talks about posterior mean $\mu(x_0)$ and variance $\sigma^2(x_0)$. 

# fun with regression adjustment

the whole thing again, but with likelihood  $P(x=1 \ | \ \theta) = 1$   if   $\theta\geq -1.5$  (skip to figure !)

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from delfi import distribution as dd

# define likelihood

def simulator(theta):
    return (theta >= -1.5) * 1.

# define prior
prior = dd.Gaussian(m = np.zeros(1), 
                     S =np.eye(1))

# draw artificial dataset
th = prior.gen(100000)
x = simulator(th)

# compute conditional probabilities
th_x_0 = th[x==0] # all samples from p(\theta | x=0)
th_x_1 = th[x==1] # all samples from p(\theta | x=1)

# get posterior samples 
x_0 = 0
s_post = th[x==x_0].copy()

# train linear-Gaussian regression model: 
def mu(x):
    if x == 0:
        return th_x_0.mean()
    elif x==1:
        return th_x_1.mean()
    
def sig(x): 
    if x == 0:
        return th_x_0.std()
    elif x==1:
        return th_x_1.std()

# do regression-adjustment    
th_reg_adj_0 = mu(0) + (th_x_0 - mu(0)) * sig(1)/sig(1) # nothing to do 
th_reg_adj_1 = mu(0) + (th_x_1 - mu(1)) * sig(0)/sig(1) 
    
# collect regression-adjusted samples
th_reg_adj = np.zeros_like(th) 
th_reg_adj[x==0] = th_reg_adj_0
th_reg_adj[x==1] = th_reg_adj_1

plt.figure(figsize=(12,12))

# plot raw samples with mean +/- 1 std 
plt.subplot(2,2,1)
x_plot = x+np.random.normal(size=x.size).reshape(x.shape)/50 # add some jitter to x for visualization
plt.plot(x_plot, th, 'ko')
for x_ in [0,1]:
    plt.plot(x_, mu(x_), 'ro')
    plt.plot([x_,x_], sig(x_)*np.array([-1,1])+mu(x_), 'r-')
plt.axis([-1, 2, -5, 5])
plt.xlabel('x (+ jitter for visibility)')
plt.ylabel('theta')
plt.title('conditional distribution p( theta | x )')
plt.xticks([0,1])

# plot real posterior
plt.subplot(2,2,2)
plt.hist(s_post, bins=np.linspace(-4, 2, 61))
plt.title('actual posterior p( theta | x_0 ) for x_0 = 0')
plt.xlabel('theta')
plt.yticks([])

# plot regression-adjusted samples with mean +/- 1 std
plt.subplot(2,2,3)
plt.plot(x_plot, th_reg_adj, 'ko')
for x_ in [0,1]:
    plt.plot(x_, mu(0), 'mo')
    plt.plot([x_,x_], sig(0)*np.array([-1,1])+mu(0), 'm-')
plt.axis([-1, 2, -5, 5])
plt.xlabel('x (+ jitter for visibility)')
plt.ylabel('theta')
plt.title('regression-adjusted samples')
plt.xticks([0,1])

# plot regression-adjusted posterior
plt.subplot(2,2,4)
plt.hist(th_reg_adj, bins=np.linspace(-4, 2, 61))
plt.title('regression-adjusted posterior')
plt.xlabel('theta')
plt.yticks([])
plt.show()


print(' mean and variance of the regression-adjusted posterior are correct by construction: ')

print('true posterior mean: ', s_post.mean())
print('true posterior std.: ', s_post.std())

print('regr.adj. posterior mean: ', th_reg_adj.mean())
print('regr.adj. posterior std.: ', th_reg_adj.std())
