## Parameter estimation for a linear operator using Gaussian processes


Assumptions about the linear operator:

$\mathcal{L}_x^\phi u(x) = f(x)$

$u(x) \sim \mathcal{GP}(0, k_{uu}(x,x',\theta))$

$f(x) \sim \mathcal{GP}(0, k_{ff}(x,x',\theta,\phi))$

$y_u = u(X_u) + \epsilon_u; \epsilon_u \sim \mathcal{N}(0, \sigma_u^2I)$

$y_f = f(X_f) + \epsilon_f; \epsilon_f \sim \mathcal{N}(0, \sigma_f^2I)$

Taking a simple operator as example:

$\mathcal{L}_x^\phi := \phi \cdot + \frac{d}{dx_1}\cdot + \frac{d^2}{dx_2^2}\cdot$

$u(x) = x_1 x_2 - x_2^2$

$f(x) = \phi x_1 x_2 - \phi x_2^2 + x_2 - 2$

Problem at hand:

Given $\{X_u, y_u\}$ and $\{X_f, y_f\}$, estimate $\phi$.


#### step 1: Simulate data


Use $\phi = 2$


In [1]:
import numpy as np
import sympy as sp
from scipy.optimize import minimize
import matplotlib.pyplot as plt

In [2]:
x = np.random.rand(10,2)
y_u = np.multiply(x[:,0], x[:,1]) - x[:,1]**2
y_f = 2.0*y_u + x[:,1] - 2

In [3]:
x.shape[0]

10

#### step 2: Evaluate kernels



This step uses information about $\mathcal{L}_x^\phi$ but not about $u(x)$ or $f(x)$.

$k_{uu}(x_i, x_j; \theta) =  exp(-\theta_1(x_{i,1}-x_{j,1})^2 -\theta_2(x_{i,2}-x_{j,2})^2)$


In [4]:
xi1, xj1, xi2, xj2, theta1, theta2, phi = sp.symbols('xi1 xj1 xi2 xj2 theta1 theta2 phi')
kuu_sym = sp.exp(-theta1*((xi1 - xj1)**2) - theta2*((xi2 - xj2)**2))
kuu_fn = sp.lambdify((xi1, xj1, xi2, xj2, theta1, theta2), kuu_sym, "numpy")
def kuu(x, theta1, theta2):
    k = np.zeros((x.shape[0], x.shape[0]))
    for i in range(x.shape[0]):
        for j in range(x.shape[0]):
            k[i,j] = kuu_fn(x[i,0], x[j,0], x[i,1], x[j,1], theta1, theta2)
    return k

$k_{ff}(x_i,x_j;\theta,\phi) \\
= \mathcal{L}_{x_i}^\phi \mathcal{L}_{x_j}^\phi k_{uu}(x_i, x_j; \theta) \\
= \mathcal{L}_{x_i}^\phi \left( \phi k_{uu} + \frac{\partial}{\partial x_{j,1}}k_{uu} + \frac{\partial^2}{\partial^2 x_{j,2}}k_{uu}\right) \\
= \phi^2 k_{uu} + \phi \frac{\partial}{\partial x_{j,1}}k_{uu} + \phi \frac{\partial^2}{\partial^2 x_{j,2}}k_{uu} + \phi \frac{\partial}{\partial x_{i,1}}k_{uu} + \frac{\partial}{\partial x_{i,1}}\frac{\partial}{\partial x_{j,1}}k_{uu} + \frac{\partial}{\partial x_{i,1}}\frac{\partial^2}{\partial^2 x_{j,2}}k_{uu} + \phi \frac{\partial^2}{\partial^2 x_{i,2}}k_{uu} + \frac{\partial^2}{\partial^2 x_{i,2}}\frac{\partial}{\partial x_{j,1}}k_{uu} + \frac{\partial^2}{\partial^2 x_{i,2}}\frac{\partial^2}{\partial^2 x_{j,2}}k_{uu}$

In [5]:
kff_sym = phi**2*kuu_sym \
            + phi*sp.diff(kuu_sym, xj1) \
            + phi*sp.diff(kuu_sym, xj2, xj2) \
            + phi*sp.diff(kuu_sym, xi1) \
            + sp.diff(kuu_sym, xj1, xi1) \
            + sp.diff(kuu_sym, xj2, xj2, xi1) \
            + phi*sp.diff(kuu_sym, xi2, xi2) \
            + sp.diff(kuu_sym, xj1, xi2, xi2) \
            + sp.diff(kuu_sym, xj2, xj2, xi2, xi2)
kff_fn = sp.lambdify((xi1, xj1, xi2, xj2, theta1, theta2, phi), kff_sym, "numpy")
def kff(x, theta1, theta2, phi):
    k = np.zeros((x.shape[0], x.shape[0]))
    for i in range(x.shape[0]):
        for j in range(x.shape[0]):
            k[i,j] = kff_fn(x[i,0], x[j,0], x[i,1], x[j,1], theta1, theta2, phi)
    return k

$k_{fu}(x_i,x_j;\theta,\phi) \\
= \mathcal{L}_{x_i}^\phi k_{uu}(x_i, x_j; \theta) \\
= \phi k_{uu} + \frac{\partial}{\partial x_{i,1}}k_{uu} + \frac{\partial^2}{\partial x_{i,2}^2}k_{uu}$

In [6]:
kfu_sym = phi*kuu_sym + sp.diff(kuu_sym, xi1) + sp.diff(kuu_sym, xi2, xi2)
kfu_fn = sp.lambdify((xi1, xj1, xi2, xj2, theta1, theta2, phi), kfu_sym, "numpy")
def kfu(x, theta1, theta2, phi):
    k = np.zeros((x.shape[0], x.shape[0]))
    for i in range(x.shape[0]):
        for j in range(x.shape[0]):
            k[i,j] = kfu_fn(x[i,0], x[j,0], x[i,1], x[j,1], theta1, theta2, phi)
    return k

In [7]:
def kuf(x, theta1, theta2, phi):
    return kfu(x, theta1, theta2, phi).T

#### step 3: define negative log marginal likelihood  



$K = \begin{bmatrix}
k_{uu}(X_u, X_u; \theta) + \sigma_u^2I & k_{uf}(X_u, X_f; \theta, \phi) \\
k_{fu}(X_f, X_u; \theta, \phi) & k_{ff}(X_f, X_f; \theta, \phi) + \sigma_f^2I
\end{bmatrix}$

For simplicity, assume $\sigma_u = \sigma_f$.

$\mathcal{NLML} = \frac{1}{2} \left[ log|K| + y^TK^{-1}y + Nlog(2\pi) \right]$

where $y = \begin{bmatrix}
y_u \\
y_f
\end{bmatrix}$

In [8]:
def nlml(params, x, y1, y2, s):
    params = np.exp(params)
    K = np.block([
        [
            kuu(x, params[0], params[1]) + s*np.identity(x.shape[0]),
            kuf(x, params[0], params[1], params[2])
        ],
        [
            kfu(x, params[0], params[1], params[2]),
            kff(x, params[0], params[1], params[2]) + s*np.identity(x.shape[0])
        ]
    ])
    y = np.concatenate((y1, y2))
    val = 0.5*(np.log(abs(np.linalg.det(K))) \
               + np.mat(y) * np.linalg.inv(K) * np.mat(y).T)
    return val.item(0)

In [9]:
nlml((1, 1, 0.69), x, y_u, y_f, 1e-6)

-5.782860181060707

#### step 4: Optimise hyperparameters


In [10]:
nlml_wp = lambda params: nlml(params, x, y_u, y_f, 1e-7)
m = minimize(nlml_wp, np.random.rand(3), method="Nelder-Mead")

In [11]:
m

 final_simplex: (array([[-2.14541894, -0.79255199,  0.75524789],
       [-2.14536881, -0.79252757,  0.75526145],
       [-2.14547886, -0.79248655,  0.75524377],
       [-2.14537636, -0.7925724 ,  0.75527505]]), array([-54.20687619, -54.20687617, -54.20687613, -54.20687611]))
           fun: -54.206876191529446
       message: 'Optimization terminated successfully.'
          nfev: 179
           nit: 98
        status: 0
       success: True
             x: array([-2.14541894, -0.79255199,  0.75524789])

In [12]:
np.exp(m.x)

array([0.117019  , 0.45268807, 2.12813901])