# Optimizing Parameters

**Abstract**: In this session we introduce the process of optimization
of the hyper parameters of the Gaussian process covariance function.

$$
$$

::: {.cell .markdown}

<!-- Do not edit this file locally. -->
<!-- Do not edit this file locally. -->
<!---->
<!-- Do not edit this file locally. -->
<!-- Do not edit this file locally. -->
<!-- The last names to be defined. Should be defined entirely in terms of macros from above-->
<!--

-->

In [None]:
%pip install gpy

## GPy: A Gaussian Process Framework in Python

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_software/includes/gpy-software.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_software/includes/gpy-software.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

Gaussian processes are a flexible tool for non-parametric analysis with
uncertainty. The GPy software was started in Sheffield to provide a easy
to use interface to GPs. One which allowed the user to focus on the
modelling rather than the mathematics.

<img class="" src="https://inverseprobability.com/talks/./slides/diagrams//gp/gpy.png" style="width:70%">

Figure: <i>GPy is a BSD licensed software code base for implementing
Gaussian process models in Python. It is designed for teaching and
modelling. We welcome contributions which can be made through the GitHub
repository <https://github.com/SheffieldML/GPy></i>

GPy is a BSD licensed software code base for implementing Gaussian
process models in python. This allows GPs to be combined with a wide
variety of software libraries.

The software itself is available on
[GitHub](https://github.com/SheffieldML/GPy) and the team welcomes
contributions.

The aim for GPy is to be a probabilistic-style programming language,
i.e., you specify the model rather than the algorithm. As well as a
large range of covariance functions the software allows for non-Gaussian
likelihoods, multivariate outputs, dimensionality reduction and
approximations for larger data sets.

The documentation for GPy can be found
[here](https://gpy.readthedocs.io/en/latest/).

## Improving the Numerics

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_gp/includes/gp-numerics-and-optimization.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_gp/includes/gp-numerics-and-optimization.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

In practice we shouldn’t be using matrix inverse directly to solve the
GP system. One more stable way is to compute the *Cholesky
decomposition* of the kernel matrix. The log determinant of the
covariance can also be derived from the Cholesky decomposition.

In [None]:
import mlai

In [None]:
%load -n mlai.update_inverse

In [None]:
GP.update_inverse = update_inverse

## Capacity Control

Gaussian processes are sometimes seen as part of a wider family of
methods known as kernel methods. Kernel methods are also based around
covariance functions, but in the field they are known as Mercer kernels.
Mercer kernels have interpretations as inner products in potentially
infinite dimensional Hilbert spaces. This interpretation arises because,
if we take $\alpha=1$, then the kernel can be expressed as $$
\mathbf{K}= \boldsymbol{ \Phi}\boldsymbol{ \Phi}^\top 
$$ which imples the elements of the kernel are given by, $$
k(\mathbf{ x}, \mathbf{ x}^\prime) = \boldsymbol{ \phi}(\mathbf{ x})^\top \boldsymbol{ \phi}(\mathbf{ x}^\prime).
$$ So we see that the kernel function is developed from an inner product
between the basis functions. Mercer’s theorem tells us that any valid
*positive definite function* can be expressed as this inner product but
with the caveat that the inner product could be *infinite length*. This
idea has been used quite widely to *kernelize* algorithms that depend on
inner products. The kernel functions are equivalent to covariance
functions and they are parameterized accordingly. In the kernel modeling
community it is generally accepted that kernel parameter estimation is a
difficult problem and the normal solution is to cross validate to obtain
parameters. This can cause difficulties when a large number of kernel
parameters need to be estimated. In Gaussian process modelling kernel
parameter estimation (in the simplest case proceeds) by maximum
likelihood. This involves taking gradients of the likelihood with
respect to the parameters of the covariance function.

## Gradients of the Likelihood

The easiest conceptual way to obtain the gradients is a two step
process. The first step involves taking the gradient of the likelihood
with respect to the covariance function, the second step involves
considering the gradient of the covariance function with respect to its
parameters.

## Overall Process Scale

In general we won’t be able to find parameters of the covariance
function through fixed point equations, we will need to do gradient
based optimization.

## Capacity Control and Data Fit

The objective function can be decomposed into two terms, a capacity
control term, and a data fit term. The capacity control term is the log
determinant of the covariance. The data fit term is the matrix inner
product between the data and the inverse covariance.

In [None]:
def rotateObject(rotationMatrix, handle):
for i = 1:prod(size(handle))
    type = get(handle(i), 'type');
    if strcmp(type, 'text'):
        xy = get(handle(i), 'position');
        xy(1:2) = rotationMatrix*xy(1:2)';
        set(handle(i), 'position', xy);
    else:
        xd = get(handle(i), 'xdata');
        yd = get(handle(i), 'ydata');
        new = rotationMatrix*[xd(:)'; yd(:)'];
        set(handle(i), 'xdata', new(1, :));
        set(handle(i), 'ydata', new(2, :));

## Learning Covariance Parameters

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_gp/includes/gp-optimize.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_gp/includes/gp-optimize.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

Can we determine covariance parameters from the data?

$$
\mathcal{N}\left(\mathbf{ y}|\mathbf{0},\mathbf{K}\right)=\frac{1}{(2\pi)^\frac{n}{2}{\det{\mathbf{K}}^{\frac{1}{2}}}}{\exp\left(-\frac{\mathbf{ y}^{\top}\mathbf{K}^{-1}\mathbf{ y}}{2}\right)}
$$

$$
\begin{aligned}
    \mathcal{N}\left(\mathbf{ y}|\mathbf{0},\mathbf{K}\right)=\frac{1}{(2\pi)^\frac{n}{2}\color{blue}{\det{\mathbf{K}}^{\frac{1}{2}}}}\color{red}{\exp\left(-\frac{\mathbf{ y}^{\top}\mathbf{K}^{-1}\mathbf{ y}}{2}\right)}
\end{aligned}
$$

$$
\begin{aligned}
    \log \mathcal{N}\left(\mathbf{ y}|\mathbf{0},\mathbf{K}\right)=&\color{blue}{-\frac{1}{2}\log\det{\mathbf{K}}}\color{red}{-\frac{\mathbf{ y}^{\top}\mathbf{K}^{-1}\mathbf{ y}}{2}} \\ &-\frac{n}{2}\log2\pi
\end{aligned}
$$

$$
E(\boldsymbol{ \theta}) = \color{blue}{\frac{1}{2}\log\det{\mathbf{K}}} + \color{red}{\frac{\mathbf{ y}^{\top}\mathbf{K}^{-1}\mathbf{ y}}{2}}
$$

In [None]:
      clf
      lambda1 = 3;
      lambda2 = 1;
      t = linspace(-pi, pi, 200);
      R = [sqrt(2)/2 -sqrt(2)/2; sqrt(2)/2 sqrt(2)/2];
      xy = R*[lambda1*sin(t); lambda2*cos(t)];
      line(xy(1, :), xy(2, :), 'linewidth', 3, 'color', blackColor);
      axis off, axis equal
      a = arrow([0 lambda1*R(1, 1)], [0 lambda1*R(2, 1)]);
      set(a, 'linewidth', 3, 'color', blueColor);
      a = arrow([0 lambda2*R(1, 2)], [0 lambda2*R(2, 2)]);
      set(a, 'linewidth', 3, 'color', blueColor);
      xlim = get(gca, 'xlim');
      xspan = xlim(2) - xlim(1);
      ylim = get(gca, 'ylim');
      yspan = ylim(2) - ylim(1);
      text(lambda1*0.5*R(1, 1)-0.05*xspan, lambda1*0.5*R(2, 1)-yspan*0.05, '$\eigenvalue_1$')
      text(lambda2*0.5*R(1, 2)-0.05*xspan, lambda2*0.5*R(2, 2)-yspan*0.05, '$\eigenvalue_2$')
      fileName = 'gpOptimiseEigen';
      printLatexPlot(fileName, directory, 0.45*textWidth)

## Capacity Control through the Determinant

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_gp/includes/gp-optimize-capacity.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_gp/includes/gp-optimize-capacity.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

The parameters are *inside* the covariance function (matrix).
$$k_{i, j} = k(\mathbf{ x}_i, \mathbf{ x}_j; \boldsymbol{ \theta})$$

$$\mathbf{K}= \mathbf{R}\boldsymbol{ \Lambda}^2 \mathbf{R}^\top$$

In [None]:
gpoptimizePlot1

<table>
<tr>
<td width="50%">

<img class="negate" src="https://inverseprobability.com/talks/./slides/diagrams//gp/gp-optimize-eigen.png" style="width:100%">

</td>
<td width="50%">

$\boldsymbol{ \Lambda}$ represents distance on axes. $\mathbf{R}$ gives
rotation.

</td>
</tr>
</table>

-   $\boldsymbol{ \Lambda}$ is *diagonal*,
    $\mathbf{R}^\top\mathbf{R}= \mathbf{I}$.
-   Useful representation since
    $\det{\mathbf{K}} = \det{\boldsymbol{ \Lambda}^2} = \det{\boldsymbol{ \Lambda}}^2$.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import mlai
import mlai.plot

In [None]:
diagrams = './gp/'

In [None]:
plot.covariance_capacity(rotate_angle=np.pi/4, lambda1 = 0.5, lambda2 = 0.3, diagrams = './gp/')

<img src="https://inverseprobability.com/talks/./slides/diagrams//gp/gp-optimise-determinant009.svg" class="" width="80%" style="vertical-align:middle;">

Figure: <i>The determinant of the covariance is dependent only on the
eigenvalues. It represents the ‘footprint’ of the Gaussian.</i>

In [None]:
    clf
    includeText = [];
    counter = 0;
    plotWidth = 0.6*textWidth;
    lambda1 = 3;
    lambda2 = 1;
    t = linspace(-pi, pi, 200);
    R = [sqrt(2)/2 -sqrt(2)/2; sqrt(2)/2 sqrt(2)/2];
    xy = [lambda1*sin(t); lambda2*cos(t)];
    contourHand = line(xy(1, :), xy(2, :), 'color', blackColor);
    xy = [lambda1*sin(t); lambda2*cos(t)]*2;
    lim = [-1 1]*max([lambda1 lambda2])*2.2;
    set(gca, 'xlim', lim, 'ylim', lim)
    axis equal


    contourHand = [contourHand line(xy(1, :), xy(2, :), 'color', blackColor)];
    set(contourHand, 'linewidth', 2, 'color', redColor)
    arrowHand = arrow([0 lambda1], [0 0]);
    arrowHand = [arrowHand arrow([0 0], [0 lambda2])];
    set(arrowHand, 'linewidth', 3, 'color', blackColor);
    xlim = get(gca, 'xlim');
    xspan = xlim(2) - xlim(1);
    ylim = get(gca, 'ylim');
    yspan = ylim(2) - ylim(1);
    eigLabel = text(lambda1*0.5, -yspan*0.05, '$\eigenvalue_1$', 'horizontalalignment', 'center');
    eigLabel = [eigLabel text(-0.05*xspan, lambda2*0.5, '$\eigenvalue_2$', 'horizontalalignment', 'center')];
    xlabel('$\dataScalar_1$')
    ylabel('$\dataScalar_2$')
    
    box off
    xlim = get(gca, 'xlim');
    ylim = get(gca, 'ylim');
    line([xlim(1) xlim(1)], ylim, 'color', blackColor)
    line(xlim, [ylim(1) ylim(1)], 'color', blackColor)
    
    fileName = ['gpOptimiseQuadratic' num2str(counter)];
    printLatexPlot(fileName, directory, plotWidth);
    includeText = [includeText '\only<' num2str(counter) '>{\input{' directory fileName '.svg}}'];
    counter = counter + 1;

    y = [1.2 1.4];
    dataHand = line(y(1), y(2), 'marker', 'x', 'markersize', markerSize, 'linewidth', markerWidth, 'color', blackColor);
    
    fileName = ['gpOptimiseQuadratic' num2str(counter)];
    printLatexPlot(fileName, directory, plotWidth);
    includeText = [includeText '\only<' num2str(counter) '>{\input{' directory fileName '.svg}}'];
    counter = counter + 1;

    
    rotateObject(rotationMatrix, arrowHand);
    rotateObject(rotationMatrix, contourHand);
    rotateObject(rotationMatrix, eigLabel);
    
    fileName = ['gpOptimiseQuadratic' num2str(counter)];
    printLatexPlot(fileName, directory, plotWidth);
    includeText = [includeText '\only<' num2str(counter) '>{\input{' directory fileName '.svg}}'];
    counter = counter + 1;
    
    printLatexText(includeText, 'gpOptimiseQuadraticIncludeText.tex', directory)

<img src="https://inverseprobability.com/talks/./slides/diagrams//gp/gp-optimise-quadratic002.svg" class="" width="80%" style="vertical-align:middle;">

Figure: <i>The data fit term of the Gaussian process is a quadratic loss
centered around zero. This has eliptical contours, the principal axes of
which are given by the covariance matrix.</i>

## Quadratic Data Fit

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_gp/includes/gp-optimize-data-fit.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_gp/includes/gp-optimize-data-fit.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

## Data Fit Term

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_gp/includes/gp-optimize-data-fit-capacity.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_gp/includes/gp-optimize-data-fit-capacity.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import os

In [None]:
import GPy
import mlai.plot
import mlai
import gp_tutorial

In [None]:
np.random.seed(125)
diagrams = './gp'

black_color=[0., 0., 0.]
red_color=[1., 0., 0.]
blue_color=[0., 0., 1.]
magenta_color=[1., 0., 1.]
fontsize=18

In [None]:
y_lim = [-2.2, 2.2]
y_ticks = [-2, -1, 0, 1, 2]
x_lim = [-2, 2]
x_ticks = [-2, -1, 0, 1, 2]
err_y_lim = [-12, 20]

linewidth=3
markersize=15
markertype='.'

In [None]:
x = np.linspace(-1, 1, 6)[:, np.newaxis]
xtest = np.linspace(x_lim[0], x_lim[1], 200)[:, np.newaxis]

# True data
true_kern = GPy.kern.RBF(1) + GPy.kern.White(1)
true_kern.rbf.lengthscale = 1.0
true_kern.white.variance = 0.01
K = true_kern.K(x) 
y = np.random.multivariate_normal(np.zeros((6,)), K, 1).T

In [None]:
# Fitted model
kern = GPy.kern.RBF(1) + GPy.kern.White(1)
kern.rbf.lengthscale = 1.0
kern.white.variance = 0.01

lengthscales = np.asarray([0.01, 0.05, 0.1, 0.25, 0.5, 1, 2, 4, 8, 16, 100])

fig1, ax1 = plt.subplots(figsize=plot.one_figsize)    
fig2, ax2 = plt.subplots(figsize=plot.one_figsize)    
line = ax2.semilogx(np.NaN, np.NaN, 'x-', 
                    color=black_color)
ax.set_ylim(err_y_lim)
ax.set_xlim([0.025, 32])
ax.grid(True)
ax.set_xticks([0.01, 0.1, 1, 10, 100])
ax.set_xticklabels(['$10^{-2}$', '$10^{-1}$', '$10^0$', '$10^1$', '$10^2$'])


err = np.zeros_like(lengthscales)
err_log_det = np.zeros_like(lengthscales)
err_fit = np.zeros_like(lengthscales)

counter = 0
for i, ls in enumerate(lengthscales):
        kern.rbf.lengthscale=ls
        K = kern.K(x) 
        invK, L, Li, log_det_K = GPy.util.linalg.pdinv(K)
        err[i] = 0.5*(log_det_K + np.dot(np.dot(y.T,invK),y))
        err_log_det[i] = 0.5*log_det_K
        err_fit[i] = 0.5*np.dot(np.dot(y.T,invK), y)
        Kx = kern.K(x, xtest)
        ypred_mean = np.dot(np.dot(Kx.T, invK), y)
        ypred_var = kern.Kdiag(xtest) - np.sum((np.dot(Kx.T,invK))*Kx.T, 1)
        ypred_sd = np.sqrt(ypred_var)
        ax1.clear()
        _ = gp_tutorial.gpplot(xtest.flatten(),
                               ypred_mean.flatten(),
                               ypred_mean.flatten()-2*ypred_sd.flatten(),
                               ypred_mean.flatten()+2*ypred_sd.flatten(), 
                               ax=ax1)
        x_lim = ax1.get_xlim()
        ax1.set_ylabel('$f(x)$', fontsize=fontsize)
        ax1.set_xlabel('$x$', fontsize=fontsize)

        p = ax1.plot(x, y, markertype, color=black_color, markersize=markersize, linewidth=linewidth)
        ax1.set_ylim(y_lim)
        ax1.set_xlim(x_lim)                                    
        ax1.set_xticks(x_ticks)
        #ax.set(box=False)
           
        ax1.plot([x_lim[0], x_lim[0]], y_lim, color=black_color)
        ax1.plot(x_lim, [y_lim[0], y_lim[0]], color=black_color)

        file_name = 'gp-optimise{counter:0>3}.svg'.format(counter=counter)
        mlai.write_figure(os.path.join(diagrams, file_name),
                          figure=fig1,
                          transparent=True)
        counter += 1

        ax2.clear()
        t = ax2.semilogx(lengthscales[0:i+1], err[0:i+1], 'x-', 
                        color=magenta_color, 
                        markersize=markersize,
                        linewidth=linewidth)
        t2 = ax2.semilogx(lengthscales[0:i+1], err_log_det[0:i+1], 'x-', 
                         color=blue_color, 
                        markersize=markersize,
                        linewidth=linewidth)
        t3 = ax2.semilogx(lengthscales[0:i+1], err_fit[0:i+1], 'x-', 
                         color=red_color, 
                        markersize=markersize,
                        linewidth=linewidth)
        ax2.set_ylim(err_y_lim)
        ax2.set_xlim([0.025, 32])
        ax2.set_xticks([0.01, 0.1, 1, 10, 100])
        ax2.set_xticklabels(['$10^{-2}$', '$10^{-1}$', '$10^0$', '$10^1$', '$10^2$'])

        ax2.grid(True)

        ax2.set_ylabel('negative log likelihood', fontsize=fontsize)
        ax2.set_xlabel('length scale, $\ell$', fontsize=fontsize)
        file_name = 'gp-optimise{counter:0>3}.svg'.format(counter=counter)
        mlai.write_figure(os.path.join(diagrams, file_name),
                          figure=fig2,
                          transparent=True)
        counter += 1
        #ax.set_box(False)
        xlim = ax2.get_xlim()
        ax2.plot([xlim[0], xlim[0]], err_y_lim, color=black_color)
        ax2.plot(xlim, [err_y_lim[0], err_y_lim[0]], color=black_color)

<table>
<tr>
<td width="50%">

<img src="https://inverseprobability.com/talks/./slides/diagrams//gp/gp-optimise006.svg" class="" width="100%" style="vertical-align:middle;">

</td>
<td width="50%">

<img src="https://inverseprobability.com/talks/./slides/diagrams//gp/gp-optimise010.svg" class="" width="100%" style="vertical-align:middle;">

</td>
</tr>
</table>
<table>
<tr>
<td width="50%">

<img src="https://inverseprobability.com/talks/./slides/diagrams//gp/gp-optimise016.svg" class="" width="100%" style="vertical-align:middle;">

</td>
<td width="50%">

<img src="https://inverseprobability.com/talks/./slides/diagrams//gp/gp-optimise021.svg" class="" width="100%" style="vertical-align:middle;">

</td>
</tr>
</table>

Figure: <i>Variation in the data fit term, the capacity term and the
negative log likelihood for different lengthscales.</i>

## Gene Expression Example

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_gp/includes/della-gatta-gene-gp.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_gp/includes/della-gatta-gene-gp.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

We now consider an example in gene expression. Gene expression is the
measurement of mRNA levels expressed in cells. These mRNA levels show
which genes are ‘switched on’ and producing data. In the example we will
use a Gaussian process to determine whether a given gene is active, or
we are merely observing a noise response.

## Della Gatta Gene Data

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_datasets/includes/della-gatta-gene-data.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_datasets/includes/della-gatta-gene-data.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

-   Given given expression levels in the form of a time series from
    Della Gatta et al. (2008).

In [None]:
import numpy as np
import pods

In [None]:
data = pods.datasets.della_gatta_TRP63_gene_expression(data_set='della_gatta',gene_number=937)

x = data['X']
y = data['Y']

offset = y.mean()
scale = np.sqrt(y.var())

In [None]:
import matplotlib.pyplot as plt
import mlai.plot
import mlai

In [None]:
xlim = (-20,260)
ylim = (5, 7.5)
yhat = (y-offset)/scale

fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
_ = ax.plot(x, y, 'r.',markersize=10)
ax.set_xlabel('time/min', fontsize=20)
ax.set_ylabel('expression', fontsize=20)
ax.set_xlim(xlim)
ax.set_ylim(ylim)

mlai.write_figure(figure=fig, 
                  filename='./datasets/della-gatta-gene.svg', 
                  transparent=True, 
                  frameon=True)

<img src="https://inverseprobability.com/talks/./slides/diagrams//datasets/della-gatta-gene.svg" class="" width="80%" style="vertical-align:middle;">

Figure: <i>Gene expression levels over time for a gene from data
provided by Della Gatta et al. (2008). We would like to understand
whether there is signal in the data, or we are only observing noise.</i>

-   Want to detect if a gene is expressed or not, fit a GP to each gene
    Kalaitzis and Lawrence (2011).

<svg viewBox="0 0 200 200" style="width:15%">

<defs> <clipPath id="clip0">

<style>
circle {
  fill: black;
}
</style>

<circle cx="100" cy="100" r="100"/> </clipPath> </defs>

<title>

Freddie Kalaitzis

</title>

<image preserveAspectRatio="xMinYMin slice" width="100%" xlink:href="https://inverseprobability.com/talks/./slides/diagrams//people/freddie-kalaitzis.jpg" clip-path="url(#clip0)"/>

</svg>

<img class="" src="https://inverseprobability.com/talks/./slides/diagrams//health/1471-2105-12-180_1.png" style="width:80%">

Figure: <i>The example is taken from the paper “A Simple Approach to
Ranking Differentially Expressed Gene Expression Time Courses through
Gaussian Process Regression.” Kalaitzis and Lawrence (2011).</i>

<center>

<http://www.biomedcentral.com/1471-2105/12/180>

</center>

Our first objective will be to perform a Gaussian process fit to the
data, we’ll do this using the [GPy
software](https://github.com/SheffieldML/GPy).

In [None]:
import GPy

In [None]:
m_full = GPy.models.GPRegression(x,yhat)
m_full.kern.lengthscale=50
_ = m_full.optimize() # Optimize parameters of covariance function

Initialize the length scale parameter (which here actually represents a
*time scale* of the covariance function) to a reasonable value. Default
would be 1, but here we set it to 50 minutes, given points are arriving
across zero to 250 minutes.

In [None]:
xt = np.linspace(-20,260,200)[:,np.newaxis]
yt_mean, yt_var = m_full.predict(xt)
yt_sd=np.sqrt(yt_var)

Now we plot the results using the helper function in `mlai.plot`.

In [None]:
import mlai.plot

In [None]:
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m_full, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full.log_likelihood()), fontsize=20)
mlai.write_figure(figure=fig,
                  filename='./gp/della-gatta-gene-gp.svg', 
                  transparent=True, frameon=True)

<img src="https://inverseprobability.com/talks/./slides/diagrams//gp/della-gatta-gene-gp.svg" class="" width="80%" style="vertical-align:middle;">

Figure: <i>Result of the fit of the Gaussian process model with the time
scale parameter initialized to 50 minutes.</i>

Now we try a model initialized with a longer length scale.

In [None]:
m_full2 = GPy.models.GPRegression(x,yhat)
m_full2.kern.lengthscale=2000
_ = m_full2.optimize() # Optimize parameters of covariance function

In [None]:
import mlai.plot

In [None]:
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m_full2, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full2.log_likelihood()), fontsize=20)
mlai.write_figure(figure=fig,
                  filename='./gp/della-gatta-gene-gp2.svg', 
                  transparent=True, frameon=True)

<img src="https://inverseprobability.com/talks/./slides/diagrams//gp/della-gatta-gene-gp2.svg" class="" width="80%" style="vertical-align:middle;">

Figure: <i>Result of the fit of the Gaussian process model with the time
scale parameter initialized to 2000 minutes.</i>

Now we try a model initialized with a lower noise.

In [None]:
m_full3 = GPy.models.GPRegression(x,yhat)
m_full3.kern.lengthscale=20
m_full3.likelihood.variance=0.001
_ = m_full3.optimize() # Optimize parameters of covariance function

In [None]:
import mlai.plot

In [None]:
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m_full3, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full3.log_likelihood()), fontsize=20)
mlai.write_figure(figure=fig,
                  filename='./gp/della-gatta-gene-gp3.svg', 
                  transparent=True, frameon=True)

<img src="https://inverseprobability.com/talks/./slides/diagrams//gp/della-gatta-gene-gp3.svg" class="" width="80%" style="vertical-align:middle;">

Figure: <i>Result of the fit of the Gaussian process model with the
noise initialized low (standard deviation 0.1) and the time scale
parameter initialized to 20 minutes.</i>

In [None]:
import mlai.plot

In [None]:
plot.multiple_optima(diagrams='./gp')

<img src="https://inverseprobability.com/talks/./slides/diagrams//gp/multiple-optima000.svg" class="" width="50%" style="vertical-align:middle;">

Figure: <i></i>

<!--

<img src="https://inverseprobability.com/talks/./slides/diagrams//gp/multiple-optima001.svg" class="" width="" style="vertical-align:middle;">-->

## Thanks!

For more information on these subjects and more you might want to check
the following resources.

-   twitter: [@lawrennd](https://twitter.com/lawrennd)
-   podcast: [The Talking Machines](http://thetalkingmachines.com)
-   newspaper: [Guardian Profile
    Page](http://www.theguardian.com/profile/neil-lawrence)
-   blog:
    [http://inverseprobability.com](http://inverseprobability.com/blog.html)

## References

Della Gatta, G., Bansal, M., Ambesi-Impiombato, A., Antonini, D.,
Missero, C., Bernardo, D. di, 2008. Direct targets of the TRP63
transcription factor revealed by a combination of gene expression
profiling and reverse engineering. Genome Research 18, 939–948.
<https://doi.org/10.1101/gr.073601.107>

Kalaitzis, A.A., Lawrence, N.D., 2011. A simple approach to ranking
differentially expressed gene expression time courses through Gaussian
process regression. BMC Bioinformatics 12.
<https://doi.org/10.1186/1471-2105-12-180>