In [1]:
%matplotlib inline
import itertools
import os
import numpy as np
import gpflow
import numbers
import matplotlib.pyplot as plt
import tensorflow as tf
from gpflow_monitor import *
X = np.random.rand(10000, 1) * 10
Y = np.sin(X) + np.random.randn(*X.shape)
Xt = np.random.rand(10000, 1) * 10
Yt = np.sin(X) + np.random.randn(*X.shape)
np.random.seed(0)

# Demo: `gpflow_monitor`
In this notebook we'll demo how to use `gpflow_monitoring` for logging the optimisation of a GPflow model. The example should cover pretty much all use cases.

## Creating the GPflow model
We first create the GPflow model as usual.

In [2]:
m = gpflow.models.SVGP(X, Y, gpflow.kernels.RBF(1), gpflow.likelihoods.Gaussian(), Z=np.linspace(0, 10, 5)[:, None],
                       minibatch_size=100)
m.likelihood.variance = 0.01
m.compile()

Instructions for updating:
Use `tf.data.Dataset.from_tensor_slices()`.


## Setting up the optimisation
Next we need to set up the optimisation process. `gpflow_monitor` provides classes that manage the optimsation, and perform certain logging tasks. In this example, we want to:
- log certain scalar parameters in TensorBoard
- log the full optimisation objective (log marginal likelihood bound) periodically, even though we optimise with minibatches
- store a backup of the optimisation process periodically
- log performance for a test set periodically

Because of the integration with TensorFlow ways of storing and logging, we will need to perform a few TensorFlow manipulations outside of GPflow as well.

We start by creating the `global_step` variable. This is not strictly required by TensorFlow optimisers, but they do all have support for it. Its purpose is to track how many optimisation steps have occurred. It is useful to keep this in a TensorFlow variable as this allows it to be restored together with all the parameters of the model.

In [3]:
global_step = tf.Variable(0, trainable=False, name="global_step")
m.session.run(global_step.initializer)

Next, we create an instance of `FileWriter`, which will save the TensorBoard logs to a file. This object needs to be shared between all `gpflow_monitor.TensorBoard` objects, if they are to write to the same path.

In [4]:
fw = tf.summary.FileWriter(os.path.join("./results/test/tensorboard/"), m.session.graph)

Now the TensorFlow side is set up, we can focus on the `gpflow_monitor` part. The optimsation is taken care of by the `ManagedOptimisation` class. This will run the training loop. The `ManagedOptimisation` object will also take care of running `Task`s.

Each `Task` is something that needs to be run periodically during the optimisation. The first and second parameters of all tasks are a generator returning times (either in iterations or time) of when the `Task` needs to be run. The second determines whether a number of iterations (`Trigger.ITER`), an amount of time spent optimising (`Trigger.OPTIMISATION_TIME`), or the wall-clock time (`Trigger.TOTAL_TIME`) triggers the `Task` to be run. The following `Task`s are run once in every 100 or 1000 iterations.

In [5]:
opt_method = ManagedOptimisation(m, gpflow.train.AdamOptimizer(0.01), global_step)
opt_method.tasks += [
    PrintTimings((x * 100 for x in itertools.count()), Trigger.ITER),
    ModelTensorBoard((x * 100 for x in itertools.count()), Trigger.ITER, m, fw),
    LmlTensorBoard((x * 1000 for x in itertools.count()), Trigger.ITER, m, fw, verbose=False),
    StoreSession((x * 1000 for x in itertools.count()), Trigger.ITER, m.session, "./results/test/checkpoint")
]

INFO:tensorflow:Summary name full lml is illegal; using full_lml instead.
Restoring session from `./results/test/checkpoint-8000`.
INFO:tensorflow:Restoring parameters from ./results/test/checkpoint-8000


We may also want to perfom certain tasks that do not have pre-defined `Task` classes. For example, computing the performance on a test set. Here we create such a class by extending `ModelTensorBoard` to log the testing benchmarks in addition to all the scalar parameters.

In [6]:
class TestTensorBoard(ModelTensorBoard):
    def __init__(self, sequence, trigger: Trigger, model, file_writer, Xt, Yt):
        super().__init__(sequence, trigger, model, file_writer)
        self.Xt = Xt
        self.Yt = Yt
        self._full_test_err = tf.placeholder(gpflow.settings.tf_float, shape=())
        self._full_test_nlpp = tf.placeholder(gpflow.settings.tf_float, shape=())

        self.summary = tf.summary.merge([tf.summary.scalar("test_rmse", self._full_test_err),
                                         tf.summary.scalar("test_nlpp", self._full_test_nlpp)])

    def _event_handler(self, manager):
        minibatch_size = 100
        preds = np.vstack([m.predict_y(Xt[mb * minibatch_size:(mb + 1) * minibatch_size, :])[0]
                            for mb in range(-(-len(Xt) // minibatch_size))])
        test_err = np.mean((Yt - preds) ** 2.0)**0.5
        summary, step = m.session.run([self.summary, global_step],
                                      feed_dict={self._full_test_err: test_err,
                                                 self._full_test_nlpp: 0.0})
        self.file_writer.add_summary(summary, step)

We then add it to the task list.

In [7]:
opt_method.tasks.append(TestTensorBoard((x * 1000 for x in itertools.count()), Trigger.ITER, m, fw, Xt, Yt))

## Running the optimisation
We finally get to running the optimisation. The second time this is run, the session should be restored from a checkpoint created by `StoreSession`. To confirm this, we print out the first value in all TensorFlow tensors. This includes any values used by the optimiser. This is important to ensure that the optimiser starts off from _exactly_ the same state as that it left. If this is not done correctly, models may start diverging after loading.

In [8]:
[u[1] if isinstance(u[1], numbers.Number) else u[1].flatten()[0]  for u in sorted([(v.name, m.session.run(v)) for v in tf.global_variables()], key=lambda x: x[0])]

[61.814632605500641,
 80856.247939576046,
 0.60027241991117597,
 -295.16480320215925,
 482806.80551486911,
 2.3075532340700646,
 20.572823690039474,
 520599.54765632085,
 0.68340141488715167,
 -29.036712765609494,
 18830789.488434911,
 0.5511608304354263,
 -46.192833857302041,
 405246.75582089456,
 0.61505779513667536,
 -4.6592879752311243,
 74518.754570049394,
 0.025340512427143348,
 0.0,
 0.0003338237,
 8000]

In [9]:
opt_method.minimize(maxiter=8000)

1, 8001:	3.04 optimisation iter/s	3.04 total iter/s	0.00 last iter/sFull lml: -14221.326103 (-1.42e+04)
1000, 9000:	745.32 optimisation iter/s	537.54 total iter/s	1035.15 last iter/sFull lml: -14221.629262 (-1.42e+04)
2000, 10000:	858.09 optimisation iter/s	642.41 total iter/s	1004.25 last iter/sFull lml: -14213.064519 (-1.42e+04)
3000, 11000:	894.36 optimisation iter/s	682.00 total iter/s	894.45 last iter/sFull lml: -14223.946411 (-1.42e+04)
4000, 12000:	912.18 optimisation iter/s	702.84 total iter/s	925.44 last iter/sFull lml: -14212.878131 (-1.42e+04)
5000, 13000:	922.36 optimisation iter/s	714.18 total iter/s	915.71 last iter/sFull lml: -14214.866895 (-1.42e+04)
6000, 14000:	933.23 optimisation iter/s	725.45 total iter/s	949.40 last iter/sFull lml: -14212.362766 (-1.42e+04)
7000, 15000:	939.78 optimisation iter/s	732.63 total iter/s	921.51 last iter/sFull lml: -14207.748967 (-1.42e+04)
8000, 16000:	947.47 optimisation iter/s	740.23 total iter/s	990.15 last iter/sFull lml: -14223.25

Here, we print the optimised variables for comparison on the next run.

In [10]:
[u[1] if isinstance(u[1], numbers.Number) else u[1].flatten()[0]  for u in sorted([(v.name, m.session.run(v)) for v in tf.global_variables()], key=lambda x: x[0])]

[61.814632605500641,
 80856.247939576046,
 0.60027241991117597,
 -295.16480320215925,
 482806.80551486911,
 2.3075532340700646,
 20.572823690039474,
 520599.54765632085,
 0.68340141488715167,
 -29.036712765609494,
 18830789.488434911,
 0.5511608304354263,
 -46.192833857302041,
 405246.75582089456,
 0.61505779513667536,
 -4.6592879752311243,
 74518.754570049394,
 0.025340512427143348,
 0.0,
 0.0003338237,
 8000]