In [None]:
from geostat import GP, NormalizingFeaturizer
import matplotlib.pyplot as pp
import numpy as np

# Overview

In this notebook we will:
  * Use a Gaussian process to generate synthetic data with known geospatial parameters.
  * Use a second Gaussian process to infer the geospatial parameters from the synthetic data.
  * Use the second Gaussian process with fitted geospatial parameters to interpolate locations on a mesh.

# Synthesizing data

We will synthesize data at mesh locations in a square centered on the origin.

First define mesh locations:

In [None]:
N = 81
meshx, meshy = np.meshgrid(np.linspace(-1, 1, N), np.linspace(-1, 1, N))
mesh_locs = np.stack([meshx, meshy], axis=-1)

In [None]:
locs = np.random.uniform(-1., 1., [2500, 2])

Declare the terms of the spatial trend:

In [None]:
def trend_terms(x, y): return x, y, x*x, x*y, y*y

Create a featurizer that the Gaussian process class `GP` will use to convert locations into trend features:

In [None]:
featurizer = NormalizingFeaturizer(trend_terms, locs)

Instantiate a `GP` and immediately call `generate` to generate synthetic observations.
  * `parameter0` holds the geostatistical parameters for the `GP`.
  * `alpha` parameterizes the normal distribution prior for trend coefficients. 

In [None]:
mesh_obs = GP(featurizer = featurizer,
         covariance_func = 'squared-exp',
         parameters = dict(range=0.33, sill=1., nugget=0.25),
         hyperparameters = dict(alpha=0.2),
         verbose=True).generate(mesh_locs)

When the data is plotted, you can see an overall trend with some localized variations.

In [None]:
vmin, vmax = mesh_obs.min(), mesh_obs.max()
c = pp.pcolormesh(meshx, meshy, mesh_obs, vmin=vmin, vmax=vmax)
pp.colorbar(c)
pp.title('Synthetic data')
pp.show()

Of these synthetic datapoints we'll sample just 200, with which we'll try to reconstruct the rest of the data.

In [None]:
sample_indices = np.random.choice(range(N*N), [200], replace=False)
locs = mesh_locs.reshape([-1, 2])[sample_indices, :]
obs = mesh_obs.ravel()[sample_indices]

c = pp.scatter(locs[:, 0], locs[:, 1], c=obs, vmin=vmin, vmax=vmax)
pp.colorbar(c)
pp.title('Synthetic observations')
pp.show()

# Inferring parameters

Now we create a second `GP`. This time we pass in the data (`locs` and `obs`) and it will fit the geospatial parameters to the data. Here `parameters` holds initial geospatial parameters, which are different that those in the first `GP`, but after fitting they converge to something close.

In [None]:
gp = GP(featurizer = featurizer,
        covariance_func = 'squared-exp',
        parameters = dict(range=1.0, sill=0.5, nugget=0.5),
        hyperparameters = dict(alpha=obs.ptp()**2, reg=0, train_iters=300),
        verbose=True).fit(locs, obs)

# Generating predictions

Call `GP` to get predictions at the same mesh locations as before:

In [None]:
mean, var = gp.predict(locs, obs, mesh_locs)

In [None]:
c = pp.pcolormesh(meshx, meshy, mean, vmin=vmin, vmax=vmax)
pp.colorbar(c)
pp.title('Prediction mean')
pp.show()

For comparison, here's the original synthetic data:

In [None]:
c = pp.pcolormesh(meshx, meshy, mesh_obs, vmin=vmin, vmax=vmax)
pp.colorbar(c)
pp.title('Synthetic data')
pp.show()

And here's a plot of prediction variance, which accounts for, among other things, the noise that the model is unable to reconstruct.

In [None]:
c = pp.pcolormesh(meshx, meshy, var, cmap='gist_heat_r')
pp.colorbar(c)
pp.title('Prediction variance')
pp.show()