## Overview

Fit Bayesian Gaussian process models to different data sets with various tolerance thresholds. The data sets are described in

*Schabenberger Oliver, Pierce Fran.*
Contemporary Statistical Models for the
Plant and Soil Science. 11 2001. 738.

and

Pebesma Edzer J., Bivand Roger S. Classes and methods for spatial data in R // R News. November 2005. 5, 2. 9–13.

## Load Data Set

In [1]:
import numpy as np
import time
import scipy
from bbai.gp import BayesianGaussianProcessRegression, Power1CovarianceFunction
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import rdata
import requests
import tempfile
from collections import defaultdict

In [2]:
def read_file(url):
    tfile = tempfile.NamedTemporaryFile()
    tfile.write(requests.get(url).content)
    parsed = rdata.parser.parse_file(tfile.name)
    return rdata.conversion.convert(parsed)

In [3]:
datasets = {}

# Add soil data
from dataset.soil_cn.soil_cn_dataset import X, y
n = len(y)
Z = np.array(X)
X = np.ones((n, 1))
datasets['soil'] = (Z, X, y)

# Add meuse data
url = 'https://github.com/edzer/sp/raw/main/data/meuse.rda'
df = read_file(url)['meuse']
df = df[['x', 'y', 'dist', 'zinc']]
Z = np.array(df.iloc[:, :2]) / 1.0e3
y = np.log(np.array(df.iloc[:, -1]))
dist = df.iloc[:, 2:3]
X = np.hstack((np.ones((len(Z), 1)), np.sqrt(dist)))
datasets['meuse'] = (Z, X, y)

## Fit Model

In [4]:
fit_results = defaultdict(list)
for (name, (Z, X, y)) in datasets.items():
    for tol in [1.0e-2, 1.0e-3, 1.0e-4, 1.0e-5, 1.0e-6]:
        t1 = time.time()
        model = BayesianGaussianProcessRegression(
            kernel=Power1CovarianceFunction(), tolerance=tol)
        model.fit(Z, y, X)
        elapse = time.time() - t1
        fit_results[name].append((model, tol, elapse))
        print(name, tol, elapse)

soil 0.01 1.3517811298370361
soil 0.001 1.1853382587432861
soil 0.0001 2.923412799835205
soil 1e-05 6.77296781539917
soil 1e-06 10.154182195663452
meuse 0.01 0.6730570793151855
meuse 0.001 1.1384694576263428
meuse 0.0001 2.5448386669158936
meuse 1e-05 4.807579040527344
meuse 1e-06 17.95725107192993


In [5]:
for name, results in fit_results.items():
    for model, tol, elapse in results:
        dists = [
            ('length', model.marginal_length_),
            ('noise_ratio', model.marginal_noise_ratio_),
            ('sigma2_signal', model.marginal_sigma2_signal_),
        ]
        print('****', name, tol, elapse, model.hyperparameter_matrix_.shape[1])
        for dist_name, dist in dists:
            print(dist_name, dist.ppf(0.25), dist.ppf(0.5), dist.ppf(0.75))

**** soil 0.01 1.3517811298370361 249
length 41.998093253345615 63.50132596095646 106.92075333884189
noise_ratio 0.3149327989356085 0.4454634342072269 0.5974938744494842
sigma2_signal 0.19873236963506982 0.24525395532228778 0.3140322430392845
**** soil 0.001 1.1853382587432861 252
length 42.444538163517805 63.14054885630358 105.13180234376262
noise_ratio 0.31537265767185574 0.4446500087580262 0.6011890086055984
sigma2_signal 0.1987245548748764 0.24523423729608484 0.31397788311776165
**** soil 0.0001 2.923412799835205 798
length 42.88314552314327 62.54595619234351 104.5599586217665
noise_ratio 0.3162247107629836 0.44278452724756057 0.6064441029575582
sigma2_signal 0.197972176989152 0.24538997805604332 0.316094663971309
**** soil 1e-05 6.77296781539917 2218
length 42.882764814085505 62.544236520569335 104.56607630933216
noise_ratio 0.31622604100567675 0.44278515712601646 0.6064469957941565
sigma2_signal 0.19794915138184405 0.2452509104807488 0.31673048565073836
**** soil 1e-06 10.1541821