This notebook shows how to use the general interface in HiGP for regression problems.

In this test, we will use the 3D Road dataset at "https://archive.ics.uci.edu/ml/machine-learning-databases/00246/3D_spatial_network.txt". 

We random sample 30000 points for training and use 100 points for testing for demonstration.

In [1]:
import higp
import numpy as np
import pandas as pd
import math
import matplotlib.pyplot as plt
import torch
%matplotlib inline

Since this is a large (> 20000 training points) and low-dimensional (2D or 3D) data set, HiGP can use the $\mathcal{H}^2$ matrix for faster calculation. The $\mathcal{H}^2$ matrix requires a higher working precision, so we use float64 instead of float32 in this example.

In [2]:
np_dtype = np.float64

Download the dataset, Scale features to $[-1, 1]$ and normalize labels.

In [3]:
df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00246/3D_spatial_network.txt", sep=',', header=None)
data_array = df.values[:, 1:]
all_x = data_array[:, 1:3]
all_y = data_array[:, -1]

# Scale features and normalize labels
all_x_max = np.max(all_x, 0)
all_x_min = np.min(all_x, 0)
all_x = 2.0 * (all_x - all_x_min[np.newaxis, :]) / (all_x_max[np.newaxis, :] - all_x_min[np.newaxis, :]) - 1.0
all_y = (all_y - np.mean(all_y)) / np.std(all_y)

# Randomly select a subset of the data
n_all = all_x.shape[0]
n_train = 30000
n_test = 100
n_sample = n_train + n_test

sample_array = np.random.choice(n_all, n_sample, replace = False)

train_x = np.ascontiguousarray(all_x[sample_array[:n_train], :].T).astype(np_dtype)
train_y = np.ascontiguousarray(all_y[sample_array[:n_train]]).astype(np_dtype)
test_x = np.ascontiguousarray(all_x[sample_array[n_train:], :].T).astype(np_dtype)
test_y = np.ascontiguousarray(all_y[sample_array[n_train:]]).astype(np_dtype)

Remember to use the `ascontiguousarray()` method in NumPy to guarantee that `train_x, train_y, test_x, test_y` are stored contiguously.

Now let's check the shapes of these four arrays. We can see that each data point is stored in one column in `train_x` and `test_x`.

In [4]:
print("Shape of train_x : ", train_x.shape)
print("Shape of train_y : ", train_y.shape)
print("Shape of test_x : ", test_x.shape)
print("Shape of test_y : ", test_y.shape)

Shape of train_x :  (2, 30000)
Shape of train_y :  (30000,)
Shape of test_x :  (2, 100)
Shape of test_y :  (100,)


Create a GP regression problem model and a PyTorch Adam optimizer. 

By default, HiGP uses the $\mathcal{H}^2$ matrix if possible (`mvtype = 0`). If you want to disable the use of the $\mathcal{H}^2$ matrix, set `mvtype = 1`. For more information about the parameter `mvtype`, please refer to the user manual. 

In [5]:
torch_dtype = torch.float32 if np_dtype == np.float32 else torch.float64
gprproblem = higp.gprproblem.setup(data = train_x, label = train_y, kernel_type = 1, mvtype = 0)
model = higp.GPRModel(gprproblem, dtype = torch_dtype)
optimizer = torch.optim.Adam(model.parameters(), lr = 0.05)

Run 20 steps of Adam.

In [6]:
loss_history, param_histpry = higp.gpr_torch_minimize(model, optimizer, maxits = 20, print_info = True)

Iteration (max 20), Elapsed time (sec), Loss, Hyperparameters (l, s, f, before nnt)
1, 4.91, 0.73834, 0.050, -0.050, 0.050
2, 9.73, 0.72011, 0.100, -0.100, 0.100
3, 14.51, 0.70171, 0.149, -0.150, 0.150
4, 19.19, 0.68327, 0.198, -0.198, 0.199
5, 23.87, 0.66518, 0.246, -0.236, 0.248
6, 28.55, 0.65058, 0.292, -0.278, 0.297
7, 33.30, 0.63468, 0.338, -0.318, 0.345
8, 37.95, 0.61919, 0.382, -0.351, 0.393
9, 42.80, 0.60648, 0.425, -0.377, 0.441
10, 47.70, 0.59638, 0.466, -0.401, 0.488
11, 52.59, 0.58670, 0.506, -0.427, 0.534
12, 57.16, 0.57644, 0.544, -0.454, 0.580
13, 62.03, 0.56583, 0.580, -0.481, 0.625
14, 66.82, 0.55535, 0.615, -0.507, 0.669
15, 71.64, 0.54492, 0.648, -0.534, 0.713
16, 76.44, 0.53407, 0.679, -0.562, 0.756
17, 81.37, 0.52286, 0.709, -0.593, 0.799
18, 86.08, 0.51017, 0.738, -0.625, 0.840
19, 90.77, 0.49736, 0.765, -0.658, 0.881
20, 95.63, 0.48369, 0.792, -0.692, 0.922


Run predictions with the initial parameters and the trained parameters.

In [7]:
Pred0 = higp.gpr_prediction(data_train = train_x,
                            label_train = train_y,
                            data_prediction = test_x,
                            kernel_type = 1,
                            pyparams = np.hstack((0.0, 0.0, 0.0)))

Pred = higp.gpr_prediction(data_train = train_x,
                           label_train = train_y,
                           data_prediction = test_x,
                           kernel_type = 1,
                           pyparams = model.get_params())

Finally, let us check the root mean squared error (RMSE) of the predition.

In [8]:
rmse0 = np.linalg.norm(Pred0[0] - test_y) / np.sqrt(float(n_sample-n_train))
rmse = np.linalg.norm(Pred[0] - test_y) / np.sqrt(float(n_sample-n_train))
print("RMSE (before training): %g, RMSE (after training): %g\n" % (rmse0, rmse))

RMSE (before training): 0.0178134, RMSE (after training): 0.00791506

