# Phase Features with ridge regression

In this tutorial, we will demonstrate the use of phase features in distribution regression (for the aerosol dataset), in comparison with the normal Gaussian Kernel (Fourier Features approach), and in a scenario of covariate shift. Parameters were chosen previously with cross validation for this particular set of data and frequencies, through cross validation.

For this tutorial, we will make use of the kerpy package from https://github.com/oxmlcs/kerpy.

This is tested on Python 2.7.

In [1020]:
from __future__ import print_function
# Make sure these are on path
import numpy as np
from kerpy.GaussianKernel import GaussianKernel
from kerpy.LinearKernel import LinearKernel
from kerpy.LinearBagKernel import LinearBagKernel
from sklearn import preprocessing

from SymInvBagKernel import SymInvBagKernel
import aux_fct

Specify path for data and load the aerosol dataset (MISR1). One can also normalise data (and normalise labels), though results does not seem to be sensitive to this.

In [1021]:
path = '/Users/hochunglaw/Desktop/Fourier-Phase-Neural-Network/data' #Change path 
misr_data_x, misr_data_y = aux_fct.load_data(path, random = True) 
print('Data shape:', misr_data_x.shape)
# Train Set
train_x = misr_data_x[:640]
train_y = misr_data_y[:640]
# Test Set
test_x = misr_data_x[640:]
test_y = misr_data_y[640:]
variance = np.var(np.concatenate(misr_data_x), axis = 0) # For calculating signal to noise ratio
data_dim = misr_data_x.shape[2]
mean_predict_t = np.linalg.norm(test_y - np.mean(train_y))**2/len(test_y) #Mean prediction

Data shape: (800, 100, 16)


Calculate the bandwidth for phase features kernel,(note that a larger scale will give you a more robust model to covariate shift, so when tuning, one should choose the larger scale within an acceptable range of error), here we use the same bandwidth + reg parameter for both, so that it is a more fair comparison. Also in general, we expect phase to perform better on smaller dimensional dataset and note that it is more sensitive to the choice of frequencies, relative to fourier.

In [1022]:
bandwidth_scale = 2.0
bandwidth = np.sqrt(aux_fct.median_sqdist(train_x) / 2) * bandwidth_scale # Calculate bandwidth
print('Bandwidth:', bandwidth)

Bandwidth: 592.091937266


Setup the kernel using the kerpy kernel class structure:

In [1023]:
data_phase_kernel = GaussianKernel(sigma=float(bandwidth))
phase_kernel = SymInvBagKernel(data_phase_kernel) #Defines the phase features kernel, normalising the fourier features
np.random.seed(23)
#Generate the frequencies from a normal distribution using the bandwidth, frequencies printed and same as below.
phase_kernel.rff_generate(mdata= 250, dim= data_dim)

[[  1.12649407e-03   4.35964070e-05  -1.31334234e-03 ...,   1.01914805e-04
   -1.76231078e-03  -1.70571800e-03]
 [  7.46060442e-04   1.90659048e-03  -3.10436210e-03 ...,  -2.06598789e-03
    2.39037199e-03   7.73040386e-04]
 [  1.23101802e-03   3.32454237e-03  -9.25173902e-04 ...,  -7.67407436e-05
    1.75798032e-03  -1.58817793e-04]
 ..., 
 [ -9.96534918e-05   1.52940940e-04  -3.30935054e-03 ...,  -3.56494600e-04
   -3.47245176e-03   1.71365775e-03]
 [ -4.21688813e-04  -1.98924139e-03   4.68809121e-04 ...,   1.75583922e-03
    1.72603469e-03   2.85188958e-03]
 [  4.57280433e-04   3.07467721e-03  -5.81815095e-04 ...,   2.64615130e-05
    2.90672289e-04   1.75453416e-04]]


Construct explicit feature maps for bags. 

In [1024]:
train_phase_means = phase_kernel.rff_expand(train_x)
test_phase_means = phase_kernel.rff_expand(test_x)

By constructing an explicit feature map for each bag as below, we can now use any normal regression methods on $\mathbb{R}^{16}$, here we use ridge regression (with a linear kernel).

In [1025]:
lmbda = 0.001 #Ridge regularisation parameter
ridge_kernel = LinearKernel()
obj_phase, t_predict, t_phase_mse = ridge_kernel.ridge_regress(train_phase_means, train_y, lmbda, test_phase_means, test_y)
print('Mean RMSE:', np.sqrt(mean_predict_t))
print('Phase RMSE:', np.sqrt(t_phase_mse))

Mean RMSE: 0.162785267464
Phase RMSE: 0.0693909669971


Now we do the same for the Gaussian Kernel, using same set of frequencies. Note the LinearKernel() of the data_kernel provides 

In [1026]:
#This computes the mean embedding (average) of the explicit feature map of the data_kernel (Gaussian), 
#effectively providing us a gaussian bag kernel. 
data_gauss_kernel = GaussianKernel(sigma=float(bandwidth))
gauss_kernel = LinearBagKernel(data_gauss_kernel)
np.random.seed(23)
gauss_kernel.rff_generate(mdata= 250, dim= data_dim)
train_gauss_means = gauss_kernel.rff_expand(train_x)
test_gauss_means = gauss_kernel.rff_expand(test_x)

lmbda = 0.001 #Ridge regularisation parameter
ridge_kernel = LinearKernel()
obj_gauss, t_predict, t_gauss_mse = ridge_kernel.ridge_regress(train_gauss_means, train_y, lmbda, test_gauss_means, test_y)
print('Mean RMSE:', np.sqrt(mean_predict_t))
print('Fourier (Gaussian) RMSE:', np.sqrt(t_gauss_mse))


[[  1.12649407e-03   4.35964070e-05  -1.31334234e-03 ...,   1.01914805e-04
   -1.76231078e-03  -1.70571800e-03]
 [  7.46060442e-04   1.90659048e-03  -3.10436210e-03 ...,  -2.06598789e-03
    2.39037199e-03   7.73040386e-04]
 [  1.23101802e-03   3.32454237e-03  -9.25173902e-04 ...,  -7.67407436e-05
    1.75798032e-03  -1.58817793e-04]
 ..., 
 [ -9.96534918e-05   1.52940940e-04  -3.30935054e-03 ...,  -3.56494600e-04
   -3.47245176e-03   1.71365775e-03]
 [ -4.21688813e-04  -1.98924139e-03   4.68809121e-04 ...,   1.75583922e-03
    1.72603469e-03   2.85188958e-03]
 [  4.57280433e-04   3.07467721e-03  -5.81815095e-04 ...,   2.64615130e-05
    2.90672289e-04   1.75453416e-04]]
Mean RMSE: 0.162785267464
Fourier (Gaussian) RMSE: 0.0749136861309


We see that the Fourier features (Gaussian Kernel on Bags) and Phase Features perfrom similarly in this case. Now we demonstrate that phase features are invariant up to Symmetric Positive Definite (SPD) noise, by artficially creating a scenario of covariate shift, where we add __only__ noise to the test set.  

In [1027]:
noise_ratio_t = 3.0 # This is a super high noise to signal setting.
test_x_noisy = np.zeros((160, 100, 16))
latent_t = noise_ratio_t * variance * np.random.uniform(low = 0.0, high = 1.0, size = (160, 1, 16))
for i in range(160):
    for k in range(16):
        test_x_noisy[i,:,k] = test_x[i,:,k] + np.sqrt(latent_t[i,:,k]) * np.random.normal(size = (100))

Observe that the features are very different (for the first sample in first bag).

In [1028]:
print('Original: First sample in first bag\n',test_x[0][0])
print('Noisy: First sample in first bag\n', test_x_noisy[0][0])

Original: First sample in first bag
 [ 372.33903211  273.97752857  222.03223966  206.53149968  347.60454266
  265.22960477  237.22911295  193.27649299  421.43038523  274.16365461
  232.19638219  194.04713291   21.39239526   38.21109247  116.34367275
  130.11828423]
Noisy: First sample in first bag
 [ 298.9780435   249.19810836  192.99434325  234.32955854  331.12292451
   71.75804117  253.09135085  340.17637478  332.77734323  279.17942752
  133.95235616  119.59060674   51.07726173   11.37218012  122.29624417
  126.7050332 ]


Obtain RMSE on the noisy test set using non-noisy training set.

In [1029]:
test_noisy_gauss_means = gauss_kernel.rff_expand(test_x_noisy)
test_noisy_phase_means = phase_kernel.rff_expand(test_x_noisy)

ridge_kernel = LinearKernel()
t_noisy_phase_predict = np.dot(obj_phase.T,ridge_kernel.kernel(train_phase_means,test_noisy_phase_means))
t_noisy_gauss_predict = np.dot(obj_gauss.T,ridge_kernel.kernel(train_gauss_means,test_noisy_gauss_means))

print('Mean RMSE:', np.sqrt(mean_predict_t))
print('\nPHASE')
print('Phase RMSE:', np.sqrt(t_phase_mse))
print('Noisy Phase RMSE:', np.sqrt((np.linalg.norm(t_noisy_phase_predict.T - test_y)**2)/len(test_y)))
print('\nFOURIER')
print('Fourier (Gaussian) RMSE:', np.sqrt(t_gauss_mse))
print('Noisy Fourier (Gaussian) RMSE:', np.sqrt((np.linalg.norm(t_noisy_gauss_predict.T - test_y)**2)/len(test_y)))


Mean RMSE: 0.162785267464

PHASE
Phase RMSE: 0.0693909669971
Noisy Phase RMSE: 0.119631934894

FOURIER
Fourier (Gaussian) RMSE: 0.0749136861309
Noisy Fourier (Gaussian) RMSE: 0.142921231861


We see that indeed the Phase features, which are invariant to SPD noise (Gaussian Noise in this case) performs much better.