# Learning Phase Features with neural networks

In this tutorial, we will demonstrate the use of learnt phase features in distribution regression (for the aerosol dataset), in comparison with the learnt fourier features using a neural network approach, in a situation of covariate shift. Parameters previously chosen with cross-validation.

In [1]:
from __future__ import print_function

import numpy as np

import aux_fct
import phase_fourier_dr_nn

Load data and split data into train and test set, as well as initiliase frequnecies at the right scales.

In [2]:
path = '/Users/Leon1/Desktop/Fourier-Phase-Neural-Network' #Change path 
# Load Dataset into features and labels
misr_data_x, misr_data_y = aux_fct.load_data(path, random = True)
variance = np.var(np.concatenate(misr_data_x), axis = 0) # For calculating signal to noise ratio

x_train = misr_data_x[:640]
y_train = misr_data_y[:640]
x_test = misr_data_x[640:]
y_test = misr_data_y[640:]

bandwidth = np.sqrt(aux_fct.median_sqdist(x_train) / 2) # Calculate bandwidth
init_sd = 1.0/bandwidth
baseline = np.mean(np.square(y_test - np.mean(y_train)))

Network Parameters:

In [3]:
learning_rate = 0.3
reg_1 = 10.0  # L2 Regularisation for frequencies layer 
reg_2 = 0.01 # L2 Regularisation for output layer
n_freq = 60 # Number of frequencies to use
batch_size = 20
no_epochs = 100
n_cpu = 1 # Number of CPUs available

Run Fourier Neural Network.

In [4]:
fourier_accuracy = phase_fourier_dr_nn.phase_fourier_nn(x_train, y_train, x_test, y_test, n_freq, learning_rate, reg_1, reg_2, batch_size, no_epochs, 'Fourier', init_sd, n_cpu)

('Epoch:', '0005', 'cost=', '554.739651')
('Epoch:', '0010', 'cost=', '12.8426197')
('Epoch:', '0015', 'cost=', '2.66595643')
('Epoch:', '0020', 'cost=', '1.09130356')
('Epoch:', '0025', 'cost=', '0.39939132')
('Epoch:', '0030', 'cost=', '0.153120656')
('Epoch:', '0035', 'cost=', '0.0912785875')
('Epoch:', '0040', 'cost=', '0.0484081809')
('Epoch:', '0045', 'cost=', '0.0256928517')
('Epoch:', '0050', 'cost=', '0.0162742692')
('Epoch:', '0055', 'cost=', '0.0193037217')
('Epoch:', '0060', 'cost=', '0.0169599625')
('Epoch:', '0065', 'cost=', '0.0132669617')
('Epoch:', '0070', 'cost=', '0.0130852022')
('Epoch:', '0075', 'cost=', '0.0124178923')
('Epoch:', '0080', 'cost=', '0.0130633435')
('Epoch:', '0085', 'cost=', '0.013719943')
('Epoch:', '0090', 'cost=', '0.0142125989')
('Epoch:', '0095', 'cost=', '0.015813382')
('Epoch:', '0100', 'cost=', '0.0172238293')


Run Phase Neural Network.

In [5]:
phase_accuracy = phase_fourier_dr_nn.phase_fourier_nn(x_train, y_train, x_test, y_test, n_freq, learning_rate, reg_1, reg_2, batch_size, no_epochs, 'Phase', init_sd, n_cpu)

('Epoch:', '0005', 'cost=', '1455.6421')
('Epoch:', '0010', 'cost=', '128.537295')
('Epoch:', '0015', 'cost=', '18.2254402')
('Epoch:', '0020', 'cost=', '3.61193501')
('Epoch:', '0025', 'cost=', '1.1174812')
('Epoch:', '0030', 'cost=', '0.429399093')
('Epoch:', '0035', 'cost=', '0.225146125')
('Epoch:', '0040', 'cost=', '0.1329212')
('Epoch:', '0045', 'cost=', '0.0996745143')
('Epoch:', '0050', 'cost=', '0.0825198765')
('Epoch:', '0055', 'cost=', '0.0755586256')
('Epoch:', '0060', 'cost=', '0.0704775527')
('Epoch:', '0065', 'cost=', '0.0694013226')
('Epoch:', '0070', 'cost=', '0.0689869891')
('Epoch:', '0075', 'cost=', '0.068671187')
('Epoch:', '0080', 'cost=', '0.0667706945')
('Epoch:', '0085', 'cost=', '0.0672347206')
('Epoch:', '0090', 'cost=', '0.0681778083')
('Epoch:', '0095', 'cost=', '0.0686174606')
('Epoch:', '0100', 'cost=', '0.067340245')


Results, here the neural network approach performs slighlty worse than that of the ridge regression, likely due to the small dataset we have, and also training maybe difficult.

In [6]:
print('Mean RMSE', np.sqrt(baseline))
print('Fourier Features NN RMSE', np.sqrt(fourier_accuracy))
print('Phase Features NN RMSE', np.sqrt(phase_accuracy))

Mean RMSE 0.162785267464
Fourier Features NN RMSE 0.0914625
Phase Features NN RMSE 0.0872548


Now we add noise to the test set __only__ in a scenario of covariate shift. 

In [None]:
noise_ratio_t = 1.0 # This is a super high noise to signal setting.
test_x_noisy = np.zeros((160, 100, 16))
latent_t = noise_ratio_t * variance * np.random.uniform(low = 0.0, high = 1.0, size = (160, 1, 16))
for i in range(160):
    for k in range(16):
        test_x_noisy[i,:,k] = x_test[i,:,k] + np.sqrt(latent_t[i,:,k]) * np.random.normal(size = (100))

In [None]:
print('Fourier training')
fourier_noise_accuracy = phase_fourier_dr_nn.phase_fourier_nn(x_train, y_train, test_x_noisy, y_test, n_freq, learning_rate, reg_1, reg_2, batch_size, no_epochs, 'Fourier', init_sd, n_cpu)
print('Phase training')
phase_noise_accuracy = phase_fourier_dr_nn.phase_fourier_nn(x_train, y_train, test_x_noisy, y_test, n_freq, learning_rate, reg_1, reg_2, batch_size, no_epochs, 'Phase', init_sd, n_cpu)

Fourier training
('Epoch:', '0005', 'cost=', '104.804688')
('Epoch:', '0010', 'cost=', '3.09828884')
('Epoch:', '0015', 'cost=', '0.602490444')
('Epoch:', '0020', 'cost=', '0.30521829')
('Epoch:', '0025', 'cost=', '0.207218098')
('Epoch:', '0030', 'cost=', '0.192395864')
('Epoch:', '0035', 'cost=', '0.105905397')
('Epoch:', '0040', 'cost=', '0.0464760157')
('Epoch:', '0045', 'cost=', '0.0375325573')
('Epoch:', '0050', 'cost=', '0.0278830222')
('Epoch:', '0055', 'cost=', '0.0169325576')
('Epoch:', '0060', 'cost=', '0.0150145526')
('Epoch:', '0065', 'cost=', '0.0174051884')
('Epoch:', '0070', 'cost=', '0.0177815692')
('Epoch:', '0075', 'cost=', '0.0171603801')
('Epoch:', '0080', 'cost=', '0.0209043248')
('Epoch:', '0085', 'cost=', '0.0192830533')
('Epoch:', '0090', 'cost=', '0.0207836285')
('Epoch:', '0095', 'cost=', '0.0194874265')
('Epoch:', '0100', 'cost=', '0.0178608833')
Phase training
('Epoch:', '0005', 'cost=', '1078.44071')
('Epoch:', '0010', 'cost=', '95.3693738')
('Epoch:', '00

In [None]:

print('Mean RMSE:', np.sqrt(baseline))
print('Phase Features NN RMSE', np.sqrt(phase_accuracy))
print('Phase Features NN Noisy RMSE', np.sqrt(phase_noise_accuracy))
print('\nFOURIER')
print('Fourier Features NN RMSE', np.sqrt(fourier_accuracy))
print('Fourier Features NN Noisy RMSE', np.sqrt(fourier_noise_accuracy))


Note that the learnt frequncies in the Fourier Features seems also to be invariant to noise, this is because the frequencies learnt give rise to fourier features that have close to unit L2 norm, this is shown in https://arxiv.org/abs/1703.07596, this indicates that there might be potential SPD noise on the original aerosol dataset. Note also that in comparison to the ridge regression, it seems that learning frequencies is important in a scenairo of large covariate shift.