# Selecting Hyperparameters

Here we select the hyperparameters which provide the lowest RMSE in a x-validation test using the DP exponential mechanism.

**! The results will probably improve with more iterations and better parameters for the x-validation, etc!**

Uses the kung dataset: Females in the Kung dataset.

In [2]:
import dp4gp_datasets
import dp4gp
import random
import numpy as np
import GPy
import matplotlib.pyplot as plt
import dp4gp_histogram
import pandas as pd
%matplotlib inline

kung = dp4gp_datasets.load_kung()

X = kung[kung[:,3]==0,2:3]
y = kung[kung[:,3]==0,0:1]
sens = 100.0
epsilon = 1.0
delta = 0.01

middley = (np.max(y)+np.min(y))/2
y[y>middley+sens/2] = middley+sens/2
y[y<middley-sens/2] = middley-sens/2

#ysub = np.mean(y)
ysub = (max(y)+min(y))/2.0
y = y - ysub


ys_std = np.std(y)
y = y / ys_std
ac_sens = sens/ys_std




In [25]:
def runxval(X,y,sens,NDPsamples,nblocks,errorlimit,ys_std,kern):
    #sens = how much the data points can change (in the output direction)
    #NDPsamples = number of samples from the DP noise
    #errorlimit = maximum value the y-f error can be
    
    #awkwardly I've scaled everything by y's standard deviation, so it all needs
    #unscaling by multiplying eveything by ys_std
    blocksize = len(X)/nblocks
    perm = np.random.permutation(len(X))
    block_training_sensitivities = []
    sse = 0
    N = 0
    for block in range(0,len(X),blocksize):
        if block+blocksize>=len(X): #if last block to test
            blocksize = len(X)-block
        testIndices = perm[block:(block+blocksize)]
        Xtest = X[testIndices,:]
        ytest = y[testIndices]
        Xtrain = np.delete(X,testIndices,0)
        ytrain = np.delete(y,testIndices,0)

        
        model = GPy.models.GPRegression(Xtrain,ytrain,kern,normalizer=None)
        model.Gaussian_noise = 0.3
        
        dpgp = dp4gp.DPGP_cloaking(model,sens,epsilon,delta)
        try:
            samps, mu, samp_cov = dpgp.draw_prediction_samples(Xtest,N=NDPsamples,Nattempts=1,Nits=200)
        except TypeError:
            continue #we're just not going to be able to include this in the error sum
        C = dpgp.get_C(Xtest)
        block_training_sensitivities.append(np.max(np.sum((sens * ys_std * C)**2,0)))
        errors = np.array(samps-ytest)
        #we need to bound the size of the error, so we can bound the effect of a perturbation in a test point
        errors[errors>errorlimit] = errorlimit
        errors[errors<-errorlimit] = -errorlimit
        sse += np.sum((errors*ys_std)**2)
        N += errors.size
    #this is a list of the sensitivities each fold incurs due to the possibility the perturbed point is within their
    #training data. I've multiplied it by the number of DP samples, as the SSE adds this sensitivity that many times.
    #so for example :- if two DP samples are generated for the same data, they'll both contain up to this level of
    #sensitivity.
    block_training_sensitivities = np.array(block_training_sensitivities) * NDPsamples
    
    #(y+sens-f)^2 - (y-f)^2 = sens^2 + 2.y.sens - 2.sens.f = sens^2 + 2.sens.(y-f)
    #y-f is constrained to be less than errorlimit
    #so the above is;
    #sens^2 + 2.sens.(y-f) <= sens^2 + 2.sens.errorlimit
    testsens = (sens*ys_std)**2 + 2*sens*errorlimit*ys_std
    testsens *= NDPsamples #as previously a perturbation in a test point will influence NDPsamples.
    return N, sse, block_training_sensitivities, testsens

lengthscales = 3**np.arange(0,7)
RMSEsens = []
RMSE = []

SSEs = []
SSEsens = []

for lengthscale in lengthscales:
    print "========================================================="
    print "========================================================="
    print "============================%04d=========================" % lengthscale
    print "========================================================="
    print "========================================================="
    kern = GPy.kern.RBF(1.0,lengthscale=lengthscale,variance=1.0)
    N, sse, trainsens, testsens = runxval(X,y,ac_sens,10,10,ac_sens*2,ys_std,kern)
    SSEsen = np.sum(np.sort(trainsens)[1:]) + testsens
    RMSEsens.append(np.sqrt(SSEsen/N))
    RMSE.append(np.sqrt(sse/N))
    SSEs.append(sse)
    SSEsens.append(SSEsen)

(28, 259)
*
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Failed to find solution
(28, 259)
*
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (3.8846795501345945, 3.2552472614374586, array([[ 1.]]), 1.0, 0.0)
(28, 259)
*
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Issue: 
We actually want the RMSE (and its sensitivity). How to find the sensitivity of the RMSE, given we have the sensitivity of the SSE? The square root is a bit awkward;

$$\text{RMSE} = \sqrt{\frac{\text{SSE}}{N}}$$

So the sensitivity is;

$$\Delta_\text{RMSE} = \sqrt{\frac{\text{SSE}+\Delta_\text{SSE}}{N}} - \sqrt{\frac{\text{SSE}}{N}}$$

which depends on the SSE, which depends on the data. So we need to put a bound on this.

As the $\text{SSE}$ approaches infinity the difference above approaches zero [prove].

Thus the largest sensitivity is when the $\text{SSE}$ equals zero. The sensitivity of the RMSE will at that point be no greater than $\sqrt{\frac{\Delta_\text{SSE}}{N}}$

**This bound is quite burdensome!**

In [42]:
RMSE = np.array(RMSE)
RMSEsens = np.array(RMSEsens)

In [43]:
exp_sens = np.max(RMSEsens)
eps = 1.0
p = np.exp((eps * -RMSE) / (2*exp_sens))
p = p/np.sum(p)

#### Experimental: using just the SSE!

In [44]:
SSEs = np.array(SSEs)
SSEsens = np.array(SSEsens)

In [52]:
exp_sens = np.max(SSEsens)
eps = 1.0
q = np.exp((eps * -SSEs) / (2*exp_sens))
q = q/np.sum(q)

### Results

The table below shows the probability of selecting each of the lengthscales, with their RMSE.

The two righthand columns are the probability of selecting each of the lengthscales, given either the RMSE utility function or the SSE utility function.

Using the SSE seems better as there's a 50% chance we'll pick either of the best lengthscales (27 or 81 years).

The RMSE is worse I guess because I had to give a poorer bound on the sensitivity in that case.

In [57]:
print "Lengthscale (years)   RMSE    Sens  RMSEProb (%) SSEProb (%)"
np.set_printoptions(precision=3)
np.c_[lengthscales,RMSE,RMSEsens,np.round(p*100),np.round(q*100)]

Lengthscale (years)   RMSE    Sens  RMSEProb (%) SSEProb (%)


array([[   1.   ,   87.168,   16.59 ,    2.   ,    0.   ],
       [   3.   ,   44.506,   13.03 ,    8.   ,    4.   ],
       [   9.   ,   24.404,    9.074,   15.   ,   14.   ],
       [  27.   ,   15.275,    6.88 ,   20.   ,   26.   ],
       [  81.   ,   16.397,    6.588,   20.   ,   24.   ],
       [ 243.   ,   19.715,    6.42 ,   18.   ,   18.   ],
       [ 729.   ,   22.129,    6.448,   16.   ,   14.   ]])

Expected RMSE:

In [58]:
np.sum(p*RMSE)

22.926805069902215

Using the SSE to select:

In [59]:
np.sum(q*RMSE)

19.608320788928985

The expected RMSE (22.7cm) is greater than the optimum choice (15.5cm), but not by too much?
It's even less if we use the SSE as the utility function! (19.6cm).

To improve the accuracy it might be worth excluding options with very high sensitivity? I think this is ok as the sensitivity is only dependent on the inputs. The question is whether this excludes options that might turn out to be the best choices?

The accuracy will probably improve a little by running the DP cloaking for longer (to get better convergence) and using smaller folds in the X-validation, etc.