# 02 - Kernel methods and SVMs
** APSTA - Ecole Centrale Nantes **

** Diana Mateus **



PARTICIPANTS: **(Fill in your names)**

In [None]:

import matplotlib.pyplot as plt
import pandas
import numpy as np
import os

## 1. Kernel Ridge Regression


Ridge regression is an extension to ordinary least squares by adding a regularization
term to the loss function. It is defined as
\begin{equation}
\min_{\mathbf{w}} \sum_{i=1}^n (y_i -  \mathbf{x}_i^T \mathbf{w} )^2
+ \lambda \lVert \mathbf{w} \rVert_2^2 ,
\end{equation}
where the value of $\lambda > 0$ determines the amount of regularization. In this exercise we will rely on the derivation of Support Vector Machines to extend Ridge regression with the ```Kernel Trick```

**a.** Replace $\mathbf{w}$ with $\sum_{i=1}^n \alpha_i \mathbf{x}_i$



**b.** As in support vector machines, we can use the Kernel trick to make ridge regression
non-linear and at the same time avoid explicitly transforming features. Specify $k(\mathbf{x}, \mathbf{x}^\prime) = \phi(\mathbf{x})^T\phi(\mathbf{x}^\prime)$, to derive the objective function of Kernel Ridge Regression.



**c.** Derive the the solution for the $\alpha$ 




**d.** How would you use the result to make a new prediction?



**e.** What are the main similarities and differences of KRR with the classification SVM derived in class?




## 2. Wine quality prediction 

The wine quality dataset comes from the UCI Machine Learning Repository http://archive.ics.uci.edu/ml/index.php, and contains measurements and opinions for different variants of red and white wine. The goal of this part of the exercise  is to build a model capable to _predict the quality of a wine from the measurements_.

To this end, implement your own version of Kernel Ridge Regresssion and compare it with the in-built SVR function from sklearn

**a.** Run the ```Load and process``` block bellow to load the dataset into the ``wines_backup.csv`` file


In [None]:
# a) Load and Process 
# Saves the result in a single file in order to avoid downloading many times.
# Shows the first 5 lines of the table

if not os.path.exists("wines_backup.csv"):
    # if not exist, we create wines.csv which combines red and white wines into a single file
    columns = ["facidity", "vacidity", "citric", "sugar", "chlorides", "fsulfur", 
               "tsulfur", "density", "pH", "sulphates", "alcohol", "quality"]
    red = pandas.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv",
                         names=columns, sep=";", skiprows=1)
    white = pandas.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv",
                         names=columns, sep=";", skiprows=1)
    red["color"] = "red"
    white["color"] = "white"
    wines = pandas.concat([white, red])
    wines.to_csv("wines_backup.csv", sep="\t", index=False)
else:
    wines = pandas.read_csv("wines_backup.csv", sep="\t")
    
wines.head(5)

**b. Split the dataset into train (80% of samples) and test (20% samples)**. Use the in-built sklearn function
```train_test_split```

``
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=.2,random_state=3)
``

In [None]:
#b) Data split

from sklearn.model_selection import train_test_split

#build the data matrix from a subset of the available variables in the cvs file
X = wines.as_matrix(["facidity", "vacidity", "citric", "sugar", "chlorides", "fsulfur", 
               "tsulfur", "density", "pH", "sulphates", "alcohol"])

#make y the target value we want to predict
y = wines.as_matrix(['quality']).ravel()

print(X.shape, y.shape)

#Split the dataset

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=.2,random_state=3)

print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)


**c.  Implement your own version of Kernel Ridge Regressor**. To this end 
- fit the model parameters using the training data, with a radial basis function as kernel
- make predictions for the test data


In [None]:
#c) Fitting model and comparing to sklearn KernelRidge
import scipy
from scipy.spatial.distance import cdist, pdist, squareform
from numpy.linalg import inv



#Example values for the regularization and RBF variance hyper-parameters
p_lambda = 0.1
p_gamma = 0.01


#RBF kernel for two matrices of points
def gkernel(X1, X2, gamma):
    pairwise_dists = np.square(cdist(X1,X2))
    K = np.exp(-gamma* pairwise_dists)
    return K


# Build the kernel matrix for the training set
K = 
print(K.shape)


# Find the optimal alpha values with the closed form solution from 1.
alpha = 
print(alpha.shape)

# Find the kernel values for the test set
K = 
print(K.shape)

# Make predictions
y_mine = 
print(y_mine.shape)



**d.** Compare your results those of the in-built SVR function (in terms of the mean squared error) for the same values of regularization ($\lambda$) and radial-basis function ($\gamma$) hyper-parameters.

In [None]:
# d) Compute the mean squared error errors

from sklearn.metrics import mean_squared_error, r2_score
from sklearn.kernel_ridge import KernelRidge

print("Mean squared error MINE: %.2f"% mean_squared_error(y_mine, y_test))
print(y_mine)


# built-in version with KernelRidge
FILL IN HERE


print("Mean squared error: %.2f"% mean_squared_error(y_kr, y_test))
print(y_kr)
