Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Any `assert` statements are provided to check your answers. You may receive partial credit even if these code blocks fail, and you may not receive full credit even if they pass (e.g. if you hack them to pass). Problems without `assert` statements may have multiple correct solutions, and will be manually graded by instructors.

Make sure you fill in any place that says `YOUR CODE HERE`, and delete any line that says `raise NotImplementedError`. Please also your name and official GT ID below.

In [28]:
NAME = "Liyun Ren"
GTID = "lren42" #e.g. gburdell0

# Problem 1: Kernel Regression

In this problem you will work with a well-known example dataset of housing prices in Boston. You will train and test a kernel regression model for this dataset using an RBF kernel.

In [29]:
#import necessary libraries
import numpy as np
from sklearn.datasets import load_boston
import warnings
warnings.filterwarnings("ignore") #ignore warnings that may arise. A bad idea in general, but fine for this exam.

#load dataset
X, y = load_boston(return_X_y=True)

## 1a: Feature Scaling

Apply "standard scaling" to this dataset, resulting in a re-scaled data matrix, `X_rescaled` where the mean and standard devation of each feature is 1.

In [30]:
# your code here
X_rescaled=(X-X.mean(axis=0))/(X.std(axis=0))

In [31]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)
print(X_scaled)

[[-0.41978194  0.28482986 -1.2879095  ... -1.45900038  0.44105193
  -1.0755623 ]
 [-0.41733926 -0.48772236 -0.59338101 ... -0.30309415  0.44105193
  -0.49243937]
 [-0.41734159 -0.48772236 -0.59338101 ... -0.30309415  0.39642699
  -1.2087274 ]
 ...
 [-0.41344658 -0.48772236  0.11573841 ...  1.17646583  0.44105193
  -0.98304761]
 [-0.40776407 -0.48772236  0.11573841 ...  1.17646583  0.4032249
  -0.86530163]
 [-0.41500016 -0.48772236  0.11573841 ...  1.17646583  0.44105193
  -0.66905833]]


In [32]:
assert np.isclose(np.linalg.norm(np.mean(X_rescaled, axis=0)), 0)
assert np.isclose(np.std(X_rescaled, axis=0), np.ones(np.size(np.std(X_rescaled, axis=0)))).all()

## 1b: Construct an RBF kernel

Use every 10th data point from the rescaled data matrix to construct a radial basis function kernel matrix with `gamma = 0.1`. Name your kernel matrix `K_10th`. If you were unable to construct the re-scaled matrix you can use the original data matrix instead.

Recall that the formula for an RBF kernel is:

$K_{RBF}(i, j) = exp(-\gamma ||\vec{x}_i - \vec{x}_j||^2)$

where $\vec{x}_i$ is the $i^{th}$ data point and $|| \vec{a} ||$ represents the 2-norm of $\vec{a}$.

In [35]:
X_10th=X_scaled[::10]

In [None]:
def RBF(X_train,X_predict,gamma=0.1):
    K=np.zeros((X_train.shape[0],X_predict.shape[0]))
    for i in range(K.shape[0]):
        for j in range(K.shape[1]):
            K[i,j]=np.exp(-gamma*(np.linalg.norm(X_predict[j]-X_train[i]))**2)
    return K
K_10th=RBF(X_10th,X_10th)

In [39]:
def rbf(X_10th, x_test = None, gamma = 0.1):
    if x_test is None:
        x_test = X_10th       
    N = len(x_test)
    M = len(X_10th)
    K = np.zeros((N, M))
    for i in range(N):
        for j in range(M):
            K[i, j] = np.exp(-gamma * np.linalg.norm((x_test[i] - X_10th[j]))**2)  
            #   whenever told to use norm, make sure to use np.linalg.norm    
    return K
K_10th = rbf(X_10th, gamma = 0.1)
  

In [40]:
eigvals, eigvecs = np.linalg.eig(K_10th)
assert np.isclose(sum(eigvals[:10]), 36.4834)

## 1c: Train and test a kernel regression model.

Train a kernel linear regression model using the RBF kernel for every 10th data point. You do not need to add an intercept term. If you were unable to construct the RBF kernel matrix, you may use every 10th point from the original dataset instead. Compute and report the mean absolute error (MAE) on the full dataset after training.

For full credit, you should only use the Python standard library and `numpy` in this problem, but partial credit will be awarded if `scikit-learn` functions are used.

In [None]:
# your code here
y_10th=y[::10]
A=np.dot(K_10th.T,K_10th)
b=np.dot(K_10th.T,y_10th)
w=np.linalg.solve(A,b)
K_all=RBF(X_10th,X_rescaled)
yhat=np.dot(w,K_all)
MAE=np.mean(np.abs(y-yhat))
print('MAE = {}'.format(MAE))

In [45]:
def MAE(actual, prediction):
    z = zip(actual, prediction)
    N = len(actual)
    sum = 0
    for a,p in z:
        sum += abs(a - p)
    mae = sum / N
    
    return mae

In [49]:
from sklearn.linear_model import LinearRegression
y_10th=y[::10]
X_test=X[::20]
X_train = rbf(X_10th, gamma=0.1)
model_rbf = LinearRegression()
model_rbf.fit(X_10th, y_10th)
yhat_rbf = model_rbf.predict(X_test)
mae = MAE(y, yhat_rbf)
print(mae)

23.272744014563692


## 1d: Hyperparameter optimization for kernel ridge regression

Use the `GridSearchCV` function to determine the optimum hyperparameters for KRR with the Boston housing dataset. You should use the radial basis function kernel, and search over the following range of parameters:

$\alpha \in [1e-4, 1e-3, 1e-2, 1e-1, 1]$

$\gamma \in [1e-6, 1e-5, 1e-4, 1e-3, 1e-2]$

Report the $r^2$ score of the optimum hyperparamters on a validation set consisting of 30% of the data that is randomly selected from the original dataset, and that is **not used at any point in the hyperparameter training**.

In [53]:
# your code here
from sklearn.model_selection import GridSearchCV,train_test_split 
from sklearn.kernel_ridge import KernelRidge
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3)
params={'alpha':[1e-4,1e-3,1e-2,1e-1,1],'gamma':[1e-6,1e-5,1e-4,1e-3,1e-2]}
KRR=KernelRidge(kernel='rbf')
GSCV=GridSearchCV(estimator=KRR,param_grid=params)
GSCV.fit(X_train,y_train)
best_KRR=GSCV.best_estimator_
print("Optimal hyperparameters: alpha = {}, gamma={}".format(best_KRR.alpha,best_KRR.gamma))
r2_val=best_KRR.score(X_test,y_test)
print("Validation R^2={:.3f}".format(r2_val))

Optimal hyperparameters: alpha = 0.0001, gamma=1e-05
Validation R^2=0.786


In [65]:
from sklearn.model_selection import GridSearchCV
from sklearn.kernel_ridge import KernelRidge
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
alphas = np.array([1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1])
sigmas = np.array([5, 10, 15, 20, 25, 30, 35, 40])
parameter_ranges = {'alphas': alphas, 'sigmas':sigmas}
KRR = KernelRidge(kernel='rbf')
KRR_search = GridSearchCV(KRR, parameter_ranges)
KRR.fit(x_train, y_train)
yhat = KRR.predict(x_test)
KRR.best_estimator(x_test,y_test)


AttributeError: 'KernelRidge' object has no attribute 'best_estimator'

End of assignment. Any code appearing past this point will not be graded.