# ML with Ridge Regression

In this notebook, we will use the functions in the file ridge_regression.py. But we will also use the cleaned data. We want to see if we can use a higher degree.

In [62]:
# Useful starting lines
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
%load_ext autoreload
%autoreload 2
from IPython import display
# Import everything in the functions folder
from functions.costs import *
from functions.proj1_helpers import *
from functions.split import *
from functions.ridge_regression import *
from functions.helpers import *

from mpl_toolkits.mplot3d import Axes3D

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


First, we load the cleaned data

In [130]:
DATA_TRAIN_PATH = 'data/train_cleaned.csv' 
y, tX, ids = load_csv_data(DATA_TRAIN_PATH)
tX, mean_tX, std_tX = standardize(tX)

We do a 5-fold cross validation to find the best lambda and best degree

In [None]:
degrees = np.linspace(3, 10, 7) 
lambdas = np.logspace(-5, 0, 20)
rmse_te = cross_validation(y, tX, lambdas, degrees, 5, False, 1)

Start the 5-fold Cross Validation!


Plot the rmse

In [None]:
plt.matshow(rmse_te)

Get the best parameter with min value for rmse_te

In [None]:
lambda_star, degree_star = find_min(rmse_te, lambdas, degrees)
print("Lambda* = %f"%lambda_star)
print("Degree* = %f"%degree_star)

We can split the data just to see if we have a good prediction.

In [None]:
ratio = 0.8
x_train, y_train, x_test, y_test = split_data(tX, y, ratio)

Now, that we have the best degree and best lambda, we can do the Ridge Regression and get the best weights. 

In [None]:
# Build poly first
tX_train = build_poly(x_train, degree_star)
tX_test = build_poly(x_test, degree_star)
print("Polynomials done")

# Ridge Regression
loss, w_star = ridge_regression(y_train, tX_train, lambda_star)
print("Loss = %f"%(loss))

In [None]:
prediction(y_test, tX_test, w_star)

## Generate predictions and save ouput in csv format for submission:

We retrain on all the data.

In [124]:
#lambda_star = 0.01
#degree_star = 4
tX_poly = build_poly(tX, degree_star)
loss, w_star = ridge_regression(y, tX_poly, lambda_star)
print("Loss = %f"%(loss))

Loss = 0.784885


In [125]:
DATA_TEST_PATH = 'data/test_cleaned.csv' # TODO: download train data and supply path here 
_, tX_test, ids_test = load_csv_data(DATA_TEST_PATH)
#tX_test, mean_tX_test, std_tX_test = standardize(tX_test)
tX_test_poly = build_poly(tX_test, degree_star)

In [126]:
OUTPUT_PATH = 'output/LS_RR_clean.csv' # TODO: fill in desired name of output file for submission
y_pred = predict_labels(w_star, tX_test_poly)
create_csv_submission(ids_test, y_pred, OUTPUT_PATH)