# ML with Ridge Regression

In this notebook, we will use the functions in the file ridge_regression.py. 

In [1]:
# Useful starting lines
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
%load_ext autoreload
%autoreload 2
from IPython import display
# Import everything in the functions folder
from functions.costs import *
from functions.proj1_helpers import *
from functions.split import *
from functions.ridge_regression import *
from functions.helpers import *

First, we load the data

In [2]:
DATA_TRAIN_PATH = 'data/train.csv' 
y, tX, ids = load_csv_data(DATA_TRAIN_PATH)

We split the data, just to see if we can predict something

In [3]:
ratio = 0.8
x_train, y_train, x_test, y_test = split_data(tX, y, ratio)

We build the polynomial functions for the RR.

In [4]:
# Degree can't go higher than 8 because matrix becomes singular
degrees = np.linspace(1, 5, 5) 
lambdas = np.logspace(-5, 0, 30)
rmse_tr, rmse_te = cross_validation(y_train, x_train, y_test, x_test, lambdas, degrees)

Get the best parameter with min value for rmse_te

In [5]:
lambda_star, degree_star = find_min(rmse_tr, rmse_te, lambdas, degrees)
print("Lambda* = %f"%lambda_star)
print("Degree* = %f"%degree_star)

NameError: name 'rmse_tr' is not defined

Now, that we have the best degree and best lambda, we can do the Ridge Regression and get the best weights. 

In [6]:
# Build poly first
tX_train = build_poly(x_train, degree_star)
tX_test = build_poly(x_test, degree_star)
print("Polynomials done")

# Ridge Regression
loss, w_star = ridge_regression(y_train, tX_train, lambda_star)
print("Loss = %f"%(loss))

Polynomials done
Loss = 0.777447


In [7]:
prediction(y_test, tX_test, w_star) 

Good prediction: 39238/50000 (78.476000%)
Wrong prediction: 10762/50000 (21.524000%)
0.34576


## Generate predictions and save ouput in csv format for submission:


We retrain on all the data.

In [8]:
tX_poly = build_poly(tX, degree_star)
loss, w_star = ridge_regression(y, tX_poly, lambda_star)


In [9]:
DATA_TEST_PATH = 'data/test.csv' # TODO: download train data and supply path here 
_, tX_test, ids_test = load_csv_data(DATA_TEST_PATH)

tX_test_poly = build_poly(tX_test, degree_star)

In [10]:
OUTPUT_PATH = 'output/LS_RR.csv' # TODO: fill in desired name of output file for submission
y_pred = predict_labels(w_star, tX_test_poly)
create_csv_submission(ids_test, y_pred, OUTPUT_PATH)

0.281016053133
