# Practical 1 : Implementation of Linear Regression (Ridge, Lasso)

First part:
- Implement linear regression model 
    - using least squares method
    - implement directly using the NumPy package

Second part:
- regularization
- polynomial basis expansion
- cross validation
- scikit-learn: https://scikit-learn.org/

You will need to use the following:

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import _pickle as cp

For the purpose of testing, we’ll use the winequality dataset. The dataset is available here:
https://archive.ics.uci.edu/ml/datasets/Wine+Quality In order to make it easier to import the dataset, we’ve converted the data to the numpy array format and shuffled it so that you can start the practical directly. The dataset is available on the course website. 

The dataset has two files. We’ll focus on the white wine data, which is the larger dataset. You can load the data from the files as follows:

In [2]:
# load the dataset
# X is a matrix such that each row stores a data record 
# y is a vector of the corresponding labels of the records
#X, y = cp.load(open('/Data/winequality-white.pickle', 'rb'))
X, y = cp.load(open('winequality-white.pickle', 'rb'))
# check the size of the data
print("X is a matrix with shape {}, which has {} records and {} attributes.".format(X.shape, X.shape[0], X.shape[1]))
print("y is a vector with {} values, which stores the corresponding labels of the data records in X".format(y.shape[0]))


FileNotFoundError: [Errno 2] No such file or directory: 'winequality-white.pickle'

In order to get consistent results, all students should use the same 80% of the data as training
data. We’ll use the remaining as test data. To achieve this split run the following:

In [None]:
# The function splits the dataset into the training dataset and the test dataset.
# The parameter split_coeff is a percentage value such that
# the first split_coeff of the dataset goes to the training dataset, 
# and the remaining data goes to the test dataset.
def split_data(X, y, split_coeff):
    N, _ = X.shape # get the number of records (rows)
    train_size = int(split_coeff * N) # use the first split_coeff of the data as the training data
    X_train = X[:train_size] # the first training_size records
    y_train = y[:train_size]
    X_test = X[train_size:] # the last test_size records
    y_test = y[train_size:]
    return X_train, y_train, X_test, y_test

X_train, y_train, X_test, y_test = split_data(X, y, 0.8) # use 80% of the data as training data

# check the size of the splitted dataset
print("Shape of X_train:", X_train.shape)
print("Shape of y_train:", y_train.shape)
print("Shape of X_test:", X_test.shape)
print("Shape of y_test:", y_test.shape)

We’ll not touch the test data except for reporting the errors of our learned models

## Understanding What We’re Predicting

Before we get to training a linear model on the data and using it to make predictions, let’s look
at the spread of y values on the training set. The values are integers between 3 and 9 indicating
the quality of the wine.


### **Task 1**
Make a bar chart showing the distribution of y values appearing in the training data.

In [None]:
#@title
# Task 1: 
# the function takes the training dataset as the input, and make the bar chart
def plot_bar_chart_score(y_train):
    ###################################################
    ##### YOUR CODE STARTS HERE #######################
    ###################################################
    score, value = np.unique(y_train, return_counts=True)
    
    print("The scores are" , score)
    print("The values associated are", value)
    
    plt.ylabel('Number of wines')
    plt.xlabel('Score')
    plt.bar(score, value)
    ###################################################
    ##### YOUR CODE ENDS HERE #########################
    ###################################################

plot_bar_chart_score(y_train)

### **Task 2** 
Implement the trivial predictor, which uses the average value of y on the training set as the prediction for ever datapoint. 

In [None]:
#@title
# Task 2: implement the simplest predictor
# The function computes the average value of y on the training label values
def compute_average(y_train):
    # The code below is just for compilation. 
    # You need to delete it and write your own code.
    ###################################################
    ##### YOUR CODE STARTS HERE #######################
    ###################################################
    meanY= np.mean(y_train)    
    return meanY
    ###################################################
    ##### YOUR CODE ENDS HERE #########################
    ###################################################

y_train_avg = compute_average(y_train)
print("Average of y on the training label values is {}".format(y_train_avg))

# The simplest predictor returns the average value.
def simplest_predictor(X_test, y_train_avg):
  return y_train_avg


### **Task 3**
Report the mean squared error, i.e., the average of the squared residuals, using this simplest of predictors on the training and test data. We should hope that our models beat at lease this baseline. 

In [None]:
# We will evaluate our simplest predictor here. 
# Implement a function that can report the mean squared error 
# of a predictor on the given test data
# Input: test dataset and predictor
# Output: mean squared error of the predictor on the given test data
def test_data(X_test, y_test, predictor: callable=None):
    # Applies the predictor to each row to compute the predicted values
    y_predicted = np.apply_along_axis(predictor, 1, X_test)

    # TODO: compute the mean squared error of y_predicted
    # The code below is just for compilation. 
    # You need to delete it and write your own code.
    ###################################################
    ##### YOUR CODE STARTS HERE #######################
    ###################################################
    errors=(y_test-y_predicted)
    squared_errors=errors**2
    sum_squared_errors =np.sum(squared_errors)
    mse = 1/(y_test.size)*sum_squared_errors
    ###################################################
    ##### YOUR CODE ENDS HERE #########################
    ###################################################
    
    return mse

# use the above function test_data to evaluate the simplest predictor
# we use the lambda function here to pass the function simplest_predictor to the evaluator.
mse_simplest_predictor_train = test_data(X_train, y_train, lambda x: simplest_predictor(x, y_train_avg))
mse_simplest_predictor_test = test_data(X_test, y_test, lambda x: simplest_predictor(x, y_train_avg))

# Report the result
print('Simplest Predictor')
print('--------------------------------------------------------------------------------\n')
print('MSE (Training) = %.4f' % mse_simplest_predictor_train)
print('MSE (Testing)  = %.4f' % mse_simplest_predictor_test)

## Linear Model Using Least Squares

Let us first fit a linear regression model and then calculate the training and test error. We’ll
actually use the closed form solution of the least squares estimate for the linear model. 


### **Task 4**
Is it strictly necessary to standardize the data for the linear model using the least squares method? Why?

Standardization is strictly necessary only if the model includes interaction terms a/o polynomial terms (model for curvatures, avoid the risk of producing misleading results, missing statistically significant terms, etc.). Our data set does not have these terms, therefore standardization is not required.

### **Task 5**
Standardize the data, i.e., make the data for every feature have mean 0 and variance 1. 

We do the standardization using the training data, and we need to remember the means and
the standard deviations so that they can be applied to the test data as well. Apply the
standardization so that every feature in the training data has mean 0 and variance 1. Apply
the same transformation to the test data. 

In [None]:
# Input: training data
# Output: standardize training data, standard deviations and means
def standardize_data(X):
    # TODO: compute mean, standard deviations and the standardized data
    # The code below is just for compilation. 
    # You need to replace it by your own code.
    ###################################################
    ##### YOUR CODE STARTS HERE #######################
    ###################################################
    mean = np.mean(X)
    std = np.std(X)
    X_std = (X-mean)/std 
    ###################################################
    ##### YOUR CODE ENDS HERE #########################
    ###################################################
    
    return X_std, mean, std

X_train_std, X_train_mean, X_train_std_div = standardize_data(X_train)
print("X_train_std:", X_train_std.shape)
print("Mean:", X_train_mean)
print("Standard deviation:", X_train_std_div)

In [None]:
# TODO: Standardize the test data using the mean and standard deviation you computed for the training data
###################################################
##### YOUR CODE STARTS HERE #######################
###################################################
X_test_std=(X_test-X_train_mean)/X_train_std_div
print(X_test_std.shape)
###################################################
##### YOUR CODE ENDS HERE #########################
###################################################

### **Task 6**
Implement the linear model predictor, and report the mean squared error using the linear model on the training and test data.

We will do this in several steps. We need to implement the function for computing the parameters based on the training dataset. Note we need to add the bias column to the dataset. 

In [None]:
# the function adds a column of ones to the front of the input matrix
def expand_with_ones(X):
    # TODO: adds a column of ones to the front of the input matrix
    # The code below is just for compilation. 
    # You need to replace it by your own code.
    ###################################################
    ##### YOUR CODE STARTS HERE #######################
    ###################################################
    ones=np.ones((X.shape[0],1))
    X_out = np.append(ones,X,axis=1)
    return X_out
    ###################################################
    ##### YOUR CODE ENDS HERE #########################
    ###################################################

# The function computes the parameters
def least_squares_compute_parameters(X_input, y):
    # add the bias column to the dataset
    X = expand_with_ones(X_input)

    # TODO: compute the parameters based on the expanded X and y
    # The code below is just for compilation. 
    # You need to replace it by your own code.
    ###################################################
    ##### YOUR CODE STARTS HERE #######################
    ###################################################
    XTX=np.dot(X.T,X)
    T = np.linalg.multi_dot([np.linalg.inv(XTX),X.T,y])
    return T
    ###################################################
    ##### YOUR CODE ENDS HERE #########################
    ###################################################

# train the linear model parameters
w = least_squares_compute_parameters(X_train_std, y_train)
print("w:", w.shape)

We then implement the linear model predictor given the dataset and the parameters. 

In [None]:
# Implement the linear model predictor
# Input: test data and parameters
# Output: predicted values
def linear_model_predictor(X, w):
    # TODO: compute the predicted values based on the test dataset and the parameters
    # The code below is just for compilation. 
    # You need to replace it by your own code.
    ###################################################
    ##### YOUR CODE STARTS HERE #######################
    ###################################################
    y_predicted=np.dot(X,w)
    return y_predicted
    ###################################################
    ##### YOUR CODE ENDS HERE #########################
    ###################################################


We can now evaluate our linear model predictor on the test dataset. 

In [None]:
# use the function test_data to evaluate the linear model predictor
mse_linear_model_predictor = test_data(expand_with_ones(X_test_std), y_test, lambda x: linear_model_predictor(x, w))
print("Mean squared error is {}".format(mse_linear_model_predictor))

## Learning Curves

Let us see if the linear model is overfitting or underfitting. Since the dataset is somewhat large and there are only 11 features, our guess should be that it may either be underfitting or be about right.

Starting with 20 datapoints, we’ll use training datasets of increasing size, in increments of 20 up to about 600 datapoints. For each case train the linear model only using the first n elements of
the training data. Calculate the training error (on the data used) and the test error (on the full test set). Plot the training error and test error as a function of the size of the dataset used for
training.

### **Task 7** 
Implement a function that evaluates the linear model over the training dataset with the input size.
The function takes a dataset and the split coefficient as inputs, and
1. splits the data to training and test datasets,
2. standardizes the data,
3. trains the linear model, and
4. reports the mse of the linear model predictor on both training and test datasets. 

In [None]:
# Input: dataset and split coefficient
# Output: mse of the linear model predictor on both the training and test datasets
def train_and_test(X, y, split_coeff):
    # TODO: implement the function 
    # The code below is just for compilation. 
    # You need to replace it by your own code.
    ###################################################
    ##### YOUR CODE STARTS HERE #######################
    ###################################################
    # Hints: use the functions you have implemented
    X_train, y_train, X_test, y_test=split_data(X, y, split_coeff)
    X_train_std, X_train_mean, X_train_std_div = standardize_data(X_train)
    X_test_std = (X_test-X_train_mean)/X_train_std_div
    
    w = least_squares_compute_parameters(X_train_std, y_train)
    
    mse_train = test_data(expand_with_ones(X_train_std), y_train, lambda x: linear_model_predictor(x, w))
    mse_test = test_data(expand_with_ones(X_test_std), y_test, lambda x: linear_model_predictor(x, w))
    return mse_train, mse_test
    ###################################################
    ##### YOUR CODE ENDS HERE #########################
    ###################################################

mse_train, mse_test = train_and_test(X, y, 0.8)
print('MSE using Linear Models')
print('-----------------------\n')
print('MSE (Training) = %.4f' % mse_train)
print('MSE (Testing)  = %.4f' % mse_test)


### **Task 8**
Report the learning curves plot. Also, explain whether you think the model is underfitting or not and how much data you need before getting the optimal test error.

In [None]:
mse_train_v = []
mse_test_v = []
K_v=[] #splitting coefficients
training_size =[] #size of the training set

TRAINING_SIZE_MAX = 601
TRAINING_SIZE_MIN = 20

# compute the errors over datasets with different sizes
for train_size in range(TRAINING_SIZE_MIN, TRAINING_SIZE_MAX, 20):
    # TODO: compute the training error and test error on datasets with size train_size
    # and add them to mse_train_v and mse_test_v, respectively 
    K=train_size/X.shape[0] 
    mse_train, mse_test = train_and_test(X, y, K)
    training_size.append(train_size)
    K_v.append(K)
    mse_train_v.append(mse_train)
    mse_test_v.append(mse_test)

# The below code outputs the plot of mse from different training sizes
plt.figure(2)
plt.plot(np.arange(TRAINING_SIZE_MIN, TRAINING_SIZE_MAX, 20), mse_train_v, 'r--', label="Training Error")
plt.plot(np.arange(TRAINING_SIZE_MIN, TRAINING_SIZE_MAX, 20), mse_test_v, 'b-', label="Test Error")
plt.xlabel('Dataset Size')
plt.ylabel('Mean Squared Error')
plt.show()

val, idx = min((val, idx) for (idx, val) in enumerate(mse_test_v)) #find the optimal test error value and its position
print('The optimal test error is ', val, ' and can be found using a split coefficient of ' , K_v[idx], ' equivalent to a training size of ', training_size[idx], '.' )

#### Solution Task 8:
We did not see any underfitting since intially we started with less train dataset and when you have few train examples model tends to overfit. Similar pattern can be seen in initial datasizes where test error is far greater than train set. As we gradually increased the train size, the problem of overfitting got reduced.
The optimal test error is  0.5674813618333658 and can be found using a split coefficient of  0.10616578195181707  equivalent to a training size of  520 . The model reaches optimality when it is fed with most of the training data. However, according to the graph line it can be seen that the model plateaus with a training set size of around 210. Slightly more than this number and the test error becomes greater than the training error, which is a clear sympthom of overfitting. If the model is used in a real life application, it would be wise to early stop the parameters calculation at 200 data points. 

## Polynomial Basis Expansion with Ridge and Lasso

For this part use the following from the scikit-learn package. Read the documentation available here: http://scikit-learn.org/stable/modules/classes.html



You will need the use the following:

In [None]:
# You will need the following libs. 
# Fell free to import other libs. 

# import the preprocessing libs for standarization and basis expansion
from sklearn.preprocessing import StandardScaler, PolynomialFeatures 

# Ridge and Lasso linear model
from sklearn.linear_model import Ridge, Lasso 

Try 5 powers of 10 for lambda from 10^-2 to 10^2 and use degree 2 basis expansion. Fit ridge and lasso using degree 2 polynomial expansion with these values of lambda. You should pick the optimal values for lambda using a validation set. Set the last 20% of the training set for the purpose of validation.

### **Task 9**
Let's implement the function for expanding the basis of the dataset. 

Hints: use `PolynomialFeatures`

In [None]:
def expand_basis(X, degree):
    # TODO: expand the basis of X for the degree
    # The code below is just for compilation. 
    # You need to replace it by your own code.
    ###################################################
    ##### YOUR CODE STARTS HERE #######################
    ###################################################
    # Hints: use the function PolynomialFeatures
    poly=PolynomialFeatures(degree)
    X=poly.fit_transform(X)
    return X
    ###################################################
    ##### YOUR CODE ENDS HERE #########################
    ###################################################

### **Task 10**
Prepare the training, test and validation data using the expanded dataset. Expand and standardize the the data. 

Hints: you can use `StandardScaler` and `std_scaler` to standardize the data

In [None]:
# TODO: the training, test and validation data using the expanded dataset.
# The code below is just for compilation. 
# You need to replace it by your own code.
def prepare_data(X, y, degree):
    ###################################################
    ##### YOUR CODE STARTS HERE #######################
    ###################################################
    # Hints: follow the steps    
    # You need to parpare four datasets:
    # 1. training data -- X_train, y_train
    # 2. test data -- X_test, y_test
    # 3. validation data -- X_train_v, y_train_v
    # 4. training data (cross validation) -- X_train_n, y_train_n
    
    # You need expand the basis of the data, and do standardization
    X_expand=expand_basis(X,degree)
    X_train,y_train,X_test,y_test=split_data(X_expand,y,0.8)
    scaler = StandardScaler()
    
    #training_Data
    X_train=scaler.fit_transform(X_train)
    
    # test data
    X_test = scaler.transform(X_test)

    # further split the training data to training and validation data
    # training data
    X_train_n, y_train_n, X_train_v, y_train_v= split_data(X_train,y_train,0.8) 

    # validation data
    

    return X_train, y_train, X_train_n, y_train_n, X_train_v, y_train_v, X_test, y_test
    ###################################################
    ##### YOUR CODE ENDS HERE #########################
    ###################################################

X_train, y_train, X_train_n, y_train_n, X_train_v, y_train_v, X_test, y_test = prepare_data(X, y, 2)# here we expand the dataset with degree 2
print(X_train.shape,X_train_v.shape,X_train_n.shape)

### **Task 11**
We have prepared the training data and the validation data. We can now choose the hyper parameter lambda for Ridge and Lasso using the validation data. 


In [None]:
from sklearn.metrics import mean_squared_error
# The function takes the training and validation data as inputs, and 
# returns the lambda value that has the minimal mse
# We use is_ridge to indicate the model we consider. 
# is_ridge = True indicates Ridge while is_ridge = False indicates Lasso
def choose_hyper_param(X_train_n, y_train_n, X_train_v, y_train_v, is_ridge: bool):
    mse_arr = []
    lam_arr = []

    # Try lambda values from 10^-2 to 10^2. 
    # Record the mse and the lambda values in mse_arr and lam_arr
    # The code below is just for compilation. 
    # You need to replace it by your own code.
    ###################################################
    ##### YOUR CODE STARTS HERE #######################
    ###################################################
    for pow_lam in range(-2, 3):
        lam = 10 ** pow_lam
        if is_ridge:
            clf=Ridge(lam)
        else:
            clf=Lasso(lam)
        clf.fit(X_train_n,y_train_n)

        y_predict= clf.predict(X_train_v)
        mse_val=mean_squared_error(y_train_v,y_predict)
        mse_arr.append(mse_val) # add the mse when using the hyperparameter lam
        lam_arr.append(lam)
    ###################################################
    ##### YOUR CODE ENDS HERE #########################
    ###################################################

    # get the index of the lambda value that has the minimal use
    lambda_idx_min = np.argmin(np.array(mse_arr))

    # plot of the lambda values and their mse
    plt.figure()
    plt.semilogx(lam_arr, mse_arr)

    # return the best lambda value
    return lam_arr[lambda_idx_min]

# call the function to choose the lambda for Ridge and Lasso
lam_ridge = choose_hyper_param(X_train_n, y_train_n, X_train_v, y_train_v, True)
lam_lasso = choose_hyper_param(X_train_n, y_train_n, X_train_v, y_train_v, False)

print("Ridge lambda:", lam_ridge)
print("Lasso lambda:", lam_lasso)


### **Task 12**:
Once you’ve obtained the optimal values for lambda for Ridge and Lasso, train these models using these hyperparameters on the full training data. Then report
the training and test error.

In [None]:
# TODO: train the Ridge and Lasso models using their best parameters, and
#       report their mse
###################################################
##### YOUR CODE STARTS HERE #######################
###################################################
# Hints: train these models on the full training data
clf_ridge=Ridge(lam_ridge)
clf_ridge.fit(X_train, y_train)
y_train_ridge_predict=clf_ridge.predict(X_train)
y_test_ridge_predict=clf_ridge.predict(X_test)

clf_lasso=Lasso(lam_lasso,max_iter=10000)
clf_lasso.fit(X_train, y_train)
y_train_lasso_predict=clf_lasso.predict(X_train)
y_test_lasso_predict=clf_lasso.predict(X_test)

mse_ridge_train = mean_squared_error(y_train,y_train_ridge_predict)
mse_ridge_test = mean_squared_error(y_test,y_test_ridge_predict)
mse_lasso_train = mean_squared_error(y_train,y_train_lasso_predict)
mse_lasso_test = mean_squared_error(y_test,y_test_lasso_predict)
###################################################
##### YOUR CODE ENDS HERE #########################
###################################################

# Report the result
print('For Ridge Regression with using degree %d polynomial expansion and lambda = %.4f' % (2, lam_ridge))
print('--------------------------------------------------------------------------------\n')
print('MSE (Training) = %.4f' % mse_ridge_train)
print('MSE (Testing)  = %.4f' % mse_ridge_test)

print('\n\nFor Lasso with using degree %d polynomial expansion and lambda = %.4f' % (2, lam_lasso))
print('---------------------------------------------------------------------\n')
print('MSE (Training) = %.4f' % mse_lasso_train)
print('MSE (Testing)  = %.4f' % mse_lasso_test)

## Larger Degrees



### **Task 13**
Try using higher degree basis expansion. You may want to use k-fold cross validation to determine
the values of hyperparameters rather than just keeping a validation set. 

Hints: Use `KFold` to do this automatically. 

In [None]:
# KFold
from sklearn.model_selection import KFold

# TODO: Try using higher degree basis expansion. Find the degree that gives the minimal mse. 
###################################################
##### YOUR CODE STARTS HERE #######################
###################################################
# Hints: use KFold
def search_hyperparameter_using_kfold(X_train, y_train, kfold_splits, is_ridge:bool):
    mse_arr_lambda = []
    lam_arr = []

    for pow_lam in range(-2, 3):
        lam = 10 ** pow_lam
        kf = KFold(n_splits=kfold_splits)
        mse_kfold_arr=[]
        for train_index, test_index in kf.split(X_train):
            X_train_n, X_train_v = X_train[train_index], X_train[test_index]
            y_train_n, y_train_v = y_train[train_index], y_train[test_index]
            
            if is_ridge:
                clf=Ridge(lam)
            else:
                clf=Lasso(lam,iter=10000)
                
            clf.fit(X_train_n,y_train_n)
            y_predict_v= clf.predict(X_train_v)
            mse_fold=mean_squared_error(y_train_v,y_predict_v)
            mse_kfold_arr.append(mse_fold)
            
        #store average mse for each lambda    
        mse_val_lambda=np.mean(mse_kfold_arr)
        mse_arr_lambda.append(mse_val_lambda)
        lam_arr.append(lam)
        
    #pick minimum mse and lambda    
    lambda_idx_min = np.argmin(np.array(mse_arr_lambda))
    
    #store degree mse by selecting best lambda
    best_lambda=lam_arr[lambda_idx_min]
       
    return best_lambda

#higher degree basis expansion
best_lamda_arr=[]
deg_arr=[]
mse_deg_arr=[]
for deg in range(2,6):
    
    #expand
    X_expanded=expand_basis(X,deg)
    #split only for test
    X_train,y_train,X_test,y_test=split_data(X_expanded,y,0.8)
    
    #standardise
    scaler = StandardScaler()
    X_train=scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)
    
    #hyperparameter search
    kfold_splits=5
    best_lambda=search_hyperparameter_using_kfold(X_train,y_train,kfold_splits,True)
    best_lamda_arr.append(best_lambda)
    
    #train clasifier on train set with best lambda
    clf=Ridge(best_lambda)
    clf.fit(X_train,y_train)
    y_predict_test= clf.predict(X_test)
    mse_deg=mean_squared_error(y_test,y_predict_test)
    
    mse_deg_arr.append(mse_deg)
    deg_arr.append(deg)

# Degree with minimum mse
degree_idx_min=np.argmin(np.array(mse_deg_arr))    
print("%d is the degree with minimal mse=%.4f"%(deg_arr[degree_idx_min],mse_deg_arr[degree_idx_min]))   

for i in range(4):
    print('\nFor Ridge with using degree %d polynomial expansion and lambda = %.4f, MSE = %.4f '% (deg_arr[i], best_lamda_arr[i],mse_deg_arr[i]))
###################################################
##### YOUR CODE ENDS HERE #########################
###################################################

### Project Short Report

Task 1 : The skeleton code was slightly modified because X_train was redundant.The method first finds the number of times where each unique score appears, than it returns a bar chart.

Task 2 : The function simply take y_train as an input and it computes the mean with the numpy built-in method.

Task 3 : The MSE was computed using the formula provided in the lecture, the process was performed step by step.

Task 4 : See answer in the given text box.

Task 5 : The method to standardize the data make use of the numpy methods "mean" and "std". The standardization of the test dataset use the mean and the standard deviation of the train dataset.

Task 6 : First, a column of one has been addeded in order to make use of the matrix formula for computing the linear model parameters. Then, the numpy method linalg.multi_dot was used to compute the dot product of two or more arrays in a single function call, while automatically selecting the fastest evaluation order. Finally, the target variable was calculate via a simple numpy dot product.

Task 7 : For this exercise, we make use of the functions previously computed. In sequence, we tell the program to split the data set, to standardize the training and testing X set, to compute the parameters and to return the MSE after having computed the predictions.

Task 8 : See answer in the given text box.

Task 9 : The function uses the scikit-learn method to expand each feature column of the X array with its polynomial equivalent for all the degrees used in the input.

Task 10 : For this exercise, we followed the steps provided in the skeleton code and made full use of the scikit-learn built in method.

Task 11 : The function selects the hyper parameter lambda that minimizes the MSE. This function was later tested to select hyperparameters for the NBC classifier in pratical 2.

Task 12 : The lambdas derived in the previous task were applied here. To remove a warning in the Lasso model, we set max_iter=10000. 

Task 13 : First, a function was compute to search the best hyperparameter using the K-Fold method. This function was applied after basis expansion and standardization of the independant variables in order to find the degree (for basis expansion) that minimize the MSE. Using a range between 2 and 6, we discover that the best degree is 3. We report that we have encountered the well known fact that basis expansion helps reduce the MSE of a model, but it has a negative effect when it is used too extensively. 