# JUPYTER NOTEBOOK TIPS

Each rectangular box is called a cell. 
* ctrl+ENTER evaluates the current cell; if it contains Python code, it runs the code, if it contains Markdown, it returns rendered text.
* alt+ENTER evaluates the current cell and adds a new cell below it.
* If you click to the left of a cell, you'll notice the frame changes color to blue. You can erase a cell by hitting 'dd' (that's two "d"s in a row) when the frame is blue.

# Supervised Learning Model Skeleton

We'll use this skeleton for implementing different supervised learning algorithms.

In [3]:
class Model:
        
    def fit(self):
        
        raise NotImplementedError
    
    def predict(self, test_points):
        raise NotImplementedError

In [None]:
def preprocess(data_f, feature_names_f):
    '''
    data_f: where to read the dataset from
    feature_names_f: where to read the feature names from
    Returns:
        features: ndarray
            nxd array containing `float` feature values
        labels: ndarray
            1D array containing `float` label
    '''
    # You might find np.genfromtxt useful for reading in the file. Be careful with the file delimiter, 
    # e.g. for comma-separated files use delimiter=',' argument.
    
    data = np.genfromtxt(data_f)
    features = data[:,:-1]
    target = data[:,-1]
    feature_names = np.genfromtxt(feature_names_f, dtype='unicode')
    
    return features, feature_names, target

In cases where data is not abundantly available, we resort to getting an error estimate from average of error on different splits of dataset. In this case, every fold of data is used for testing and for training in turns, i.e. assuming we split our data into 3 folds, we'd
* train our model on fold-1+fold-2 and test on fold-3
* train our model on fold-1+fold-3 and test on fold-2
* train our model on fold-2+fold-3 and test on fold-1.

We'd use the average of the error we obtained in three runs as our error estimate. 

Implement function "kfold" below.


In [11]:
# TODO: Programming Assignment 2
import numpy as np
def kfold(size, k):

    '''
    Args:
        size: int
            number of examples in the dataset that you want to split into k
        k: int 
            Number of desired splits in data.(Assume test set is already separated.)
        Returns:
        fold_dict: dict
            A dictionary with integer keys corresponding to folds. Values are (training_indices, val_indices).
        
        val_indices: ndarray
            1/k of training indices randomly chosen and separates them as validation partition.
        train_indices: ndarray
            Remaining 1-(1/k) of the indices.
            
            e.g. fold_dict = {0: (train_0_indices, val_0_indices), 
            1: (train_0_indices, val_0_indices), 2: (train_0_indices, val_0_indices)} for k = 3
    '''
    n = int(size/k)
    fold_dict = {}
    
    test = 0
    for i in range (0,size):
        train_indicies = []
        Array_range = (n+test)-1
        #print('Array_range',Array_range)
        val_indice = np.arange(test,Array_range)
        test = Array_range
        #print('test',test)
        #print(val_indice)
        for y in range (0,size):
            if y not in val_indice:
                train_indicies.append(y)
        #print(train_indicies)
        fold_dict[i] = (train_indicies,val_indice)
    return fold_dict

Implement "mse" and regularization functions. They will be used in the fit method of linear regression.

In [1]:
#TODO: Programming Assignment 2
import numpy as np
def mse(y_pred, y_true):
    '''
    Args:
        y_hat: ndarray 
            1D array containing data with `float` type. Values predicted by our method
        y_true: ndarray
            1D array containing data with `float` type. True y values
    Returns:
        cost: ndarray
            1D array containing mean squared error between y_pred and y_true.
        
    '''
    
    #test = 0
    cost = 0
    #for i in range (0,len(y_pred)):
    cost = np.mean(((y_true - y_pred)**2)/len(y_pred))
        #cost.append(test)
    #raise NotImplementedError

    return cost
    

In [2]:
#TODO: Programming Assignment 2
def regularization(weights, method):
    '''
    Args:
        weights: ndarray
            1D array with `float` entries
        method: str
    Returns:
        value: float
            A single value. Regularization term that will be used in cost function in fit.
    '''
    value = 0
    
    if method == "l1":
        for i in range (0,len(weights)):
            value +=  abs(weights[i])
    elif method == "l2":
        for y in range (0,len(weights)):
            value += (abs(weights[y])**2)
    #raise NotImplementedError
    return value