# ***Basic Functions used in Machine learning algorithms***

# Normalization, Prediction, Loss function with and without regularization, Gradient of those loss function, Root mean square metrics, Gradient decent and Pseudo inverse method this are the functions contained in this module.


## Import Statements

In [28]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

## Normalize function 



In [29]:
def Normalize(X): # Output will be a normalized data matrix of the same dimension
    '''
    Normalize all columns of X using mean and standard deviation
    '''
    y = (X - X.mean(axis=0)) / X.std(axis=0)
    # Taking mean by averaging all elements in single column(i.e, generating an additional row having mean for each column) and subract that mean from the given data X
    # Resultant value is divided by the Standard deviation(Std also find column elements, same as in mean)
    return y

## Prediction Function

Given X and w, compute the predicted output.

In [30]:
def Prediction (X, w): # Output will be a prediction vector y
    '''
    Computing Prediction given an input data matrix X and weight vecor w. Output y = [X 1]w where 1 is a vector of all 1s 
    '''
    X = np.append(X, np.ones((len(X), 1), dtype = int), axis = 1)
    # appending a column of integer 1 to each row of x. Adding bias value [x 1] to X used to solve for w0 in w vector
    predict = (w).dot(X.T)
    #Matrix dot product between weight vector w and X
    return predict

## Loss Functions

Code for the four  loss functions:

1. MSE loss is only for the error
2. MAE loss is only for the error
3. L2 loss is for MSE and L2 regularization, and can call MSE loss
4. L1 loss is for MSE and L1 regularization, and can call MSE loss

In [31]:
def MSE_Loss (X, t, w, lamda =0):
    '''
    lamda=0 is a default argument to prevent errors if you pass lamda to a function that doesn't need it by mistake. 
    This allows us to call all loss functions with the same input format.
    
    You are encouraged read about default arguments by yourself online if you're not familiar.
    '''
    mse = (np.sum(np.square(Prediction(X,w) - t))) / len(X)
    #As per the formula for mea square error: 
    #prediction(X, w) gives predicted y and target t is subracted from y
    #For each value of y & t the difference is squared and added together
    #the result is divided with n => len(x)
    return mse

In [32]:
def MAE_Loss (X, t, w, lamda = 0): 
    mae = (np.sum(np.abs(Prediction(X, w) - t))) / len(X)
    #As similar in MSE, instead of Squaring, here absolute value (y - t) is taken, added together and divided by n
    return mae

In [33]:
def L2_Loss (X, t, w, lamda): # Output should be a single number based on L2-norm (with sqrt)
    l2 = MSE_Loss(X, t, w, lamda) + (lamda) * (np.sqrt(np.sum(np.square(np.abs(w[:len(w) - 1])))))
    #l2 = MSE_Loss(X, t, w, lamda) + (lamda) *  (np.linalg.norm(w[: len(w) - 1]))
    return l2

In [34]:
def L1_Loss (X, t, w, lamda):
    l1 = MSE_Loss(X, t, w, lamda) + (lamda) * (np.sum(np.abs(w[:len(w) - 1])))
    return l1

In [35]:
def NRMSE_Metric (X, t, w, lamda=0): #RMSE/std_dev(t)
    nrmse = np.sqrt(MSE_Loss(X, t, w)) / np.std(t)
    return nrmse

## Gradient function
Each Loss function will have its own gradient function:

1. MSE gradient is only for the error
2. MAE gradient is only for the error
3. L2 gradient is for MSE and L2 regularization, and can call MSE gradient
4. L1 gradient is for MSE and L1 regularization, and can call MSE gradient

In [36]:
def MSE_Gradient (X, t, w, lamda=0): # Output will have the same size as w
    x = np.append(X, np.ones((len(X), 1), dtype = int), axis = 1)
    mseg = (x.T).dot((Prediction(X, w) - t)) * 2 / len(t)
    return mseg

In [37]:
def MAE_Gradient (X, t, w, lamda=0): # Output will have the same size as w
    x = np.append(X, np.ones((len(X), 1), dtype = int), axis =1)
    maeg = (x.T).dot(np.sign(Prediction(X, w) - t)) / len (X)
    return maeg

In [38]:
def L2_Gradient (X, t, w, lamda): # Output will have the same size as w
    l2 = MSE_Gradient(X, t, w, lamda) 
    w[len(w) - 1] = 0
    g = (lamda / np.sqrt(np.sum(np.square(w[:len(w) - 1])))) * (w)
    l2g = l2 + g
    return l2g

In [39]:
def L1_Gradient (X, t, w, lamda): # Output will have the same size as w
    l1 = MSE_Gradient(X, t, w, lamda) 
    w[len(w) - 1] = 0
    g = lamda * np.sign(w)
    l1g = l1 + g
    return l1g

## Gradient Descent Function


In [40]:
def Gradient_Descent (X, X_val, t, t_val, w, lamda, max_iter, epsilon, lr, lossfunc, gradfunc): # See output format in 'return' statement
    train_loss = lossfunc(X, t, w, lamda)
    for i in range(max_iter):
      w_final = w - (lr) * gradfunc(X, t, w, lamda)
      w = w_final
      train_loss_final = lossfunc(X, t, w, lamda)
      validation_loss_final = lossfunc(X_val, t_val, w_final, lamda)
      validation_NRMSE = NRMSE_Metric(X_val, t_val, w_final)
      if validation_loss_final <= epsilon:
        break
    return w_final, train_loss_final, validation_loss_final, validation_NRMSE #It will return variables structured like this
    

## Pseudo Inverse Method

It is a slightly more advanced version, with L2 penalty:

w = (X' X + lambda I)^(-1) X' t.

See, for example: Section 2 of https://web.mit.edu/zoya/www/linearRegression.pdf

Here, the column of 1's in assumed to be included in X

In [41]:
def Pseudo_Inverse (X, t, lamda): # Output will be weight vector
    x = np.append(X, np.ones((len(X), 1), dtype = int), axis =1)
    xt = (x.T).dot(x)
    Inv = ((np.linalg.inv((xt + ((lamda) * (np.identity(len(xt))))) * (1 / len(t)))) * (1 / len(t))).dot(x.T).dot(t)
    return Inv