**Author:** \
[**Shiva Omprakash**](https://www.linkedin.com/in/sivashanmugam-omprakash/) \
M.S. in Robotics & Machine Learning, Univesity at Buffalo (SUNY) \

# Linear Regression

In [100]:

#Importing required libraries
from scipy.optimize import minimize
import numpy as np
import matplotlib.pyplot as plt
import pickle


**Part 1:** Direct minimization

*Ordinary Least Square (OLE)* method to estimate regression parameters by minimizing the squared loss.

$J(\textbf{w}) = \frac{1}{2} (\textbf{y} - \textbf{X}\textbf{w})^{\top}(\textbf{y} - \textbf{X}\textbf{w})$

where, \
$\textbf{X}$ - input data matrix \
$\textbf{y}$ - target vector \
$\textbf{w}$ - weight vector for regression

In [101]:

#Function to calculate the weights
def trainDirectModel(X, y):
  '''
  Input:
    X: Numpy array - N x d
    y: Numpy array - N x 1
  Output:
    weights: Numpy array - d x 1
  '''

  xT= np.transpose(X)
  xT_prod= np.matmul(xT, X)
  inv_prod= np.linalg.inv(xT_prod)

  y_xT_prod= np.matmul(xT, y)
  
  weights= np.matmul(inv_prod, y_xT_prod)

  return weights


**Root Mean Squared Error (RMSE)** - To measure the difference between values predicted by an estimator (model) and the values observed.

$\text{RMSE} = \sqrt{\frac{1}{N}\sum_{i=1}^N{(y_i - \textbf{w}^{\top}x_i^2)}}$

In [102]:

# Function to calculate the Root Mean Squared Error (RMSE)
def testDirectModel(w, X, y):
  '''
  Input:
    w: Numpy array - d x 1
    X: Numpy array - N x d
    y: Numpy array - N x 1
  Output:
    rmse: Float
  '''

  total= 0
  for i in range(len(X)):
    wT= np.transpose(w)
    wT_X_prod= np.matmul(wT, X[i])
    yi2= (y[i] - wT_X_prod)**2
    total+=yi2

  rmse= np.sqrt(total/len(X))

  return rmse


In [103]:

#Understanding available dataset
X_train, y_train, X_test, y_test= pickle.load(open('/content/drive/MyDrive/Colab Notebooks/diabetes.pickle', 'rb'), encoding= 'latin1')

print('Dimensions of the training data:')
print('X : '+str(X_train.shape))
print('Y : '+str(y_train.shape))
print('\nDimensions of the testing data:')
print('X : '+str(X_test.shape))
print('Y : '+str(y_test.shape))


Dimensions of the training data:
X : (242, 64)
Y : (242, 1)

Dimensions of the testing data:
X : (200, 64)
Y : (200, 1)


In [104]:

#Adding interecept term to the dataset
x1= np.ones((len(X_train), 1))
x2= np.ones((len(X_test), 1))

print('Dimensions of generated intercept terms for training & testing dataset:')
print(x1.shape, x2.shape)

X_train_i= np.concatenate((x1, X_train), axis= 1)
X_test_i= np.concatenate((x2, X_test), axis= 1)

print('\nDimensions of data after adding the intercept term:')
print('X_train : '+str(X_train_i.shape))
print('X_test : '+str(X_test_i.shape))


Dimensions of generated intercept terms for training & testing dataset:
(242, 1) (200, 1)

Dimensions of data after adding the intercept term:
X_train : (242, 65)
X_test : (200, 65)


In [105]:

#Estimating the weights / training the model
w= trainDirectModel(X_train, y_train)
w_i= trainDirectModel(X_train_i, y_train)

print('Resulting dimensions of the calculated weights array:')
print('Without intercept: '+str(w.shape))
print('With intercept: '+str(w_i.shape))


Resulting dimensions of the calculated weights array:
Without intercept: (64, 1)
With intercept: (65, 1)


In [106]:

train_rmse= testDirectModel(w, X_train, y_train)
train_rmse_i= testDirectModel(w_i, X_train_i, y_train)
print('RMSE without intercept on training data: %0.2f'%train_rmse)
print('RMSE with intercept on training data: %0.2f'%train_rmse_i)


RMSE without intercept on training data: 138.20
RMSE with intercept on training data: 46.77


In [107]:

test_rmse= testDirectModel(w, X_test, y_test)
test_rmse_i= testDirectModel(w_i, X_test_i, y_test)
print('RMSE without intercept on test data: %0.2f'%test_rmse)
print('RMSE with intercept on test data: %0.2f'%test_rmse_i)


RMSE without intercept on test data: 326.76
RMSE with intercept on test data: 60.89


**Conclusion:** \
Performance of a model trained with data with intercept is better than the one trained without intercept.

**Part 2:** Gradient Descent

In [108]:
def trainGradientModel():
  pass

In [109]:
def testGradientModel():
  pass