# Preliminary work

- I found one of the datasets from the paper
- I chose 1 of the features which seemed most appropriate for a polynomial fit for linear regression (basically exactly what the paper did)
- Below is some preliminary work where I conduct standard linear regression and then polynomial linear regression using existing package sklearn
- I show MSE for both fits

In [201]:
import pandas as pd
import openpyxl
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import random
import multiprocessing

In [None]:
dataframe = pd.read_excel('dataset1/dataset1.xlsx')
print(dataframe.shape)
dataframe.describe()

In [175]:
X = dataframe['V'].sort_values()
X = (X-X.mean()) / X.std()
y = dataframe['AT'][X.index].values
y = (y-y.mean()) / y.std()

X = np.reshape(X.values, (-1,1))

In [None]:
plt.scatter(X, y)
plt.xlabel("Exhaust Vacuum")
plt.ylabel("Average Temperature")
plt.show()

In [177]:
lr = LinearRegression()
lr.fit(X, y)
y_hat_sklearn = lr.predict(X)

pr = PolynomialFeatures(degree=3)
X_poly = pr.fit_transform(X)
lr_poly = LinearRegression()
lr_poly.fit(X_poly, y)
y_hat_poly_sklearn = lr_poly.predict(X_poly)


In [None]:
plt.scatter(X, y, color = 'blue')
plt.plot(X, y_hat_sklearn, color = 'firebrick')
plt.plot(X, y_hat_poly_sklearn, color = 'green')
plt.show()

In [None]:
y_hat_sklearn = lr.predict(X)
y_hat_poly_sklearn = lr_poly.predict(X_poly)

print("mean squared error for standard linear:", mean_squared_error(y, y_hat_sklearn))
print("mean squared error for linear polynomial:", mean_squared_error(y, y_hat_poly_sklearn))

# Going from built-in package to implementing it ourselves

- Now using the dataset, I will conduct linear regression, but this time using matrix multiplication and numpy.
- I will implement a closed-form based based algorithm before moving onto a gradient descent based algorithm.

In [180]:
'''
Converts an (N * 1) matrix into a (N * h) matrix where h is the number of basis functions ()
The degree of the polynomial is (h-1)
'''
def polynomial_basis_function_transformation(X, h):
    powers = np.arange(h)
    X_poly = np.power(X, powers)
    return X_poly

'''
Conducts Linear Regression but initially transforms data using polynomial basis functions
Takes in an (N * 1) matrix, converts it into a (N * h) matrix
Performs linear regression on the (N*h) matrix resulting in h weights - betas
Returns the predictions only
'''
def lin_reg_poly_closed_form(X, y, h):
    X_poly = polynomial_basis_function_transformation(X, h)
    beta_hat_poly = np.linalg.pinv(X_poly.T @ X_poly) @ X_poly.T @ y
    y_hat_poly = X_poly @ beta_hat_poly
    print(beta_hat_poly)
    return y_hat_poly


In [None]:
y_hat_poly = lin_reg_poly_closed_form(X, y, 4)

In [None]:
plt.scatter(X, y, color = 'blue')
plt.plot(X, y_hat_sklearn, color = 'firebrick')
plt.plot(X, y_hat_poly, color = 'green')
plt.show()

print("mean squared error for linear polynomial through numpy:", mean_squared_error(y, y_hat_poly))

# Implementing the Batch Gradient Descent algorithm for linear regression

- We have conducted the closed form solution for polynomial linear regression above ourselves, moving away from sklearn as a package
- We now look to implement an iterative algorithm, useful when closed form solution is computationally prohibitive, such as when $X^TX$ is $10,000*10,000$ leading to matrix inversion times being extremely long
- We will initially implement Batch Gradient Descent and parallelize it before finally moving onto Stochastic Gradient Descent, and then parallelizing Stochastic Gradient Descent

## Non Parallelized version

In [202]:
'''
Conducts Linear Regression but initially transforms data using polynomial basis functions
Takes in an (N * 1) matrix, converts it into a (N * h) matrix
Performs linear regression on the (N*h) matrix resulting in h weights - betas
But this time linear regression is conducted through iterative gradient descent
MSE as you iterate through the algorithm is shown
Returns the predictions only
'''
def lin_reg_poly_batch_gradient_descent(X, y, h, alpha, n):
    X_poly = polynomial_basis_function_transformation(X, h)
    beta_hat_poly = np.random.rand(h)
    for i in range(n):
        y_hat_poly = X_poly @ beta_hat_poly
        beta_hat_poly = beta_hat_poly - alpha * (X_poly.T @ (y_hat_poly - y))
        print("MSE in iteration", i, ": ", mean_squared_error(y, y_hat_poly))
    return y_hat_poly
    

In [None]:
y_hat_poly_bgd = lin_reg_poly_batch_gradient_descent(X, y, 4, 0.00001, 10000)

In [None]:
plt.scatter(X, y, color = 'blue')
plt.plot(X, y_hat_sklearn, color = 'firebrick')
plt.plot(X, y_hat_poly_bgd, color = 'green')
plt.show()

print("mean squared error for linear polynomial through numpy via gradient descent:", mean_squared_error(y, y_hat_poly_bgd))

## Parallelized version
- We now implement the parallelized version of Batch Gradient Descent
- We can expect to see clear advantages to the Batch Gradient Descent algorithm when using parallelization