<a href="https://colab.research.google.com/github/pareshrchaudhary/experiments/blob/main/fundamentals/linear_regression_object_oriented.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

Mathematical Explanation of Linear Regression

Assumption - The relationship between features
and target is approximately linear, i.e. expected value of the target can be expressed as a weighted sum of the features. This means that the change in the target variable is proportional to the change in the features. Many real-world relationships can be approximated well by a linear model.

Let's consider a simple linear regression model. Given a dataset $\{(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), \ldots, (x^{(m)}, y^{(m)})\}$ where $x^{(i)}$ represents the feature and $y^{(i)}$ represents the target for the $i$th example, we want to learn a linear relationship between the features and the target.

The linear regression model assumes the following relationship between the features and the target:

\begin{equation}
y^{(i)} = b + \mathbf{w}^T \mathbf{x}^{(i)}
\end{equation}

where $b$ is the bias term (intercept), $\mathbf{w}$ is the weight vector (coefficients), and $\mathbf{x}^{(i)}$ is the feature vector of the $i$th example.

Our goal is to learn the parameters $b$ and $\mathbf{w}$ that minimize the mean squared error (MSE) between the predicted values and the actual values:

\begin{equation}
\text{MSE} = \frac{1}{m} \sum_{i=1}^{m} (y^{(i)} - (\mathbf{w}^T \mathbf{x}^{(i)} + b))^2
\end{equation}

To minimize the MSE, we use the gradient descent algorithm. In each iteration of gradient descent, we update the parameters $b$ and $\mathbf{w}$ according to the following update rules:

\begin{align}
b &:= b - \alpha \frac{1}{m} \sum_{i=1}^{m} (p_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \\
\mathbf{w} &:= \mathbf{w} - \alpha \frac{1}{m} \sum_{i=1}^{m} (p_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \mathbf{x}^{(i)}
\end{align}

where $p_{\mathbf{w},b}(\mathbf{x})$ is the predicted value given by the linear regression model and $\alpha$ is the learning rate.

The above update rules are implemented in the linear regression class using NumPy.


# Setup

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
!wget https://raw.githubusercontent.com/pareshrchaudhary/experiments/main/fundamentals/data/home_data.csv

--2024-02-07 16:58:50--  https://raw.githubusercontent.com/pareshrchaudhary/experiments/main/fundamentals/data/home_data.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2381890 (2.3M) [text/plain]
Saving to: ‘home_data.csv’


2024-02-07 16:58:51 (28.8 MB/s) - ‘home_data.csv’ saved [2381890/2381890]



# Train/Test data split

In [9]:
data = pd.read_csv('home_data.csv')
data.describe()

Unnamed: 0,id,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
count,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0
mean,4580302000.0,540088.1,3.370842,2.114757,2079.899736,15106.97,1.494309,0.007542,0.234303,3.40943,7.656873,1788.390691,291.509045,1971.005136,84.402258,98077.939805,47.560053,-122.213896,1986.552492,12768.455652
std,2876566000.0,367127.2,0.930062,0.770163,918.440897,41420.51,0.539989,0.086517,0.766318,0.650743,1.175459,828.090978,442.575043,29.373411,401.67924,53.505026,0.138564,0.140828,685.391304,27304.179631
min,1000102.0,75000.0,0.0,0.0,290.0,520.0,1.0,0.0,0.0,1.0,1.0,290.0,0.0,1900.0,0.0,98001.0,47.1559,-122.519,399.0,651.0
25%,2123049000.0,321950.0,3.0,1.75,1427.0,5040.0,1.0,0.0,0.0,3.0,7.0,1190.0,0.0,1951.0,0.0,98033.0,47.471,-122.328,1490.0,5100.0
50%,3904930000.0,450000.0,3.0,2.25,1910.0,7618.0,1.5,0.0,0.0,3.0,7.0,1560.0,0.0,1975.0,0.0,98065.0,47.5718,-122.23,1840.0,7620.0
75%,7308900000.0,645000.0,4.0,2.5,2550.0,10688.0,2.0,0.0,0.0,4.0,8.0,2210.0,560.0,1997.0,0.0,98118.0,47.678,-122.125,2360.0,10083.0
max,9900000000.0,7700000.0,33.0,8.0,13540.0,1651359.0,3.5,1.0,4.0,5.0,13.0,9410.0,4820.0,2015.0,2015.0,98199.0,47.7776,-121.315,6210.0,871200.0


In [10]:
# Convert the DataFrame to a NumPy array
data_array = data.to_numpy()

# Shuffle the data
np.random.shuffle(data_array)

# Define the proportion of data to be used for training
train_ratio = 0.8
num_train = int(train_ratio * len(data_array))

# Split the data into training and test sets
train_data = data_array[:num_train]
test_data = data_array[num_train:]

# Split the features and target variable for training set
X_train = np.float64(train_data[:, 3:])
# X_train = (X_train - np.min(X_train, axis=0)) / (np.max(X_train, axis=0) - np.min(X_train, axis=0))
y_train = np.float64(train_data[:, 2])

# Split the features and target variable for test set
X_test = np.float64(test_data[:, 3:])
# X_test = (X_test - np.min(X_test, axis=0)) / (np.max(X_test, axis=0) - np.min(X_test, axis=0))
y_test = np.float64(test_data[:, 2])

print(X_train.shape, y_train.shape, X_test.shape ,y_test.shape)

(17290, 18) (17290,) (4323, 18) (4323,)


# Model

In [5]:
class LinearRegression:
  def __init__(self):
    self.weights = None
    self.bias = None

  def initialize_parameters(self, input_dim):
    self.weights = np.zeros(input_dim)
    self.bias = 0

  def fit(self, X, y, optimizer):
    # Initialize model parameters
    self.initialize_parameters(X.shape[1])

    # Optimize parameters
    optimizer.optimize(self, X, y)

  def predict(self, X):

    # Make predictions
    return np.matmul(X, self.weights) + self.bias

class GradientDescentOptimizer:
  def __init__(self, learning_rate=0.01, iterations=1000, clip_value=1.0):
    self.learning_rate = learning_rate
    self.iterations = iterations
    self.clip_value = clip_value

  def optimize(self, model, X, y):
    for _ in range(self.iterations):
        predictions = model.predict(X)
        errors = ((predictions - y) ** 2 )/ 2*len(predictions)
        gradient_w = np.dot(X.T, -errors) / len(X)
        gradient_b = np.sum(-errors) / len(X)

        # Gradient clipping
        total_gradient_norm = np.sum(gradient_w**2) + gradient_b**2
        if total_gradient_norm > self.clip_value**2:
            multiplier = self.clip_value / np.sqrt(total_gradient_norm)
            gradient_w *= multiplier
            gradient_b *= multiplier

        # Update weights and bias
        model.weights -= self.learning_rate * gradient_w
        model.bias -= self.learning_rate * gradient_b

# Training

In [6]:
# Instantiate the LinearRegression class
model = LinearRegression()

# Instantiate the GradientDescentOptimizer class
optimizer = GradientDescentOptimizer(learning_rate=0.001, iterations=100000, clip_value=1.0)

# Fit the model
model.fit(X_train, y_train, optimizer)

In [7]:
from sklearn.metrics import mean_squared_error

np.sqrt(mean_squared_error(y_test, model.predict(X_test)))

8413380.422980024