<b> Implementing Linear Regression from Scratch with California Housing Dataset </b>

Objective:
This exercise aims to provide a hands-on experience in implementing linear regression from scratch using the California housing dataset. You will gain a deeper understanding of the inner workings of linear regression, including the concepts of cost function, and gradient descent optimization.

<b>Steps:</b>

1- Load the California Housing Dataset:

- Use the fetch_california_housing function from scikit-learn to load the dataset.

In [83]:
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [84]:
# Load the California housing dataset
housing = fetch_california_housing()
data, target = housing.data, housing.target

In [85]:
# explore the data
print(data.shape)
print(target.shape)
print(housing.DESCR)

(20640, 8)
(20640,)
.. _california_housing_dataset:

California Housing dataset
--------------------------

**Data Set Characteristics:**

    :Number of Instances: 20640

    :Number of Attributes: 8 numeric, predictive attributes and the target

    :Attribute Information:
        - MedInc        median income in block group
        - HouseAge      median house age in block group
        - AveRooms      average number of rooms per household
        - AveBedrms     average number of bedrooms per household
        - Population    block group population
        - AveOccup      average number of household members
        - Latitude      block group latitude
        - Longitude     block group longitude

    :Missing Attribute Values: None

This dataset was obtained from the StatLib repository.
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

The target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).

This

2- Data Preprocessing:

- Add a bias term to the input features.
- Split the dataset into training and testing sets.

In [86]:
# Add a bias term to the input features
data_bias = np.c_[np.ones((data.shape[0], 1)), data]

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data_bias, target, test_size=0.2, random_state=42)

3- Standardization:

- Standardize the input features using StandardScaler from scikit-learn.

In [87]:
# Standardize the input features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

4- Linear Regression Implementation:

- Implement a simple linear regression class with methods for fitting the model and making predictions.
- Use mean squared error as the cost function.
- Utilize gradient descent for optimization.

In [88]:
# Linear regression implementation from scratch
class LinearRegression:
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        """
        Initialize the linear regression model.

        Args:
            learning_rate (float): learning rate.
            n_iter (int): number of iterations.
        """
        #Initialize the parameters
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations


    def fit(self, X, y):
        """
        Train the linear regression model.

        Args:
            X : input features.
            y : target values.
        """
        #Initialize weights with random values
        self.weights = np.random.randn(X.shape[1]) * 0.01

        #Loop to iterate through the training program
        for _ in range(self.n_iterations):
            # Predict  target values (predicted_y) based on weights and input features (X) using dot product
            predicted_y = np.dot(X, self.weights)

            # Calculate the errors using difference between the predicted target values (predicted_y) and the target values we have(y)
            errors = predicted_y - y

            # Calculate the gradient of the cost function (MSE)
            # Adjust errors based on the input relationships (transposed X) and sample size (X.shape[0])
            gradient = (1 / X.shape[0]) * np.dot(X.T, errors)

            # Update the weights by deducting the product of the learning rate and the gradient from the current weights.
            self.weights -= self.learning_rate * gradient


    def predict(self, X):
        """
        Generate predictions for X.

        Args:
            X : input features.
        """
        # Generate predictions based on weights
        return np.dot(X, self.weights)
    

5- Training the Model:

- Instantiate the linear regression model.
- Train the model on the training set using the implemented gradient descent algorithm.

In [89]:
# Instantiate and train the model
model = LinearRegression(learning_rate=0.01, n_iterations=1000)
model.fit(X_train_scaled, y_train)

6- Prediction and Evaluation:

- Make predictions on the test set.
- Evaluate the model's performance using mean squared error.

In [90]:
# Make predictions on the test set
predictions = model.predict(X_test_scaled)

# Evaluate the model
mse = np.mean((predictions - y_test)**2)
print(f"Mean Squared Error on Test Set: {mse}")

Mean Squared Error on Test Set: 4.867671375151251
