<b>Exercise: Implementing Polynomial Regression from Scratch with Diabetes Dataset</b>


<b>Objective:</b>

Implement polynomial regression from scratch using the Diabetes dataset to understand how to extend linear regression for capturing non-linear relationships.

<b>Step 1: Load and Explore the Dataset</b>

Load the Diabetes dataset and explore its features. Familiarize yourself with the dataset structure and the target variable (disease progression one year after baseline).

In [115]:
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the diabetes dataset
diabetes = load_diabetes()
data, target = diabetes.data, diabetes.target

<b>Step 2: Split the Dataset</b>

Split the dataset into training and testing sets. This will allow us to train the model on one subset and evaluate its performance on another.

In [116]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2, random_state=42)

In [117]:
# Scale the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

<b>Step 3: Implement Polynomial Features Function</b>

Implement a function to transform the input features into polynomial features of a given degree. This function will take the original features and create new features by raising them to different powers.

In [118]:
def polynomial_features(X, degree):
    """
    Transform the input features into polynamial features of a given degree
    :X: input features
    :degree: polynamial degree
    :return: new features raised to different powers
    """
    # Create an empty matrix to store new features
    X_poly = np.empty((len(X), degree+1))

    # From 0 to degree, calculate the corresponding power of X and store it in X_poly. 
    for i in range(degree+1):
        X_poly[:, i] = X[:, 0]**i 
    return X_poly

<b>Step 4: Implement Polynomial Regression Class</b>

Create a class for polynomial linear regression with methods for fitting the model and making predictions. Use mean squared error as the cost function and gradient descent for optimization.

In [119]:
class PolynomialRegression:
    def __init__(self, degree, learning_rate=0.001, n_iterations=100):
        # Initialization code here
        self.degree = degree
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.theta = None
    
    def polynomial_features(self, X):
        # Create an empty matrix to store new features
        X_poly = np.empty((len(X), self.degree+1))
        # From 0 to degree, calculate the corresponding power of X and store it in X_poly. 
        for i in range(self.degree+1):
            X_poly[:, i] = X[:, 0]**i
        return X_poly
    
    def fit(self, X, y):
        """
        Train the polynomial regression model.
        :X: input features
        :y: target features
        """
        # Transform the input features into polynomial features
        X_poly = self.polynomial_features(X)
        # Initialize model parameters
        self.theta = np.zeros(X_poly.shape[1])

        
        for _ in range(self.n_iterations):
            # Calculate the error using the difference between predicted target values and actual target values
            error = np.dot(X_poly, self.theta) - y

            # Calculate the gradient with respect to the parameter theta
            gradient = (2 / X_poly.shape[0]) * np.dot(X_poly.T, error)

            # Update theta by subtracting the product of the learning rate and the gradient
            self.theta -= self.learning_rate * gradient
    
    def predict(self, X):
        """
        Generate predictions for X.
        :X: input features
        :return: predictions
        """
        # Transform the input features into polynomial features
        X_poly = self.polynomial_features(X)
        # Generate predictions based on theta
        return np.dot(X_poly, self.theta)
        

<b>Step 5: Train and Evaluate the Model</b>

Instantiate the <i>'PolynomialRegression'</i> class, fit the model to the training set, and evaluate its performance on the test set.

In [130]:
# Instantiate and train the polynomial regression model
model = PolynomialRegression(degree=2, learning_rate=0.001, n_iterations=100)
model.fit(X_train, y_train)

# Make predictions on the test set
predictions = model.predict(X_test)

# Evaluate the model (calculate and print mean squared error)
mse = np.mean((predictions - y_test) ** 2)
print(f"Mean Squared Error on Test Set: {mse}")

Mean Squared Error on Test Set: 14197.452402381148
