        <b>Exercise: Implementing Polynomial Regression from Scratch with Diabetes Dataset</b>


<b>Objective:</b>

Implement polynomial regression from scratch using the Diabetes dataset to understand how to extend linear regression for capturing non-linear relationships.

<b>Step 1: Load and Explore the Dataset</b>

Load the Diabetes dataset and explore its features. Familiarize yourself with the dataset structure and the target variable (disease progression one year after baseline).

In [4]:
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the diabetes dataset
diabetes = load_diabetes()
data, target = diabetes.data, diabetes.target

<b>Step 2: Split the Dataset</b>

Split the dataset into training and testing sets. This will allow us to train the model on one subset and evaluate its performance on another.

In [5]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2, random_state=42)

In [6]:
# Scale the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

<b>Step 3: Implement Polynomial Features Function</b>

Implement a function to transform the input features into polynomial features of a given degree. This function will take the original features and create new features by raising them to different powers.

In [7]:
def polynomial_features(X, degree):
    """
    Generate polynomial features for input data X up to a given degree.
    """
    from itertools import combinations_with_replacement
    import numpy as np

    n_samples, n_features = X.shape
    # Start with bias term (intercept)
    X_poly = np.ones((n_samples, 1))

    for deg in range(1, degree + 1):
        for items in combinations_with_replacement(range(n_features), deg):
            X_new = np.prod(X[:, items], axis=1).reshape(-1, 1)
            X_poly = np.hstack((X_poly, X_new))

    return X_poly


<b>Step 4: Implement Polynomial Regression Class</b>

Create a class for polynomial linear regression with methods for fitting the model and making predictions. Use mean squared error as the cost function and gradient descent for optimization.

In [8]:
class PolynomialRegression:
    def __init__(self, degree, learning_rate=0.001, n_iterations=1000):
        self.degree = degree
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = None

    def polynomial_features(self, X):
        return polynomial_features(X, self.degree)

    def fit(self, X, y):
        X_poly = self.polynomial_features(X)
        n_samples, n_features = X_poly.shape
        # Initialize weights
        self.weights = np.zeros(n_features)

        # Gradient Descent
        for _ in range(self.n_iterations):
            y_pred = X_poly.dot(self.weights)
            errors = y_pred - y
            gradient = (2 / n_samples) * X_poly.T.dot(errors)
            self.weights -= self.learning_rate * gradient

    def predict(self, X):
        X_poly = self.polynomial_features(X)
        return X_poly.dot(self.weights)


<b>Step 5: Train and Evaluate the Model</b>

Instantiate the <i>'PolynomialRegression'</i> class, fit the model to the training set, and evaluate its performance on the test set.

In [13]:
# Instantiate and train the polynomial regression model
model = PolynomialRegression(degree=2, learning_rate=0.001, n_iterations=100)
model.fit(X_train, y_train)

# Make predictions on the test set
predictions = model.predict(X_test)

# Evaluate the model (calculate and print mean squared error)
mse = np.mean((predictions - y_test) ** 2)
print(f"Mean Squared Error on Test Set: {mse}")

Mean Squared Error on Test Set: 6660.786220859313
