# Exercise: Implementing Polynomial Regression from Scratch with Diabetes Dataset


## Objective:

Implement polynomial regression from scratch using the Diabetes dataset to understand how to extend linear regression for capturing non-linear relationships.

### Step 1: Load and Explore the Dataset

Load the Diabetes dataset and explore its features. Familiarize yourself with the dataset structure and the target variable (disease progression one year after baseline).

In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the diabetes dataset
diabetes = load_diabetes()
df = pd.DataFrame(data=diabetes.data, columns=diabetes.feature_names)

### Step 2: Split the Dataset

Split the dataset into training and testing sets. This will allow us to train the model on one subset and evaluate its performance on another.

In [2]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df, test_size=0.2, random_state=42)

In [3]:
# Scale the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

### Step 3: Implement Polynomial Features Function

Implement a function to transform the input features into polynomial features of a given degree. This function will take the original features and create new features by raising them to different powers.

In [4]:
def polynomial_features(X, degree = 2):
    numerical_columns = X.select_dtypes(include=[np.number]).columns

    # Create polynomial features up to the specified degree
    for col in numerical_columns:
        for d in range(2, degree + 1):
            new_col_name = f"{col}^{d}"
            X[new_col_name] = X[col] ** d
    return X

### Step 4: Implement Polynomial Regression Class 

Create a class for polynomial linear regression with methods for fitting the model and making predictions. Use mean squared error as the cost function and gradient descent for optimization.

In [5]:
class PolynomialRegression:
    def __init__(self, degree, learning_rate=0.001, n_iterations=1000):
        

    def polynomial_features(self, X):
       X = polynomial_features(X)

    def fit(self, X, y):
       


### Step 5: Train and Evaluate the Model

Instantiate the $PolynomialRegression$ class, fit the model to the training set, and evaluate its performance on the test set.

In [6]:
# Instantiate and train the polynomial regression model
model = PolynomialRegression(degree=2, learning_rate=0.001, n_iterations=100)
model.fit(X_train, y_train)

# Make predictions on the test set
predictions = model.predict(X_test)

# Evaluate the model (calculate and print mean squared error)
mse = np.mean((predictions - y_test) ** 2)
print(f"Mean Squared Error on Test Set: {mse}")

Mean Squared Error on Test Set: 6660.786220859313
