# Exercise: Implementing Polynomial Regression from Scratch with Diabetes Dataset


## Objective:

Implement polynomial regression from scratch using the Diabetes dataset to understand how to extend linear regression for capturing non-linear relationships.

### Step 1: Load and Explore the Dataset

Load the Diabetes dataset and explore its features. Familiarize yourself with the dataset structure and the target variable (disease progression one year after baseline).

In [2]:
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the diabetes dataset
diabetes = load_diabetes()
data, target = diabetes.data, diabetes.target


### Step 2: Split the Dataset

Split the dataset into training and testing sets. This will allow us to train the model on one subset and evaluate its performance on another.

In [3]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2, random_state=42)

In [4]:
# Scale the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

### Step 3: Implement Polynomial Features Function

Implement a function to transform the input features into polynomial features of a given degree. This function will take the original features and create new features by raising them to different powers.

In [5]:
def polynomial_features(X, degree=2):
    # Get the number of samples and features
    n_samples, n_features = X.shape
    
    # Initialize a list to store the polynomial features
    features = [np.ones(n_samples)]  # Start with a column of ones for the bias term
    
    # Loop over each degree from 1 to the specified degree
    for d in range(1, degree + 1):
        for feature_index in range(n_features):
            # Raise each feature to the power of d and add to the list
            features.append(X[:, feature_index] ** d)
    
    # Concatenate all features into a single array
    return np.column_stack(features)

### Step 4: Implement Polynomial Regression Class 

Create a class for polynomial linear regression with methods for fitting the model and making predictions. Use mean squared error as the cost function and gradient descent for optimization.

In [6]:
class PolynomialRegression:
    def __init__(self, degree, learning_rate=0.001, n_iterations=1000):
        self.degree = degree
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = None

    def polynomial_features(self, X):
        return polynomial_features(X, self.degree)

    def fit(self, X, y):
        # Use the class method to generate polynomial features
        X = self.polynomial_features(X)

        # Initialize weights randomly
        self.weights = np.random.randn(X.shape[1])

        # Perform gradient descent
        for _ in range(self.n_iterations):
            # Calculate predictions
            y_pred = np.dot(X, self.weights)
            # Calculate error
            error = y_pred - y
            # Update weights
            self.weights -= self.learning_rate * np.dot(X.T, error) / X.shape[0]

    def predict(self, X):
        # Generate polynomial features for the input data
        X = self.polynomial_features(X)
        
        # Calculate predictions
        return np.dot(X, self.weights)




### Step 5: Train and Evaluate the Model

Instantiate the $PolynomialRegression$ class, fit the model to the training set, and evaluate its performance on the test set.

In [7]:
# Instantiate and train the polynomial regression model
model = PolynomialRegression(degree=2, learning_rate=0.1, n_iterations=5000)
model.fit(X_train, y_train)

# Make predictions on the test set
predictions = model.predict(X_test)

# Evaluate the model (calculate and print mean squared error)
mse = np.mean((predictions - y_test) ** 2)
print(f"Mean Squared Error on Test Set: {mse}")