# Loading Libraries and diabetes dataset

In [2]:
from sklearn.datasets import load_diabetes
import pandas as pd


diabetes = load_diabetes(as_frame=True, scaled=False)


## Splitting the data into training, validation, and test sets.
## The data is split into a 70% training set, a 15% validation set, and a 15% test set.

In [3]:
from sklearn.model_selection import train_test_split

X_train, X_temp, y_train, y_temp = train_test_split(diabetes.data, diabetes.target, test_size=0.3, random_state=42)
X_valid, X_test, y_valid, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)


## Trainnig and evaluating the multivariate linear model:

In [4]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error, mean_absolute_percentage_error

linear_model = LinearRegression()
linear_model.fit(X_train, y_train)


linear_predictions = linear_model.predict(X_valid)

r2 = r2_score(y_valid, linear_predictions)
mae = mean_absolute_error(y_valid, linear_predictions)
mape = mean_absolute_percentage_error(y_valid, linear_predictions)

print(f"Linear Model - R-squared: {r2:.2f}, MAE: {mae:.2f}, MAPE: {mape:.2f}")


Linear Model - R-squared: 0.51, MAE: 38.22, MAPE: 0.35


In [5]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

# polynomial regression model with degree 2
polymodel_D2 = make_pipeline(PolynomialFeatures(degree=2), LinearRegression())
polymodel_D2.fit(X_train, y_train)

polypredictions_D2 = polymodel_D2.predict(X_valid)


r2_poly = r2_score(y_valid, polypredictions_D2)
mae_poly = mean_absolute_error(y_valid, polypredictions_D2)
mape_poly = mean_absolute_percentage_error(y_valid, polypredictions_D2)

print(f"Polynomial Model (Degree 2) - R-squared: {r2_poly:.2f}, MAE: {mae_poly:.2f}, MAPE: {mape_poly:.2f}")


Polynomial Model (Degree 2) - R-squared: 0.30, MAE: 43.35, MAPE: 0.37


## 3


In [6]:

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X_train)
poly_model = LinearRegression()
poly_model.fit(X_poly, y_train)

X_test_poly = poly.transform(X_test)
poly_predictions = poly_model.predict(X_test_poly)

r2_poly = r2_score(y_test, poly_predictions)
mae_poly = mean_absolute_error(y_test, poly_predictions)
mape_poly = mean_absolute_percentage_error(y_test, poly_predictions)

print(f"Polynomial Regression (Degree 2) on BMI Feature - R-squared: {r2_poly:.2f}, MAE: {mae_poly:.2f}, MAPE: {mape_poly:.2f}")


Polynomial Regression (Degree 2) on BMI Feature - R-squared: 0.40, MAE: 46.25, MAPE: 0.41


## 4

In [7]:

poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_poly = poly.fit_transform(X_train)
poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)

X_test_poly = poly.transform(X_test)
poly_predictions = poly_model.predict(X_test_poly)

r2_poly = r2_score(y_test, poly_predictions)
mae_poly = mean_absolute_error(y_test, poly_predictions)
mape_poly = mean_absolute_percentage_error(y_test, poly_predictions)

print(f"Multivariate Polynomial Regression (Degree 2) - R-squared: {r2_poly:.2f}, MAE: {mae_poly:.2f}, MAPE: {mape_poly:.2f}")


Multivariate Polynomial Regression (Degree 2) - R-squared: 0.45, MAE: 45.95, MAPE: 0.39


## 5

For Multivariate Linear Regression R² Measures how well the linear model fits the data. Usually Higher is better.
MAE and MAPE indicate prediction accuracy. for them Lower values are better.

Polynomial Regression (Degree 2) on 'bmi' alone R² Measures the fit of the 'bmi' feature. Higher is better.
MAE and MAPE iIndicate prediction accuracy. Lower values are better.

Multivariate Polynomial Regression (Degree 2) on all Variables R² measures how well the polynomial model fits all variables. Higher is better.
MAE and MAPE indicate prediction accuracy. Lower values are better.
In general, higher R² and lower MAE or MAPE are desirable. The choice of model depends on the problem and the trade-off between complexity and predictive accuracy.

In [9]:
print("Multivariate Linear Regression for Degree 2:")
print(f"R-squared: {r2_poly:.2f}, MAE: {mae_poly:.2f}, MAPE: {mape_poly:.2f}\n")

print("Polynomial Regression (Degree 2) on 'bmi' Alone:")
print(f"R-squared (bmi): {r2_poly:.2f}, MAE (bmi): {mae_poly:.2f}, MAPE (bmi): {mape_poly:.2f}\n")

print("Multivariate Polynomial Regression (Degree 2) on All Variables:")
print(f"R-squared: {r2_poly:.2f}, MAE: {mae_poly:.2f}, MAPE: {mape_poly:.2f}")


Multivariate Linear Regression for Degree 2:
R-squared: 0.45, MAE: 45.95, MAPE: 0.39

Polynomial Regression (Degree 2) on 'bmi' Alone:
R-squared (bmi): 0.45, MAE (bmi): 45.95, MAPE (bmi): 0.39

Multivariate Polynomial Regression (Degree 2) on All Variables:
R-squared: 0.45, MAE: 45.95, MAPE: 0.39


## 6 i

Multivariate Linear Regression
The number of parameters is equal to the number of features in the dataset plus one for the intercept term. The features are the characteristics or columns in the diabetes dataset.
Parameters = Number of features + 1 (for the intercept term)
That means each feature has its own weight, and there's one extra

Polynomial Regression on BMI:
In the polynomial regression for BMI, there are three parameters: one for BMI, one for BMI squared, and one for the intercept term.
Parameters = 3 (original 'bmi', 'bmi^2', and an intercept)
That means we have 'bmi', 'bmi^2', and an intercept in the model.

Multivariate Polynomial Regression:
Similar to multivariate linear regression, the number of parameters depends on the number of features, but here it's larger because of the extra polynomial terms introduced by the polynomial regression.
Parameters = So Many. Basically depends on the number of original features, interactions, and quadratic terms, plus an intercept.
This model has more parameters because it considers interactions and quadratic terms for all original features in addition to an intercept, making it more complex.






## 6 ii
It deployments depends on various factors, including the specific problem, available data, and the trade-off between model complexity and interpretability. 

I would choose Multivariate Linear Regression for simplicity and ease of interpretation when relationships between variables are mostly linear, Polynomial Regression (Degree 2) on 'bmi' Alone if we expect a nonlinear, quadratic relationship with the 'bmi' feature and, Multivariate Polynomial Regression (Degree 2) on All Variables when we suspect complex, nonlinear interactions between multiple features.
