### Practical Lab 4 - Multivariate Linear and Polynomial Regression, and Evaluation using R-Squared, MAPE and MAE

* Loadind the dataset
* Split the data into train, validation, and test sets
* Standardize features

In [2]:
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the dataset
diabetes = load_diabetes(as_frame=True, scaled=True)

# Split the data into train, validation, and test sets
X_train, X_temp, y_train, y_temp = train_test_split(diabetes.data, diabetes.target, test_size=0.4, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)

### Multivariate Linear Regression:
* Fit the Multivariate linear regression model
* Make predictions on the validation set
* Evaluating the model

In [3]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error, mean_absolute_percentage_error

# Fit the multivariate linear regression model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)

# Make predictions on the validation set
y_val_pred_linear = linear_model.predict(X_val)

# Evaluate the model
r2_linear = r2_score(y_val, y_val_pred_linear)
mae_linear = mean_absolute_error(y_val, y_val_pred_linear)
mape_linear = mean_absolute_percentage_error(y_val, y_val_pred_linear) * 100


### Polynomial Regression on BMI (2nd Degree):
* Create polynomial features
* Fit the polynomial regression model
* Make predictions on the validation set
* Evaluating the model

In [4]:
from sklearn.preprocessing import PolynomialFeatures

# Extract the BMI feature
X_train_bmi = X_train[:, 2:3]  # Index 2 corresponds to the BMI feature
X_val_bmi = X_val[:, 2:3]

# Create polynomial features (degree=2)
poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_bmi_poly = poly.fit_transform(X_train_bmi)
X_val_bmi_poly = poly.transform(X_val_bmi)

# Fit the polynomial regression model
poly_model_bmi = LinearRegression()
poly_model_bmi.fit(X_train_bmi_poly, y_train)

# Make predictions on the validation set
y_val_pred_poly_bmi = poly_model_bmi.predict(X_val_bmi_poly)

# Evaluate the model
r2_poly_bmi = r2_score(y_val, y_val_pred_poly_bmi)
mae_poly_bmi = mean_absolute_error(y_val, y_val_pred_poly_bmi)
mape_poly_bmi = mean_absolute_percentage_error(y_val, y_val_pred_poly_bmi) * 100

### Multivariate Polynomial Regression (2nd Degree) on All Variables:
* Create polynomial features for all variables
* Fit the multivariate polynomial regression model
* Making the predictions on the validation set
* Evaluating the model

In [5]:
# Create polynomial features (degree=2) for all variables
poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_poly = poly.fit_transform(X_train)
X_val_poly = poly.transform(X_val)

# Fit the multivariate polynomial regression model
poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)

# Make predictions on the validation set
y_val_pred_poly = poly_model.predict(X_val_poly)

# Evaluate the model
r2_poly = r2_score(y_val, y_val_pred_poly)
mae_poly = mean_absolute_error(y_val, y_val_pred_poly)
mape_poly = mean_absolute_percentage_error(y_val, y_val_pred_poly) * 100


### Comparing the Three Models:

In [6]:
# Print model comparison
print("Multivariate Linear Regression:")
print(f"R-squared: {r2_linear:.3f}, MAE: {mae_linear:.3f}, MAPE: {mape_linear:.3f}%")

print("\nPolynomial Regression on BMI (2nd Degree):")
print(f"R-squared: {r2_poly_bmi:.3f}, MAE: {mae_poly_bmi:.3f}, MAPE: {mape_poly_bmi:.3f}%")

print("\nMultivariate Polynomial Regression (2nd Degree) on All Variables:")
print(f"R-squared: {r2_poly:.3f}, MAE: {mae_poly:.3f}, MAPE: {mape_poly:.3f}%")


Multivariate Linear Regression:
R-squared: 0.581, MAE: 38.221, MAPE: 34.802%

Polynomial Regression on BMI (2nd Degree):
R-squared: 0.362, MAE: 48.909, MAPE: 44.269%

Multivariate Polynomial Regression (2nd Degree) on All Variables:
R-squared: 0.418, MAE: 47.277, MAPE: 44.361%


### Number of Parameters for Each Model:
* Multivariate Linear Regression: The number of parameters is equal to the number of features (variables) plus one for the intercept term.
* Polynomial Regression on BMI (2nd Degree): The number of parameters is determined by the polynomial degree (2 in this case) plus one for the intercept.
* Multivariate Polynomial Regression (2nd Degree) on All Variables: The number of parameters is determined by the polynomial degree (2 in this case) times the number of original features plus one for the intercept.

### Choosing a Model for Deployment:
I choose <b>Multivariate Linear Regression</b>, because,
* R-squared (R²): A higher R-squared value indicates that the model explains more variance in the target variable. Based on this criterion, the Multivariate Linear Regression model has the best R-squared value (0.581), suggesting that it explains the most variance among the three models.

* Mean Absolute Error (MAE): A lower MAE indicates lower prediction error on average. The Multivariate Linear Regression model has the lowest MAE (38.221), indicating the smallest average prediction error among the models.

* Mean Absolute Percentage Error (MAPE): MAPE represents the average percentage error in predictions. The Multivariate Linear Regression model also has the lowest MAPE (34.802%), meaning that, on average, its predictions are closest to the actual values in percentage terms.

the Multivariate Linear Regression model appears to perform the best among the three models. It has the highest R-squared value, the lowest MAE, and the lowest MAPE on the validation set, suggesting better overall predictive performance
