# Model Evaluation with K-Fold Cross-Validation

In this notebook, we will evaluate two types of regression models using K-Fold Cross-Validation on the Diabetes dataset:

1. **Linear Regression**
2. **Lasso Regression**

### Objectives:
1. **Load the Dataset**: We will use the Diabetes dataset from `scikit-learn`, which contains data related to diabetes progression.
2. **Train and Evaluate the Models**: We will fit both Linear Regression and Lasso Regression models to the dataset and evaluate their performance.
3. **Cross-Validation**: We will perform 10-Fold Cross-Validation to assess each model's generalization performance and compare it with the training performance.

### Steps:
1. **Import Libraries**: Import necessary libraries for data manipulation, modeling, and evaluation.
2. **Load and Prepare Data**: Load the dataset and prepare the feature matrix and target variable.
3. **Train the Models**: Fit both Linear Regression and Lasso Regression models to the entire dataset.
4. **Evaluate the Models**: Compute the R² scores for both training and cross-validated predictions for each model.
5. **Print Results**: Display the R² scores to understand each model's performance on the training data and during cross-validation.

In [38]:
# Import necessary libraries
import numpy as np
from sklearn.model_selection import KFold
from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet
from sklearn.metrics import r2_score


In [39]:
# Load the diabetes dataset
from sklearn.datasets import load_diabetes
dataset = load_diabetes()
X = dataset.data
y = dataset.target

In [40]:
def evaluate_model(name, model):
    """
    Fits a model, evaluates its performance on the training data and using 10-Fold cross-validation.
    Prints the R^2 scores for both training and cross-validation.
    
    Parameters:
    - name: str, name of the model
    - model: scikit-learn model instance to evaluate
    """
    # Fit the model on the entire dataset
    model.fit(X, y)
    
    # Make predictions on the entire dataset
    p = model.predict(X)
    
    # Calculate the R^2 score for the training data
    r2_train = r2_score(y, p)

    # Initialize K-Fold cross-validation with 10 folds
    kf = KFold(n_splits=10, shuffle=True, random_state=111)
    
    # Array to store cross-validated predictions
    p_cv = np.zeros_like(y)

    # Perform K-Fold cross-validation
    for train_index, test_index in kf.split(X):
        # Split the data into training and test sets for this fold
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Fit the model on the training data
        model.fit(X_train, y_train)
        
        # Predict on the test data and store the predictions
        p_cv[test_index] = model.predict(X_test)

    # Calculate the R^2 score for the cross-validated predictions
    r2_cv = r2_score(y, p_cv)
    
    # Print the results
    print(f'Method:: {name}')  # Model name
    print(f'Training R^2:: {r2_train:.2f}')  # R^2 score on the training data
    print(f'10-Fold CV R^2:: {r2_cv:.2f}\n')  # R^2 score from 10-fold cross-validation


In [41]:
# Define models to evaluate
models = [
    ('linear regression', LinearRegression()),
    ('lasso regression', Lasso(alpha=0.5)),
    ('ridge regression', Ridge(alpha=1.0)),
    ('elastic net', ElasticNet(alpha=0.5, l1_ratio=0.5))
]

# Evaluate each model
for name, model in models:
    evaluate_model(name, model)

Method:: linear regression
Training R^2:: 0.52
10-Fold CV R^2:: 0.49

Method:: lasso regression
Training R^2:: 0.46
10-Fold CV R^2:: 0.44

Method:: ridge regression
Training R^2:: 0.45
10-Fold CV R^2:: 0.43

Method:: elastic net
Training R^2:: 0.02
10-Fold CV R^2:: 0.02

