**K-fold cross-validation** is a technique for evaluating machine learning models by training and testing them on different subsets of data, providing a better estimate of model performance than a single train-test split.

### Steps:

1. **Split the Data**: Divide the data into *K* equally-sized "folds" (subsets).
2. **Train & Validate in K Rounds**:
   - In each round, use one fold as the validation set and the remaining *K-1* folds as the training set.
   - Repeat this process *K* times so that each fold is used as the validation set once.
3. **Average the Results**: After *K* rounds, average the performance scores (like accuracy or F1-score) from each fold to get a reliable estimate of model performance.

### Diagram Explanation

Imagine *K = 5* for simplicity. In each round, a different fold (colored differently in each round) is used as the validation set while the remaining folds serve as training data.

```
Round 1: [ Val ] [ Train ] [ Train ] [ Train ] [ Train ]
Round 2: [ Train ] [ Val ] [ Train ] [ Train ] [ Train ]
Round 3: [ Train ] [ Train ] [ Val ] [ Train ] [ Train ]
Round 4: [ Train ] [ Train ] [ Train ] [ Val ] [ Train ]
Round 5: [ Train ] [ Train ] [ Train ] [ Train ] [ Val ]
```

This ensures every data point is used once as a validation set, and the model is evaluated on the entire dataset across different splits.

In [1]:
import numpy as np
import pandas as pd
from sklearn.metrics import r2_score
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LinearRegression, ElasticNet
from sklearn.model_selection import train_test_split, KFold, cross_val_score

In [2]:
housing_df = pd.read_csv(r'C:\Users\DAI.STUDENTSDC\Desktop\Machine Learning\Data Sets\Boston.csv')

In [5]:
X = housing_df.drop(['medv'], axis=1)
y = housing_df['medv']

In [7]:
linear_model = LinearRegression()

score_result = cross_val_score(linear_model, X, y, cv=5)
score_result.mean()

0.35327592439587757