## 1. K-Fold Cross-Validation: The Workhorse of Model Evaluation

The dataset is divided into **k equal-sized subsets (folds)**. The model is trained and evaluated **k times**, each time using a different fold as the validation set.

For each of the **k iterations**:

1. One fold is kept aside as the **testing (validation) set**
2. The remaining **kâˆ’1 folds** are combined to form the **training set**
3. The model is trained on the training set
4. The model is evaluated on the held-out fold
5. The performance metric (Accuracy, MSE, RÂ², etc.) is recorded

- Each fold is used **exactly once** as the validation set
- Final performance = **Average of all k recorded scores**
- This reduces bias caused by a single random train-test split

### Choosing the Value of `k` in K-Fold Cross-Validation
The selection of `k` involves a **trade-off between reliability and computation cost**.

#### ðŸ”º Higher Value of `k`
- More reliable performance estimate
- Model is trained **k times** â†’ higher computation cost
- Training sets are larger and very similar
- Can increase variance if folds are not well-represented

#### ðŸ”» Lower Value of `k`
- Faster computation
- Less reliable performance estimate
- Smaller and more distinct training sets
- Can introduce higher bias in performance estimation

#### Commonly Used Values
- **k = 5** or **k = 10**
  - Good balance between accuracy and computation time
  - Widely used in practice

## 2. Stratified K-Fold Cross-Validation

Stratified K-Fold Cross-Validation is a variation of standard K-Fold Cross-Validation designed specifically for **classification problems**, especially when the dataset is **imbalanced**.  
Its main objective is to ensure that **each fold preserves the same class distribution** as the original dataset.
Each fold contains approximately the same proportion of samples from each target class as found in the full dataset.

### How Stratified K-Fold Works

1. **Stratification**
   - The dataset is first grouped based on the target variable (class labels).

2. **Proportional Distribution**
   - Samples from each class are distributed evenly across all `k` folds.
   - Example:
     - Original dataset: 80% Class A, 20% Class B  
     - Each fold: approximately 80% Class A and 20% Class B

3. **Iterative Evaluation**
   - The process runs for `k` iterations:
     - One fold is used as the validation (test) set.
     - The remaining `kâˆ’1` folds are combined to form the training set.
   - Both training and validation sets maintain the original class proportions.

4. **Performance Averaging**
   - Performance metrics (accuracy, precision, recall, F1-score, etc.) are recorded for each iteration.
   - Final performance is computed as the **average across all folds**.

### Why Stratified K-Fold Is Important
- **Fair and Reliable Evaluation**
  - Each fold represents the overall dataset distribution accurately.
- **Improved Learning for Minority Classes**
  - Minority classes are included in every training and validation split.
- **Reduced Variance in Performance Scores**
  - Leads to more stable and consistent evaluation results across folds.

### When to Use Stratified K-Fold
- Classification problems
- Imbalanced datasets
- When reliable evaluation of all classes is required

## 3. Leave-One-Out Cross-Validation (LOOCV): The Extreme Case

Leave-One-Out Cross-Validation (LOOCV) is a special case of K-Fold Cross-Validation where the number of folds `k` is equal to the number of data points `n` in the dataset.  
In each iteration, exactly **one data point** is used as the test set, and the remaining `nâˆ’1` data points are used for training.

LOOCV follows the steps below:

For a dataset with `n` data points:
1. Select one data point as the testing set.
2. Use the remaining `nâˆ’1` data points as the training set.
3. Train the model on the training set.
4. Evaluate the model on the single held-out data point and record the performance metric.
5. Repeat this process for all `n` data points.

After all iterations:
- A total of `n` performance scores are obtained.
- The final performance estimate is the **average of these `n` scores**.

### Why LOOCV is Important
**Very Low Bias**  
Since almost the entire dataset (`nâˆ’1` samples) is used for training in each iteration, the trained model closely resembles a model trained on the full dataset. This leads to a very low-bias performance estimate.

**Deterministic Evaluation**  
LOOCV produces the same result every time it is run on the same dataset, as there is only one possible way to leave out each data point.

**Effective for Small Datasets**  
When the dataset is very small, LOOCV allows maximum use of available data for training in each iteration.

### When to Consider LOOCV

LOOCV is generally used in limited scenarios due to its high computational cost:

- Very small datasets where training `n` models is feasible
- Theoretical analysis of model behavior and stability
- Certain linear models where LOOCV computation can be optimized

### Comparison with K-Fold Cross-Validation

LOOCV can be viewed as K-Fold Cross-Validation with `k = n`, but with important trade-offs:

- **Bias:** Lower bias than K-Fold (for `k < n`)
- **Variance:** Often higher variance than K-Fold
- **Computation:** Significantly more computationally expensive than K-Fold (e.g., `k = 5` or `k = 10`)

**Practical Recommendation**: In most real-world applications, **K-Fold Cross-Validation with `k = 5` or `k = 10`** is preferred over LOOCV due to its better balance between computational efficiency, bias, and variance.

## K-Fold Cross-Validation for a Regression Model

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import KFold
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_regression

In [5]:
# Generate a synthetic regression dataset
np.random.seed(42)
X,y = make_regression(n_samples=100,n_features=10,noise=10,random_state=42)
x_df = pd.DataFrame(X, columns=[f'feature_{i}'for i in range(X.shape[1])])
y_df = pd.Series(y,name='target')

In [11]:
# Initialize K-Fold
kf = KFold(n_splits=5,shuffle=True,random_state=42)

# Perform K-Fold Cross-Validation manually
mse_scores=[]
print('Performing K-Fold Cross-Validation...')

for fold,(train_index,test_index) in enumerate(kf.split(x_df)):
    print(f'--- Fold {fold+1} ---')
    
    # Split data into training and testing sets for this fold
    X_train, X_test = x_df.iloc[train_index], x_df.iloc[test_index]
    y_train, y_test = y_df.iloc[train_index], y_df.iloc[test_index]
    
    # Initialize and train the model
    model = LinearRegression()
    model.fit(X_train, y_train)
    
    # Make predictions on the test set
    y_pred = model.predict(X_test)
    
    # Calculate Mean Squared Error for this fold
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f'MSE for Fold {fold+1}: {mse:.2f}')

# Calculate the average MSE across all folds
mean_mse = np.mean(mse_scores)
std_mse = np.std(mse_scores)

print('--- Cross-Validation Results ---')
print(f'Average MSE: {mean_mse:.2f}')
print(f'Standard Deviation of MSE: {std_mse:.2f}')

Performing K-Fold Cross-Validation...
--- Fold 1 ---
MSE for Fold 1: 102.66
--- Fold 2 ---
MSE for Fold 2: 110.75
--- Fold 3 ---
MSE for Fold 3: 100.54
--- Fold 4 ---
MSE for Fold 4: 106.44
--- Fold 5 ---
MSE for Fold 5: 146.99
--- Cross-Validation Results ---
Average MSE: 113.48
Standard Deviation of MSE: 17.11


In [13]:
# Generating k-fold using in-built
from sklearn.model_selection import cross_val_score
model = LinearRegression()

# scoring='neg_mean_squared_error' because cross_val_score maximizes the score
# We will negate it later to get positive MSE
scores = cross_val_score(model, x_df, y_df, cv=kf, scoring='neg_mean_squared_error')

print('--- Using cross_val_score ---')
print(f'Scores for each fold: {np.negative(scores)}') # Negate to get positive MSE

mean_cv_mse = np.mean(np.negative(scores))
std_cv_mse = np.std(np.negative(scores))

print(f'Average MSE (cross_val_score): {mean_cv_mse:.2f}')
print(f'Standard Deviation of MSE (cross_val_score): {std_cv_mse:.2f}')

--- Using cross_val_score ---
Scores for each fold: [102.65673458 110.74775381 100.54424578 106.43890505 146.98911372]
Average MSE (cross_val_score): 113.48
Standard Deviation of MSE (cross_val_score): 17.11


## Stratified K-Fold for a Classification Task

In [14]:
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score
from sklearn.datasets import make_classification

In [15]:
np.random.seed(42)
X_clf, y_clf = make_classification(n_samples=100, n_features=20, n_informative=10, n_redundant=5,
                                   n_classes=2, n_clusters_per_class=2, weights=[0.9, 0.1], flip_y=0.05, random_state=42)
X_clf_df = pd.DataFrame(X_clf)
y_clf_df = pd.Series(y_clf)

print(f'Class distribution: {pd.Series(y_clf).value_counts()}')

Class distribution: 0    85
1    15
Name: count, dtype: int64


In [17]:
# Initialize Stratified K Fold
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
# Perform Stratified K-Fold Cross-Validation using cross_val_score
model_clf = LogisticRegression(solver='liblinear', random_state=42) # liblinear is good for small datasets

# Evaluate using accuracy
accuracy_scores = cross_val_score(model_clf, X_clf_df, y_clf_df, cv=skf, scoring='accuracy')

# Evaluate using F1-score (weighted average is often good for imbalanced data)
# 'weighted' accounts for label imbalance when computing the metric
f1_scores = cross_val_score(model_clf, X_clf_df, y_clf_df, cv=skf, scoring='f1_weighted')

print('--- Stratified K-Fold Results (Classification) ---')
print(f'Accuracy scores for each fold: {accuracy_scores}')
print(f'Average Accuracy: {np.mean(accuracy_scores):.3f} +/- {np.std(accuracy_scores):.3f}')

print(f'F1-scores (weighted) for each fold: {f1_scores}')
print(f'Average F1-score (weighted): {np.mean(f1_scores):.3f} +/- {np.std(f1_scores):.3f}')

--- Stratified K-Fold Results (Classification) ---
Accuracy scores for each fold: [0.6  0.75 0.7  0.85 0.9 ]
Average Accuracy: 0.760 +/- 0.107
F1-scores (weighted) for each fold: [0.6375     0.76406926 0.7        0.78108108 0.87777778]
Average F1-score (weighted): 0.752 +/- 0.081
