Cross  Validation
-----------------

Cross-validation is a model evaluation technique in machine learning that helps you understand how well your model will perform on unseen (new) data.

Instead of training on one fixed train‚Äìtest split, cross-validation tests the model multiple times using different splits, giving a more reliable performance score.

‚úÖ Why do we need Cross Validation?
------------------------------------
Because if you do only one train-test split (e.g., 80‚Äì20), the result may be unlucky:

Maybe test data is too easy ‚Üí accuracy looks very high

Maybe test data is too hard ‚Üí accuracy looks very low

Cross validation avoids this randomness.

‚úÖ What is Cross Validation?
-----------------------------
Cross validation means:

    Split dataset into K equal parts (called folds)

    Train the model on K-1 folds

    Test it on the remaining 1 fold

    Repeat K times so each fold becomes the test set once

    Take the average score

    This gives a stable and fair estimate of performance.

üéØ Most Common Type: K-Fold Cross Validation
--------------------------------------------
Example with K = 5:

| Fold | Training on | Testing on |
| ---- | ----------- | ---------- |
| 1    | F2 F3 F4 F5 | F1         |
| 2    | F1 F3 F4 F5 | F2         |
| 3    | F1 F2 F4 F5 | F3         |
| 4    | F1 F2 F3 F5 | F4         |
| 5    | F1 F2 F3 F4 | F5         |

Final score = average of 5 test scores.

üß† Simple Explanation

Cross validation = training and testing your model multiple times on different splits to avoid overfitting and get a more reliable accuracy.

üìå Types of Cross Validation
------------------------------
1Ô∏è‚É£ K-Fold Cross Validation

Most common.

2Ô∏è‚É£ Stratified K-Fold

Keeps class proportions same in each fold (best for classification).

3Ô∏è‚É£ Leave-One-Out (LOOCV)

Uses only 1 sample for testing each time (very slow).

4Ô∏è‚É£ Time Series Cross Validation

Used when data has time order (cannot shuffle).

CODE
----

from sklearn.model_selection import cross_val_score

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()

scores = cross_val_score(model, X, y, cv=5)

print("Cross-validation scores:", scores)

print("Average accuracy:", scores.mean())


Summary
--------

| Term             | Meaning                                     |
| ---------------- | ------------------------------------------- |
| Train-test split | Tests once only                             |
| Cross validation | Tests multiple times on different splits    |
| Benefit          | Reduces overfitting, more reliable accuracy |
