
# ‚≠ê **What is Cross Validation?**

**Cross Validation (CV)** is a technique used to check how well a machine learning model will perform on **new, unseen data**.
It helps us make sure the model is **not overfitting**, and the accuracy is **more reliable**.

---

# üéØ **Why We Use Cross Validation**

‚úî Gives more **accurate** model performance
‚úî Reduces **overfitting**
‚úî Works well when the dataset is **small**
‚úî Helps us choose the **best model**
‚úî Helps us choose the **best hyperparameters**

---

# üîÑ **How Cross Validation Works (Simple Explanation)**

Instead of doing a simple **train/test split**, cross validation **splits the dataset into K parts** (called *folds*).

Example: **5-Fold CV**

1. Split data into 5 equal parts
2. Use **4 parts** for training
3. Use **1 part** for testing
4. Repeat this **5 times**, each time a different part is used for testing
5. Calculate the **average accuracy**

This gives a **more stable and trusted result**. üî•üìä

---

# ‚≠ê **Types of Cross Validation**

### 1Ô∏è‚É£ **K-Fold Cross Validation**

Most commonly used.
Dataset is divided into *K* folds (like 5 or 10).

---

### 2Ô∏è‚É£ **Stratified K-Fold**

Used for **classification** problems.
Keeps the **same class ratio** in every fold.

---

### 3Ô∏è‚É£ **Leave-One-Out CV (LOOCV)**

Each row becomes a test set once.
Very accurate but **very slow** when dataset is large.

---

### 4Ô∏è‚É£ **Time Series Cross Validation**

Used for **time series data**.
Always trains on **past data** and tests on **future data**.

---

# üß™ **Simple Python Example**

```python
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
model = LogisticRegression()

scores = cross_val_score(model, X, y, cv=5)
print(scores)
print("Average Accuracy:", scores.mean())
```



In [25]:
# import libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import GaussianNB

In [26]:
%%time
# load iris data
iris = load_iris()

# define Gaussion Naive Bayes model
nb = GaussianNB()

# perform k-fold cross-validation with k=5
score = cross_val_score(nb, iris.data, iris.target, cv=5, scoring='accuracy')

# print the scores fold and the mean score
print("Cross-validation scores for each fold:", score)
print("Mean cross-validation score:", score.mean())


Cross-validation scores for each fold: [0.93333333 0.96666667 0.93333333 0.93333333 1.        ]
Mean cross-validation score: 0.9533333333333334
CPU times: user 18.4 ms, sys: 2.42 ms, total: 20.8 ms
Wall time: 25.9 ms


# `Cross-validation with tips`

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split



In [28]:
tips = sns.load_dataset('tips')
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [29]:
X = tips[['total_bill', 'tip', 'size' ]]
y = tips['sex']


In [30]:
nb = GaussianNB()