# Day 15 â€“ Cross Validation and Overfitting in Machine Learning

## Introduction
After training a Machine Learning model, it is important to check whether the model performs well on unseen data.
Two important concepts related to this are Overfitting and Cross Validation.

Understanding these concepts is critical for building reliable ML systems.

---

## What is Overfitting
Overfitting happens when a model learns the training data too well,
including noise and unnecessary details.

An overfitted model:
- Performs very well on training data
- Performs poorly on test or new data

This means the model fails to generalize.

---

## What is Underfitting
Underfitting happens when a model is too simple and fails to learn patterns from data.

An underfitted model:
- Performs poorly on training data
- Performs poorly on test data

The model does not capture the relationship between input and output.

---

## What is Cross Validation
Cross Validation is a technique used to evaluate model performance more reliably.

Instead of splitting data only once into training and testing sets,
cross validation divides the dataset into multiple parts and trains/tests multiple times.

The most common method is K-Fold Cross Validation.

---

## K-Fold Cross Validation
In K-Fold Cross Validation:
- The dataset is divided into K equal parts (folds).
- The model is trained K times.
- Each time, one fold is used for testing and the rest for training.
- The final performance is the average of all results.

This gives a more accurate estimate of model performance.

---

## Why Cross Validation is Important
- Reduces bias from a single train-test split
- Provides more stable performance measurement
- Helps detect overfitting

---

## Conclusion
Understanding overfitting and cross validation is essential for building robust Machine Learning models.
Proper validation ensures that the model performs well on real-world data.


### 1.Importing Required Library

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import LinearRegression


### 2.Creating Dataset

In [3]:
data = {
    "Experience": [1, 2, 3, 4, 5, 6, 7, 8],
    "Salary": [30000, 35000, 40000, 45000, 50000, 55000, 60000, 65000]
}

df = pd.DataFrame(data)


### 3. splitting Features and Target 

In [5]:
X = df[["Experience"]]
y = df["Salary"]
