# **Bias vs Variance in Machine Learning** 🎯

Bias and variance are two sources of **error** in machine learning models. Understanding the trade-off between them is **key to building good models**.  

---

## **🔹 What is Bias?**
- **Bias is the error due to overly simplistic assumptions in the learning algorithm.**  
- High bias means the model is too **simple** and **does not capture** the underlying patterns in data.  
- A model with **high bias** is **underfitting** the data.

### **Example of High Bias**
- Predicting housing prices using **only the number of bedrooms** while ignoring other features (like location, size, etc.).
- A linear model trying to fit a **non-linear** dataset.

### **Effects of High Bias:**
❌ **Underfitting** – The model is too simple.  
❌ **Poor accuracy on both training & test data**.  

---

## **🔹 What is Variance?**
- **Variance is the error due to sensitivity to small variations in the training data.**  
- High variance means the model is too **complex** and **captures noise** along with the actual pattern.  
- A model with **high variance** is **overfitting** the data.

### **Example of High Variance**
- A decision tree that **memorizes** every training sample but fails on new test data.
- A deep neural network trained on a small dataset.

### **Effects of High Variance:**
❌ **Overfitting** – The model performs well on training data but fails on unseen data.  
❌ **Low test accuracy but high training accuracy**.  

---

## **🔹 Bias-Variance Tradeoff**
- **We need to balance bias and variance** to achieve good model performance.  
- **Goal:** Find a model that generalizes well on new data.  

| **Bias** (Underfitting) | **Variance** (Overfitting) |
|-----------------|------------------|
| Too simple | Too complex |
| Ignores patterns | Captures noise |
| Poor training & test performance | High training accuracy, low test accuracy |

---

## **🔹 Bias-Variance Tradeoff Visualization**
Imagine throwing darts at a dartboard 🎯:

1. **High Bias, Low Variance**  
   - Darts land far from the target **(systematic error)** but close to each other.
   - Model is **too simple** → Underfitting.

2. **Low Bias, High Variance**  
   - Darts land randomly **(large spread, no consistency)**.
   - Model is **too complex** → Overfitting.

3. **Low Bias, Low Variance (Ideal)**  
   - Darts land near the target.
   - Model captures patterns **without overfitting**.

---

## **🔹 Example with Python**
We compare **underfitting, overfitting, and the optimal model** using polynomial regression.

```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split

# Generate synthetic data
np.random.seed(42)
X = np.linspace(0, 10, 100).reshape(-1, 1)
y = 3 * X.squeeze() ** 2 + 2 * X.squeeze() + np.random.randn(100) * 10  # Quadratic relationship

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Different polynomial degrees (underfitting, best fit, overfitting)
degrees = [1, 2, 10]

plt.figure(figsize=(15, 5))
for i, d in enumerate(degrees):
    model = make_pipeline(PolynomialFeatures(d), LinearRegression())
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    
    plt.subplot(1, 3, i+1)
    plt.scatter(X_train, y_train, color="gray", label="Training Data")
    plt.scatter(X_test, y_test, color="red", label="Test Data")
    plt.plot(X, model.predict(X), color="blue", label=f"Degree {d} Fit")
    plt.legend()
    plt.title(f"Polynomial Degree {d}")

plt.show()
```

- **Degree 1 (High Bias) → Underfitting**  
- **Degree 2 (Balanced) → Best Fit**  
- **Degree 10 (High Variance) → Overfitting**  

---

## **🔹 How to Reduce Bias and Variance?**
| **Problem** | **Solution** |
|------------|-------------|
| High Bias (Underfitting) | Use more complex models (e.g., neural networks, deeper trees) |
| High Bias (Underfitting) | Add more relevant features |
| High Variance (Overfitting) | Use regularization (L1, L2) |
| High Variance (Overfitting) | Use cross-validation (K-fold) |
| High Variance (Overfitting) | Reduce model complexity |

---

## **🔹 Summary**
| **Bias** (Underfitting) | **Variance** (Overfitting) |
|-----------------|------------------|
| Too simple model | Too complex model |
| Ignores important features | Captures noise in data |
| Poor accuracy on training & test | Good training accuracy, poor test accuracy |
| Example: Linear regression on non-linear data | Example: Deep tree memorizing data |

👉 **Find the right balance for best generalization!**  



![image.png](attachment:image.png)

#### Here in the above image the blue dots are the training data and the orange dots are the testing

![image-2.png](attachment:image-2.png)

Here are two models trained on diffrenct datasets and you can see how much there is the error diffrence between them this is because of high varinace 

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

# Bulls Eye Diagram
![image-2.png](attachment:image-2.png)

## Here the smallest circle is the Truth Value and the diamond dots are predicted values it shows how accurate is your model
### If they are all together or close to each other this is called Low Variance.
    If they are close to the smallest circle its called low Bias.
    If they are far from the smallest cicle its called high Bias. 
---

![image.png](attachment:image.png)
### If they are far from each other this is called High Variance.
