# Overfitting and Underfitting

## Linear Regression Example

![under_over](overfitting_underfitting_lin_reg_eg.png)

### Overfitting vs. Underfitting

- **Underfitting (High Bias)**  
  - The algorithm is unable to fit the training data well.  
  - A clear pattern exists in the training data, but the model cannot capture it.  
  - Example: Linear regression applied when the true relationship is polynomial.  
  - Cause: Too few features, or the model is too simple.  
  - Interpretation: The algorithm has a **bias** that the function is linear, even though the data suggests otherwise.

- **Overfitting (High Variance)**  
  - The algorithm fits the training data *too well* (even achieving zero cost).  
  - The curve is too complex, wiggling to pass through all points—including noise.  
  - This results in poor generalization to new examples.  
  - Cause: Too many features, or an overly flexible model.  
  - Another term: **High Variance**  
    - The model adapts too strongly to training data variations.  
    - Small changes in the training data lead to very different final predictions.
    
- **Good Generalization (middle)**  
  - The learning algorithm balances bias and variance.  
  - It makes accurate predictions on both training and unseen data.  
  - Example: Middle graph shows a well-generalized model.

---

### Key Point  
The aim of Machine Learning is to find a model that:  
- **Does not underfit** (low bias)  
- **Does not overfit** (low variance)  
- **Generalizes well** to new data

---

## Logistic Regression / Classification Example

![log_under_over](overfitting_underfitting_logistic_reg_eg.png)


# Addressing Over or Under Fitting


## Addressing Overfitting

There are several strategies to reduce overfitting and improve generalisation:

1. **Collect More Training Data**  
   - More diverse and representative data helps the model learn general patterns.

---

2. **Use Fewer Features**

   - **Feature Selection**  
     - Select only the most relevant features (possibly using intuition or statistical methods).  
     - This effectively sets less important features to zero.  
     - **Disadvantage:** You might discard features that are actually useful.

   - **Use Fewer Polynomial Features**  
     - Reducing the degree of polynomial features prevents the model from becoming too complex and wiggly.  
     - This lowers variance and improves generalisation.

---

3. **Regularisation**  

   - Instead of removing features entirely, **regularisation** reduces the impact of some features.  
   - It encourages the algorithm to **restrict parameter values** (*weights* $w_{j}$), but without forcing them to be exactly zero (as in feature selection).  
   - This means we **keep all features** but limit the effect of some.  
   - In practice, we usually regularise the $w_{j}$ parameters rather than $b$.  
   - Regularising $b$ typically makes little difference, but the main focus is on controlling the weights.


![regularisation](regularisation.png)