## Evaluate your model


It is difficult to visualise the fit of the model when we have features greater than 2. 

Thererfore, we use train-test split to evaluate the performance of our model.

1. Fit the model into train set.
2. Use the model in test set and find the squared sum of error in the test set.

We might see, $J_{train}(\vec w, b)$ is less than $J_{test}(\vec w, b)$. This is an example of overfitting where our algorithm failed to generalise.


In **classification** setting, we evaluate by counting the number of misclassified prediction.

$\text{count}( \hat y \neq y)$

## Model Selection

We might choose one of these models:

1. $f_{\vec w, b} \vec x = W_1x +b$
2. $f_{\vec w, b} \vec x = W_1x + W_2x^2 +b$
3. $f_{\vec w, b} \vec x = W_1x + W_2x^2 + w_3x^3+b$
4. ........

We can either choose a lienar model like in $(1)$ or quadratic like in $(2)$ or higher order polynomial like in $(3)$. We might keep implementing higher order polynomial.


**One way of choosing a model might be** trying out each model (may be upto $10^{th}$ degree polynomial) and see which gives us low cost.

The issue with this is similar to over-fitting in the train set. We might see that the cost is less than the train set. This can be an overly optimistic view. 

**How to decide which degree of polynomial to use?**

> Instead of splitting the data into just *train and test set*, we will also use a **validation set**



# Overfitting and Higher-Order Polynomials

## **Model Complexity**

- A **linear model** (e.g., $$f(x) = W_1x + b$$) is simple and captures linear relationships. It may not capture the complexity of a dataset with nonlinear patterns.
- A **quadratic model** (e.g., $$f(x) = W_1x + W_2x^2 + b$$) or higher-order polynomials can capture more complex patterns.
- As the degree of the polynomial increases, the model becomes more flexible and can fit the training data more closely.

## **Overfitting**

- When we fit a very high-order polynomial (e.g., $10^{th}$-degree), the model might capture not only the true underlying pattern in the data but also the noise.
- This leads to **overfitting**, where the model performs exceptionally well on the training set but poorly on unseen data (test set). Overfitting gives a false sense of accuracy since the model essentially "memorizes" the training data.

## **Symptoms of Overfitting**

- The training error (or cost) is very low, but the test error is high.
- The model might display erratic or extreme fluctuations, especially in regions of the input space where there is little or no data.

## **Why Overfitting Happens in Higher-Order Polynomials**

When you use higher-order polynomials:

1. The model has more parameters ($W_1, W_2, \dots, W_k$), which increases its capacity to fit the data.
2. The polynomial terms (e.g., $x^3, x^4, \dots, x^{10}$) create complex curves that can pass through almost all points in the training set, regardless of whether those points represent meaningful trends or random noise.

While this reduces the cost on the training set, it does not generalize well to new data because the high-degree polynomial captures noise rather than the true signal.

## **How to Choose a Model Without Overfitting**

To mitigate overfitting and select the best model:

### **1. Use a Validation Set**

- Split the dataset into three parts: training, validation, and test sets.
- Train the model on the training set and evaluate its performance on the validation set.
- Choose the degree of the polynomial that minimizes the validation error, not just the training error.

### **2. Regularization**

- Apply techniques like $L_2$ regularization (Ridge Regression) or $L_1$ regularization (Lasso Regression) to penalize large coefficients of higher-order terms. This helps prevent overfitting by discouraging overly complex models.

### **3. Cross-Validation**

- Use k-fold cross-validation to get a more reliable estimate of model performance. This reduces the risk of choosing a model that happens to perform well on a specific validation split due to randomness.

### **4. Bias-Variance Tradeoff**

Understand the tradeoff:

- **Low-degree polynomial**: High bias, low variance (underfitting).
- **High-degree polynomial**: Low bias, high variance (overfitting).

Aim for a model that balances bias and variance to achieve the lowest error on the test set.

## **Practical Example**

Suppose we try polynomial models up to the $10^{th}$-degree:

1. For $1^{st}$-degree (linear): High training and validation errors (underfitting).
2. For $10^{th}$-degree: Very low training error but high validation error (overfitting).
3. For $3^{rd}$-degree: Training and validation errors are both reasonably low, indicating a good fit.

The $3^{rd}$-degree polynomial might be the best choice because it balances complexity and generalization.

---

By experimenting and evaluating models using these principles, you can choose a model that generalizes well without falling into the trap of overfitting.


# Train, Validation, and Test Sets in Machine Learning

In machine learning, splitting data into different sets ensures that the model is trained, tuned, and evaluated properly. The three main sets are: **Training Set**, **Validation Set**, and **Test Set**. Each serves a specific purpose:

---

## 1. Training Set
   **Purpose**:  
   - The training set is used to train the model. It teaches the model to learn patterns and relationships in the data.

   **Example**:  
   - Suppose you are building a model to predict house prices. The training set might include features such as house size, location, and number of bedrooms, along with the corresponding prices.

   **Key Point**:  
   - The model learns by minimizing error on this data using optimization algorithms like gradient descent.

---

## 2. Validation Set
   **Purpose**:  
   - The validation set is used to tune the model's hyperparameters and assess the model’s performance during training.

   **Example**:  
   - After training on the training set, you test the model on the validation set to find the optimal learning rate or regularization strength.

   **Key Point**:  
   - It helps prevent overfitting by showing how well the model generalizes to unseen data during training. 
   - Hyperparameters such as the number of hidden layers or regularization strength are tuned using the validation set.

---

## 3. Test Set
   **Purpose**:  
   - The test set is used to evaluate the model's final performance after all training and tuning are complete.

   **Example**:  
   - Once you have finalized the model and hyperparameters, you assess the model's performance on the test set, which it has never seen before.

   **Key Point**:  
   - It provides an unbiased estimate of the model's ability to generalize to new, unseen data.

---

## Analogy: Preparing for an Exam

- **Training Set**: The notes and exercises you use to study and practice.
- **Validation Set**: Mock exams you take to test your preparation and adjust your study strategy.
- **Test Set**: The actual exam where your final understanding is assessed.

---

## Example Workflow

Suppose you have 10,000 data points:

1. **Training set**: 70% (7,000 points) used to train the model.
2. **Validation set**: 20% (2,000 points) used to tune hyperparameters.
3. **Test set**: 10% (1,000 points) used to assess the model's final performance.

---

## Why All Three Are Necessary

1. **Without a Test Set**: You cannot objectively measure the performance of the model.
2. **Without a Validation Set**: The model may be overfitted during hyperparameter tuning.
3. **Without a Training Set**: The model cannot learn from the data in the first place.

This division ensures the model generalizes well to unseen data and avoids common issues like overfitting or underfitting.