# Key Terminologies in Model Training

When training a model practically (e.g., using Python and scikit-learn), you will encounter the following terms:

---

## 1. Dataset Splitting

### 1.1 Training and Test Data

- **Training data**: Used to train the model.  
- **Test data**: Used to evaluate the model on unseen data.  

**Example:**  
- Total dataset: 1000 records  
- Split ratio: 70% training, 30% testing  
- Training set: 700 records  
- Test set: 300 records  

**Purpose:**  
- Ensures the model **generalizes well** to new data.  
- The model should **never see the test data during training**.

---

### 1.2 Train and Validation Split (Within Training Set)

- Further split training data into:
  1. **Train subset** – used to fit the model.  
  2. **Validation subset** – used for **hyperparameter tuning**.  

- **Goal:** Improve model performance before final evaluation on the test set.

---

## 2. Overfitting vs. Underfitting

### 2.1 Generalized Model

- High accuracy on **both training and test sets**.  
- Example: Training accuracy = 90%, Test accuracy = 85%  
- **Bias**: Low  
- **Variance**: Low  
- **Ideal scenario**

---

### 2.2 Overfitting

- Model performs **very well on training data** but **poorly on test data**.  
- Example: Training accuracy = 90%, Test accuracy = 50%  
- **Reason:** Model memorizes training data, fails to generalize.  
- **Bias**: Low  
- **Variance**: High  

---

### 2.3 Underfitting

- Model performs **poorly on both training and test data**.  
- Example: Training accuracy = 50%, Test accuracy = 45%  
- **Reason:** Model is too simple or not trained enough.  
- **Bias**: High  
- **Variance**: High  

---

## 3. Bias and Variance

- **Bias**: Error due to overly simplistic assumptions in the model.  
  - High bias → model underfits.  
- **Variance**: Error due to model being too sensitive to training data.  
  - High variance → model overfits.  

**Bias-Variance Tradeoff:**  
- Goal is to **minimize both** and achieve a generalized model.

---

## 4. Visualization (Intuition)

1. **Generalized Model**


Training Data Points

Best-fit line passes near all points
→ low bias, low variance


2. **Overfitting**

Training Data Points

Best-fit line twists to fit every point
→ low bias, high variance


3. **Underfitting**

Training Data Points

Best-fit line is too simple
→ high bias, high variance


---

## 5. Practical Notes

- **Validation set**: Used to tune hyperparameters (e.g., learning rate, regularization).  
- **Test set**: Used only at the end to check **generalization performance**.  
- **Evaluation metrics**: R² and Adjusted R² help detect overfitting or underfitting.

---

## Summary Table

| Scenario           | Train Accuracy | Test Accuracy | Bias       | Variance     |
|-------------------|---------------|---------------|------------|--------------|
| Generalized Model | High          | High          | Low        | Low          |
| Overfitting       | High          | Low           | Low        | High         |
| Underfitting      | Low           | Low           | High       | High         |


# Linear Regression using Ordinary Least Squares (OLS)

We want to fit a **simple linear regression model**:

$$
\hat{y}_i = \beta_0 + \beta_1 x_i
$$

where:  
- $\hat{y}_i$ = predicted value of $y_i$  
- $x_i$ = independent variable  
- $\beta_0$ = intercept  
- $\beta_1$ = slope (coefficient)  

---

## Step 1: Define the OLS objective

OLS aims to **minimize the sum of squared errors (SSE)** between the actual and predicted values:

$$
S(\beta_0, \beta_1) = \frac{1}{n} \sum_{i=1}^n \left(y_i - (\beta_0 + \beta_1 x_i)\right)^2
$$

---

## Step 2: Take partial derivatives

To minimize $S(\beta_0, \beta_1)$, take the derivatives w.r.t. $\beta_0$ and $\beta_1$, and set them equal to zero.

### Derivative w.r.t. $\beta_0$:

$$
\frac{\partial S}{\partial \beta_0} = -\frac{2}{n} \sum_{i=1}^n \left( y_i - \beta_0 - \beta_1 x_i \right) = 0
$$

Simplify:

$$
\sum_{i=1}^n \left( y_i - \beta_0 - \beta_1 x_i \right) = 0
$$

$$
n\beta_0 + \beta_1 \sum_{i=1}^n x_i = \sum_{i=1}^n y_i
$$

$$
\beta_0 = \bar{y} - \beta_1 \bar{x} \quad \text{where} \quad \bar{y} = \frac{1}{n}\sum y_i, \quad \bar{x} = \frac{1}{n}\sum x_i
$$

---

### Derivative w.r.t. $\beta_1$:

$$
\frac{\partial S}{\partial \beta_1} = -\frac{2}{n} \sum_{i=1}^n x_i \left( y_i - \beta_0 - \beta_1 x_i \right) = 0
$$

Simplify:

$$
\sum_{i=1}^n x_i (y_i - \beta_0 - \beta_1 x_i) = 0
$$

$$
\sum_{i=1}^n x_i y_i - \beta_0 \sum_{i=1}^n x_i - \beta_1 \sum_{i=1}^n x_i^2 = 0
$$

Substitute $\beta_0 = \bar{y} - \beta_1 \bar{x}$:

$$
\sum_{i=1}^n x_i y_i - (\bar{y} - \beta_1 \bar{x}) \sum_{i=1}^n x_i - \beta_1 \sum_{i=1}^n x_i^2 = 0
$$

Simplify further:

$$
\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y}) = \beta_1 \sum_{i=1}^n (x_i - \bar{x})^2
$$

---

## Step 3: Solve for $\beta_1$ and $\beta_0$

$$
\beta_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}
$$

$$
\beta_0 = \bar{y} - \beta_1 \bar{x}
$$

---

## Step 4: Summary

1. Compute the mean of $x$ and $y$: $\bar{x}, \bar{y}$  
2. Compute the slope (coefficient):

$$
\beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}
$$

3. Compute the intercept:

$$
\beta_0 = \bar{y} - \beta_1 \bar{x}
$$

4. The predicted line:

$$
\hat{y} = \beta_0 + \beta_1 x
$$

---

✅ This derivation can be directly implemented in code, e.g., using **NumPy** or **sklearn**, and gives the same result as fitting a linear regression model via gradient descent.
