

**Error and Its Types**

Let the error be defined as:

$$
\text{ERROR} = y_i - \hat{y}_i
$$

There are multiple types of error:

---

**1. Sum of Squared Error (SSE)**

$$
\text{SSE} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

---

**2. Mean Squared Error (MSE)**

$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

---

**3. Mean Absolute Error (MAE)**

$$
\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right|
$$

---

**4. Root Mean Squared Error (RMSE)**

$$
\text{RMSE} = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 } = \sqrt{\text{MSE}}
$$

### Method 1: Ordinary Least Squares Method (OLS)

**OLS: The Quickest Way to Coefficients**

We first use the **Ordinary Least Squares (OLS)** method to calculate $\beta_0$ and $\beta_1$.  
The formulas are:

$$
\beta_1 = \frac{ \sum (x_i - \bar{x})(y_i - \bar{y}) }{ \sum (x_i - \bar{x})^2 }
$$

$$
\beta_0 = \bar{y} - \beta_1 \cdot \bar{x}
$$

Where:

- $x_i$: Experience  
- $y_i$: Bonus  
- $\bar{x}$: Average experience  
- $\bar{y}$: Average bonus  

OLS is like taking a direct route.  
Let’s switch gears and explore **Gradient Descent**, where we **iteratively refine** our guesses for $\beta_0$ and $\beta_1$.


### Method 2: Gradient Descent
![gradientDescent.png](attachment:23259d79-ab80-48de-89d2-73bacff5176a.png)


#### **Step 1: Start with Initial Guesses**

We start by guessing the coefficients:

$$
\beta_0 = 0,\quad \beta_1 = 0
$$

At this stage, our formula predicts bonuses as zero for everyone.

---

#### **Step 2: Define the Loss Function**

The loss function tells us how far off our predictions are.  
We use the **Mean Squared Error (MSE)**:

$$
J(\beta_0, \beta_1) = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

Where:

- $y_i$: Actual bonus  
- $\hat{y}_i = \beta_0 + \beta_1 \cdot x_i$: Predicted bonus  
- $n$: Number of employees

---

#### **Step 3: Compute the Gradients**

To minimize the loss, we calculate the gradients—how much the loss changes with $\beta_0$ and $\beta_1$:

$$
\frac{\partial J}{\partial \beta_0} = -\frac{2}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)
$$

$$
\frac{\partial J}{\partial \beta_1} = -\frac{2}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i) \cdot x_i
$$

The gradients act like a compass, guiding us down the hill.

---

#### **Step 4: Update the Coefficients**

Using the gradients, we adjust $\beta_0$ and $\beta_1$:

$$
\beta_j = \beta_j - \alpha \cdot \frac{\partial J}{\partial \beta_j}
$$

Where:

- $\alpha$: The **learning rate**, controlling the size of each step
