### **Mathematical calculation of gradient descent for linear regression using the specified cost function.**

### 1. Linear Regression Model Equation

The linear regression model is given by:
$$ \hat{y}^{(i)} = w \cdot x^{(i)} + b $$
where:

- $ \hat{y}^{(i)} $ is the predicted output for the $ i $-th training example.
- $ x^{(i)} $ is the input feature for the $ i $-th training example.
- $ w $ is the weight (or slope).
- $ b $ is the bias (or intercept).

### 2. Cost Function

The cost function (mean squared error) measures how well the model fits the data:
$$ J(w, b) = \frac{1}{2m} \sum\_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)})^2 $$
where:

- $ m $ is the number of training examples.
- $ y^{(i)} $ is the actual output for the $ i $-th training example.
- $ \hat{y}^{(i)} = w \cdot x^{(i)} + b $ is the predicted output for the $ i $-th training example.

### 3. Calculation of Gradient Descent Equation

#### Step 1: Compute the Partial Derivatives

We need to compute the partial derivatives of the cost function with respect to $ w $ and $ b $.

**Partial derivative with respect to $ w $:**
$$ \frac{\partial J}{\partial w} = \frac{\partial}{\partial w} \left[ \frac{1}{2m} \sum_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)})^2 \right] $$

Using the chain rule:
$$ \frac{\partial J}{\partial w} = \frac{1}{m} \sum*{i=1}^{m} (\hat{y}^{(i)} - y^{(i)}) \cdot \frac{\partial \hat{y}^{(i)}}{\partial w} $$
Since $ \hat{y}^{(i)} = w \cdot x^{(i)} + b $, we have:
$$ \frac{\partial \hat{y}^{(i)}}{\partial w} = x^{(i)} $$
Thus:
$$ \frac{\partial J}{\partial w} = \frac{1}{m} \sum*{i=1}^{m} (\hat{y}^{(i)} - y^{(i)}) \cdot x^{(i)} $$

**Partial derivative with respect to $ b $:**
$$ \frac{\partial J}{\partial b} = \frac{\partial}{\partial b} \left[ \frac{1}{2m} \sum_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)})^2 \right] $$

Using the chain rule:
$$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum*{i=1}^{m} (\hat{y}^{(i)} - y^{(i)}) \cdot \frac{\partial \hat{y}^{(i)}}{\partial b} $$
Since $ \hat{y}^{(i)} = w \cdot x^{(i)} + b $, we have:
$$ \frac{\partial \hat{y}^{(i)}}{\partial b} = 1 $$
Thus:
$$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum*{i=1}^{m} (\hat{y}^{(i)} - y^{(i)}) $$

#### Step 2: Update the Parameters

The parameters $ w $ and $ b $ are updated using the following equations:
$$ w := w - \alpha \cdot \frac{\partial J}{\partial w} $$
$$ b := b - \alpha \cdot \frac{\partial J}{\partial b} $$
where $ \alpha $ is the learning rate, a hyperparameter that determines the step size at each iteration.

Substituting the partial derivatives:
$$ w := w - \alpha \cdot \frac{1}{m} \sum*{i=1}^{m} (\hat{y}^{(i)} - y^{(i)}) \cdot x^{(i)} $$
$$ b := b - \alpha \cdot \frac{1}{m} \sum*{i=1}^{m} (\hat{y}^{(i)} - y^{(i)}) $$

### Summary

To summarize, the gradient descent algorithm for linear regression involves:

1. Computing the partial derivatives of the cost function with respect to the parameters $ w $ and $ b $.
2. Updating the parameters $ w $ and $ b $ iteratively using the computed gradients and a chosen learning rate $ \alpha $.


---

### **Break down of the calculation**

#### Expression:

$$\frac{\partial}{\partial w} \left[ \frac{1}{2m} \sum_{i=1}^{m} (w \cdot x^{(i)} + b - y^{(i)})^2 \right] $$

#### Step-by-Step Breakdown:

1. **Outer Function:**

   - The outer function is a sum of squared terms divided by $ 2m $.
   - We need to differentiate this with respect to $ w $.

2. **Inner Function:**

   - The inner function is $ (w \cdot x^{(i)} + b - y^{(i)})^2 $.
   - When differentiating this with respect to $ w $, we use the chain rule.

3. **Chain Rule Application:**

   - Differentiate the square term:
     $$\frac{\partial}{\partial w} (w \cdot x^{(i)} + b - y^{(i)})^2 = 2(w \cdot x^{(i)} + b - y^{(i)}) \cdot \frac{\partial}{\partial w} (w \cdot x^{(i)} + b - y^{(i)}) $$
   - The derivative of $ w \cdot x^{(i)} + b - y^{(i)} $ with respect to $ w $ is $ x^{(i)} $.

4. **Combining Results:**
   - Putting it all together:
     $$\frac{\partial}{\partial w} \left[ \frac{1}{2m} \sum_{i=1}^{m} (w \cdot x^{(i)} + b - y^{(i)})^2 \right] = \frac{1}{2m} \sum_{i=1}^{m} 2(w \cdot x^{(i)} + b - y^{(i)}) \cdot x^{(i)} $$
   - Simplifying:
     $$= \frac{1}{m} \sum_{i=1}^{m} (w \cdot x^{(i)} + b - y^{(i)}) \cdot x^{(i)} $$
