We are looking at how to compute the derivatives of the **cost function** for linear regression. The cost function is:

$$
J(m, c) = \sum_{i=1}^n \left( y_i - (mx_i + c) \right)^2
$$


### What are we trying to do?

We are trying to **minimize** the total error between our predicted line $\hat{y}_i = mx_i + c$ and the actual data $y_i$. The total error is the **sum of squared differences** (residuals).

To do this, we want to find values of $m$ (slope) and $c$ (intercept) such that this error is the smallest. In math, we do this by computing the **derivatives**‚Äîwhich tell us the direction in which the function is increasing or decreasing.



### Derivative Intuition

Imagine you are walking on a hilly surface (the graph of the function). The **derivative** tells you the slope of the hill at your feet. If the slope is steeply upwards, go the other way! If it's steeply downwards, keep going‚Äîit means you're minimizing the function.


### Let's write the cost function again:

$$
J(m, c) = \sum_{i=1}^n \left( y_i - (mx_i + c) \right)^2
$$

Let‚Äôs simplify this inner expression a bit. For each data point:

$$
\text{error}_i = y_i - (mx_i + c)
$$

$$
\text{error}_i^2 = \left( y_i - (mx_i + c) \right)^2
$$

So our goal is to **adjust** $m$ and $c$ to reduce the total squared error.

---

## Derivative with respect to **m** (slope)

We're going to ask: ‚ÄúHow does the error change if we nudge the slope $m$ a little bit?‚Äù

Let‚Äôs derive $\frac{\partial J}{\partial m}$:

$$
\frac{\partial J}{\partial m} = \sum_{i=1}^n 2 \cdot \left( y_i - (mx_i + c) \right) \cdot (-x_i)
$$

Why this form?

* $2 \cdot \text{error}$: Comes from squaring a value (recall: $d/dx[x^2] = 2x$)
* $-x_i$: Comes from chain rule; you're changing $m$, and the inside has $-mx_i$, which gives derivative $-x_i$

So, putting it all together:

$$
\frac{\partial J}{\partial m} = -2 \sum_{i=1}^n x_i \cdot \left( y_i - (mx_i + c) \right)
$$

---

## üßÆ Derivative with respect to **c** (intercept)

Same logic, but now we ask: ‚ÄúWhat if we nudge the intercept $c$?‚Äù

$$
\frac{\partial J}{\partial c} = \sum_{i=1}^n 2 \cdot \left( y_i - (mx_i + c) \right) \cdot (-1)
$$

$$
\frac{\partial J}{\partial c} = -2 \sum_{i=1}^n \left( y_i - (mx_i + c) \right)
$$

---

### ‚úÖ Final Derivatives

$$
\frac{\partial J}{\partial m} = -2 \sum_{i=1}^n x_i \cdot \left( y_i - (mx_i + c) \right)
$$

$$
\frac{\partial J}{\partial c} = -2 \sum_{i=1}^n \left( y_i - (mx_i + c) \right)
$$

---

### üîÅ What do you do with these?

You use them in **gradient descent**:

* Repeatedly update:

  $$
  m := m - \alpha \cdot \frac{\partial J}{\partial m}
  $$

  $$
  c := c - \alpha \cdot \frac{\partial J}{\partial c}
  $$
* Where $\alpha$ is the **learning rate** (a small number like 0.01)

---

Would you like a visual or Python code to compute this using example data?
