### A statistics example

Consider we run an experiment and it gives some data that is of the form $$
y = m x
$$. That's we measure $x$ and $y$, and we know there is some linear relationship between $x$ and $y$, but we don't know the value of $m$.

We are given some data points, but those data points are noisy. They don't fit perfectly on the line.

![](noisy_linear_data.png)

To determine the appropriate value of $m$, we can transform the original problem into an optimization problem.

Let's introduce a method called **Least Squares**. Let $S(m)$ be the *sum of squares of distance* between $y_i$ and $m x_i$ for a given $m$, then $$
S(m) = \sum_{i} (y_i - m x_i)^2
$$. Then we need to find the $m$ that minimize $S(m)$.

By computing the derivative of $S$ respect to $m$, $$
\begin{equation}
\begin{aligned}
\frac{d S}{d m} &= \sum_{i} - 2 x_i (y_i - m x_i) \\
&= -2 \sum_{i} x_i y_i + 2 m \sum_{i} x_i^2
\end{aligned}
\end{equation}
$$. The next step is to find the critical points, where
$$
-2 \sum_{i} x_i y_i + 2 m \sum_{i} x_i^2 = 0
$$

$$
2 \sum_{i} x_i y_i = 2 m \sum_{i} x_i^2 
$$

$$
m = \frac{\sum_{i} x_i y_i}{\sum_{i} x_i^2}
$$ So is that a local minimum or a local maximum? Let's compute the $2{nd}$ order derivative. $$
\begin{equation}
\begin{aligned}
\frac{d^2 S}{d m^2} &= 2 \sum_{i} x_i^2 > 0
\end{aligned}
\end{equation}
$$. Hence the critical point is the minimum.

### A more general form of Linear Regression: $y = m x + b$

To extend the application of the least squares method from linear relationship to affine relationship $y = m x + b$, we can form the problem as finding the minimum for $$
S(m, b) = \sum_{i} (y_i - (m x_i + b))^2
$$. And this is very interesting because we don't know yet about finding the derivative of a function with more than one input.

We will figure multivariable stuff out in a different course. With spoilers, we know that will be useful for a bunch of applications optimizing mulrivariable functions:
- Game Theory
- Linear Programming
- Machine Learning