# Chapter 3: Walking the Gradient
## Our Algorithm Doesn't Cut It

- Most problems are too complicated to be solved by a simple line with 2 parameters.
- Adding more parameters to our train() function would kill its performance
- In each iteration, the train() algorithm tweaks either w or b, but this can go wrong (tweaking w might increase the loss caused by b). 
    - To avoid this problem, we should tweak w and b at the same time. But, if there are many parameters (we could have hundreds of thousands of parameters), that would cause too many combinations to use this algorithm.
- Also, using a small lr gives us high precision at the cost of too much time: we need both speed and precision

## Gradient Descent

We want a better train() algorithm. The point of the function is to find the parameters that give you the lowest loss. 

Imagine the loss curve (page 34). We want to find w that gives us the lowest point of the curve. To code this, we need to find the slope (gradient) of the curve. 

Need to find "the derivative of the loss with respect to the weight", or ∂L/∂w.

If the hiker is on the left, the derivative is negative, because the loss decreases as w increases

If the hiker is on the right, the derivative is positive, because the loss increases as w increases.

At the bottom of the curve, the curve is level and the derivative is zero.

You have to walk in the opposite direction of the derivative to approach the minimum. So, if derivative is negative, you have to walk in the positive direction. 

Also, the size of the step should be proportional to derivative: if the derivative is a big number (either positive or negative), the curve is steep and the basecamp is far away. So, you can take big steps. As you approach the minimum, the derivative becomes smaller, and so do your steps.

### A Sprinkle of Math

\begin{equation*}
L = {1/m}( \sum_{i=1}^m ((wx_i + b) - y_i)^2)
\end{equation*}



## What You Just Learned

## Hands On: Basecamp Overshooting