<a href="https://colab.research.google.com/github/rahiakela/grokking-deep-learning/blob/4-gradient-descent/gradient_descent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# introduction to neural learning: gradient descent

## Predict, compare, and learn

We learned about the paradigm “predict, compare, learn,” and we dove
deep into the first step: **predict**. So now we cover the next two steps of the paradigm: **compare and learn**.

### Compare

**Comparing gives a measurement of how much a prediction
“missed” by.**

Once you’ve made a prediction, the next step is to evaluate how well you did. This may
seem like a simple concept, but you’ll find that coming up with a good way to measure
error is one of the most important and complicated subjects of deep learning.

You’ll also learn that error is always positive! We’ll consider the analogy of an
archer hitting a target: whether the shot is too low by an inch or too high by an inch, the
error is still just 1 inch. 

In the neural network compare step, you need to consider these
kinds of properties when measuring error.

we evaluate only one simple way of measuring error: mean
squared error. It’s but one of many ways to evaluate the accuracy of a neural network.



### Learn
**Learning tells each weight how it can change to reduce the error.**

Learning is all about error attribution, or the art of figuring out how each weight played its part in creating error. It’s the blame game of deep learning.

So we’ll spend times looking at the most popular version of the deep learning blame game:**gradient descent**.

At the end of the day, it results in computing a number for each weight. That number
represents how that weight should be higher or lower in order to reduce the error. Then
you’ll move the weight according to that number, and you’ll be finished.

## Compare: Does your network make good predictions?
**Let’s measure the error and find out!**

<img src='https://github.com/rahiakela/img-repo/blob/master/measure-error-1.JPG?raw=1' width='800'/>

In [2]:
knob_weight = 0.5
input = 0.5
goal_pred = 0.8

pred = input * knob_weight

error = (pred - goal_pred) ** 2
print(error)

0.30250000000000005


## Why measure error?
**Measuring error simplifies the problem.**

The goal of training a neural network is to make correct predictions. That’s what you want.
And in the most pragmatic world you want the
network to take input that you can easily calculate (today’s stock price) and predict things that
are hard to calculate (tomorrow’s stock price). That’s what makes a neural network useful.

It turns out that changing knob_weight to make the network correctly predict
goal_prediction is slightly more complicated than changing knob_weight to make
error == 0. There’s something more concise about looking at the problem this way.

In [11]:
knob_weight = 0.9
input = 0.5
goal_pred = 0.8

pred = input * knob_weight

error = (pred - goal_pred) ** 2
print(error)

0.12250000000000003


**Different ways of measuring error prioritize error differently.**

By squaring the error, numbers that are less than 1 get smaller, whereas numbers that are greater than 1 get bigger. You’re going to change what I call pure error (pred - goal_pred) so that bigger errors become very big and smaller errors quickly become irrelevant.

By measuring error this way, you can prioritize big errors over smaller ones. When you have
somewhat large pure errors (say, 10), you’ll tell yourself that you have very large error $$(10**2 == 100)$$ and in contrast, when you have small pure errors (say, 0.01), you’ll tell yourself that you
have very small error $$(0.01**2 == 0.0001)$$ See what I mean about prioritizing? It’s just modifying
what you consider to be error so that you amplify big ones and largely ignore small ones.

In contrast, if you took the absolute value instead of squaring the error, you wouldn’t have this
type of prioritization. The error would just be the positive version of the pure error—which
would be fine, but different.

**Why do you want only positive error?**

Eventually, you’ll be working with millions of input -> goal_prediction pairs, and we’ll
still want to make accurate predictions. So, you’ll try to take the average error down to 0.

This presents a problem if the error can be positive and negative.

Imagine if you were
trying to get the neural network to correctly predict two datapoints—two input ->
goal_prediction pairs. 

If the first had an error of 1,000 and the second had an error of
–1,000, then the average error would be zero! 

You’d fool yourself into thinking you predicted
perfectly, when you missed by 1,000 each time! That would be really bad. 

Thus, you want the
error of each prediction to always be positive so they don’t accidentally cancel each other out
when you average them.

In [0]:
2