<a href="https://colab.research.google.com/github/rahiakela/grokking-deep-learning/blob/4-gradient-descent/gradient_descent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# introduction to neural learning: gradient descent

## Predict, compare, and learn

We learned about the paradigm “predict, compare, learn,” and we dove
deep into the first step: **predict**. So now we cover the next two steps of the paradigm: **compare and learn**.

### Compare

**Comparing gives a measurement of how much a prediction
“missed” by.**

Once you’ve made a prediction, the next step is to evaluate how well you did. This may
seem like a simple concept, but you’ll find that coming up with a good way to measure
error is one of the most important and complicated subjects of deep learning.

You’ll also learn that error is always positive! We’ll consider the analogy of an
archer hitting a target: whether the shot is too low by an inch or too high by an inch, the
error is still just 1 inch. 

In the neural network compare step, you need to consider these
kinds of properties when measuring error.

we evaluate only one simple way of measuring error: mean
squared error. It’s but one of many ways to evaluate the accuracy of a neural network.



### Learn
**Learning tells each weight how it can change to reduce the error.**

Learning is all about error attribution, or the art of figuring out how each weight played its part in creating error. It’s the blame game of deep learning.

So we’ll spend times looking at the most popular version of the deep learning blame game:**gradient descent**.

At the end of the day, it results in computing a number for each weight. That number
represents how that weight should be higher or lower in order to reduce the error. Then
you’ll move the weight according to that number, and you’ll be finished.

## Compare: Does your network make good predictions?
**Let’s measure the error and find out!**

<img src='https://github.com/rahiakela/img-repo/blob/master/measure-error-1.JPG?raw=1' width='800'/>

In [0]:
knob_weight = 0.5
input = 0.5
goal_pred = 0.8

pred = input * knob_weight

error = (pred - goal_pred) ** 2
print(error)

0.30250000000000005


## Why measure error?
**Measuring error simplifies the problem.**

The goal of training a neural network is to make correct predictions. That’s what you want.
And in the most pragmatic world you want the
network to take input that you can easily calculate (today’s stock price) and predict things that
are hard to calculate (tomorrow’s stock price). That’s what makes a neural network useful.

It turns out that changing knob_weight to make the network correctly predict
goal_prediction is slightly more complicated than changing knob_weight to make
error == 0. There’s something more concise about looking at the problem this way.

In [0]:
knob_weight = 0.9
input = 0.5
goal_pred = 0.8

pred = input * knob_weight

error = (pred - goal_pred) ** 2
print(error)

0.12250000000000003


**Different ways of measuring error prioritize error differently.**

By squaring the error, numbers that are less than 1 get smaller, whereas numbers that are greater than 1 get bigger. You’re going to change what I call pure error (pred - goal_pred) so that bigger errors become very big and smaller errors quickly become irrelevant.

By measuring error this way, you can prioritize big errors over smaller ones. When you have
somewhat large pure errors (say, 10), you’ll tell yourself that you have very large error $$(10**2 == 100)$$ and in contrast, when you have small pure errors (say, 0.01), you’ll tell yourself that you
have very small error $$(0.01**2 == 0.0001)$$ See what I mean about prioritizing? It’s just modifying
what you consider to be error so that you amplify big ones and largely ignore small ones.

In contrast, if you took the absolute value instead of squaring the error, you wouldn’t have this
type of prioritization. The error would just be the positive version of the pure error—which
would be fine, but different.

**Why do you want only positive error?**

Eventually, you’ll be working with millions of input -> goal_prediction pairs, and we’ll
still want to make accurate predictions. So, you’ll try to take the average error down to 0.

This presents a problem if the error can be positive and negative.

Imagine if you were
trying to get the neural network to correctly predict two datapoints—two input ->
goal_prediction pairs. 

If the first had an error of 1,000 and the second had an error of
–1,000, then the average error would be zero! 

You’d fool yourself into thinking you predicted
perfectly, when you missed by 1,000 each time! That would be really bad. 

Thus, you want the
error of each prediction to always be positive so they don’t accidentally cancel each other out
when you average them.

## What’s the simplest form of neural learning?

**Learning using the hot and cold method.**

At the end of the day, learning is really about one thing: adjusting knob_weight either up
or down so the error is reduced. If you keep doing this and the error goes to 0, you’re done
learning! How do you know whether to turn the knob up or down? Well, you try both up and
down and see which one reduces the error! Whichever one reduces the error is used to update
knob_weight. It’s simple but effective. After you do this over and over again, eventually
error == 0, which means the neural network is predicting with perfect accuracy.

### Hot and cold learning

Hot and cold learning means wiggling the weights to see which direction reduces the error
the most, moving the weights in that direction, and repeating until the error gets to 0.

### An empty network

<img src='https://github.com/rahiakela/img-repo/blob/master/hot-and-cold-learning-1.JPG?raw=1' width='800'/>


In [0]:
weight = 0.1
lr = 0.01

def neural_network(input, weight):
  prediction = input * weight
  return prediction

### PREDICT: Making a prediction and evaluating error

<img src='https://github.com/rahiakela/img-repo/blob/master/hot-and-cold-learning-2.JPG?raw=1' width='800'/>

In [2]:
number_of_toes = [8.5]
win_or_lose_binary = [1] # (won!!!)

input = number_of_toes[0]
true = win_or_lose_binary[0]

pred = neural_network(input, weight)

error = (pred - true) ** 2
print(error)

0.022499999999999975


### COMPARE: Making a prediction with a higher weight and evaluating error

<img src='https://github.com/rahiakela/img-repo/blob/master/hot-and-cold-learning-3.JPG?raw=1' width='800'/>

In [3]:
lr = 0.1 # higher

pred_up = neural_network(input, weight + lr)

err_up = (pred_up - true) ** 2
print(err_up)

0.49000000000000027


In [4]:
lr = 0.01  # lower

pred_down = neural_network(input, weight - lr)

err_down = (pred_down - true) ** 2
print(err_down)

0.05522499999999994


### COMPARE + LEARN: Comparing the errors and setting the new weight

<img src='https://github.com/rahiakela/img-repo/blob/master/hot-and-cold-learning-4.JPG?raw=1' width='800'/>

In [0]:
if (error > err_down or error > err_up):
  if ( err_down < err_up):
    weight -= lr
  if (err_up < err_up):
    weight += lr  

This reveals what learning in neural networks really is: a search problem. You’re searching
for the best possible configuration of weights so the network’s error falls to 0 (and predicts
perfectly). As with all other forms of search, you might not find exactly what you’re looking
for, and even if you do, it may take some time.

## Hot and cold learning
**This is perhaps the simplest form of learning.**

#### Complete Code

In [8]:
weight = 0.5
input = 0.5

goal_prediction = 0.8

step_amount = 0.001  # How much to move the weights each iteration

# Repeat learning many times so the error can keep getting smaller.
for iteration in range(1101):
  prediction = input * weight
  error = (prediction - goal_prediction) ** 2

  print(f'Error: {str(error)}\t\tPrediction: {str(prediction)}')

  up_prediction = input * (weight + step_amount)   # try up!
  up_error = (goal_prediction - up_prediction) ** 2

  down_prediction = input * (weight - step_amount)  # try down!
  down_error = (goal_prediction - down_prediction) ** 2

  if (down_error < up_error):
    weight = weight - step_amount    # If down is better, go down!
  if (down_error > up_error):
    weight = weight + step_amount    # If up is better, go up!

Error: 0.30250000000000005		Prediction: 0.25
Error: 0.3019502500000001		Prediction: 0.2505
Error: 0.30140100000000003		Prediction: 0.251
Error: 0.30085225		Prediction: 0.2515
Error: 0.30030400000000007		Prediction: 0.252
Error: 0.2997562500000001		Prediction: 0.2525
Error: 0.29920900000000006		Prediction: 0.253
Error: 0.29866224999999996		Prediction: 0.2535
Error: 0.29811600000000005		Prediction: 0.254
Error: 0.2975702500000001		Prediction: 0.2545
Error: 0.29702500000000004		Prediction: 0.255
Error: 0.29648025		Prediction: 0.2555
Error: 0.29593600000000003		Prediction: 0.256
Error: 0.2953922500000001		Prediction: 0.2565
Error: 0.294849		Prediction: 0.257
Error: 0.29430625		Prediction: 0.2575
Error: 0.293764		Prediction: 0.258
Error: 0.2932222500000001		Prediction: 0.2585
Error: 0.292681		Prediction: 0.259
Error: 0.29214025		Prediction: 0.2595
Error: 0.2916		Prediction: 0.26
Error: 0.2910602500000001		Prediction: 0.2605
Error: 0.29052100000000003		Prediction: 0.261
Error: 0.28998225		Pr

### Characteristics of hot and cold learning
**It’s simple.**

Hot and cold learning is simple. After making a prediction, you predict two more times, once with a
slightly higher weight and again with a slightly lower weight. You then move weight depending on
which direction gave a smaller error. Repeating this enough times eventually reduces error to 0.

####Problem 1: It’s inefficient.


You have to predict multiple times to make a single knob_weight update. This seems very inefficient.

#### Problem 2: Sometimes it’s impossible to predict the exact goal prediction.

With a set step_amount, unless the perfect weight is exactly n*step_amount away, the network
will eventually overshoot by some number less than step_amount. When it does, it will then
start alternating back and forth between each side of goal_prediction. Set step_amount to 0.2
to see this in action. If you set step_amount to 10, you’ll really break it. When I try this, I see the
following output.

It never remotely comes close to 0.8!

In [9]:
weight = 0.5
input = 0.5

goal_prediction = 0.8

step_amount = 0.2  # How much to move the weights each iteration

# Repeat learning many times so the error can keep getting smaller.
for iteration in range(1101):
  prediction = input * weight
  error = (prediction - goal_prediction) ** 2

  print(f'Error: {str(error)}\t\tPrediction: {str(prediction)}')

  up_prediction = input * (weight + step_amount)   # try up!
  up_error = (goal_prediction - up_prediction) ** 2

  down_prediction = input * (weight - step_amount)  # try down!
  down_error = (goal_prediction - down_prediction) ** 2

  if (down_error < up_error):
    weight = weight - step_amount    # If down is better, go down!
  if (down_error > up_error):
    weight = weight + step_amount    # If up is better, go up!

Error: 0.30250000000000005		Prediction: 0.25
Error: 0.20250000000000007		Prediction: 0.35
Error: 0.12250000000000007		Prediction: 0.44999999999999996
Error: 0.06250000000000006		Prediction: 0.5499999999999999
Error: 0.02250000000000004		Prediction: 0.6499999999999999
Error: 0.0025000000000000157		Prediction: 0.7499999999999999
Error: 0.0024999999999999823		Prediction: 0.8499999999999999
Error: 0.0025000000000000157		Prediction: 0.7499999999999999
Error: 0.0024999999999999823		Prediction: 0.8499999999999999
Error: 0.0025000000000000157		Prediction: 0.7499999999999999
Error: 0.0024999999999999823		Prediction: 0.8499999999999999
Error: 0.0025000000000000157		Prediction: 0.7499999999999999
Error: 0.0024999999999999823		Prediction: 0.8499999999999999
Error: 0.0025000000000000157		Prediction: 0.7499999999999999
Error: 0.0024999999999999823		Prediction: 0.8499999999999999
Error: 0.0025000000000000157		Prediction: 0.7499999999999999
Error: 0.0024999999999999823		Prediction: 0.8499999999999999


In [10]:
weight = 0.5
input = 0.5

goal_prediction = 0.8

step_amount = 10  # How much to move the weights each iteration

# Repeat learning many times so the error can keep getting smaller.
for iteration in range(1101):
  prediction = input * weight
  error = (prediction - goal_prediction) ** 2

  print(f'Error: {str(error)}\t\tPrediction: {str(prediction)}')

  up_prediction = input * (weight + step_amount)   # try up!
  up_error = (goal_prediction - up_prediction) ** 2

  down_prediction = input * (weight - step_amount)  # try down!
  down_error = (goal_prediction - down_prediction) ** 2

  if (down_error < up_error):
    weight = weight - step_amount    # If down is better, go down!
  if (down_error > up_error):
    weight = weight + step_amount    # If up is better, go up!

Error: 0.30250000000000005		Prediction: 0.25
Error: 19.802500000000002		Prediction: 5.25
Error: 0.30250000000000005		Prediction: 0.25
Error: 19.802500000000002		Prediction: 5.25
Error: 0.30250000000000005		Prediction: 0.25
Error: 19.802500000000002		Prediction: 5.25
Error: 0.30250000000000005		Prediction: 0.25
Error: 19.802500000000002		Prediction: 5.25
Error: 0.30250000000000005		Prediction: 0.25
Error: 19.802500000000002		Prediction: 5.25
Error: 0.30250000000000005		Prediction: 0.25
Error: 19.802500000000002		Prediction: 5.25
Error: 0.30250000000000005		Prediction: 0.25
Error: 19.802500000000002		Prediction: 5.25
Error: 0.30250000000000005		Prediction: 0.25
Error: 19.802500000000002		Prediction: 5.25
Error: 0.30250000000000005		Prediction: 0.25
Error: 19.802500000000002		Prediction: 5.25
Error: 0.30250000000000005		Prediction: 0.25
Error: 19.802500000000002		Prediction: 5.25
Error: 0.30250000000000005		Prediction: 0.25
Error: 19.802500000000002		Prediction: 5.25
Error: 0.302500000000

The real problem is that even though you know the correct direction to move weight, you don’t know
the correct amount. Instead, you pick a fixed one at random (step_amount). Furthermore, this amount
has nothing to do with error. Whether error is big or tiny, step_amount is the same. 

So, hot and cold
learning is kind of a bummer. It’s inefficient because you predict three times for each weight update, and
step_ amount is arbitrary, which can prevent you from learning the correct weight value.

## Calculating both direction and amount from error