In [1]:
knob_weight = 0.5
input = 2
goal_prediction = .8

step_amount = 0.001

display_step = 1

## This is a form of gradient descent
where we update the "knob_weight" with the "direction" and amount

The key to learning is adjusting the weights in the correct *direction* and by the correct *amount* so that the **error** reduces to 0.

In [2]:
for iteration in range(50):
    
    prediction = input * knob_weight
    
    error = (goal_prediction-prediction) ** 2 
    
    
    derivative = (prediction - goal_prediction) * input

    knob_weight = knob_weight - derivative
    
    if iteration % display_step == 0:
        print("Iteration: " + str(iteration) + " Error: " + str(error) + " Prediction: " + str(prediction))

  

Iteration: 0 Error: 0.04 Prediction: 1.0
Iteration: 1 Error: 0.36 Prediction: 0.2
Iteration: 2 Error: 3.24 Prediction: 2.6
Iteration: 3 Error: 29.16 Prediction: -4.6
Iteration: 4 Error: 262.44 Prediction: 17.0
Iteration: 5 Error: 2361.96 Prediction: -47.8
Iteration: 6 Error: 21257.64 Prediction: 146.6
Iteration: 7 Error: 191318.76 Prediction: -436.6
Iteration: 8 Error: 1721868.84 Prediction: 1313.0
Iteration: 9 Error: 15496819.56 Prediction: -3935.8
Iteration: 10 Error: 139471376.04 Prediction: 11810.6
Iteration: 11 Error: 1255242384.36 Prediction: -35428.6
Iteration: 12 Error: 11297181459.2 Prediction: 106289.0
Iteration: 13 Error: 1.01674633133e+11 Prediction: -318863.8
Iteration: 14 Error: 9.15071698198e+11 Prediction: 956594.6
Iteration: 15 Error: 8.23564528379e+12 Prediction: -2869780.6
Iteration: 16 Error: 7.41208075541e+13 Prediction: 8609345.0
Iteration: 17 Error: 6.67087267987e+14 Prediction: -25828031.8
Iteration: 18 Error: 6.00378541188e+15 Prediction: 77484098.6
Iteration: 


## The key concept: 

## "Learning is adjusting our weight to reduce the error to zero"

First, our neural network is really just a bunch of **weights** and
an **error** function. Our goal is to move the weights (in this case just one) so that our error goes
to zero! How do we do this? Well, it turns out we can take the derivative between two variables
in any function. When we do this, we learn how one changes when we change the other. We can
then use this to modify each weight in the **direction** that leads us to the lowest error. How do we
know this direction, it is always in the opposite of the value of our derivative. (p. 41)

We are trying to figure out the direction and the
amount to change our weight so that our error goes down. A derivative gives us the relationship
between any two variables in a function. We use the derivative to determine the relationship be-
tween any weight and the error. We then move our weight in the opposite direction of the deriva-
tive to find the lowest weight.

## Key structure:
* Inputs
* weights
* prediction
* error
* derivative -- the measure of the effect of the weight on the error
* update the weight 
* loop (predict again) 

## Derivatives


We can actually compute the slope (i.e. derivative) of the line for any value
of knob_weight. We can then use this slope (derivative) to figure out which direction reduces
our error! Even better, based on the steepness we can get at least some idea for how far away we
are

What is the difference between the error and the derivative of our error and knob_
weight? **Well the error is just a measure of how much we missed. The derivative defines the real-
tionship between each weight and how much we missed.** In other words, it tells how how much
changing a weight contributed to the error

We are trying to figure out the direction and the
amount to change our weight so that our error goes down. A derivative gives us the relationship
between any two variables in a function. We use the derivative to determine the relationship be-
tween any weight and the error. We then move our weight in the opposite direction of the deriva-
tive to find the lowest weight.

## Divergence 

its pretty easy to break this -- just put 2 in as an input. What happens is that small changes to the knob cause huge changes in the prediction and lead to instability 

if we have a BIG input, then the prediction is VERY sensitive to changes in
the weight (since prediction = input * knob_weight). 

* this is intersting cause  in the traffic signs thing -- and a reason why you would normalize

The solution is to multiply the weight update by
a fraction to make it smaller. In most cases, this involves multiplying our weight update by
a single real-valued number between 0 and 1, known as alpha.

In [10]:
# adding alpha to limit the derivative so it doesnt over shoot
knob_weight = 0.5
input = 2
goal_prediction = .8
step_amount = 0.001
display_step = 1
alpha = .1

for iteration in range(20):
    prediction = input * knob_weight
    error = (goal_prediction-prediction) ** 2 
    derivative = (prediction - goal_prediction) * input
    knob_weight = knob_weight - alpha * derivative
    
    if iteration % display_step == 0:
        print("Iteration: " + str(iteration) + " Error: " + str(round(error,4)) + 
              " Prediction: " + str(round(prediction,4)))



Iteration: 0 Error: 0.04 Prediction: 1.0
Iteration: 1 Error: 0.0144 Prediction: 0.92
Iteration: 2 Error: 0.0052 Prediction: 0.872
Iteration: 3 Error: 0.0019 Prediction: 0.8432
Iteration: 4 Error: 0.0007 Prediction: 0.8259
Iteration: 5 Error: 0.0002 Prediction: 0.8156
Iteration: 6 Error: 0.0001 Prediction: 0.8093
Iteration: 7 Error: 0.0 Prediction: 0.8056
Iteration: 8 Error: 0.0 Prediction: 0.8034
Iteration: 9 Error: 0.0 Prediction: 0.802
Iteration: 10 Error: 0.0 Prediction: 0.8012
Iteration: 11 Error: 0.0 Prediction: 0.8007
Iteration: 12 Error: 0.0 Prediction: 0.8004
Iteration: 13 Error: 0.0 Prediction: 0.8003
Iteration: 14 Error: 0.0 Prediction: 0.8002
Iteration: 15 Error: 0.0 Prediction: 0.8001
Iteration: 16 Error: 0.0 Prediction: 0.8001
Iteration: 17 Error: 0.0 Prediction: 0.8
Iteration: 18 Error: 0.0 Prediction: 0.8
Iteration: 19 Error: 0.0 Prediction: 0.8
