<a href="https://colab.research.google.com/github/rahiakela/grokking-deep-learning/blob/5-generalizing-gradient-descent/generalizing_gradient_descent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# learning multiple weights at a time: generalizing gradient descent

## Gradient descent learning with multiple inputs
**Gradient descent also works with multiple inputs**

We’ll more or less reveal how the same techniques can be used to update a
network that contains multiple weights. Let’s start by jumping in the deep end, shall we?

<img src='https://github.com/rahiakela/img-repo/blob/master/gradient-descent-multiple-inputs-1.JPG?raw=1' width='800'/>


In [0]:
def w_sum(a, b):
  assert(len(a) == len(b))

  output = 0
  for i in range(len(a)):
    output += (a[i] * b[i])
  return output

weights = [0.1, 0.2, -0.1]

def neural_network(input, weights):
  pred = w_sum(input, weights)
  return pred

<img src='https://github.com/rahiakela/img-repo/blob/master/gradient-descent-multiple-inputs-2.JPG?raw=1' width='800'/>

In [2]:
toes = [8.5 , 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2 , 1.3, 0.5, 1.0]

win_or_lose_binary = [1, 1, 0, 1]
true = win_or_lose_binary[0]

input = [toes[0], wlrec[0], nfans[0]]

pred = neural_network(input, weights)
print(f'Prediction: {str(pred)}')

error = (pred - true) ** 2
print(f'Error: {str(error)}')

delta = pred - true
print(f'Delta: {str(delta)}')

Prediction: 0.8600000000000001
Error: 0.01959999999999997
Delta: -0.1399999999999999


<img src='https://github.com/rahiakela/img-repo/blob/master/gradient-descent-multiple-inputs-3.JPG?raw=1' width='800'/>

In [3]:
def ele_mul(number, vector):
  output = [0, 0, 0]

  assert(len(output) == len(vector))

  for i in range(len(vector)):
    output[i] = number * vector[i]

  return output

input = [toes[0], wlrec[0], nfans[0]]

pred = neural_network(input, weights)
print(f'Prediction: {str(pred)}')

error = (pred - true) ** 2
print(f'Error: {str(error)}')

delta = pred - true
print(f'Delta: {str(delta)}')

weight_deltas = ele_mul(delta, input)
for wd in weight_deltas:
  print(f'Weight Delta: {str(wd)}')

Prediction: 0.8600000000000001
Error: 0.01959999999999997
Delta: -0.1399999999999999
Weight Delta: -1.189999999999999
Weight Delta: -0.09099999999999994
Weight Delta: -0.16799999999999987


There’s nothing new in this diagram. Each weight_delta is calculated by taking its output
delta and multiplying it by its input. In this case, because the three weights share the same
output node, they also share that node’s delta. But the weights have different weight deltas
owing to their different input values. Notice further that you can reuse the ele_mul function
from before, because you’re multiplying each value in weights by the same value delta.

<img src='https://github.com/rahiakela/img-repo/blob/master/gradient-descent-multiple-inputs-4.JPG?raw=1' width='800'/>

In [4]:
input = [toes[0], wlrec[0], nfans[0]]

pred = neural_network(input, weights)

error = (pred - true) ** 2

delta = pred - true

weight_deltas = ele_mul(delta, input)

alpha = 0.01

for i in range(len(weights)):
  weights[i] -= alpha * weight_deltas[i]
print(f'Weights: {str(weights)}')
print(f'Weight Deltas: {str(weight_deltas)}')  

Weights: [0.1119, 0.20091, -0.09832]
Weight Deltas: [-1.189999999999999, -0.09099999999999994, -0.16799999999999987]


## Let’s watch several steps of learning

In [5]:
def neural_network(input, weights):
  output = 0
  for i in range(len(input)):
    output += (input[i] * weights[i])
  return output

def ele_mul(scalar, vector):
  output = [0, 0, 0]
  for i in range(len(output)):
    output[i] = vector[i] * scalar
  return output

toes = [8.5 , 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2 , 1.3, 0.5, 1.0]     

win_or_lose_binary = [1, 1, 0, 1]
true = win_or_lose_binary[0]

alpha = 0.01
weights = [0.1, 0.2, -.1]
input = [toes[0], wlrec[0], nfans[0]]

for iter in range(3):
  pred = neural_network(input, weights)

  error = (pred - true) ** 2
  delta = pred - true

  weight_deltas = ele_mul(delta, input)

  print(f'Iteration: {str(iter + 1)}')
  print(f'Prediction: {str(pred)}')
  print(f'Error: {str(error)}')
  print(f'Delta: {str(delta)}')
  print(f'Weights: {str(weights)}')
  print(f'Weight_Deltas:')
  print(str(weight_deltas))
  print()

for i in range(len(weights)):
  weights[i] -= alpha * weight_deltas[i]  

Iteration: 1
Prediction: 0.8600000000000001
Error: 0.01959999999999997
Delta: -0.1399999999999999
Weights: [0.1, 0.2, -0.1]
Weight_Deltas:
[-1.189999999999999, -0.09099999999999994, -0.16799999999999987]

Iteration: 2
Prediction: 0.8600000000000001
Error: 0.01959999999999997
Delta: -0.1399999999999999
Weights: [0.1, 0.2, -0.1]
Weight_Deltas:
[-1.189999999999999, -0.09099999999999994, -0.16799999999999987]

Iteration: 3
Prediction: 0.8600000000000001
Error: 0.01959999999999997
Delta: -0.1399999999999999
Weights: [0.1, 0.2, -0.1]
Weight_Deltas:
[-1.189999999999999, -0.09099999999999994, -0.16799999999999987]



<img src='https://github.com/rahiakela/img-repo/blob/master/grokking-deep-learning/iteration-1.JPG?raw=1' width='800'/>

We can make three individual error/weight curves, one for each weight. As before, the slopes
of these curves (the dotted lines) are reflected by the weight_delta values. Notice that **a** is
steeper than the others. Why is weight_delta steeper for **a** than the others if they share the
same output delta and error measure? Because **a** has an input value that’s significantly
higher than the others and thus, a higher derivative.

<img src='https://github.com/rahiakela/img-repo/blob/master/grokking-deep-learning/iteration-2.JPG?raw=1' width='800'/>

<img src='https://github.com/rahiakela/img-repo/blob/master/grokking-deep-learning/iteration-3.JPG?raw=1' width='800'/>

Most of the learning (weight changing) was performed
on the weight with the largest input **a** , because the input changes the slope significantly.
This isn’t necessarily advantageous in all settings. A subfield called normalization helps
encourage learning across all weights despite dataset characteristics such as this. This
significant difference in slope forced me to set alpha lower than I wanted (0.01 instead of
0.1). Try setting alpha to 0.1: do you see how **a**a causes it to diverge?

## Freezing one weight: What does it do?

This experiment is a bit advanced in terms of theory, but I think it’s a great exercise to
understand how the weights affect each other. You’re going to train again, except weight **a**
won’t ever be adjusted. You’ll try to learn the training example using only weights **b** and **c**
(weights[1] and weights[2]).

In [6]:
def neural_network(input, weights):
  output = 0
  for i in range(len(input)):
    output += (input[i] * weights[i])
  return output

def ele_mul(scalar, vector):
  output = [0, 0, 0]
  for i in range(len(output)):
    output[i] = vector[i] * scalar
  return output

toes = [8.5 , 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2 , 1.3, 0.5, 1.0]     

win_or_lose_binary = [1, 1, 0, 1]
true = win_or_lose_binary[0]

alpha = 0.3
weights = [0.1, 0.2, -.1]
input = [toes[0], wlrec[0], nfans[0]]

for iter in range(3):
  pred = neural_network(input, weights)

  error = (pred - true) ** 2
  delta = pred - true

  weight_deltas = ele_mul(delta, input)
  weight_deltas[0] = 0

  print(f'Iteration: {str(iter + 1)}')
  print(f'Prediction: {str(pred)}')
  print(f'Error: {str(error)}')
  print(f'Delta: {str(delta)}')
  print(f'Weights: {str(weights)}')
  print(f'Weight_Deltas:')
  print(str(weight_deltas))
  print()

for i in range(len(weights)):
  weights[i] -= alpha * weight_deltas[i]

Iteration: 1
Prediction: 0.8600000000000001
Error: 0.01959999999999997
Delta: -0.1399999999999999
Weights: [0.1, 0.2, -0.1]
Weight_Deltas:
[0, -0.09099999999999994, -0.16799999999999987]

Iteration: 2
Prediction: 0.8600000000000001
Error: 0.01959999999999997
Delta: -0.1399999999999999
Weights: [0.1, 0.2, -0.1]
Weight_Deltas:
[0, -0.09099999999999994, -0.16799999999999987]

Iteration: 3
Prediction: 0.8600000000000001
Error: 0.01959999999999997
Delta: -0.1399999999999999
Weights: [0.1, 0.2, -0.1]
Weight_Deltas:
[0, -0.09099999999999994, -0.16799999999999987]



<img src='https://github.com/rahiakela/img-repo/blob/master/grokking-deep-learning/freezing-weight-1.JPG?raw=1' width='800'/>

<img src='https://github.com/rahiakela/img-repo/blob/master/grokking-deep-learning/freezing-weight-2.JPG?raw=1' width='800'/>

error is determined by the training data. Any network can have any weight value, but
the value of error given any particular weight configuration is 100% determined by data.
You’ve already seen how the steepness of the U shape is affected by the input data (on
several occasions). What you’re really trying to do with the neural network is find the
lowest point on this big error plane, where the lowest point refers to the lowest error.



## Gradient descent learning with multiple outputs