# Training Neural Networks

Using your answer to the previous question, let's see it implemented in code.

The following code block has an example implementation of ∂Ck∂w(1). It is up to you to implement ∂Ck∂b(1).

Don't worry if you don't know exactly how the code works. It's more important that you get a feel for what is going on.

We will introduce the following derivative in the code,

ddztanh(z)=1cosh2z.
Complete the function 'dCdb' below. Replace the ??? towards the bottom, with the expression you calculated in the previous question.


In [1]:

import numpy as np

# First define our sigma function.
sigma = np.tanh

# Next define the feed-forward equation.
def a1 (w1, b1, a0) :
  z = w1 * a0 + b1
  return sigma(z)

# The individual cost function is the square of the difference between
# the network output and the training data output.
def C (w1, b1, x, y) :
  return (a1(w1, b1, x) - y)**2

# This function returns the derivative of the cost function with
# respect to the weight.
def dCdw (w1, b1, x, y) :
  z = w1 * x + b1
  dCda = 2 * (a1(w1, b1, x) - y) # Derivative of cost with activation
  dadz = 1/np.cosh(z)**2 # derivative of activation with weighted sum z
  dzdw = x # derivative of weighted sum z with weight
  return dCda * dadz * dzdw # Return the chain rule product.

# This function returns the derivative of the cost function with
# respect to the bias.
# It is very similar to the previous function.
# You should complete this function.
def dCdb (w1, b1, x, y) :
  z = w1 * x + b1
  dCda = 2 * (a1(w1, b1, x) - y)
  dadz = 1/np.cosh(z)**2
  """ Change the next line to give the derivative of
      the weighted sum, z, with respect to the bias, b. """
  dzdb = 1
  return dCda * dadz * dzdb

"""Test your code before submission:"""
# Let's start with an unfit weight and bias.
w1 = 2.3
b1 = -1.2
# We can test on a single data point pair of x and y.
x = 0
y = 1
# Output how the cost would change
# in proportion to a small change in the bias
print( dCdb(w1, b1, x, y) )

-1.1186026425530913


Recall that when we add more neurons to the network, our quantities are upgraded to vectors or matrices.


a(1)=σ(z(1)),

z(1)=W(1)a(0)+b(1)
The individual cost functions remain scalars. Instead of becoming vectors, the components are summed over each output neuron.

C_k = \sum_i (a^{(1)}_i - y_i)^2C 
k
​	 =∑ 
i
​	 (a 
i
(1)
​	 −y 
i
​	 ) 
2
 

Note here that ii labels the output neuron and is summed over, whereas kk labels the training example.

The training data becomes a vector too,

x→x and has the same number of elements as input neurons.

y→y and has the same number of elements as output neurons.

This allows us to write the cost function in vector form using the modulus squared,

Ck=|a(1)−y|2.

Use the code block below to play with calculating the cost function for this network.



In [3]:
# Define the activation function.
sigma = np.tanh

# Let's use a random initial weight and bias.
W = np.array([[-0.94529712, -0.2667356 , -0.91219181],
              [ 2.05529992,  1.21797092,  0.22914497]])
b = np.array([ 0.61273249,  1.6422662 ])

# define our feed forward function
def a1 (a0) :
  # Notice the next line is almost the same as previously,
  # except we are using matrix multiplication rather than scalar multiplication
  # hence the '@' operator, and not the '*' operator.
  z = W @ a0 + b
  # Everything else is the same though,
  return sigma(z)

# Next, if a training example is,
x = np.array([0.7, 0.6, 0.2])
y = np.array([0.9, 0.6])

# Then the cost function is,
d = a1(x) - y # Vector difference between observed and expected activation
C = d @ d # Absolute value squared of the difference.