# Gradient Descent: The Code

From before we saw that one weight update can be calculated as:

Δw
​i
​​ =ηδx
​i
​​ 

with the error term δ as

δ=(y−
​y
​^
​​ )f
​′
​​ (h)=(y−
​y
​^
​​ )f
​′
​​ (∑w
​i
​​ x
​i
​​ )

Remember, in the above equation (y−
​y
​^
​​ ) is the output error, and f
​′
​​ (h) refers to the derivative of the activation function, f(h). We'll call that derivative the output gradient.

Now I'll write this out in code for the case of only one output unit. We'll also be using the sigmoid as the activation function f(h).

### Defining the sigmoid function for activations
def sigmoid(x):
    return 1/(1+np.exp(-x))

### Derivative of the sigmoid function
def sigmoid_prime(x):
    return sigmoid(x) * (1 - sigmoid(x))

### Input data
x = np.array([0.1, 0.3])
### Target
y = 0.2
### Input to output weights
weights = np.array([-0.8, 0.5])

### The learning rate, eta in the weight step equation
learnrate = 0.5

### the linear combination performed by the node (h in f(h) and f'(h))
h = x[0]*weights[0] + x[1]*weights[1]
### or h = np.dot(x, weights)

### The neural network output (y-hat)
nn_output = sigmoid(h)

### output error (y - y-hat)
error = y - nn_output

### output gradient (f'(h))
output_grad = sigmoid_prime(h)

### error term (lowercase delta)
error_term = error * output_grad

### Gradient descent step 
del_w = [ learnrate * error_term * x[0],
          learnrate * error_term * x[1]]
### or del_w = learnrate * error_term * x

# Me Testing 

In [3]:
import numpy as np

# Defining the sigmoid function for activations
def sigmoid(x):
    return 1/(1+np.exp(-x))

# Derivative of the sigmoid function
def sigmoid_prime(x):
    return sigmoid(x) * (1 - sigmoid(x))

# Input data
x = np.array([0.1, 0.3])

# Target
y = 0.2
# Input to output weights
weights = np.array([-0.8, 0.5])

# The learning rate, eta in the weight step equation
learnrate = 0.5

# the linear combination performed by the node (h in f(h) and f'(h))
h = x[0]*weights[0] + x[1]*weights[1]
# or h = np.dot(x, weights)

# The neural network output (y-hat)
nn_output = sigmoid(h)

# output error (y - y-hat)
error = y - nn_output

# output gradient (f'(h))
output_grad = sigmoid_prime(h)

# error term (lowercase delta)
error_term = error * output_grad

# Gradient descent step 
del_w = [ learnrate * error_term * x[0],
          learnrate * error_term * x[1]]
# or del_w = learnrate * error_term * x

# Me Testing 


weight_test = [-0.2] = wi

inputs_test = [7] = xi

weight_grades = [-1] = wi

inputs_grades = [4] = xi

__The learning rate, eta in the weight step equation
learnrate = [0.5] = η

__True Target value
y = [0.2] = y

(0.5 × ((0.2 - (1/(1+2.71828183^(-0.2×7+-1×4)))) × ((1/(1+2.71828183^(-0.2×7+-1×4))) × (1 - (1/(1+2.71828183^(-0.2×7+-1×4)))))) × 7) = -0.01246251922

(-0.2×7+-1×4) = -5.4 = __Linear combination output = ∑ wi × xi__

(0.5 × ((0.2 - (1/(1+2.71828183^-5.4))) × ((1/(1+2.71828183^-5.4)) × (1 - (1/(1+2.71828183^-5.4))))) × 7)

(1/(1+2.71828183^-5.4)) = 0.9955037268390589 = __Sigmoid function output  f(h)__

(0.5 × ((0.2 - 0.9955037268390589) × (0.9955037268390589 × (1 - 0.9955037268390589))) × 7)

(0.9955037268390589 × (1 - 0.9955037268390589)) = 0.0044760566886033444 __Derivative of the sigmoid function = f′(h)__

(0.5 × ((0.2 - 0.9955037268390589) × 0.0044760566886033444) × 7) 

(0.2 - 0.9955037268390589) = -0.7955037268390588 = __Output error = E__

(0.5 × (-0.7955037268390588 × 0.0044760566886033444) × 7) 

(-0.7955037268390588 × 0.0044760566886033444) = -0.003560719777326857 = __Error term (lowercase delta) = δ__

((0.5 × -0.003560719777326857 × 7), (0.5 × -0.003560719777326857 × 4)) = 0.012462519220644, -0.0071214395546537138 =  __Gradient descent step = ηδxi = Δwi__ 

((-0.2 + 0.5 × -0.003560719777326857 × 7), (-1 + 0.5 × -0.003560719777326857 × 4)) = -0.21246251922064402, -1.0071214395546537 =  __Gradient descent step with wi = wi+ηδxi = Δwi__ 

In [75]:
import numpy as np

weight_test = -0.2
weight_grades = -1
bias = -1.0
inputs_test = 7
inputs_grades = 4

# Linear combination
lc = weight_test*inputs_test + weight_grades*inputs_grades
print("Linear combination output [{}] and Linear combination with bias [{}]".format(lc, lc+bias))

# Heaviside step function
hstep_f = 0
if bias + lc < 0:
    hstep_f = 0
elif bias + lc >= 0:
    hstep_f = 1 
print("Heaviside step function output [{}]".format(hstep_f))
np.exp(lc)

# Sigmoid function is the (exponential of Linear combination + 1)/1 = sigmoid
print("Exponential number is [{}]".format(np.exp([1,1,-3])))
print("Exponential of Linear combination [{}]".format(np.exp(lc)))
print("Exponential of Linear combination [{}]".format(1+np.exp(lc)))
print("Exponential of Linear combination [{}]".format(1/(1+np.exp(lc))))
sigmoid_fun = 1/(1+np.exp(lc))
print("Sigmoid function output [{}]".format(sigmoid_fun))

# Target
y = 0.2

# output error (y - y-hat)
error = y - sigmoid_fun
print("Output error (y - y-hat) [{}]".format(error))

# Derivative of the sigmoid function
# output gradient (f'(h))
sigmoid_prime = sigmoid_fun * (1 - sigmoid_fun)
print("Derivative of the sigmoid function [{}]".format(sigmoid_prime))

# error term (lowercase delta)
error_term = error * sigmoid_prime
print("Error term (lowercase delta) [{}]".format(error_term))

# The learning rate, eta in the weight step equation
learnrate = 0.5

# Gradient descent step with out weight_test and weight_grades
del_w2 = [ learnrate * error_term * inputs_test,
        learnrate * error_term * inputs_grades]
print("Gradient descent step with out weight_test and weight_grades {}".format(del_w2))

# Gradient descent step 
del_w = [ weight_test + learnrate * error_term * inputs_test,
          weight_grades + learnrate * error_term * inputs_grades]
# or del_w = learnrate * error_term * x
print("Gradient descent step with weight_test and weight_grades {}".format(del_w))

Linear combination output [-5.4] and Linear combination with bias [-6.4]
Heaviside step function output [0]
Exponential number is [[ 2.71828183  2.71828183  0.04978707]]
Exponential of Linear combination [0.004516580942612666]
Exponential of Linear combination [1.0045165809426126]
Exponential of Linear combination [0.9955037268390589]
Sigmoid function output [0.9955037268390589]
Output error (y - y-hat) [-0.7955037268390588]
Derivative of the sigmoid function [0.0044760566886033444]
Error term (lowercase delta) [-0.003560719777326857]
Gradient descent step with out weight_test and weight_grades [-0.012462519220644, -0.0071214395546537138]
Gradient descent step with weight_test and weight_grades [-0.21246251922064402, -1.0071214395546537]


# Gradient Descent: The Code test

In [32]:
import numpy as np

def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1/(1+np.exp(-x))

def sigmoid_prime(x):
    """
    # Derivative of the sigmoid function
    """
    return sigmoid(x) * (1 - sigmoid(x))

learnrate = 0.5
x = np.array([1, 2, 3, 4])
y = np.array(0.5)

# Initial weights
w = np.array([0.5, -0.5, 0.3, 0.1])

### Calculate one gradient descent step for each weight
### Note: Some steps have been consilated, so there are
###       fewer variable names than in the above sample code

# TODO: Calculate the node's linear combination of inputs and weights
#for i in range(len(x)):
    #h = w[i]*x[i]

h = w[0]*x[0] + w[1]*x[1] + w[2]*x[2] + w[3]*x[3]
print(h)
print(np.dot(x, w))

#for xhere, where in zip(x, w):
   #print(xhere,where)
    #h = h + w[0]*x[0]

# TODO: Calculate output of neural network
nn_output = sigmoid(h)

# TODO: Calculate error of neural network
error = y - nn_output

# output gradient (f'(h))
output_grad = sigmoid_prime(h)

# TODO: Calculate the error term
#       Remember, this requires the output gradient, which we haven't
#       specifically added a variable for.
error_term = error * output_grad

# TODO: Calculate change in weights
del_w = [ learnrate * error_term * x ]

print('Neural Network output:')
print(nn_output)
print('Amount of Error:')
print(error)
print('Change in Weights:')
print(del_w)

0.8
0.8
Neural Network output:
0.689974481128
Amount of Error:
-0.189974481128
Change in Weights:
[array([-0.02031869, -0.04063738, -0.06095608, -0.08127477])]
