<a href="https://colab.research.google.com/github/jpscard/backpropagation_tarefa_3/blob/main/backpropagation_JP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Código de exemplo

In [1]:
''' Code to study MLP training with backpropagation by tracking the training steps.
The neural network is a fully-connected multilayer perceptron with 1 hidden layer
and 2 inputs, 2 neurons in the hidden layer and 2 output neurons.
Hence, the weigth matrices (also called kernels by Keras) have dimensions:
W1 for layer 1 has dimension 2 x 2 and W2 of layer 2 has dimension 2 x 2.
Recall that when implementing the equations, we use the transposes. For instance,
the net output of layer 1 is W1^T x, where x is a column vector of dimension 2 x 1.
All neurons have a bias weight.

History:
I created this TF v2 version as follow:
1) Obtained a Tensorflow v1 from
 https://github.com/keras-team/keras/issues/956 and saved it as
 backpropagation_tf1_example.py. This code was not compatible with TF v2. 
2) Then I used
tf_upgrade_v2 --infile backpropagation_tf1_example.py --outfile backpropagation_tf2_example.py
to create a TF v2 version.
3) According to https://stackoverflow.com/questions/66221788/tf-gradients-is-not-supported-when-eager-execution-is-enabled-use-tf-gradientta
I added the line:
tf.compat.v1.disable_eager_execution()
and the code executed properly.
4) I made several modifications to initialize the weights with known values
and help tracking the training procedure.

Aldebaro, UFPA, Jan 2022.
'''

from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from keras import backend as k
from keras import losses
import numpy as np
import tensorflow as tf
from sklearn.metrics import mean_squared_error

#Initialize all weights to match the values adopted in the example at
#https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ 
#Initialise kernel (weights matrix) to required value
def kernel_init_layer1(shape,dtype='float'):
    # In this simple example, assume that the shape is (2, 2) 
    # and create a fixed array with this dimension for layer 1
    kernel = np.array([[0.15, 0.20],[0.25, 0.30]])
    return kernel 

#Initialise kernel (weights matrix) to required value
def kernel_init_layer2(shape,dtype='float'):
    # In this simple example, assume that the shape is (2, 2)
    # and create a fixed array with this dimension for layer 2
    kernel = np.array([[0.40, 0.45], [0.50, 0.55]])
    return kernel 

#Initialise kernel (bias vector) to required value
def kernel_init_bias1(shape,dtype='float'):
    # In this simple example, assume that the shape is (1, 2)
    # and create a fixed array with this dimension for the 
    # bias vector of layer 1
    kernel = np.array([0.35, 0.35]) #both neurons with the same value
    return kernel 

#Initialise kernel (bias vector) to required value
def kernel_init_bias2(shape,dtype='float'):
    # In this simple example, assume that the shape is (1, 2)
    # and create a fixed array with this dimension for the 
    # bias vector of layer 2
    kernel = np.array([0.60, 0.60]) #both neurons with the same value
    return kernel 

# from https://stackoverflow.com/questions/66221788/tf-gradients-is-not-supported-when-eager-execution-is-enabled-use-tf-gradientta
# TF 2 does not use "eager" execution, so disable it:
tf.compat.v1.disable_eager_execution()

#Define the neural network model with dense layers. Syntax:
#https://keras.io/api/layers/core_layers/dense/
#The sigmoid activation function in Keras is the standard logistic function 1/(1+exp(-x)).
#https://keras.io/api/layers/activations/
model = Sequential()
model.add(Dense(2, input_dim=2, use_bias=True,  bias_initializer=kernel_init_bias1,
        kernel_initializer=kernel_init_layer1, activation='sigmoid'))
model.add(Dense(2, use_bias=True,  bias_initializer=kernel_init_bias2, 
        kernel_initializer=kernel_init_layer2, activation='sigmoid'))
model.summary() # display the architecture

# We are not going to use a Keras optimizer. Define a learning rate:
learning_rate = 1  #you can change to 0.5 or any other reasonable value

#now compile the model, informing loss and performanc metrics that
#should be computed along the training
model.compile(loss='mse', metrics=['accuracy'])

# Begin TensorFlow
sess = tf.compat.v1.InteractiveSession()
sess.run(tf.compat.v1.initialize_all_variables())

model.optimizer.lr = learning_rate

print("===START TRAINING THE NETWORK:===")
steps = 3 # steps of gradient descent
for s in range(steps):
    print('Learning rate = ', k.get_value(model.optimizer.lr))
    print(" ############ Step = " + str(s) + " ############")
    print("1) ===BEFORE using any GRADIENT:===")
    #define input and target vectors, and also inform it to the loss function object
    #in this code we will keep the same inputs and targets along the steps, but the inputs and
    #targets can change when we go, for instance, through the training set
    inputs = np.array([[0.05, 0.10]]) #define the input for the current iteration (step)
    outputs = model.predict(inputs) #forward pass
    #define the target vector for the current iteration (step)
    targets = np.array([[0.01, 0.99]]) #notice that a sigmoid can output within range [0, 1]
    mse = mean_squared_error(targets, outputs) #calculate MSE
    #initialize loss object to be later incorporated to the gradients object that
    #enables the calculation of the symbolic gradients
    loss = losses.mean_squared_error(targets, model.output)
    #  ===== Obtain symbolic gradient to calculate numerical gradients =====
    gradients = k.gradients(loss, model.trainable_weights) #inform loss and weights
    if False: #enable with True in case you want to see the objects
        print("List of tensors representing the symbolic gradients:")
        for i in range(len(gradients)):
            print('symbolic gradient[',i,']=',gradients[i])
    print('Network input [x1, x2]:\n', inputs)
    print('Network output [out_o1, out_o2]:\n', outputs)
    print("targets:\n", targets)
    #show weights at the beginning of iteration s
    print('weights at the beginning of this step')
    for i in range(len(model.trainable_weights)):
        #note that model.trainable_weights[i] is an object of
        # <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable
        #therefore (see e.g. https://stackoverflow.com/questions/33679382/how-do-i-get-the-current-value-of-a-variable )
        #you need to obtain its value via a TF session:
        print('weights[',i,']=',sess.run(model.trainable_weights[i]))
    print("MSE with the initial weights:", mse)

    print("2) ===After applying GRADIENT:===")
    print("------------- Results for step (iteration) =", s)
    # ===== Calculate numerical gradient from symbolic gradients =====
    # evaluated_gradients is a list, show its contents:
    evaluated_gradients = sess.run(gradients, feed_dict={model.input: inputs})
    print('Gradients g to be used in new_weights = current_weights - learning_rate*g')
    for i in range(len(evaluated_gradients)):
        print('gradients[',i,']=',evaluated_gradients[i])

    # Apply ("step down") the gradient for each layer, subtracting the gradients
    # from current weights scaled by the learning rate:
    for i in range(len(model.trainable_weights)):
        sess.run(tf.compat.v1.assign_sub(model.trainable_weights[i], learning_rate*evaluated_gradients[i]))

    #show weights after gradient propagation of iteration s
    print('weights after gradient propagation in this step')
    for i in range(len(model.trainable_weights)):
        print('weights[',i,']=',sess.run(model.trainable_weights[i]))

    # print the MSE with new weights:
    outputs = model.predict(inputs)
    mse = mean_squared_error(targets, outputs)
    print("MSE with the new weights:", mse)

#Collect and show final results
final_outputs = model.predict(inputs)
final_mse = mean_squared_error(targets, final_outputs)

print("\n ===AFTER executing all GRADIENT descent steps===")
#show weights at the end of training
for i in range(len(model.trainable_weights)):
    #note that model.trainable_weights[i] is an object of
    # <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable
    #therefore (see e.g. https://stackoverflow.com/questions/33679382/how-do-i-get-the-current-value-of-a-variable )
    #you need to obtain its value via a TF session:
    print('final weights[',i,']=',sess.run(model.trainable_weights[i]))
print("outputs:\n", final_outputs)
print("targets:\n", targets)
print("Final MSE = ", final_mse)

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 2)                 6         
                                                                 
 dense_1 (Dense)             (None, 2)                 6         
                                                                 
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________


Instructions for updating:
Use `tf.global_variables_initializer` instead.
  updates=self.state_updates,


===START TRAINING THE NETWORK:===
Learning rate =  1
 ############ Step = 0 ############
1) ===BEFORE using any GRADIENT:===
Network input [x1, x2]:
 [[0.05 0.1 ]]
Network output [out_o1, out_o2]:
 [[0.7569319 0.7677179]]
targets:
 [[0.01 0.99]]
weights at the beginning of this step
weights[ 0 ]= [[0.15 0.2 ]
 [0.25 0.3 ]]
weights[ 1 ]= [0.35 0.35]
weights[ 2 ]= [[0.4  0.45]
 [0.5  0.55]]
weights[ 3 ]= [0.6 0.6]
MSE with the initial weights: 0.3036582988081465
2) ===After applying GRADIENT:===
------------- Results for step (iteration) = 0
Gradients g to be used in new_weights = current_weights - learning_rate*g
gradients[ 0 ]= [[0.00044758 0.00056464]
 [0.00089517 0.00112929]]
gradients[ 1 ]= [0.00895169 0.01129289]
gradients[ 2 ]= [[ 0.08169585 -0.02356439]
 [ 0.08194415 -0.02363601]]
gradients[ 3 ]= [ 0.137425   -0.03963893]
weights after gradient propagation in this step
weights[ 0 ]= [[0.14955242 0.19943535]
 [0.24910483 0.2988707 ]]
weights[ 1 ]= [0.3410483 0.3387071]
weights[ 2 

# 1. Codigo de aplicação para a amostra_1 (1.2,0,1,-1.3) 

## 1.1 Para Learning rate = 1 

In [5]:
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from keras import backend as k
from keras import losses
import numpy as np
import tensorflow as tf
from sklearn.metrics import mean_squared_error

#Initialize all weights to match the values adopted in the example at
#https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ 
#Initialise kernel (weights matrix) to required value
def kernel_init_layer1(shape,dtype='float'):
    # In this simple example, assume that the shape is (2, 2) 
    # and create a fixed array with this dimension for layer 1
    kernel = np.array([[0.1, 0.3],[-0.4, 0.2]])
    return kernel 

#Initialise kernel (weights matrix) to required value
def kernel_init_layer2(shape,dtype='float'):
    # In this simple example, assume that the shape is (2, 2)
    # and create a fixed array with this dimension for layer 2
    kernel = np.array([[0.4, 0.2], [-0.4, 0.7]])
    return kernel 

#Initialise kernel (bias vector) to required value
def kernel_init_bias1(shape,dtype='float'):
    # In this simple example, assume that the shape is (1, 2)
    # and create a fixed array with this dimension for the 
    # bias vector of layer 1
    kernel = np.array([-0.4, 0.8]) #both neurons with the same value
    return kernel 

#Initialise kernel (bias vector) to required value
def kernel_init_bias2(shape,dtype='float'):
    # In this simple example, assume that the shape is (1, 2)
    # and create a fixed array with this dimension for the 
    # bias vector of layer 2
    kernel = np.array([0.8, 0.2]) #both neurons with the same value
    return kernel 

# from https://stackoverflow.com/questions/66221788/tf-gradients-is-not-supported-when-eager-execution-is-enabled-use-tf-gradientta
# TF 2 does not use "eager" execution, so disable it:
tf.compat.v1.disable_eager_execution()

#Define the neural network model with dense layers. Syntax:
#https://keras.io/api/layers/core_layers/dense/
#The sigmoid activation function in Keras is the standard logistic function 1/(1+exp(-x)).
#https://keras.io/api/layers/activations/
model = Sequential()
model.add(Dense(2, input_dim=2, use_bias=True,  bias_initializer=kernel_init_bias1,
        kernel_initializer=kernel_init_layer1, activation='sigmoid'))
model.add(Dense(2, use_bias=True,  bias_initializer=kernel_init_bias2, 
        kernel_initializer=kernel_init_layer2, activation='sigmoid'))
model.summary() # display the architecture

# We are not going to use a Keras optimizer. Define a learning rate:
learning_rate = 1  #you can change to 0.5 or any other reasonable value

#now compile the model, informing loss and performanc metrics that
#should be computed along the training
model.compile(loss='mse', metrics=['accuracy'])

# Begin TensorFlow
sess = tf.compat.v1.InteractiveSession()
sess.run(tf.compat.v1.initialize_all_variables())

model.optimizer.lr = learning_rate

print("===START TRAINING THE NETWORK:===")
steps = 3 # steps of gradient descent
for s in range(steps):
    print('Learning rate = ', k.get_value(model.optimizer.lr))
    print(" ############ Step = " + str(s) + " ############")
    print("1) ===BEFORE using any GRADIENT:===")
    #define input and target vectors, and also inform it to the loss function object
    #in this code we will keep the same inputs and targets along the steps, but the inputs and
    #targets can change when we go, for instance, through the training set
    inputs = np.array([[1.2,0]]) #define the input for the current iteration (step)
    outputs = model.predict(inputs) #forward pass
    #define the target vector for the current iteration (step)
    targets = np.array([[1, -1.3]]) #notice that a sigmoid can output within range [0, 1]
    mse = mean_squared_error(targets, outputs) #calculate MSE
    #initialize loss object to be later incorporated to the gradients object that
    #enables the calculation of the symbolic gradients
    loss = losses.mean_squared_error(targets, model.output)
    #  ===== Obtain symbolic gradient to calculate numerical gradients =====
    gradients = k.gradients(loss, model.trainable_weights) #inform loss and weights
    if False: #enable with True in case you want to see the objects
        print("List of tensors representing the symbolic gradients:")
        for i in range(len(gradients)):
            print('symbolic gradient[',i,']=',gradients[i])
    print('Network input [x1, x2]:\n', inputs)
    print('Network output [out_o1, out_o2]:\n', outputs)
    print("targets:\n", targets)
    #show weights at the beginning of iteration s
    print('weights at the beginning of this step')
    for i in range(len(model.trainable_weights)):
        #note that model.trainable_weights[i] is an object of
        # <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable
        #therefore (see e.g. https://stackoverflow.com/questions/33679382/how-do-i-get-the-current-value-of-a-variable )
        #you need to obtain its value via a TF session:
        print('weights[',i,']=',sess.run(model.trainable_weights[i]))
    print("MSE with the initial weights:", mse)

    print("2) ===After applying GRADIENT:===")
    print("------------- Results for step (iteration) =", s)
    # ===== Calculate numerical gradient from symbolic gradients =====
    # evaluated_gradients is a list, show its contents:
    evaluated_gradients = sess.run(gradients, feed_dict={model.input: inputs})
    print('Gradients g to be used in new_weights = current_weights - learning_rate*g')
    for i in range(len(evaluated_gradients)):
        print('gradients[',i,']=',evaluated_gradients[i])

    # Apply ("step down") the gradient for each layer, subtracting the gradients
    # from current weights scaled by the learning rate:
    for i in range(len(model.trainable_weights)):
        sess.run(tf.compat.v1.assign_sub(model.trainable_weights[i], learning_rate*evaluated_gradients[i]))

    #show weights after gradient propagation of iteration s
    print('weights after gradient propagation in this step')
    for i in range(len(model.trainable_weights)):
        print('weights[',i,']=',sess.run(model.trainable_weights[i]))

    # print the MSE with new weights:
    outputs = model.predict(inputs)
    mse = mean_squared_error(targets, outputs)
    print("MSE with the new weights:", mse)

#Collect and show final results
final_outputs = model.predict(inputs)
final_mse = mean_squared_error(targets, final_outputs)

print("\n ===AFTER executing all GRADIENT descent steps===")
#show weights at the end of training
for i in range(len(model.trainable_weights)):
    #note that model.trainable_weights[i] is an object of
    # <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable
    #therefore (see e.g. https://stackoverflow.com/questions/33679382/how-do-i-get-the-current-value-of-a-variable )
    #you need to obtain its value via a TF session:
    print('final weights[',i,']=',sess.run(model.trainable_weights[i]))
print("outputs:\n", final_outputs)
print("targets:\n", targets)
print("Final MSE = ", final_mse)

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_8 (Dense)             (None, 2)                 6         
                                                                 
 dense_9 (Dense)             (None, 2)                 6         
                                                                 
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________
===START TRAINING THE NETWORK:===
Learning rate =  1
 ############ Step = 0 ############
1) ===BEFORE using any GRADIENT:===


  updates=self.state_updates,


Network input [x1, x2]:
 [[1.2 0. ]]
Network output [out_o1, out_o2]:
 [[0.6609764 0.6940291]]
targets:
 [[ 1.  -1.3]]
weights at the beginning of this step
weights[ 0 ]= [[ 0.1  0.3]
 [-0.4  0.2]]
weights[ 1 ]= [-0.4  0.8]
weights[ 2 ]= [[ 0.4  0.2]
 [-0.4  0.7]]
weights[ 3 ]= [0.8 0.2]
MSE with the initial weights: 2.0455445087618966
2) ===After applying GRADIENT:===
------------- Results for step (iteration) = 0
Gradients g to be used in new_weights = current_weights - learning_rate*g
gradients[ 0 ]= [[0.01597462 0.07125631]
 [0.         0.        ]]
gradients[ 1 ]= [0.01331218 0.05938026]
gradients[ 2 ]= [[-0.03270185  0.18227024]
 [-0.05783894  0.3223768 ]]
gradients[ 3 ]= [-0.07597064  0.42343745]
weights after gradient propagation in this step
weights[ 0 ]= [[ 0.08402538  0.2287437 ]
 [-0.4         0.2       ]]
weights[ 1 ]= [-0.4133122  0.7406198]
weights[ 2 ]= [[ 0.43270186  0.01772976]
 [-0.34216106  0.3776232 ]]
weights[ 3 ]= [ 0.87597066 -0.22343744]
MSE with the new weight

## 1.2 Para Learning rate = 0,5 

In [6]:
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from keras import backend as k
from keras import losses
import numpy as np
import tensorflow as tf
from sklearn.metrics import mean_squared_error

#Initialize all weights to match the values adopted in the example at
#https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ 
#Initialise kernel (weights matrix) to required value
def kernel_init_layer1(shape,dtype='float'):
    # In this simple example, assume that the shape is (2, 2) 
    # and create a fixed array with this dimension for layer 1
    kernel = np.array([[0.1, 0.3],[-0.4, 0.2]])
    return kernel 

#Initialise kernel (weights matrix) to required value
def kernel_init_layer2(shape,dtype='float'):
    # In this simple example, assume that the shape is (2, 2)
    # and create a fixed array with this dimension for layer 2
    kernel = np.array([[0.4, 0.2], [-0.4, 0.7]])
    return kernel 

#Initialise kernel (bias vector) to required value
def kernel_init_bias1(shape,dtype='float'):
    # In this simple example, assume that the shape is (1, 2)
    # and create a fixed array with this dimension for the 
    # bias vector of layer 1
    kernel = np.array([-0.4, 0.8]) #both neurons with the same value
    return kernel 

#Initialise kernel (bias vector) to required value
def kernel_init_bias2(shape,dtype='float'):
    # In this simple example, assume that the shape is (1, 2)
    # and create a fixed array with this dimension for the 
    # bias vector of layer 2
    kernel = np.array([0.8, 0.2]) #both neurons with the same value
    return kernel 

# from https://stackoverflow.com/questions/66221788/tf-gradients-is-not-supported-when-eager-execution-is-enabled-use-tf-gradientta
# TF 2 does not use "eager" execution, so disable it:
tf.compat.v1.disable_eager_execution()

#Define the neural network model with dense layers. Syntax:
#https://keras.io/api/layers/core_layers/dense/
#The sigmoid activation function in Keras is the standard logistic function 1/(1+exp(-x)).
#https://keras.io/api/layers/activations/
model = Sequential()
model.add(Dense(2, input_dim=2, use_bias=True,  bias_initializer=kernel_init_bias1,
        kernel_initializer=kernel_init_layer1, activation='sigmoid'))
model.add(Dense(2, use_bias=True,  bias_initializer=kernel_init_bias2, 
        kernel_initializer=kernel_init_layer2, activation='sigmoid'))
model.summary() # display the architecture

# We are not going to use a Keras optimizer. Define a learning rate:
learning_rate = 0.5  #you can change to 0.5 or any other reasonable value

#now compile the model, informing loss and performanc metrics that
#should be computed along the training
model.compile(loss='mse', metrics=['accuracy'])

# Begin TensorFlow
sess = tf.compat.v1.InteractiveSession()
sess.run(tf.compat.v1.initialize_all_variables())

model.optimizer.lr = learning_rate

print("===START TRAINING THE NETWORK:===")
steps = 3 # steps of gradient descent
for s in range(steps):
    print('Learning rate = ', k.get_value(model.optimizer.lr))
    print(" ############ Step = " + str(s) + " ############")
    print("1) ===BEFORE using any GRADIENT:===")
    #define input and target vectors, and also inform it to the loss function object
    #in this code we will keep the same inputs and targets along the steps, but the inputs and
    #targets can change when we go, for instance, through the training set
    inputs = np.array([[1.2,0]]) #define the input for the current iteration (step)
    outputs = model.predict(inputs) #forward pass
    #define the target vector for the current iteration (step)
    targets = np.array([[1, -1.3]]) #notice that a sigmoid can output within range [0, 1]
    mse = mean_squared_error(targets, outputs) #calculate MSE
    #initialize loss object to be later incorporated to the gradients object that
    #enables the calculation of the symbolic gradients
    loss = losses.mean_squared_error(targets, model.output)
    #  ===== Obtain symbolic gradient to calculate numerical gradients =====
    gradients = k.gradients(loss, model.trainable_weights) #inform loss and weights
    if False: #enable with True in case you want to see the objects
        print("List of tensors representing the symbolic gradients:")
        for i in range(len(gradients)):
            print('symbolic gradient[',i,']=',gradients[i])
    print('Network input [x1, x2]:\n', inputs)
    print('Network output [out_o1, out_o2]:\n', outputs)
    print("targets:\n", targets)
    #show weights at the beginning of iteration s
    print('weights at the beginning of this step')
    for i in range(len(model.trainable_weights)):
        #note that model.trainable_weights[i] is an object of
        # <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable
        #therefore (see e.g. https://stackoverflow.com/questions/33679382/how-do-i-get-the-current-value-of-a-variable )
        #you need to obtain its value via a TF session:
        print('weights[',i,']=',sess.run(model.trainable_weights[i]))
    print("MSE with the initial weights:", mse)

    print("2) ===After applying GRADIENT:===")
    print("------------- Results for step (iteration) =", s)
    # ===== Calculate numerical gradient from symbolic gradients =====
    # evaluated_gradients is a list, show its contents:
    evaluated_gradients = sess.run(gradients, feed_dict={model.input: inputs})
    print('Gradients g to be used in new_weights = current_weights - learning_rate*g')
    for i in range(len(evaluated_gradients)):
        print('gradients[',i,']=',evaluated_gradients[i])

    # Apply ("step down") the gradient for each layer, subtracting the gradients
    # from current weights scaled by the learning rate:
    for i in range(len(model.trainable_weights)):
        sess.run(tf.compat.v1.assign_sub(model.trainable_weights[i], learning_rate*evaluated_gradients[i]))

    #show weights after gradient propagation of iteration s
    print('weights after gradient propagation in this step')
    for i in range(len(model.trainable_weights)):
        print('weights[',i,']=',sess.run(model.trainable_weights[i]))

    # print the MSE with new weights:
    outputs = model.predict(inputs)
    mse = mean_squared_error(targets, outputs)
    print("MSE with the new weights:", mse)

#Collect and show final results
final_outputs = model.predict(inputs)
final_mse = mean_squared_error(targets, final_outputs)

print("\n ===AFTER executing all GRADIENT descent steps===")
#show weights at the end of training
for i in range(len(model.trainable_weights)):
    #note that model.trainable_weights[i] is an object of
    # <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable
    #therefore (see e.g. https://stackoverflow.com/questions/33679382/how-do-i-get-the-current-value-of-a-variable )
    #you need to obtain its value via a TF session:
    print('final weights[',i,']=',sess.run(model.trainable_weights[i]))
print("outputs:\n", final_outputs)
print("targets:\n", targets)
print("Final MSE = ", final_mse)

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_10 (Dense)            (None, 2)                 6         
                                                                 
 dense_11 (Dense)            (None, 2)                 6         
                                                                 
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________
===START TRAINING THE NETWORK:===
Learning rate =  0.5
 ############ Step = 0 ############
1) ===BEFORE using any GRADIENT:===


  updates=self.state_updates,


Network input [x1, x2]:
 [[1.2 0. ]]
Network output [out_o1, out_o2]:
 [[0.6609764 0.6940291]]
targets:
 [[ 1.  -1.3]]
weights at the beginning of this step
weights[ 0 ]= [[ 0.1  0.3]
 [-0.4  0.2]]
weights[ 1 ]= [-0.4  0.8]
weights[ 2 ]= [[ 0.4  0.2]
 [-0.4  0.7]]
weights[ 3 ]= [0.8 0.2]
MSE with the initial weights: 2.0455445087618966
2) ===After applying GRADIENT:===
------------- Results for step (iteration) = 0
Gradients g to be used in new_weights = current_weights - learning_rate*g
gradients[ 0 ]= [[0.01597462 0.07125631]
 [0.         0.        ]]
gradients[ 1 ]= [0.01331218 0.05938026]
gradients[ 2 ]= [[-0.03270185  0.18227024]
 [-0.05783894  0.3223768 ]]
gradients[ 3 ]= [-0.07597064  0.42343745]
weights after gradient propagation in this step
weights[ 0 ]= [[ 0.09201269  0.26437187]
 [-0.4         0.2       ]]
weights[ 1 ]= [-0.4066561   0.77030987]
weights[ 2 ]= [[ 0.41635093  0.10886488]
 [-0.37108055  0.53881156]]
weights[ 3 ]= [ 0.83798534 -0.01171872]
MSE with the new weig

# 2. Codigo de aplicação para a amostra_2 (-1.2,-1,-1,0.9)

## 2.1 Para Learning rate = 1

In [3]:
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from keras import backend as k
from keras import losses
import numpy as np
import tensorflow as tf
from sklearn.metrics import mean_squared_error

#Initialize all weights to match the values adopted in the example at
#https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ 
#Initialise kernel (weights matrix) to required value
def kernel_init_layer1(shape,dtype='float'):
    # In this simple example, assume that the shape is (2, 2) 
    # and create a fixed array with this dimension for layer 1
    kernel = np.array([[0.1, 0.3],[-0.4, 0.2]])
    return kernel 

#Initialise kernel (weights matrix) to required value
def kernel_init_layer2(shape,dtype='float'):
    # In this simple example, assume that the shape is (2, 2)
    # and create a fixed array with this dimension for layer 2
    kernel = np.array([[0.4, 0.2], [-0.4, 0.7]])
    return kernel 

#Initialise kernel (bias vector) to required value
def kernel_init_bias1(shape,dtype='float'):
    # In this simple example, assume that the shape is (1, 2)
    # and create a fixed array with this dimension for the 
    # bias vector of layer 1
    kernel = np.array([-0.4, 0.8]) #both neurons with the same value
    return kernel 

#Initialise kernel (bias vector) to required value
def kernel_init_bias2(shape,dtype='float'):
    # In this simple example, assume that the shape is (1, 2)
    # and create a fixed array with this dimension for the 
    # bias vector of layer 2
    kernel = np.array([0.8, 0.2]) #both neurons with the same value
    return kernel 

# from https://stackoverflow.com/questions/66221788/tf-gradients-is-not-supported-when-eager-execution-is-enabled-use-tf-gradientta
# TF 2 does not use "eager" execution, so disable it:
tf.compat.v1.disable_eager_execution()

#Define the neural network model with dense layers. Syntax:
#https://keras.io/api/layers/core_layers/dense/
#The sigmoid activation function in Keras is the standard logistic function 1/(1+exp(-x)).
#https://keras.io/api/layers/activations/
model = Sequential()
model.add(Dense(2, input_dim=2, use_bias=True,  bias_initializer=kernel_init_bias1,
        kernel_initializer=kernel_init_layer1, activation='sigmoid'))
model.add(Dense(2, use_bias=True,  bias_initializer=kernel_init_bias2, 
        kernel_initializer=kernel_init_layer2, activation='sigmoid'))
model.summary() # display the architecture

# We are not going to use a Keras optimizer. Define a learning rate:
learning_rate = 1  #you can change to 0.5 or any other reasonable value

#now compile the model, informing loss and performanc metrics that
#should be computed along the training
model.compile(loss='mse', metrics=['accuracy'])

# Begin TensorFlow
sess = tf.compat.v1.InteractiveSession()
sess.run(tf.compat.v1.initialize_all_variables())

model.optimizer.lr = learning_rate

print("===START TRAINING THE NETWORK:===")
steps = 3 # steps of gradient descent
for s in range(steps):
    print('Learning rate = ', k.get_value(model.optimizer.lr))
    print(" ############ Step = " + str(s) + " ############")
    print("1) ===BEFORE using any GRADIENT:===")
    #define input and target vectors, and also inform it to the loss function object
    #in this code we will keep the same inputs and targets along the steps, but the inputs and
    #targets can change when we go, for instance, through the training set
    inputs = np.array([[-1.2,-1]]) #define the input for the current iteration (step)
    outputs = model.predict(inputs) #forward pass
    #define the target vector for the current iteration (step)
    targets = np.array([[-1, 0.9]]) #notice that a sigmoid can output within range [0, 1]
    mse = mean_squared_error(targets, outputs) #calculate MSE
    #initialize loss object to be later incorporated to the gradients object that
    #enables the calculation of the symbolic gradients
    loss = losses.mean_squared_error(targets, model.output)
    #  ===== Obtain symbolic gradient to calculate numerical gradients =====
    gradients = k.gradients(loss, model.trainable_weights) #inform loss and weights
    if False: #enable with True in case you want to see the objects
        print("List of tensors representing the symbolic gradients:")
        for i in range(len(gradients)):
            print('symbolic gradient[',i,']=',gradients[i])
    print('Network input [x1, x2]:\n', inputs)
    print('Network output [out_o1, out_o2]:\n', outputs)
    print("targets:\n", targets)
    #show weights at the beginning of iteration s
    print('weights at the beginning of this step')
    for i in range(len(model.trainable_weights)):
        #note that model.trainable_weights[i] is an object of
        # <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable
        #therefore (see e.g. https://stackoverflow.com/questions/33679382/how-do-i-get-the-current-value-of-a-variable )
        #you need to obtain its value via a TF session:
        print('weights[',i,']=',sess.run(model.trainable_weights[i]))
    print("MSE with the initial weights:", mse)

    print("2) ===After applying GRADIENT:===")
    print("------------- Results for step (iteration) =", s)
    # ===== Calculate numerical gradient from symbolic gradients =====
    # evaluated_gradients is a list, show its contents:
    evaluated_gradients = sess.run(gradients, feed_dict={model.input: inputs})
    print('Gradients g to be used in new_weights = current_weights - learning_rate*g')
    for i in range(len(evaluated_gradients)):
        print('gradients[',i,']=',evaluated_gradients[i])

    # Apply ("step down") the gradient for each layer, subtracting the gradients
    # from current weights scaled by the learning rate:
    for i in range(len(model.trainable_weights)):
        sess.run(tf.compat.v1.assign_sub(model.trainable_weights[i], learning_rate*evaluated_gradients[i]))

    #show weights after gradient propagation of iteration s
    print('weights after gradient propagation in this step')
    for i in range(len(model.trainable_weights)):
        print('weights[',i,']=',sess.run(model.trainable_weights[i]))

    # print the MSE with new weights:
    outputs = model.predict(inputs)
    mse = mean_squared_error(targets, outputs)
    print("MSE with the new weights:", mse)

#Collect and show final results
final_outputs = model.predict(inputs)
final_mse = mean_squared_error(targets, final_outputs)

print("\n ===AFTER executing all GRADIENT descent steps===")
#show weights at the end of training
for i in range(len(model.trainable_weights)):
    #note that model.trainable_weights[i] is an object of
    # <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable
    #therefore (see e.g. https://stackoverflow.com/questions/33679382/how-do-i-get-the-current-value-of-a-variable )
    #you need to obtain its value via a TF session:
    print('final weights[',i,']=',sess.run(model.trainable_weights[i]))
print("outputs:\n", final_outputs)
print("targets:\n", targets)
print("Final MSE = ", final_mse)

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_4 (Dense)             (None, 2)                 6         
                                                                 
 dense_5 (Dense)             (None, 2)                 6         
                                                                 
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________
===START TRAINING THE NETWORK:===
Learning rate =  1
 ############ Step = 0 ############
1) ===BEFORE using any GRADIENT:===
Network input [x1, x2]:
 [[-1.2 -1. ]]
Network output [out_o1, out_o2]:
 [[0.6822494  0.66503346]]
targets:
 [[-1.   0.9]]
weights at the beginning of this step
weights[ 0 ]= [[ 0.1  0.3]
 [-0.4  0.2]]
weights[ 1 ]= [-0.4  0.8]


  updates=self.state_updates,


weights[ 2 ]= [[ 0.4  0.2]
 [-0.4  0.7]]
weights[ 3 ]= [0.8 0.2]
MSE with the initial weights: 1.442586204593552
2) ===After applying GRADIENT:===
------------- Results for step (iteration) = 0
Gradients g to be used in new_weights = current_weights - learning_rate*g
gradients[ 0 ]= [[-0.04047599  0.05397328]
 [-0.03372999  0.04497773]]
gradients[ 1 ]= [ 0.03372999 -0.04497773]
gradients[ 2 ]= [[ 0.17141584 -0.02460266]
 [ 0.20412011 -0.02929657]]
gradients[ 3 ]= [ 0.36468667 -0.05234207]
weights after gradient propagation in this step
weights[ 0 ]= [[ 0.14047599  0.24602672]
 [-0.36627     0.15502226]]
weights[ 1 ]= [-0.43373     0.84497774]
weights[ 2 ]= [[ 0.22858417  0.22460265]
 [-0.60412014  0.72929657]]
weights[ 3 ]= [0.43531334 0.25234208]
MSE with the new weights: 1.214150265512243
Learning rate =  1
 ############ Step = 1 ############
1) ===BEFORE using any GRADIENT:===
Network input [x1, x2]:
 [[-1.2 -1. ]]
Network output [out_o1, out_o2]:
 [[0.5437048 0.6872184]]
targets:
 

## 2.2 Para Learning rate = 0.5 amostra_2 -1.2,-1,-1,0.9

In [7]:
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from keras import backend as k
from keras import losses
import numpy as np
import tensorflow as tf
from sklearn.metrics import mean_squared_error

#Initialize all weights to match the values adopted in the example at
#https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ 
#Initialise kernel (weights matrix) to required value
def kernel_init_layer1(shape,dtype='float'):
    # In this simple example, assume that the shape is (2, 2) 
    # and create a fixed array with this dimension for layer 1
    kernel = np.array([[0.1, 0.3],[-0.4, 0.2]])
    return kernel 

#Initialise kernel (weights matrix) to required value
def kernel_init_layer2(shape,dtype='float'):
    # In this simple example, assume that the shape is (2, 2)
    # and create a fixed array with this dimension for layer 2
    kernel = np.array([[0.4, 0.2], [-0.4, 0.7]])
    return kernel 

#Initialise kernel (bias vector) to required value
def kernel_init_bias1(shape,dtype='float'):
    # In this simple example, assume that the shape is (1, 2)
    # and create a fixed array with this dimension for the 
    # bias vector of layer 1
    kernel = np.array([-0.4, 0.8]) #both neurons with the same value
    return kernel 

#Initialise kernel (bias vector) to required value
def kernel_init_bias2(shape,dtype='float'):
    # In this simple example, assume that the shape is (1, 2)
    # and create a fixed array with this dimension for the 
    # bias vector of layer 2
    kernel = np.array([0.8, 0.2]) #both neurons with the same value
    return kernel 

# from https://stackoverflow.com/questions/66221788/tf-gradients-is-not-supported-when-eager-execution-is-enabled-use-tf-gradientta
# TF 2 does not use "eager" execution, so disable it:
tf.compat.v1.disable_eager_execution()

#Define the neural network model with dense layers. Syntax:
#https://keras.io/api/layers/core_layers/dense/
#The sigmoid activation function in Keras is the standard logistic function 1/(1+exp(-x)).
#https://keras.io/api/layers/activations/
model = Sequential()
model.add(Dense(2, input_dim=2, use_bias=True,  bias_initializer=kernel_init_bias1,
        kernel_initializer=kernel_init_layer1, activation='sigmoid'))
model.add(Dense(2, use_bias=True,  bias_initializer=kernel_init_bias2, 
        kernel_initializer=kernel_init_layer2, activation='sigmoid'))
model.summary() # display the architecture

# We are not going to use a Keras optimizer. Define a learning rate:
learning_rate = 0.5  #you can change to 0.5 or any other reasonable value

#now compile the model, informing loss and performanc metrics that
#should be computed along the training
model.compile(loss='mse', metrics=['accuracy'])

# Begin TensorFlow
sess = tf.compat.v1.InteractiveSession()
sess.run(tf.compat.v1.initialize_all_variables())

model.optimizer.lr = learning_rate

print("===START TRAINING THE NETWORK:===")
steps = 3 # steps of gradient descent
for s in range(steps):
    print('Learning rate = ', k.get_value(model.optimizer.lr))
    print(" ############ Step = " + str(s) + " ############")
    print("1) ===BEFORE using any GRADIENT:===")
    #define input and target vectors, and also inform it to the loss function object
    #in this code we will keep the same inputs and targets along the steps, but the inputs and
    #targets can change when we go, for instance, through the training set
    inputs = np.array([[-1.2,-1]]) #define the input for the current iteration (step)
    outputs = model.predict(inputs) #forward pass
    #define the target vector for the current iteration (step)
    targets = np.array([[-1, 0.9]]) #notice that a sigmoid can output within range [0, 1]
    mse = mean_squared_error(targets, outputs) #calculate MSE
    #initialize loss object to be later incorporated to the gradients object that
    #enables the calculation of the symbolic gradients
    loss = losses.mean_squared_error(targets, model.output)
    #  ===== Obtain symbolic gradient to calculate numerical gradients =====
    gradients = k.gradients(loss, model.trainable_weights) #inform loss and weights
    if False: #enable with True in case you want to see the objects
        print("List of tensors representing the symbolic gradients:")
        for i in range(len(gradients)):
            print('symbolic gradient[',i,']=',gradients[i])
    print('Network input [x1, x2]:\n', inputs)
    print('Network output [out_o1, out_o2]:\n', outputs)
    print("targets:\n", targets)
    #show weights at the beginning of iteration s
    print('weights at the beginning of this step')
    for i in range(len(model.trainable_weights)):
        #note that model.trainable_weights[i] is an object of
        # <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable
        #therefore (see e.g. https://stackoverflow.com/questions/33679382/how-do-i-get-the-current-value-of-a-variable )
        #you need to obtain its value via a TF session:
        print('weights[',i,']=',sess.run(model.trainable_weights[i]))
    print("MSE with the initial weights:", mse)

    print("2) ===After applying GRADIENT:===")
    print("------------- Results for step (iteration) =", s)
    # ===== Calculate numerical gradient from symbolic gradients =====
    # evaluated_gradients is a list, show its contents:
    evaluated_gradients = sess.run(gradients, feed_dict={model.input: inputs})
    print('Gradients g to be used in new_weights = current_weights - learning_rate*g')
    for i in range(len(evaluated_gradients)):
        print('gradients[',i,']=',evaluated_gradients[i])

    # Apply ("step down") the gradient for each layer, subtracting the gradients
    # from current weights scaled by the learning rate:
    for i in range(len(model.trainable_weights)):
        sess.run(tf.compat.v1.assign_sub(model.trainable_weights[i], learning_rate*evaluated_gradients[i]))

    #show weights after gradient propagation of iteration s
    print('weights after gradient propagation in this step')
    for i in range(len(model.trainable_weights)):
        print('weights[',i,']=',sess.run(model.trainable_weights[i]))

    # print the MSE with new weights:
    outputs = model.predict(inputs)
    mse = mean_squared_error(targets, outputs)
    print("MSE with the new weights:", mse)

#Collect and show final results
final_outputs = model.predict(inputs)
final_mse = mean_squared_error(targets, final_outputs)

print("\n ===AFTER executing all GRADIENT descent steps===")
#show weights at the end of training
for i in range(len(model.trainable_weights)):
    #note that model.trainable_weights[i] is an object of
    # <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable
    #therefore (see e.g. https://stackoverflow.com/questions/33679382/how-do-i-get-the-current-value-of-a-variable )
    #you need to obtain its value via a TF session:
    print('final weights[',i,']=',sess.run(model.trainable_weights[i]))
print("outputs:\n", final_outputs)
print("targets:\n", targets)
print("Final MSE = ", final_mse)

Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_12 (Dense)            (None, 2)                 6         
                                                                 
 dense_13 (Dense)            (None, 2)                 6         
                                                                 
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________
===START TRAINING THE NETWORK:===
Learning rate =  0.5
 ############ Step = 0 ############
1) ===BEFORE using any GRADIENT:===


  updates=self.state_updates,


Network input [x1, x2]:
 [[-1.2 -1. ]]
Network output [out_o1, out_o2]:
 [[0.6822494  0.66503346]]
targets:
 [[-1.   0.9]]
weights at the beginning of this step
weights[ 0 ]= [[ 0.1  0.3]
 [-0.4  0.2]]
weights[ 1 ]= [-0.4  0.8]
weights[ 2 ]= [[ 0.4  0.2]
 [-0.4  0.7]]
weights[ 3 ]= [0.8 0.2]
MSE with the initial weights: 1.442586204593552
2) ===After applying GRADIENT:===
------------- Results for step (iteration) = 0
Gradients g to be used in new_weights = current_weights - learning_rate*g
gradients[ 0 ]= [[-0.04047599  0.05397328]
 [-0.03372999  0.04497773]]
gradients[ 1 ]= [ 0.03372999 -0.04497773]
gradients[ 2 ]= [[ 0.17141584 -0.02460266]
 [ 0.20412011 -0.02929657]]
gradients[ 3 ]= [ 0.36468667 -0.05234207]
weights after gradient propagation in this step
weights[ 0 ]= [[ 0.120238    0.27301338]
 [-0.38313502  0.17751114]]
weights[ 1 ]= [-0.416865   0.8224889]
weights[ 2 ]= [[ 0.31429207  0.21230133]
 [-0.50206006  0.71464825]]
weights[ 3 ]= [0.6176567  0.22617105]
MSE with the new

# 3. Codigo de aplicação para a amostra_3 (1.2,0,0,0.9)

## 3.1 Para Learning rate = 1

In [4]:
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from keras import backend as k
from keras import losses
import numpy as np
import tensorflow as tf
from sklearn.metrics import mean_squared_error

#Initialize all weights to match the values adopted in the example at
#https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ 
#Initialise kernel (weights matrix) to required value
def kernel_init_layer1(shape,dtype='float'):
    # In this simple example, assume that the shape is (2, 2) 
    # and create a fixed array with this dimension for layer 1
    kernel = np.array([[0.1, 0.3],[-0.4, 0.2]])
    return kernel 

#Initialise kernel (weights matrix) to required value
def kernel_init_layer2(shape,dtype='float'):
    # In this simple example, assume that the shape is (2, 2)
    # and create a fixed array with this dimension for layer 2
    kernel = np.array([[0.4, 0.2], [-0.4, 0.7]])
    return kernel 

#Initialise kernel (bias vector) to required value
def kernel_init_bias1(shape,dtype='float'):
    # In this simple example, assume that the shape is (1, 2)
    # and create a fixed array with this dimension for the 
    # bias vector of layer 1
    kernel = np.array([-0.4, 0.8]) #both neurons with the same value
    return kernel 

#Initialise kernel (bias vector) to required value
def kernel_init_bias2(shape,dtype='float'):
    # In this simple example, assume that the shape is (1, 2)
    # and create a fixed array with this dimension for the 
    # bias vector of layer 2
    kernel = np.array([0.8, 0.2]) #both neurons with the same value
    return kernel 

# from https://stackoverflow.com/questions/66221788/tf-gradients-is-not-supported-when-eager-execution-is-enabled-use-tf-gradientta
# TF 2 does not use "eager" execution, so disable it:
tf.compat.v1.disable_eager_execution()

#Define the neural network model with dense layers. Syntax:
#https://keras.io/api/layers/core_layers/dense/
#The sigmoid activation function in Keras is the standard logistic function 1/(1+exp(-x)).
#https://keras.io/api/layers/activations/
model = Sequential()
model.add(Dense(2, input_dim=2, use_bias=True,  bias_initializer=kernel_init_bias1,
        kernel_initializer=kernel_init_layer1, activation='sigmoid'))
model.add(Dense(2, use_bias=True,  bias_initializer=kernel_init_bias2, 
        kernel_initializer=kernel_init_layer2, activation='sigmoid'))
model.summary() # display the architecture

# We are not going to use a Keras optimizer. Define a learning rate:
learning_rate = 1  #you can change to 0.5 or any other reasonable value

#now compile the model, informing loss and performanc metrics that
#should be computed along the training
model.compile(loss='mse', metrics=['accuracy'])

# Begin TensorFlow
sess = tf.compat.v1.InteractiveSession()
sess.run(tf.compat.v1.initialize_all_variables())

model.optimizer.lr = learning_rate

print("===START TRAINING THE NETWORK:===")
steps = 3 # steps of gradient descent
for s in range(steps):
    print('Learning rate = ', k.get_value(model.optimizer.lr))
    print(" ############ Step = " + str(s) + " ############")
    print("1) ===BEFORE using any GRADIENT:===")
    #define input and target vectors, and also inform it to the loss function object
    #in this code we will keep the same inputs and targets along the steps, but the inputs and
    #targets can change when we go, for instance, through the training set
    inputs = np.array([[1.2,0]]) #define the input for the current iteration (step)
    outputs = model.predict(inputs) #forward pass
    #define the target vector for the current iteration (step)
    targets = np.array([[0, 0.9]]) #notice that a sigmoid can output within range [0, 1]
    mse = mean_squared_error(targets, outputs) #calculate MSE
    #initialize loss object to be later incorporated to the gradients object that
    #enables the calculation of the symbolic gradients
    loss = losses.mean_squared_error(targets, model.output)
    #  ===== Obtain symbolic gradient to calculate numerical gradients =====
    gradients = k.gradients(loss, model.trainable_weights) #inform loss and weights
    if False: #enable with True in case you want to see the objects
        print("List of tensors representing the symbolic gradients:")
        for i in range(len(gradients)):
            print('symbolic gradient[',i,']=',gradients[i])
    print('Network input [x1, x2]:\n', inputs)
    print('Network output [out_o1, out_o2]:\n', outputs)
    print("targets:\n", targets)
    #show weights at the beginning of iteration s
    print('weights at the beginning of this step')
    for i in range(len(model.trainable_weights)):
        #note that model.trainable_weights[i] is an object of
        # <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable
        #therefore (see e.g. https://stackoverflow.com/questions/33679382/how-do-i-get-the-current-value-of-a-variable )
        #you need to obtain its value via a TF session:
        print('weights[',i,']=',sess.run(model.trainable_weights[i]))
    print("MSE with the initial weights:", mse)

    print("2) ===After applying GRADIENT:===")
    print("------------- Results for step (iteration) =", s)
    # ===== Calculate numerical gradient from symbolic gradients =====
    # evaluated_gradients is a list, show its contents:
    evaluated_gradients = sess.run(gradients, feed_dict={model.input: inputs})
    print('Gradients g to be used in new_weights = current_weights - learning_rate*g')
    for i in range(len(evaluated_gradients)):
        print('gradients[',i,']=',evaluated_gradients[i])

    # Apply ("step down") the gradient for each layer, subtracting the gradients
    # from current weights scaled by the learning rate:
    for i in range(len(model.trainable_weights)):
        sess.run(tf.compat.v1.assign_sub(model.trainable_weights[i], learning_rate*evaluated_gradients[i]))

    #show weights after gradient propagation of iteration s
    print('weights after gradient propagation in this step')
    for i in range(len(model.trainable_weights)):
        print('weights[',i,']=',sess.run(model.trainable_weights[i]))

    # print the MSE with new weights:
    outputs = model.predict(inputs)
    mse = mean_squared_error(targets, outputs)
    print("MSE with the new weights:", mse)

#Collect and show final results
final_outputs = model.predict(inputs)
final_mse = mean_squared_error(targets, final_outputs)

print("\n ===AFTER executing all GRADIENT descent steps===")
#show weights at the end of training
for i in range(len(model.trainable_weights)):
    #note that model.trainable_weights[i] is an object of
    # <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable
    #therefore (see e.g. https://stackoverflow.com/questions/33679382/how-do-i-get-the-current-value-of-a-variable )
    #you need to obtain its value via a TF session:
    print('final weights[',i,']=',sess.run(model.trainable_weights[i]))
print("outputs:\n", final_outputs)
print("targets:\n", targets)
print("Final MSE = ", final_mse)

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_6 (Dense)             (None, 2)                 6         
                                                                 
 dense_7 (Dense)             (None, 2)                 6         
                                                                 
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________
===START TRAINING THE NETWORK:===
Learning rate =  1
 ############ Step = 0 ############
1) ===BEFORE using any GRADIENT:===


  updates=self.state_updates,


Network input [x1, x2]:
 [[1.2 0. ]]
Network output [out_o1, out_o2]:
 [[0.6609764 0.6940291]]
targets:
 [[0.  0.9]]
weights at the beginning of this step
weights[ 0 ]= [[ 0.1  0.3]
 [-0.4  0.2]]
weights[ 1 ]= [-0.4  0.8]
weights[ 2 ]= [[ 0.4  0.2]
 [-0.4  0.7]]
weights[ 3 ]= [0.8 0.2]
MSE with the initial weights: 0.23965691453887417
2) ===After applying GRADIENT:===
------------- Results for step (iteration) = 0
Gradients g to be used in new_weights = current_weights - learning_rate*g
gradients[ 0 ]= [[ 0.01485651 -0.01959436]
 [ 0.         -0.        ]]
gradients[ 1 ]= [ 0.01238043 -0.01632863]
gradients[ 2 ]= [[ 0.06375708 -0.01882739]
 [ 0.11276554 -0.03329954]]
gradients[ 3 ]= [ 0.14811596 -0.04373848]
weights after gradient propagation in this step
weights[ 0 ]= [[ 0.08514349  0.31959438]
 [-0.4         0.2       ]]
weights[ 1 ]= [-0.41238043  0.81632864]
weights[ 2 ]= [[ 0.3362429  0.2188274]
 [-0.5127655  0.7332995]]
weights[ 3 ]= [0.6518841  0.24373847]
MSE with the new weigh

## 3.1 Para Learning rate = 1=0.5

In [8]:
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from keras import backend as k
from keras import losses
import numpy as np
import tensorflow as tf
from sklearn.metrics import mean_squared_error

#Initialize all weights to match the values adopted in the example at
#https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ 
#Initialise kernel (weights matrix) to required value
def kernel_init_layer1(shape,dtype='float'):
    # In this simple example, assume that the shape is (2, 2) 
    # and create a fixed array with this dimension for layer 1
    kernel = np.array([[0.1, 0.3],[-0.4, 0.2]])
    return kernel 

#Initialise kernel (weights matrix) to required value
def kernel_init_layer2(shape,dtype='float'):
    # In this simple example, assume that the shape is (2, 2)
    # and create a fixed array with this dimension for layer 2
    kernel = np.array([[0.4, 0.2], [-0.4, 0.7]])
    return kernel 

#Initialise kernel (bias vector) to required value
def kernel_init_bias1(shape,dtype='float'):
    # In this simple example, assume that the shape is (1, 2)
    # and create a fixed array with this dimension for the 
    # bias vector of layer 1
    kernel = np.array([-0.4, 0.8]) #both neurons with the same value
    return kernel 

#Initialise kernel (bias vector) to required value
def kernel_init_bias2(shape,dtype='float'):
    # In this simple example, assume that the shape is (1, 2)
    # and create a fixed array with this dimension for the 
    # bias vector of layer 2
    kernel = np.array([0.8, 0.2]) #both neurons with the same value
    return kernel 

# from https://stackoverflow.com/questions/66221788/tf-gradients-is-not-supported-when-eager-execution-is-enabled-use-tf-gradientta
# TF 2 does not use "eager" execution, so disable it:
tf.compat.v1.disable_eager_execution()

#Define the neural network model with dense layers. Syntax:
#https://keras.io/api/layers/core_layers/dense/
#The sigmoid activation function in Keras is the standard logistic function 1/(1+exp(-x)).
#https://keras.io/api/layers/activations/
model = Sequential()
model.add(Dense(2, input_dim=2, use_bias=True,  bias_initializer=kernel_init_bias1,
        kernel_initializer=kernel_init_layer1, activation='sigmoid'))
model.add(Dense(2, use_bias=True,  bias_initializer=kernel_init_bias2, 
        kernel_initializer=kernel_init_layer2, activation='sigmoid'))
model.summary() # display the architecture

# We are not going to use a Keras optimizer. Define a learning rate:
learning_rate = 0.5  #you can change to 0.5 or any other reasonable value

#now compile the model, informing loss and performanc metrics that
#should be computed along the training
model.compile(loss='mse', metrics=['accuracy'])

# Begin TensorFlow
sess = tf.compat.v1.InteractiveSession()
sess.run(tf.compat.v1.initialize_all_variables())

model.optimizer.lr = learning_rate

print("===START TRAINING THE NETWORK:===")
steps = 3 # steps of gradient descent
for s in range(steps):
    print('Learning rate = ', k.get_value(model.optimizer.lr))
    print(" ############ Step = " + str(s) + " ############")
    print("1) ===BEFORE using any GRADIENT:===")
    #define input and target vectors, and also inform it to the loss function object
    #in this code we will keep the same inputs and targets along the steps, but the inputs and
    #targets can change when we go, for instance, through the training set
    inputs = np.array([[1.2,0]]) #define the input for the current iteration (step)
    outputs = model.predict(inputs) #forward pass
    #define the target vector for the current iteration (step)
    targets = np.array([[0, 0.9]]) #notice that a sigmoid can output within range [0, 1]
    mse = mean_squared_error(targets, outputs) #calculate MSE
    #initialize loss object to be later incorporated to the gradients object that
    #enables the calculation of the symbolic gradients
    loss = losses.mean_squared_error(targets, model.output)
    #  ===== Obtain symbolic gradient to calculate numerical gradients =====
    gradients = k.gradients(loss, model.trainable_weights) #inform loss and weights
    if False: #enable with True in case you want to see the objects
        print("List of tensors representing the symbolic gradients:")
        for i in range(len(gradients)):
            print('symbolic gradient[',i,']=',gradients[i])
    print('Network input [x1, x2]:\n', inputs)
    print('Network output [out_o1, out_o2]:\n', outputs)
    print("targets:\n", targets)
    #show weights at the beginning of iteration s
    print('weights at the beginning of this step')
    for i in range(len(model.trainable_weights)):
        #note that model.trainable_weights[i] is an object of
        # <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable
        #therefore (see e.g. https://stackoverflow.com/questions/33679382/how-do-i-get-the-current-value-of-a-variable )
        #you need to obtain its value via a TF session:
        print('weights[',i,']=',sess.run(model.trainable_weights[i]))
    print("MSE with the initial weights:", mse)

    print("2) ===After applying GRADIENT:===")
    print("------------- Results for step (iteration) =", s)
    # ===== Calculate numerical gradient from symbolic gradients =====
    # evaluated_gradients is a list, show its contents:
    evaluated_gradients = sess.run(gradients, feed_dict={model.input: inputs})
    print('Gradients g to be used in new_weights = current_weights - learning_rate*g')
    for i in range(len(evaluated_gradients)):
        print('gradients[',i,']=',evaluated_gradients[i])

    # Apply ("step down") the gradient for each layer, subtracting the gradients
    # from current weights scaled by the learning rate:
    for i in range(len(model.trainable_weights)):
        sess.run(tf.compat.v1.assign_sub(model.trainable_weights[i], learning_rate*evaluated_gradients[i]))

    #show weights after gradient propagation of iteration s
    print('weights after gradient propagation in this step')
    for i in range(len(model.trainable_weights)):
        print('weights[',i,']=',sess.run(model.trainable_weights[i]))

    # print the MSE with new weights:
    outputs = model.predict(inputs)
    mse = mean_squared_error(targets, outputs)
    print("MSE with the new weights:", mse)

#Collect and show final results
final_outputs = model.predict(inputs)
final_mse = mean_squared_error(targets, final_outputs)

print("\n ===AFTER executing all GRADIENT descent steps===")
#show weights at the end of training
for i in range(len(model.trainable_weights)):
    #note that model.trainable_weights[i] is an object of
    # <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable
    #therefore (see e.g. https://stackoverflow.com/questions/33679382/how-do-i-get-the-current-value-of-a-variable )
    #you need to obtain its value via a TF session:
    print('final weights[',i,']=',sess.run(model.trainable_weights[i]))
print("outputs:\n", final_outputs)
print("targets:\n", targets)
print("Final MSE = ", final_mse)

Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_14 (Dense)            (None, 2)                 6         
                                                                 
 dense_15 (Dense)            (None, 2)                 6         
                                                                 
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________
===START TRAINING THE NETWORK:===


  updates=self.state_updates,


Learning rate =  0.5
 ############ Step = 0 ############
1) ===BEFORE using any GRADIENT:===
Network input [x1, x2]:
 [[1.2 0. ]]
Network output [out_o1, out_o2]:
 [[0.6609764 0.6940291]]
targets:
 [[0.  0.9]]
weights at the beginning of this step
weights[ 0 ]= [[ 0.1  0.3]
 [-0.4  0.2]]
weights[ 1 ]= [-0.4  0.8]
weights[ 2 ]= [[ 0.4  0.2]
 [-0.4  0.7]]
weights[ 3 ]= [0.8 0.2]
MSE with the initial weights: 0.23965691453887417
2) ===After applying GRADIENT:===
------------- Results for step (iteration) = 0
Gradients g to be used in new_weights = current_weights - learning_rate*g
gradients[ 0 ]= [[ 0.01485651 -0.01959436]
 [ 0.         -0.        ]]
gradients[ 1 ]= [ 0.01238043 -0.01632863]
gradients[ 2 ]= [[ 0.06375708 -0.01882739]
 [ 0.11276554 -0.03329954]]
gradients[ 3 ]= [ 0.14811596 -0.04373848]
weights after gradient propagation in this step
weights[ 0 ]= [[ 0.09257174  0.3097972 ]
 [-0.4         0.2       ]]
weights[ 1 ]= [-0.40619022  0.8081643 ]
weights[ 2 ]= [[ 0.36812147  0.2