<a href="https://colab.research.google.com/github/hsarfraz/Tiny-Machine-Learning/blob/main/0_5_weights_and_bias_in_neural_network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

In this notebook I will be revisitng the single-layer neural network that I created in notebook 0.3

I will now add a additional layer to this neural network (making it a two-layered network) and will be adding a additional neuron to the first layer.

In [1]:
### IMPORTING LIBRARIES ###
import sys
import numpy as np
import tensorflow as tf
from tensorflow import keras

### MAKING SURE THAT USER HAS TENSORFLOW 2 AND PYTHON 3 ###

# This script requires TensorFlow 2 and Python 3.
if tf.__version__.split('.')[0] != '2':
    raise Exception((f"The script is developed and tested for tensorflow 2. "
                     f"Current version: {tf.__version__}"))

if sys.version_info.major < 3:
    raise Exception((f"The script is developed and tested for Python 3. "
                     f"Current version: {sys.version_info.major}"))

# Defining Key Concepts

Before I write the code of the single and double-layer neural networks I am going to define sone essential concepts/terms that will help in understanding how a neural net works.

## Neurons

Neurons are the basic units of a artificial neural network (ANN) or simulated neural network (SNN). Each neuron is connected to some/all of the neurons in the next layer. When inputs are transferred between neurons the weights are applied to the inputs along with the bias.

## Weights

Weights contol the signal (the connection strength) between two neurons. In other words, a weight decides how much influence the input would have on the neuron output.

## Bias

Biases are constant and always set to be 1 (this value can be changed). They are a additional input to the hidden and output layers but are not influenced by any layers behind them (they do not have any connections with the neurons in the previous layers). Biases are essentially constants associated with one neuron and their purpose is to ensure that when all the inputs, of the neuron, are zero that the neuron will still be activated.

Biases are added to each individual neuron. I have included a illustration that shows how biases are added to each neuron in a 3-layered neural network:

![illustration of how biases are added to each neuron in a neural net](images/0.5_bias_in_each_layer.jpg)

## Linear Transformation

Every neuron performs a linear transformation of its input using weights and biases. The linear transformation model is a equation of a straight line is slope-intercept form that looks like this:
$$
y= (weight*x) +bias
$$

It is important to ensure that a linear transformation is not the only thing that is used in each neuron because all layers in the neural network will behave in the same way since the composition of two linear functions is a linear function. A neural network will not be able to learn any complex task if linear transformations are only used in each neuron without anything else (such as activation functions).

## Activation Functions

**NOTE:** In this notebook, the neural networks that are created do not have activation functions which means that the linear transformation is only used in each neuron. I will be utlizing activation functions in future notebooks but wanted to define the concept here.

Activation functions are a additional step to each layer and run after the linear transformation, of each neuron from the previous layer, occurs. An activation function decides whether a neuron should be activated ("fired"). In other words, deciding whether sending the neuron's input to the next layer of the neural network is important.

There are many types of activation functions, some of them are:

*  Binary Step Activation Functions
*  Linear Activation Functions
*  Sigmoid Activation Functions
*  ReLU Activation Functions
*  Softmax Activation Functions

The image below illustrates how activation functions work. As you can see, the primary role of the activation function is to transform the summed weighted input from the neurons, in the previous layer, into a ouput value that can be fed into the next hidden layer or be used as final the neural networks final output.

![illustration of activation functions](images/0.5_activation_function.jpg)


# Retraining the single layer network

I am re-training the original single layer network that was created in notebook 0.3 and will display the ML model prediction when x =10. I will also display the learned weights of the single layer network.

## Detailed Explanation of the Code  

I have included a detailed explanation about the code used to build the neural net in [notebook 0.3](https://github.com/hsarfraz/Tiny-Machine-Learning/blob/main/0_3_Neural_Network_Basics.ipynb).

In [2]:
### defining my first layer ###
# `.Dense` defines the layer type. A dense layer means that the neurons will get their source of input data from all the other neurons in the previous layer of the network
my_layer = keras.layers.Dense(units=1, #`units=1` defines how many neurons will be included in the layer
                              input_shape=[1] #`input_shape=[1]` has to be defined in the first layer. The input shape is 1 since the neural net is trained on single x's to predict single y's
                              )

### defining the model ###
# `.Sequential` defines the neural network. A sequential neural network tells the neural net to operate from input to output, passing through each neural layers, one after the other
model_1 = tf.keras.Sequential([my_layer])

### compiling the model ###
# Model compilation is performed after defining the neural net model and before model training starts. Model training cannot occur without model compilation
model_1.compile(optimizer='sgd', #`optimizer='sgd'` is the stohastic gradient descent (sgd) optimizer which uses the gradient to descend down the loss curve and reach the minimum.
                loss='mean_squared_error' #`loss='mean_squared_error'` a loss function from Tensorflow which computes the squared distance between the true and predicted inputs and calculates the mean of those values.
                )

### defining the exact inputs and outputs
xs = np.array([-1.0,  0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)

### model fitting ###
# Model Fitting is the process of training the neural network to figure out the relationship between X and Y
#`epoch=500` defines the amount of training rounds made for the neural metwork/model.
model_1.fit(xs, ys, epochs=500, verbose=0)

<keras.src.callbacks.History at 0x7d6605923460>

## Printing the model prediction at x=10 and the model weights

In [3]:

print(f"The model predicts that when x=10 the y output is: {model_1.predict([10.0])[0][0]}")
print()
print('Printing the weights and biases of the only layer in the model')
print(my_layer.get_weights())

The model predicts that when x=10 the y output is: 18.98309898376465

Printing the weights and biases of the only layer in the model
[array([[1.9975504]], dtype=float32), array([-0.9924054], dtype=float32)]


## Printing the model summary

The model summary tells us the neural network type and the type of each layer

In [4]:
print(model_1.summary())

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 1)                 2         
                                                                 
Total params: 2 (8.00 Byte)
Trainable params: 2 (8.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None


# Training the 2-layer network and seeing its prediction and weights.

In [5]:
my_layer_1 = keras.layers.Dense(units=2, input_shape=[1]) #the first layer
my_layer_2 = keras.layers.Dense(units=1) #the second layer
model_2 = tf.keras.Sequential([my_layer_1, my_layer_2]) #defining the model
model_2.compile(optimizer='sgd', loss='mean_squared_error') #compiling the model

xs = np.array([-1.0,  0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)

model_2.fit(xs, ys, epochs=500, verbose=0) #model fitting

<keras.src.callbacks.History at 0x7d6606097130>

## Printing the model prediction at x=10 and the model weights of each layer

In [6]:
print(f"The model predicts that when x=10 the y output is: {model_2.predict([10.0])[0][0]}")
print()
print('Printing the weights and biases in the first layer')
print(my_layer_1.get_weights())
print()
print('Printing the weights and biases in the second layer')
print(my_layer_2.get_weights())

The model predicts that when x=10 the y output is: 18.999996185302734

Printing the weights and biases in the first layer
[array([[-0.29593694, -1.430146  ]], dtype=float32), array([0.08875442, 0.5752448 ], dtype=float32)]

Printing the weights and biases in the second layer
[array([[-0.5179822],
       [-1.2912734]], dtype=float32), array([-0.21122728], dtype=float32)]


## Printing the model summary

In [7]:
print(model_2.summary())

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_1 (Dense)             (None, 2)                 4         
                                                                 
 dense_2 (Dense)             (None, 1)                 3         
                                                                 
Total params: 7 (28.00 Byte)
Trainable params: 7 (28.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None


# Manually computing the output for the 2-layer network

To better understand how the output is calculated in a two layered network I am manually calculating the ouput. Here is the formula (**Note:** all of the weights and biases are from layer 2):

$$
y = (w_1 * x_1) + (w_2 * x_2) + b
$$

In [8]:
value_to_predict = 10.0

layer1_w1 = (my_layer_1.get_weights()[0][0][0])
layer1_w2 = (my_layer_1.get_weights()[0][0][1])
layer1_b1 = (my_layer_1.get_weights()[1][0])
layer1_b2 = (my_layer_1.get_weights()[1][1])


layer2_w1 = (my_layer_2.get_weights()[0][0])
layer2_w2 = (my_layer_2.get_weights()[0][1])
layer2_b = (my_layer_2.get_weights()[1][0])

neuron1_output = (layer1_w1 * value_to_predict) + layer1_b1
neuron2_output = (layer1_w2 * value_to_predict) + layer1_b2

neuron3_output = (layer2_w1 * neuron1_output) + (layer2_w2 * neuron2_output) + layer2_b

print(neuron1_output)
print(neuron2_output)
print(neuron3_output[0])

-2.8706149980425835
-13.72621500492096
18.999996
