<a href="https://colab.research.google.com/github/pavanraja753/Advanced-Topics-in-Artificial-Intelligence/blob/main/Assignment_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Annotate code

Consider a fully connected network constructed using the `__init__` method given below.

Summary of Feed Forward computation:

- $z^{l} = w^{l}a^{l-1} + b^{l}$
- $a^{l} = σ(z_{l})$ 

We assumed Sigmoid Non Linearity in our example




In [None]:
import numpy as np

class Network(object):
    def __init__(self, sizes):
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x) 
                        for x, y in zip(sizes[:-1], sizes[1:])]

    def feedforward(self, a):
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a)+b)
        return a



- Comment every code line in `backprop` with the analytical expression that line evaluates in computing $∇C$

- Schematically apply `backprop` on a network constructed as `net = Network([784, 30, 10])`


# Back Propagation Equations

Summary: The equations of backpropagation

\begin{align}
\delta^L = ∇_{a}C \odot σ ^{'}(z^{L}) \tag{1}
\end{align}

\begin{align}
\delta^l = (W^{l+1})^{T}\delta^{l+1} \odot σ ^{'}(z^{l}) \tag{2} 
\end{align}

Using the error terms we compted the gradients with respect to the Weights and 
Biases in each layer


\begin{align}
\frac{∂C}{∂b^l} = \delta^{l} \tag{3}
\end{align}

\begin{align}
\frac{∂C}{∂w^l} = \delta^{l} (a^{l-1})^{T} \tag{4}
\end{align}

In [1]:
def backprop(self, x, y):
    """Return a tuple "(nabla_b, nabla_w)" representing the
    gradient for the cost function C_x.  "nabla_b" and
    "nabla_w" are layer-by-layer lists of numpy arrays, similar
    to "self.biases" and "self.weights"."""
    nabla_b = [np.zeros(b.shape) for b in self.biases]  # Since gradient is computed for each parameter, we are 
                                                        #initializing the gradients of each bias parameter to be zero 
                                                        #with the same number of entries as biases   
    nabla_w = [np.zeros(w.shape) for w in self.weights] # Since gradient is computed for each parameter, we are 
                                                        #initializing the gradients of each weight parameters to be zero 
                                                        #with the same number of entries as weight
    # feedforward
    activation = x                                      # Activation values "a" for the first layer is equal to input itself 
    activations = [x]                                   # list to store all the activations, layer by layer
                                                        # We need to store all the intermediate activations to compute the backpropgation updates
    zs = []                                             # list to store all the z vectors, layer by layer
                                                        # all the intermediate Linear transformation values are also required for the backpropagation steps

    for b, w in zip(self.biases, self.weights):         # Iterating over all the layers to compute the feedforward step in neural network in a recursive way.         
        z = np.dot(w, activation)+b                     # computation of linear tranformation function z = Wx+b. where x is the activation map from the previous layers
        zs.append(z)                                    # Storing the computations in a list to use it in the backpropagation step. In particualy we need these values to 
                                                        # compuate the derivative of sigmoid at these values. Equation 1 and 2 from the summary of backpropagation equations
        activation = sigmoid(z)                         # Applying sigmoid non-lineariy function f(x) = 1/(1 + exp(-x))
        activations.append(activation)                  # Storing the intermediate activation maps for the bavkpropagation steps. Equation 4 requires these values to compute 
                                                        # the partial derivate of Loss with respect to weights. 
    # backward pass
    delta = (activations[-1] - y) * sigmoid_prime(zs[-1])  # This step computes the derivative of Loss with respect to "z variables" in the last layer of neural network. 
                                                           # Since we assumed to use squared loss function (a_i-y_i)^2 / 2, derivative of Loss with respect to output activation 
                                                           # values is (a_i - y_i) and the derivative of Loss with respect to "z variables" is product of 
                                                           #derivative of Loss with respect to "output activation" * derivative of sigmoid function computed at the z variable 
                                                           # Detailed computation of this step is provided in the below section of this code 
    nabla_b[-1] = delta                                    # Using Equation 3, we are computng the gradient with to "b variables" in the last layer of neural network. 
    nabla_w[-1] = np.dot(delta, activations[-2].transpose()) #Using Equation 4, we are computng the gradient with to "w variables" in the last layer of neural network.
    for l in xrange(2, self.num_layers):               # Now, we are applying the recursive backpropagation step. Since we computed the delta values for the last layer, we use this last layer delta values and 
                                                       # Compute the delta for the previous layers. 
        z = zs[-l]                                     # From Equation 2, inorder to apply compte delta values recursively, we need to computed the derivative of sigmoid at the "z variables"
        sp = sigmoid_prime(z)                          # 3rd term in Equation 2 RHS, requires the derivative of sigmoid at the "z variables"
        delta = np.dot(self.weights[-l+1].transpose(), delta) * sp  # Computing the delta values recursively using equation 2
        nabla_b[-l] = delta                            # Using Equation 3, we are computng the gradient with to "b variables" in the current layer of neural network.
        nabla_w[-l] = np.dot(delta, activations[-l-1].transpose()) #Using Equation 4, we are computng the gradient with to "w variables" in the current layer of neural network.
    return (nabla_b, nabla_w)


- Cost for a single training example is $C_x = \frac{1}{2} ||y-a_L||^2$

- $\frac{∂C}{∂{a_L}} = ||y-a_L||(-1) =||a_L-y||$

-  From the notation $∇_{a}C = \frac{∂C}{∂{a_L}} = ||a_L-y|| $

The above relation is directly used in the Equation 1 of summary of backpropgation steps.
- This formula will change depending on the choice of loss function