## 1. Explain the concept of forward propagation in a neural network.
ans:
Forward Propagation in a neural network is used to compute and transmit the input data through the network's layers in a sequential manner, ultimately generating an output prediction. It is an essential step in the operation of a neural network during both training and inference.

During forward propagation, the input data is fed into the network, and the computations flow forward through the layers. Each neuron in a layer receives inputs from the neurons in the previous layer, applies a set of weights and biases to those inputs, and passes the result through an activation function. This process continues until the data reaches the output layer, where the final prediction or output of the network is generated.

`Forward propagation can be mathematically represented as a series of matrix multiplications and activation function applications`. By propagating the data through the network, the model learns to transform the input data into a useful representation that can be used for tasks like classification, regression, or any other problem the network is designed to solve.

During training, forward propagation is followed by the calculation of the loss, which measures the difference between the network's predicted output and the expected output. This loss is then used to adjust the network's weights and biases during the subsequent backpropagation step, which helps the network learn and improve its performance over time.

It involves the following steps:

- Input Layer: 
  - The input data is fed into the input layer of the neural network.
- Hidden Layers: 
  - The input data is processed through one or more hidden layers. Each neuron in a hidden layer receives inputs from the previous layer, applies an activation function to the weighted sum of these inputs, and passes the result to the next layer.
- Output Layer: 
  - The processed data moves through the output layer, where the final output of the network is generated. The output layer typically applies an activation function suitable for the task, such as softmax for classification or linear activation for regression.
- Prediction: 
  - The final output of the network is the prediction or classification result for the input data.

In summary, forward propagation enables the flow of data through the network, producing predictions or outputs based on the given inputs.

## 2. What is the purpose of the activation function in forward propagation ?
ans:

Activation functions are used during forward propagation in neural networks to introduce non-linearity to the output of each neuron or unit in a layer. The activation function is applied to the weighted sum of inputs and biases, also known as the activation value or pre-activation value, before passing it as the output of that neuron.

Mathematically, let's denote the weighted sum of inputs and biases as z for a particular neuron in a layer. The activation function, denoted as σ(z), is applied element-wise to the value of z, producing the output of the neuron, which is commonly denoted as a. Therefore, we have:

a = σ(z)

The choice of activation function depends on the nature of the problem and the desired properties of the network. Here are some commonly used activation functions:

1. Step function: A simple activation function that outputs a binary value based on a threshold. If the input is above the threshold, it outputs one; otherwise, it outputs zero. However, the step function is rarely used in practice due to its lack of differentiability.

2. Sigmoid function: The sigmoid function squashes the input into a range between 0 and 1, which makes it suitable for binary classification problems. It has a smooth, S-shaped curve and is given by the formula:

   σ(z) = 1 / (1 + exp(-z))

3. Hyperbolic tangent (tanh) function: The tanh function is similar to the sigmoid function but squashes the input into a range between -1 and 1. It is useful when the output range needs to be symmetric around zero:

   σ(z) = (exp(z) - exp(-z)) / (exp(z) + exp(-z))

4. Rectified Linear Unit (ReLU): The ReLU activation function is widely used in deep learning networks. It returns the input if it is positive, otherwise, it outputs zero. Mathematically, ReLU is defined as:

   σ(z) = max(0, z)

5. Leaky ReLU: Leaky ReLU is a modified version of the ReLU function that allows small negative values when the input is less than zero. It addresses the "dying ReLU" problem, where certain neurons may become inactive during training. Leaky ReLU is defined as:

   σ(z) = max(αz, z) (where α is a small positive constant)

These are just a few examples of activation functions commonly used in neural networks. The choice of activation function depends on the specific problem and the characteristics of the data being processed.

## 3. Describe the steps involved in the backward propagation (backpropagation) algorithm.
ans:

`Backward Propagation, also known as BackPropagation`, in a neural network is `used to calculate the gradients of the network's parameters (weights and biases) with respect to the loss function`. It is an essential step in the training process of a neural network, allowing it to learn and improve its performance.

During forward propagation, the input data flows through the network, and the output prediction is generated. `In the subsequent step of backward propagation, the gradients are calculated by propagating the error backward from the output layer to the input layer`. The gradients represent the sensitivity of the loss function with respect to each parameter, indicating how changing a particular parameter would affect the overall loss.

The process of backward propagation involves the following steps:

1. Loss calculation: 
   - First, the loss function is calculated by comparing the network's output with the expected output. The choice of loss function depends on the specific problem, such as mean squared error for regression or cross-entropy loss for classification.

2. Gradient calculation: 
   - Starting from the output layer, the gradients of the parameters with respect to the loss function are calculated layer by layer, moving backward through the network. This is done using the chain rule of calculus, which allows the gradients to be recursively computed based on the gradients of subsequent layers.

3. Weight and bias updates: 
   - Once the gradients are obtained, they are used to update the weights and biases of the network. This update step is typically performed using an optimization algorithm such as gradient descent, where the parameters are adjusted in the opposite direction of the gradients to minimize the loss function.

By iteratively performing forward propagation and backward propagation, the neural network gradually learns to adjust its weights and biases to minimize the loss and improve its performance on the given task. The gradients obtained from backward propagation guide the optimization process, allowing the network to update its parameters in a way that reduces the error and improves its ability to make accurate predictions.

In summary, backward propagation plays a critical role in training a neural network by calculating the gradients of the parameters with respect to the loss function. It enables the network to learn from the training data and adjust its parameters to minimize the error, leading to improved performance on the task at hand.

## 4. What is the purpose of the chain rule in backpropagation ?
ans:

The chain rule is a fundamental rule in calculus that allows us to calculate the derivative of a composition of functions. In the context of neural networks and backward propagation, the chain rule is essential for efficiently computing the gradients of the loss function with respect to the network's parameters (weights and biases) at each layer.

The chain rule states that if we have a composition of functions, say function f(x) and g(u), and we want to calculate the derivative of f(g(x)) with respect to x, we can express it as the product of the derivatives of f and g with respect to their respective variables, multiplied together.

Mathematically, if we have y = f(g(x)), then the chain rule can be stated as:

dy/dx = (df/du) * (dg/dx)

Applying the chain rule in the context of backward propagation in a neural network, we can break down the derivative calculation step by step:

1. Start with the loss function L and the output of a neuron (denoted as a) in a particular layer.
2. Calculate the derivative of the loss function with respect to a, denoted as δL/δa. This measures how much the loss function changes with respect to the output of the neuron.
3. Calculate the derivative of the activation function (denoted as σ) with respect to the weighted sum (denoted as z) of the neuron. Denote this derivative as δa/δz.
4. Compute the derivative of the weighted sum (z) with respect to the parameters (weights w and biases b) of the neuron. Denote these derivatives as δz/δw and δz/δb.
5. Apply the chain rule to calculate the gradients of the loss with respect to the weights and biases:

   δL/δw = δL/δa * δa/δz * δz/δw
   δL/δb = δL/δa * δa/δz * δz/δb

By applying the chain rule repeatedly for each layer during backward propagation, the gradients of the loss function with respect to the weights and biases at each layer can be efficiently calculated. These gradients are then used to update the parameters of the network in the optimization step, allowing the network to learn and improve its performance.

The chain rule is a fundamental tool that enables the efficient calculation of gradients in complex networks with multiple layers and non-linear activation functions. It forms the basis for the successful implementation of backward propagation, which is essential for training deep neural networks.

## 5. Implement the forward propagation process for a simple neural network with one hidden layer using NumPy.

In [1]:
import numpy as np

# X = (hours studying, hours sleeping), y = score on test
x_all = np.array(([2, 9], [1, 5], [3, 6], [5, 10]), dtype=float) # input data
y = np.array(([92], [86], [89]), dtype=float) # output

# scale units
x_all = x_all/np.amax(x_all, axis=0) # scaling input data
y = y/100 # scaling output data (max test score is 100)

# split data
X = np.split(x_all, [3])[0] # training data
x_predicted = np.split(x_all, [3])[1] # testing data

class neural_network(object):
  def __init__(self):
  #parameters
    self.inputSize = 2
    self.outputSize = 1
    self.hiddenSize = 3

  #weights
    self.W1 = np.random.randn(self.inputSize, self.hiddenSize) # (3x2) weight matrix from input to hidden layer
    self.W2 = np.random.randn(self.hiddenSize, self.outputSize) # (3x1) weight matrix from hidden to output layer

  def forward(self, X):
    #forward propagation through our network
    self.z = np.dot(X, self.W1) # dot product of X (input) and first set of 3x2 weights
    self.z2 = self.sigmoid(self.z) # activation function
    self.z3 = np.dot(self.z2, self.W2) # dot product of hidden layer (z2) and second set of 3x1 weights
    o = self.sigmoid(self.z3) # final activation function
    return o

  def sigmoid(self, s):
    # activation function
    return 1/(1+np.exp(-s))

  def sigmoidPrime(self, s):
    #derivative of sigmoid
    return s * (1 - s)

  def backward(self, X, y, o):
    # backward propagate through the network
    self.o_error = y - o # error in output
    self.o_delta = self.o_error*self.sigmoidPrime(o) # applying derivative of sigmoid to error

    self.z2_error = self.o_delta.dot(self.W2.T) # z2 error: how much our hidden layer weights contributed to output error
    self.z2_delta = self.z2_error*self.sigmoidPrime(self.z2) # applying derivative of sigmoid to z2 error

    self.W1 += X.T.dot(self.z2_delta) # adjusting first set (input --> hidden) weights
    self.W2 += self.z2.T.dot(self.o_delta) # adjusting second set (hidden --> output) weights

  def train(self, X, y):
    o = self.forward(X)
    self.backward(X, y, o)

  def saveWeights(self):
    np.savetxt("w1.txt", self.W1, fmt="%s")
    np.savetxt("w2.txt", self.W2, fmt="%s")

  def predict(self):
    print("Predicted data based on trained weights: ")
    print("Input (scaled): \n" + str(x_predicted))
    print("Output: \n" + str(self.forward(x_predicted)))

nn = neural_network()
for i in range(1000): # trains the nn 1,000 times
  print("# " + str(i) + "\n")
  print("Input (scaled): \n" + str(X))
  print("Actual Output: \n" + str(y))
  print("Predicted Output: \n" + str(nn.forward(X)))
  print("Loss: \n" + str(np.mean(np.square(y - nn.forward(X))))) # mean squared error
  print("\n")
  nn.train(X, y)

nn.saveWeights()
nn.predict()

# 0

Input (scaled): 
[[0.4 0.9]
 [0.2 0.5]
 [0.6 0.6]]
Actual Output: 
[[0.92]
 [0.86]
 [0.89]]
Predicted Output: 
[[0.37660359]
 [0.38619524]
 [0.37145716]]
Loss: 
0.262885763875226


# 1

Input (scaled): 
[[0.4 0.9]
 [0.2 0.5]
 [0.6 0.6]]
Actual Output: 
[[0.92]
 [0.86]
 [0.89]]
Predicted Output: 
[[0.49565923]
 [0.49131673]
 [0.48467816]]
Loss: 
0.1600927434619697


# 2

Input (scaled): 
[[0.4 0.9]
 [0.2 0.5]
 [0.6 0.6]]
Actual Output: 
[[0.92]
 [0.86]
 [0.89]]
Predicted Output: 
[[0.59403403]
 [0.57863829]
 [0.57918871]]
Loss: 
0.09400729688191782


# 3

Input (scaled): 
[[0.4 0.9]
 [0.2 0.5]
 [0.6 0.6]]
Actual Output: 
[[0.92]
 [0.86]
 [0.89]]
Predicted Output: 
[[0.66306284]
 [0.6409112 ]
 [0.64637886]]
Loss: 
0.05778928895972654


# 4

Input (scaled): 
[[0.4 0.9]
 [0.2 0.5]
 [0.6 0.6]]
Actual Output: 
[[0.92]
 [0.86]
 [0.89]]
Predicted Output: 
[[0.71006768]
 [0.68409192]
 [0.69268446]]
Loss: 
0.037982883646824644


# 5

Input (scaled): 
[[0.4 0.9]
 [0.2 0.5]
 [0.6 0.6]]
Actual

Derivative of Sigmoid Prime

![image.png](attachment:image.png)