In this description, we'll explore the batch operation of inputs in input layer and hidden layer followed by the Rectified Linear Unit (ReLU) activation function.
1. ##### Input Layer : 
   - The input layer represents the input features to the neurons. It's typically represented as a row vector where each element corresponds to an      input feature.
   - For four neurons, the input layer would typically consist of four elements:
     $$ \text{Input Layer} = \begin{bmatrix}
2 & 1 & 0 \\
1 & 1 & 1 \\
3 & 0 & 1 \\
\end{bmatrix}
 $$

In [44]:
import torch

# Randomly initialize a 3x3 tensor for a neural network
x = torch.tensor([[2, 1, 0],[1, 1, 1],[3, 0, 1]])



2. **Weights**:
   - Weights are the parameters of the neuron that are learned during the training process. Each weight corresponds to a connection between an input feature and the neuron.
   - Weights are typically represented as a column vector:
   - Here, in this instance, we are going to have two weights- one for input and another for intermediate output
     $$
\text{Weights}\ w_1 = \begin{bmatrix}
-1 & 1 & 0 \\
1 & 0 & -1 \\
0 & -1 & 1 \\
-1 & 1 & 0 \\
\end{bmatrix}
$$ 
$$
\text{Weights}\ w_2 = \begin{bmatrix}
-1 & 1 & 0 & 1 \\
0 & -1 & 1 & 0 \\
\end{bmatrix}
$$

In [45]:
# Weights
w1 = torch.tensor([[-1, 1, 0],
                  [1, 0, -1],
                  [0, -1, 1],
                  [-1, 1, 0]])

w2 = torch.tensor([[-1, 1, 0, 1],
                   [0, -1, 1, 0]])

w1

tensor([[-1,  1,  0],
        [ 1,  0, -1],
        [ 0, -1,  1],
        [-1,  1,  0]])

In [46]:
w2

tensor([[-1,  1,  0,  1],
        [ 0, -1,  1,  0]])

3. **Bias**:
   - The bias term is another parameter of the neuron. It's added to the weighted sum of inputs to shift the activation function.
   - The bias is represented as a scalar value:
   -  Here, in this instance, we are going to have two bias- one for input and another for intermediate output
     $$
     b_1 = \begin{bmatrix} -1 & 3 & 5 & 3 \end{bmatrix}
     $$
     $$
     b_2 = \begin{bmatrix} -2 & 3 \end{bmatrix}
     $$

In [47]:
# Bias
b1 = torch.tensor([-5, 0 ,1, 2])
b2 = torch.tensor([-2, 3])

b1 = b1.unsqueeze(1)
b1

tensor([[-5],
        [ 0],
        [ 1],
        [ 2]])

In [48]:
b2 = b2.unsqueeze(1)
b2

tensor([[-2],
        [ 3]])

4. **Linear Transformation**:
   - This represents the result of the linear transformation of the input by the neuron, including the weighted sum of inputs and the bias term.
   - It's computed by multiplying the input layer by the weights and then adding the bias term:
  
     $$
     \text{Layer 1 Output} = w_1 . X^T + b_1
     $$

     $$
     \text{\ \ \ \ \ \ } = \begin{bmatrix}
-1 & 1 & 0 \\
1 & 0 & -1 \\
0 & -1 & 1 \\
-1 & 1 & 0 \\
\end{bmatrix}
     \begin{bmatrix}
2 & 1 & 0 \\
1 & 1 & 1 \\
3 & 0 & 1 \\
\end{bmatrix} + \begin{bmatrix} -5 \\ 0 \\ 1 \\ 2 \end{bmatrix}
    $$
    $$
    = \begin{bmatrix}
-6 & -5 & -4 \\
-1 & 1 & -1 \\
3 & 0 & 1 \\
1 & 2 & 3 \\
\end{bmatrix}
    $$
  - Layer 1 output will serve as input to hidden layer
     $$
     \text{Hidden Layer Output} = w_2 . \text{ReLU(Layer 1 Output}^T) + b_2
     $$
    $$= \begin{bmatrix} 1 & 1 & -1 & 0 \\ 0 & -1 & 1 & 0\end{bmatrix}\begin{bmatrix}
-6 & -5 & -4 \\
-1 & 1 & -1 \\
3 & 0 & 1 \\
1 & 2 & 3 \\
\end{bmatrix}) + \begin{bmatrix} -2 \\ 3 \end{bmatrix}
    $$
    $$
    = \begin{bmatrix}
-1 & 1 & 1 \\
6 & 2 & 4 \\
\end{bmatrix}
    $$ix}
    $$

In [49]:
# Linear Transformation
import torch.nn as nn

input_layer_output = torch.matmul(w1,x) + b1
relud_output_1 = nn.functional.relu(input_layer_output)

print("Output of first layer : ",input_layer_output)

hidden_layer_output = torch.matmul(w2,relud_output_1) + b2
relud_output_2 = nn.functional.relu(hidden_layer_output)

print("Output of Hidden Layer : ",hidden_layer_output)

Output of first layer :  tensor([[-6, -5, -4],
        [-1,  1, -1],
        [ 3,  0,  1],
        [ 1,  2,  3]])
Output of Hidden Layer :  tensor([[-1,  1,  1],
        [ 6,  2,  4]])


 5. **ReLU Activation**:
   - ReLU (Rectified Linear Unit) is an activation function commonly used in neural networks to introduce non-linearity.
   - It applies an element-wise operation to the output of Step 1, replacing any negative values with zero:
     $$ ReLU(\begin{bmatrix} -1 & 1 & 1 \\ 6 & 2 & 4 \end{bmatrix}) = \begin{bmatrix} 0 & 1 & 1 \\ 6 & 2 & 4 \end{bmatrix} $$

In [56]:
# RelU
relud_output_2 = nn.functional.relu(hidden_layer_output)
print(relud_output_2)

tensor([[0, 1, 1],
        [6, 2, 4]])
