In this description, we'll explore the operation of four neurons followed by the Rectified Linear Unit (ReLU) activation function.
1. ##### Input Layer : 
   - The input layer represents the input features to the neurons. It's typically represented as a row vector where each element corresponds to an      input feature.
   - For four neurons, the input layer would typically consist of four elements:
     $$ \text{Input Layer} = \begin{bmatrix} 1 & 2 & 3 & 4 \end{bmatrix} $$

In [17]:
import torch
import torch.nn as nn

# Define input tensor (1 sample, 4 features)
x = torch.tensor([[1.0, 2.0, 3.0, 4.0]])
x

tensor([[1., 2., 3., 4.]])

2. **Weights**:
   - Weights are the parameters of the neuron that are learned during the training process. Each weight corresponds to a connection between an input feature and the neuron.
   - Weights are typically represented as a column vector:
     $$
\text{Weights} = \begin{bmatrix}
-1 & 1 & 0 & 0 \\
0 & -1 & 1 & 0 \\
0 & 0 & -1 & 1 \\
1 & 0 & 0 & -1 \\
\end{bmatrix}
$$

In [13]:
# Weights
weights = torch.tensor([[-1.0, 1.0, 0.0, 0.0],
                        [0.0, -1.0, 1.0, 0.0],
                        [0.0, 0.0, -1.0, 1.0],
                        [1.0, 0.0, 0.0, -1.0]])

weights

tensor([[-1.,  1.,  0.,  0.],
        [ 0., -1.,  1.,  0.],
        [ 0.,  0., -1.,  1.],
        [ 1.,  0.,  0., -1.]])

3. **Bias**:
   - The bias term is another parameter of the neuron. It's added to the weighted sum of inputs to shift the activation function.
   - The bias is represented as a scalar value:
     $$
     \text{b} = \begin{bmatrix} -5 & 0 & 1 & 2 \end{bmatrix}
     $$

In [14]:
# Bias
bias = torch.tensor([-5.0, 0.0, 1.0, 2.0])
bias

tensor([-5.,  0.,  1.,  2.])

4. **Linear Transformation**:
   - This represents the result of the linear transformation of the input by the neuron, including the weighted sum of inputs and the bias term.
   - It's computed by multiplying the input layer by the weights and then adding the bias term:
  
     $$
     \text{Linear Output} = W . X^T + B
     $$

     $$
     \text{\ \ \ \ \ \ } = \begin{bmatrix}
        -1 & 1 & 0 & 0 \\
        0 & -1 & 1 & 0 \\
        0 & 0 & -1 & 1 \\
        1 & 0 & 0 & -1 \\
                            \end{bmatrix}
     \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix} + \begin{bmatrix} -5 \\ 0 \\ 1 \\ 2 \end{bmatrix}
    $$
    $$
    = \begin{bmatrix} -2 \\ -1 \\ 0 \\ 1 \end{bmatrix}
    $$

In [15]:
linear_output = torch.matmul(x, weights) + bias
linear_output.T

tensor([[-2.],
        [-1.],
        [ 0.],
        [ 1.]])

5. **ReLU Activation**:
   - ReLU (Rectified Linear Unit) is an activation function commonly used in neural networks to introduce non-linearity.
   - It applies an element-wise operation to the output of Step 1, replacing any negative values with zero:
      $$ \text{ReLU}([-2,-1,0,1]) = [0,0,0,1] $$

In [18]:
# ReLU activation
relu_output = nn.functional.relu(linear_output.T)

# Output
print("ReLU output:", relu_output)

ReLU output: tensor([[0.],
        [0.],
        [0.],
        [1.]])
