# Neural Network Introduction

A neural network is a computational model inspired by the way human brains work. It is designed to recognize patterns and learn from data. The network learns to map input data (like pixels in an image) to an output (like a label/target "cat" or "dog") by adjusting internal parameters.

### Real-Life Analogy
Think of a neural network like a child learning to differentiate between apples and oranges. The more fruits the child sees, the better they get at recognizing the differences.

## Neural Network Architecture

1. **Neuron** - Basic units that receive inputs, process them, and produce outputs. Each Neuron holds a numbre between 0 and 1
2. **Input Layers** - Receives initial data.
3. **Hidden Layer** - Perform computations
4. **Output Layer** - Produces the final prediction.
5. **Weights**: Numbers that determine the strength or importance of input features. ***Number of weights = Number of Hidden Layer + 1*** (1 for output layer)
6. **Biases**: Constants added to the weighted input to shift the activation function.
7. **Activation Function**: A non-linear function that decides whether the neuron should be activated.

### Example
**Predicting House Prices**

## Forward Propagation (Pass)
Forward propagation is how data flows through the network to make a prediction.

### Steps
1. Multiply input by weight.
2. Add bias.
3. Apply activation function.
4. Pass result to next layer.

### Formula
***Output = Activation(Weight * Input + Bias)***

## Training the Network - (Backpropagation + Gradient Descent)

Training involves - Analyzing the output from the **Forward Pass**, calculating the error, then ***adjusting weights and biases to reduce error***. This will be an iterative cycle.

### Steps
1. **Forward pass**: Predict output.
2. **Loss function**: Measure how wrong the prediction is.
3. **Backward pass (Backpropagation)**: Calculate how much each weight contributed to the error.
4. **Gradient Descent**: Update weights and biases in the opposite direction of error gradient.

### Analogy - Shooting Game

1. Forward Pass: You aim and shoot at the target
2. Loss Function: Calculate by how much you missed the target (assuming target is missed)
3. Backward Pass: Calculate the angle to be adjusted to hit the target
4. Gradient Descent: Update anlges (Weights & Biases) in the opposite direction of the error
5. Repeat the cycle from Step-1

## Activation Function

An activation function ***introduces non-linearity into the output of a neuron***. Without it, no matter how many layers you stack, the entire neural network would behave like a linear function. That means it would not be able to learn complex patterns such as images, speech, XOR logic, etc.

#### Why is Non-Linearity Important?
Linear equations can only solve simple problems like drawing a straight line to separate data. But most real-world problems are non-linear, such as:

1. Classifying handwritten digits
2. Detecting emotions from voice
3. Predicting stock market trends

Activation functions help the network bend the decision boundary to fit such complex patterns.

#### Example:
Imagine you're building a spam classifier: <br>
If you don't use an activation function: All emails are classified as either spam or not based on a linear rule → likely inaccurate.<br>
With activation functions, the network can learn complex rules like:

"If the word 'FREE' appears multiple times and it's from an unknown sender and has a link, then it's spam."

**Common Functions for Activation:**
1. ***ReLU (Rectified Linear Unit)***: f(x) = max(0, x)
2. ***Sigmoid***: f(x) = 1 / (1 + exp(-x)) → used for probabilities
3. ***Tanh***: Outputs between -1 and 1

## Loss Functions

To quantify how far off the network's prediction is from the actual value.

### Common Loss Functions
1. ***Mean Squared Error (MSE)*** for regression
2. **Cross Entropy*** for classification

## Sample Code Example
**Input**  - Input array where each row has two values (0 or 1) <br>
**Output** - XOR Output 

In [1]:
import numpy as np

# Step 1: Define activation and its derivative
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

In [5]:
# Step 2: Input and Output Data
X = np.array([[0, 0],
              [0, 1],
              [1, 0],
              [1, 1]])         # Inputs

y = np.array([[0],
              [1],
              [1],
              [0]])             # XOR Output (to see learning capability)

print("Input Values...........................")
print(X)
print("\n\nOutput Values...........................")
print(y)

Input Values...........................
[[0 0]
 [0 1]
 [1 0]
 [1 1]]


Output Values...........................
[[0]
 [1]
 [1]
 [0]]


In [6]:
# Step 3: Initialize Weights and Biases
np.random.seed(1)
input_layer_size = 2  # 2 Features
hidden_layer_size = 4 # 4 neurons in the hidden layer
output_layer_size = 1

In [14]:
# W1, W2, initialized randomly

# Weights - initializes the weights between the input layer and the hidden layer of your neural network using random values
W1 = np.random.rand(input_layer_size, hidden_layer_size)     # (2x4) Creates a matrix of shape (2, 4) filled with random numbers between 0 and 1.
W2 = np.random.rand(hidden_layer_size, output_layer_size)    # (4x1) Creates a matrix of shape (4, 1) filled with random numbers between 0 and 1.

# W1 - Weight Matrix : Each element in this matrix is a weight that determines how strongly a specific input neuron affects a specific hidden neuron.
# W2 - Weight Matrix : Each element in this matrix is a weight that determines how strongly a specific hidden neuron affects a specific output neuron.

# W2 - weights from hidden layer to output layer


### Why Random Initialization?
If all weights started as zeros: Every neuron would learn the same thing.
The network wouldn't be able to break symmetry or learn meaningful patterns.
Random values help each neuron begin with slightly different behavior, allowing the network to learn a rich variety of features.

In [None]:
# Biases
B1 = np.zeros((1, hidden_layer_size))  # (1x4) - Hidden layer bias vector. creats a matrix of size 1x4
B2 = np.zeros((1, output_layer_size))  # (1x1) - Output layer bias vector. creats a matrix of size 1x1

In [10]:
# Step 4: Train the network
learning_rate = 0.1
epochs = 10000

for epoch in range(epochs):
    # Forward pass - Using sigmoid activation
    Z1 = np.dot(X, W1) + B1   # Linear combination at hidden layer
    A1 = sigmoid(Z1)          # Activation from hidden layer
    Z2 = np.dot(A1, W2) + B2  # Linear combination at output layer
    A2 = sigmoid(Z2)          # Final output activation

    # Loss (Mean Squared Error)
    loss = np.mean((y - A2) ** 2)

    # Backward pass - Chain rule and gradient computation
    dA2 = (A2 - y)
    dZ2 = dA2 * sigmoid_derivative(A2)
    dW2 = np.dot(A1.T, dZ2)
    dB2 = np.sum(dZ2, axis=0, keepdims=True)

    dA1 = np.dot(dZ2, W2.T)
    dZ1 = dA1 * sigmoid_derivative(A1)
    dW1 = np.dot(X.T, dZ1)
    dB1 = np.sum(dZ1, axis=0, keepdims=True)

    # Update weights and biases - Using learning_rate * gradients
    W1 -= learning_rate * dW1
    B1 -= learning_rate * dB1
    W2 -= learning_rate * dW2
    B2 -= learning_rate * dB2

    # Print loss every 1000 iterations
    if epoch % 1000 == 0:
        print(f"Epoch {epoch} - Loss: {loss:.4f}")

Epoch 0 - Loss: 0.0059
Epoch 1000 - Loss: 0.0043
Epoch 2000 - Loss: 0.0034
Epoch 3000 - Loss: 0.0028
Epoch 4000 - Loss: 0.0023
Epoch 5000 - Loss: 0.0020
Epoch 6000 - Loss: 0.0018
Epoch 7000 - Loss: 0.0016
Epoch 8000 - Loss: 0.0014
Epoch 9000 - Loss: 0.0013


In [11]:
# Step 5: Final Prediction
print("\nFinal Output after Training:")
print(np.round(A2, 2))


Final Output after Training:
[[0.04]
 [0.97]
 [0.97]
 [0.03]]


### Final Output Interpretation (XOR)
The values are probabilities from the sigmoid function. Predictions > 0.5 are class 1, < 0.5 are class 0.

| Input   | Expected | Predicted               |
| ------- | -------- | ----------------------- |
| \[0, 0] | 0        | **0.04** (close to 0 ✅) |
| \[0, 1] | 1        | **0.97** (close to 1 ✅) |
| \[1, 0] | 1        | **0.97** (close to 1 ✅) |
| \[1, 1] | 0        | **0.03** (close to 0 ✅) |