# Artificial Neural Network

## 🧠 Goal: Build a Neural Network to Solve XOR or Binary Classification

We'll use:

* One input layer (2 features)
* One hidden layer (2 neurons, ReLU or sigmoid)
* One output layer (1 neuron, sigmoid activation)
* Binary cross-entropy loss
* Gradient descent to update weights

---

## 📊 Sample Dataset (Binary Classification)



In [1]:
# XOR dataset: 2 inputs → 1 output
X = [
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1]
]

y = [0, 1, 1, 0]  # XOR output: only true when inputs differ


## 🔢 Step-by-Step ANN Structure

We’ll create:

* Input layer: 2 features
* Hidden layer: 2 neurons
* Output layer: 1 neuron

---

## ✅ Step 1: Define Activation Functions

In [2]:
import math
import random

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1 - sx)

def relu(x):
    return max(0, x)

def relu_derivative(x):
    return 1 if x > 0 else 0


## ✅ Step 2: Initialize Weights and Biases

In [3]:
# Initialize random weights for input → hidden layer (2x2)
w1 = [[random.uniform(-1, 1) for _ in range(2)] for _ in range(2)]
b1 = [random.uniform(-1, 1) for _ in range(2)]

# Initialize weights for hidden → output layer (2x1)
w2 = [random.uniform(-1, 1) for _ in range(2)]
b2 = random.uniform(-1, 1)

learning_rate = 0.1


## ✅ Step 3: Training Loop (Forward + Backward Pass)

In [4]:
for epoch in range(10):
    total_loss = 0

    for i in range(len(X)):
        # ---------- Forward Pass ----------
        x0, x1 = X[i]
        target = y[i]

        # Input to hidden layer
        z1 = [x0 * w1[0][0] + x1 * w1[0][1] + b1[0],
              x0 * w1[1][0] + x1 * w1[1][1] + b1[1]]
        a1 = [sigmoid(z1[0]), sigmoid(z1[1])]

        # Hidden to output layer
        z2 = a1[0] * w2[0] + a1[1] * w2[1] + b2
        output = sigmoid(z2)

        # ---------- Loss ----------
        loss = -(target * math.log(output + 1e-8) + (1 - target) * math.log(1 - output + 1e-8))
        total_loss += loss

        # ---------- Backward Pass ----------
        d_loss_output = output - target
        d_output_z2 = sigmoid_derivative(z2)

        # Gradients for w2 and b2
        d_w2 = [d_loss_output * d_output_z2 * a for a in a1]
        d_b2 = d_loss_output * d_output_z2

        # Backprop to hidden layer
        d_hidden = [d_loss_output * d_output_z2 * w2[j] * sigmoid_derivative(z1[j]) for j in range(2)]

        # Gradients for w1 and b1
        d_w1 = [[d_hidden[j] * x for x in [x0, x1]] for j in range(2)]
        d_b1 = d_hidden

        # ---------- Update Weights ----------
        for j in range(2):
            for k in range(2):
                w1[j][k] -= learning_rate * d_w1[j][k]
            b1[j] -= learning_rate * d_b1[j]

        for j in range(2):
            w2[j] -= learning_rate * d_w2[j]
        b2 -= learning_rate * d_b2

    # Print every 1000 epochs
    if epoch % 1000 == 0:
        print(f"Epoch {epoch}, Loss = {total_loss:.4f}")


Epoch 0, Loss = 3.1577


## ✅ Step 4: Test the Network

In [5]:
print("\n--- Testing ---")
for i in range(len(X)):
    x0, x1 = X[i]

    z1 = [x0 * w1[0][0] + x1 * w1[0][1] + b1[0],
          x0 * w1[1][0] + x1 * w1[1][1] + b1[1]]
    a1 = [sigmoid(z1[0]), sigmoid(z1[1])]

    z2 = a1[0] * w2[0] + a1[1] * w2[1] + b2
    output = sigmoid(z2)

    print(f"Input: {x0, x1} => Output: {output:.4f}")



--- Testing ---
Input: (0, 0) => Output: 0.3834
Input: (0, 1) => Output: 0.3577
Input: (1, 0) => Output: 0.3562
Input: (1, 1) => Output: 0.3283


## 🔍 Explanation of Steps

| Step                 | Explanation                                                          |
| -------------------- | -------------------------------------------------------------------- |
| **Weight Init**      | We randomly initialize weights for symmetry breaking                 |
| **Forward Pass**     | We compute activations through the network using sigmoid             |
| **Loss Function**    | We use binary cross-entropy to measure how far output is from target |
| **Backward Pass**    | Use chain rule to compute how much each weight contributed to error  |
| **Gradient Descent** | We subtract gradient times learning rate from each weight            |

---

## 🧠 Why This Works

* The network can **learn non-linear boundaries** (like XOR)
* **Sigmoid** squashes outputs to probability range
* **Backpropagation** updates weights by calculating how much each one affects final error

---

Would you like a **visual of how the network classifies** the input space?
