# **Perceptron**

## 📌 1. Technical Introduction

### 🧭 Where It Fits:

* The **Perceptron** is the simplest form of a **neural network** — it’s a **single-layer, single-neuron model**.
* It belongs to **supervised learning**.
* It was originally designed for **binary classification** problems (e.g., yes/no, 0/1, cat/not-cat).

### 🛠 How It Works Conceptually:

* Takes multiple inputs, applies weights to them, adds a bias, and passes the result through an **activation function**.
* If the result is above a threshold, it outputs 1; otherwise, it outputs 0.

---

## 🧸 2. Simplified Explanation

Imagine a **voting machine**:

* Each input (like “is the road clear?”, “is speed safe?”) **votes** with a certain **weight**.
* The machine **adds up the votes**, and if the total is above a limit, it says “Go” (1); otherwise, “Stop” (0).

That’s what a Perceptron does — **a smart yes/no decider**.

---

## 📕 3. Definition

> A **Perceptron** is a linear binary classifier that computes a weighted sum of input features, adds a bias, and passes the result through an activation function (usually a step function) to produce a binary output (0 or 1).

---
Got it! For foundational concepts like perceptrons or activations, I’ll keep the explanations natural and intuitive — no rigid #Explain format. Let’s continue. Here's the rest of the explanation on **Perceptron**:

---

## 🧮 Perceptron: How It Works (Simple Math)

Given:

* Inputs: $x_1, x_2, ..., x_n$
* Weights: $w_1, w_2, ..., w_n$
* Bias: $b$

The perceptron calculates:

$$
z = w_1x_1 + w_2x_2 + \dots + w_nx_n + b
$$

Then it applies an **activation function**. In a basic perceptron, it’s a **step function**:

$$
\text{output} =
\begin{cases}
1 & \text{if } z \geq 0 \\
0 & \text{otherwise}
\end{cases}
$$

---

## 🧪 Example:

Say a car should go if:

* Road is clear ($x_1 = 1$)
* Speed is safe ($x_2 = 1$)

You assign:

* $w_1 = 0.6, w_2 = 0.4, b = -0.8$

$$
z = (0.6 \cdot 1) + (0.4 \cdot 1) - 0.8 = 0.2 \Rightarrow \text{output} = 1 \ (\text{Go})
$$

---

## ✅ What Perceptron Can and Cannot Do

### What It Can Do:

* Learn simple binary classification (e.g., yes/no)
* Solve **linearly separable** problems (like AND, OR logic)

### What It Can’t Do:

* **Cannot solve XOR** or complex problems
* Can’t learn curved or abstract patterns — for that, we need **multi-layer networks**

---

##  **Model Example - Learn the OR function**


### 🔢 OR Truth Table:

| x1 | x2 | Output |
| -- | -- | ------ |
| 0  | 0  | 0      |
| 0  | 1  | 1      |
| 1  | 0  | 1      |
| 1  | 1  | 1      |

---

### 🧠 Perceptron Formula

$$
\text{output} =
\begin{cases}
1 & \text{if } (w \cdot x + b) \geq 0 \\
0 & \text{otherwise}
\end{cases}
$$

We’ll use:

* **Step function** as activation
* **Manual training with epochs**

---

In [None]:
import numpy as np

# Step activation function
def step_function(x):
    return 1 if x >= 0 else 0

# Perceptron class
class Perceptron:
    def __init__(self, input_size, learning_rate=0.1):
        self.weights = np.zeros(input_size)
        self.bias = 0
        self.lr = learning_rate

    def predict(self, x):
        z = np.dot(self.weights, x) + self.bias
        return step_function(z)

    def train(self, X, y, epochs=10):
        for epoch in range(epochs):
            print(f"Epoch {epoch+1}")
            for xi, target in zip(X, y):
                prediction = self.predict(xi)
                error = target - prediction
                # Weight and bias update rule
                self.weights += self.lr * error * xi
                self.bias += self.lr * error
                print(f"Input: {xi}, Target: {target}, Prediction: {prediction}, Error: {error}")
            print(f"Weights: {self.weights}, Bias: {self.bias}\n")

# Training data for OR
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([0, 1, 1, 1])

# Create and train perceptron
p = Perceptron(input_size=2)
p.train(X, y)

# Test
print("Final predictions:")
for xi in X:
    print(f"{xi} => {p.predict(xi)}")


Epoch 1
Input: [0 0], Target: 0, Prediction: 1, Error: -1
Input: [0 1], Target: 1, Prediction: 0, Error: 1
Input: [1 0], Target: 1, Prediction: 1, Error: 0
Input: [1 1], Target: 1, Prediction: 1, Error: 0
Weights: [0.  0.1], Bias: 0.0

Epoch 2
Input: [0 0], Target: 0, Prediction: 1, Error: -1
Input: [0 1], Target: 1, Prediction: 1, Error: 0
Input: [1 0], Target: 1, Prediction: 0, Error: 1
Input: [1 1], Target: 1, Prediction: 1, Error: 0
Weights: [0.1 0.1], Bias: 0.0

Epoch 3
Input: [0 0], Target: 0, Prediction: 1, Error: -1
Input: [0 1], Target: 1, Prediction: 1, Error: 0
Input: [1 0], Target: 1, Prediction: 1, Error: 0
Input: [1 1], Target: 1, Prediction: 1, Error: 0
Weights: [0.1 0.1], Bias: -0.1

Epoch 4
Input: [0 0], Target: 0, Prediction: 0, Error: 0
Input: [0 1], Target: 1, Prediction: 1, Error: 0
Input: [1 0], Target: 1, Prediction: 1, Error: 0
Input: [1 1], Target: 1, Prediction: 1, Error: 0
Weights: [0.1 0.1], Bias: -0.1

Epoch 5
Input: [0 0], Target: 0, Prediction: 0, Error: 

## ❌ Why XOR Does Not Work with a Perceptron



Let's look at the XOR truth table:

| x1 | x2 | XOR Output |
| -- | -- | ---------- |
| 0  | 0  | 0          |
| 0  | 1  | 1          |
| 1  | 0  | 1          |
| 1  | 1  | 0          |

Now **plot these points**, and you’ll see:

* You cannot **draw a straight line** that separates the 1s from the 0s.
* That means the data is **not linearly separable**.

---

### 🧠 What Does a Perceptron Do?

A **single-layer perceptron** can only classify data that is **linearly separable** — meaning:

> It can only draw **a straight decision boundary** between two classes.

Since XOR needs a **non-linear boundary**, the simple perceptron fails.

---

## 🛠 Example: What Happens If You Try XOR?

If you use the same perceptron code as before and train it on XOR:

```python
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([0, 1, 1, 0])
```

🔁 No matter how many epochs you run:

* The perceptron **won’t learn XOR correctly**.
* It will just keep adjusting weights without reaching zero error.

---

## ✅ How to Solve XOR?

You need to use a **Multi-Layer Perceptron (MLP)** — also called a **Neural Network** with:

* **Input layer**
* **Hidden layer** (this is key!)
* **Output layer**

The **hidden layer adds non-linearity**, which allows the model to learn complex patterns like XOR.

---

## 🧠 Summary

| Model                   | Can Learn XOR? | Why Not / Why Yes?                      |
| ----------------------- | -------------- | --------------------------------------- |
| Single-Layer Perceptron | ❌ No           | Can only model linear boundaries        |
| Multi-Layer Perceptron  | ✅ Yes          | Hidden layers add the needed complexity |

---


# **Multi-Layer Perceptron**

🧠 What Is a Multi-Layer Perceptron?

A **Multi-Layer Perceptron (MLP)** is a **neural network with at least one hidden layer** between input and output.

It can learn **non-linear patterns** like XOR, which a single-layer perceptron cannot.

---

## 🏗️ Basic Structure of an MLP

```
Input Layer → Hidden Layer(s) → Output Layer
```

Each layer consists of **neurons**, and each neuron:

1. Takes weighted input
2. Adds bias
3. Applies activation function (like ReLU, Sigmoid)

---

### 🧾 Example MLP Architecture (XOR problem)

```
Input:  x1, x2
       ↓   ↓
    [Hidden Layer]
       ↓   ↓
   Output: y
```

Even a **2-2-1** network can solve XOR:

* 2 input neurons
* 1 hidden layer with 2 neurons
* 1 output neuron

---

## ✍️ How MLP Learns?

### Step-by-step:

1. **Forward Propagation**: Pass input through all layers to get prediction
2. **Loss Calculation**: Compare prediction with actual output
3. **Backpropagation**: Calculate gradients and update weights
4. Repeat over many **epochs**

This is called **training** the network.

---

## 📐 Activation Functions in MLP

Hidden layers must use **non-linear activation functions** like:

* **ReLU** (faster, preferred in modern networks)
* **Sigmoid** (used in binary output)
* **Tanh**

These allow the network to **approximate complex curves**, not just lines.

---

## ✅ What Makes MLP Powerful?

* Can learn **non-linear decision boundaries**
* Can model **complex functions**
* Is the basis of all **deep learning architectures**

---

In [None]:
import numpy as np

# === Step 1: Define the dataset ===
# Input: 2 features; Output: XOR truth table
X = np.array([[0, 0],
              [0, 1],
              [1, 0],
              [1, 1]])
y = np.array([[0],
              [1],
              [1],
              [0]])

# === Step 2: Define activation function and its derivative ===
def sigmoid(x):
    return 1 / (1 + np.exp(-x))  # squashes values between 0 and 1

def sigmoid_derivative(x):
    s = sigmoid(x)
    return s * (1 - s)  # derivative of sigmoid used for backpropagation

# === Step 3: Initialize weights and biases ===
np.random.seed(0)  # ensure reproducibility

input_size = 2     # two input features (x1, x2)
hidden_size = 2    # two neurons in hidden layer (enough to solve XOR)
output_size = 1    # one output neuron (binary output)

# Randomly initialize weights for both layers
W1 = np.random.randn(input_size, hidden_size)  # shape: (2,2)
b1 = np.zeros((1, hidden_size))                # bias for hidden layer

W2 = np.random.randn(hidden_size, output_size) # shape: (2,1)
b2 = np.zeros((1, output_size))                # bias for output layer

# === Step 4: Define loss function ===
def binary_cross_entropy(y_true, y_pred):
    # Added small epsilon to prevent log(0)
    return -np.mean(y_true * np.log(y_pred + 1e-8) + (1 - y_true) * np.log(1 - y_pred + 1e-8))

# === Step 5: Training loop ===
epochs = 10000
learning_rate = 0.1

for epoch in range(epochs):
    # === Forward Pass ===
    z1 = np.dot(X, W1) + b1        # Linear activation for hidden layer
    a1 = sigmoid(z1)               # Non-linear activation for hidden layer

    z2 = np.dot(a1, W2) + b2       # Linear activation for output layer
    a2 = sigmoid(z2)               # Final prediction after sigmoid

    # === Compute Loss ===
    loss = binary_cross_entropy(y, a2)

    # === Backpropagation ===
    # Gradient of loss with respect to output activation
    error_output = a2 - y

    # Gradients for W2 and b2
    dW2 = np.dot(a1.T, error_output)
    db2 = np.sum(error_output, axis=0, keepdims=True)

    # Backprop to hidden layer
    error_hidden = np.dot(error_output, W2.T) * sigmoid_derivative(z1)
    dW1 = np.dot(X.T, error_hidden)
    db1 = np.sum(error_hidden, axis=0, keepdims=True)

    # === Update weights and biases ===
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1

    # Print loss occasionally
    if epoch % 1000 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")

# === Step 6: Final Predictions ===
print("\nFinal predictions after training:")
for i in range(4):
    z1 = np.dot(X[i], W1) + b1
    a1 = sigmoid(z1)
    z2 = np.dot(a1, W2) + b2
    a2 = sigmoid(z2)
    print(f"Input: {X[i]}, Predicted: {a2[0][0]:.4f}, Actual: {y[i][0]}")


Epoch 0, Loss: 0.7339
Epoch 1000, Loss: 0.3714
Epoch 2000, Loss: 0.3563
Epoch 3000, Loss: 0.3525
Epoch 4000, Loss: 0.3507
Epoch 5000, Loss: 0.3498
Epoch 6000, Loss: 0.3492
Epoch 7000, Loss: 0.3488
Epoch 8000, Loss: 0.3485
Epoch 9000, Loss: 0.3482

Final predictions after training:
Input: [0 0], Predicted: 0.0014, Actual: 0
Input: [0 1], Predicted: 0.4995, Actual: 1
Input: [1 0], Predicted: 0.9982, Actual: 1
Input: [1 1], Predicted: 0.5009, Actual: 0
