# 📘 Lesson 4 — The Perceptron: First Neural Network

---

### 🎯 Why this lesson matters
So far:
- **Linear regression** → predicted continuous values.
- But real problems often need **classification** (spam vs not spam, tumor vs healthy).  

👉 The **Perceptron** was the first algorithm (1958, Frank Rosenblatt) to perform binary classification using a model inspired by neurons.  
It marks the beginning of **neural networks**.

We’ll learn:
- What the perceptron is.
- Why it works for classification.
- How training happens with gradient descent.


In [None]:
# Setup
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
%matplotlib inline
torch.manual_seed(42)


## 1) Biological Inspiration — WHY a "neuron"?

- A biological neuron receives many inputs.
- If inputs are strong enough, it "fires" (outputs 1), otherwise stays silent (0).
- The perceptron mimics this:
  $
  y = \text{step}(w \cdot x + b)
  $

👉 WHY important?
This gave birth to the idea that **learning = adjusting weights** on inputs to change decisions.


In [None]:
# Step activation function (hard threshold)
def step_function(x):
    return torch.tensor(1.0) if x >= 0 else torch.tensor(0.0)

print("step(-2) =", step_function(-2))
print("step(3)  =", step_function(3))


## 2) Perceptron Model — the math

Equation:
$
y = \text{step}(w \cdot x + b)
$

- **w** = weights (how important each feature is).
- **b** = bias (threshold shift).
- **step** = activation (0 or 1).

👉 WHY?
This transforms linear regression into a **binary classifier**.


In [None]:
class PerceptronManual:
    def __init__(self, n_inputs):
        self.w = torch.randn(n_inputs, requires_grad=True)
        self.b = torch.randn(1, requires_grad=True)

    def forward(self, x):
        z = torch.dot(self.w, x) + self.b
        return step_function(z)


## 3) Toy Example — AND Logic Gate

The perceptron was originally tested on simple logic gates.

Truth table (AND):
- (0,0) → 0
- (0,1) → 0
- (1,0) → 0
- (1,1) → 1

👉 WHY?
Because if perceptron can learn logic gates, it proves it can separate categories.


In [None]:
# Dataset: AND gate
X = torch.tensor([[0.,0.],[0.,1.],[1.,0.],[1.,1.]])
y = torch.tensor([[0.],[0.],[0.],[1.]])


## 4) Training Perceptron — WHY gradient descent?

The original Rosenblatt perceptron used a manual update rule.  
Today, we use **gradient descent** (like in regression), but with a **sigmoid** instead of step.  

👉 WHY sigmoid?
- Step is not differentiable → cannot use gradient descent.
- Sigmoid is smooth approximation → allows training.


In [None]:
class PerceptronNN(nn.Module):
    def __init__(self, n_inputs):
        super().__init__()
        self.linear = nn.Linear(n_inputs, 1)
        self.activation = nn.Sigmoid()

    def forward(self, x):
        return self.activation(self.linear(x))


## 5) Training on AND Gate


In [None]:
model = PerceptronNN(2)
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

losses = []
for epoch in range(1000):
    y_pred = model(X)
    loss = criterion(y_pred, y)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    losses.append(loss.item())
plt.figure(figsize=(8, 4))
plt.plot(losses)
plt.title("Training Loss (AND gate)")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.show()


## 6) Predictions — did it learn?

👉 WHY check predictions?  
Loss decreasing is not enough — we must confirm model decisions match truth.


In [None]:
with torch.no_grad():
    print("Predictions:", model(X).round().view(-1).tolist())
    print("Targets:    ", y.view(-1).tolist())


## 7) Visualizing Decision Boundary

👉 WHY?  
A picture shows how perceptron separates inputs into 0 vs 1.


In [None]:
import numpy as np

def plot_decision_boundary(model, X, y):
    x_min, x_max = -0.5, 1.5
    y_min, y_max = -0.5, 1.5
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                         np.linspace(y_min, y_max, 100))
    grid = torch.FloatTensor(np.c_[xx.ravel(), yy.ravel()])
    with torch.no_grad():
        Z = model(grid).reshape(xx.shape)
    plt.contourf(xx, yy, Z, levels=[0,0.5,1], cmap="coolwarm", alpha=0.6)
    plt.scatter(X[:,0], X[:,1], c=y.view(-1), cmap="coolwarm", edgecolor="k")
    plt.title("Decision boundary")
    plt.show()

plot_decision_boundary(model, X, y)


## 8) Limitations of Perceptron

- Works only for **linearly separable problems** (AND, OR).
- Fails for XOR (not separable by straight line).  

👉 WHY important?  
This limitation caused the **AI winter** (1969, Minsky & Papert proved perceptron can’t solve XOR).  
Solution: add **multiple layers** → Multilayer Perceptron (next lesson).


In [None]:
# XOR dataset
X_xor = torch.tensor([[0.,0.],[0.,1.],[1.,0.],[1.,1.]])
y_xor = torch.tensor([[0.],[1.],[1.],[0.]])

# Train perceptron
model_xor = PerceptronNN(2)
optimizer = optim.SGD(model_xor.parameters(), lr=0.1)
criterion = nn.BCELoss()

for epoch in range(1000):
    y_pred = model_xor(X_xor)
    loss = criterion(y_pred, y_xor)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

with torch.no_grad():
    preds = model_xor(X_xor).round().view(-1).tolist()

print("Predictions for XOR:", preds)
print("Targets:            ", y_xor.view(-1).tolist())


## ✅ Summary — Why Perceptron Matters

- First neural network model (1958).
- Introduced idea: weights + bias + activation.  
- Shows how classification works.  
- Limitation: cannot solve XOR → motivated **multilayer networks**.  

🚀 Next Lesson: **Multilayer Perceptron (MLP)** — how adding layers solves XOR and builds modern deep learning.
