# 2. The First Perceptron

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/maleehahassan/NNBuildingBlocksTeachingPt1/blob/main/content/02_first_perceptron.ipynb)

## Learning Objectives

By the end of this section, you will understand:
- The historical context and motivation behind the perceptron
- How the perceptron works mathematically
- The relationship between biological neurons and artificial neurons
- Limitations and capabilities of the perceptron
- How to implement a simple perceptron

## Historical Context

### The Birth of Artificial Neural Networks (1943-1958)

- **1943**: McCulloch and Pitts proposed the first mathematical model of a neuron
- **1949**: Donald Hebb introduced Hebbian learning ("neurons that fire together, wire together")
- **1958**: Frank Rosenblatt invented the **Perceptron** at Cornell University

The perceptron was revolutionary because it was the first algorithm that could **learn** to classify patterns automatically!

In [1]:
# Let's start by visualizing a biological neuron vs artificial neuron
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.patches import Circle, FancyBboxPatch
import matplotlib.patches as patches

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Biological Neuron (simplified)
ax1.set_xlim(0, 10)
ax1.set_ylim(0, 6)
ax1.set_title('Biological Neuron', fontsize=14, fontweight='bold')

# Cell body
circle = Circle((2, 3), 0.8, color='lightblue', alpha=0.7)
ax1.add_patch(circle)
ax1.text(2, 3, 'Cell\nBody', ha='center', va='center', fontsize=10)

# Dendrites (inputs)
for i, y in enumerate([1.5, 2.5, 3.5, 4.5]):
    ax1.arrow(0.2, y, 1.0, 0, head_width=0.1, head_length=0.1, fc='green', ec='green')
    ax1.text(0.1, y, f'Input {i+1}', ha='right', va='center', fontsize=9)

# Axon (output)
ax1.arrow(2.8, 3, 6.5, 0, head_width=0.15, head_length=0.2, fc='red', ec='red', linewidth=2)
ax1.text(9.5, 3, 'Output', ha='left', va='center', fontsize=10)

ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_aspect('equal')

# Artificial Neuron (Perceptron)
ax2.set_xlim(0, 10)
ax2.set_ylim(0, 6)
ax2.set_title('Artificial Neuron (Perceptron)', fontsize=14, fontweight='bold')

# Processing unit
circle = Circle((5, 3), 1, color='lightcoral', alpha=0.7)
ax2.add_patch(circle)
ax2.text(5, 3, 'Σ', ha='center', va='center', fontsize=20, fontweight='bold')

# Inputs with weights
inputs = ['x₁', 'x₂', 'x₃']
weights = ['w₁', 'w₂', 'w₃']
for i, (inp, w) in enumerate(zip(inputs, weights)):
    y = 2 + i * 0.7
    ax2.arrow(1, y, 2.8, 3-y, head_width=0.1, head_length=0.1, fc='blue', ec='blue')
    ax2.text(0.8, y, inp, ha='right', va='center', fontsize=12)
    ax2.text(2.5, y+0.2, w, ha='center', va='center', fontsize=10, color='blue')

# Bias
ax2.arrow(5, 1, 0, 1.8, head_width=0.1, head_length=0.1, fc='purple', ec='purple')
ax2.text(5, 0.8, 'bias', ha='center', va='center', fontsize=10, color='purple')

# Output
ax2.arrow(6, 3, 2.5, 0, head_width=0.15, head_length=0.2, fc='red', ec='red', linewidth=2)
ax2.text(8.8, 3, 'y', ha='left', va='center', fontsize=12)

ax2.set_xticks([])
ax2.set_yticks([])
ax2.set_aspect('equal')

plt.tight_layout()
plt.show()

print("Key Similarities:")
print("• Both receive multiple inputs")
print("• Both process and integrate information")
print("• Both produce a single output")
print("• Both can learn and adapt")

ModuleNotFoundError: No module named 'matplotlib'

## How the Perceptron Works

### The Mathematical Model

A perceptron takes multiple inputs and produces a binary output (0 or 1). Here's how:

1. **Inputs**: $x_1, x_2, ..., x_n$ (features)
2. **Weights**: $w_1, w_2, ..., w_n$ (learned parameters)
3. **Bias**: $b$ (threshold adjustment)
4. **Weighted Sum**: $z = w_1x_1 + w_2x_2 + ... + w_nx_n + b$
5. **Activation**: $y = \begin{cases} 1 & \text{if } z \geq 0 \\ 0 & \text{if } z < 0 \end{cases}$

### In Vector Form:
- $z = \mathbf{w}^T\mathbf{x} + b$
- $y = \text{step}(z)$

In [None]:
# Let's implement a simple perceptron step by step
def step_function(z):
    """Step activation function"""
    return 1 if z >= 0 else 0

def perceptron_output(inputs, weights, bias):
    """Calculate perceptron output"""
    # Calculate weighted sum
    z = np.dot(weights, inputs) + bias
    
    # Apply step function
    output = step_function(z)
    
    return z, output

# Example: Simple AND gate
print("=== Perceptron as AND Gate ===")
print("Trying to learn: output = 1 only when BOTH inputs are 1")
print()

# Weights and bias for AND gate
weights = np.array([0.5, 0.5])  # Equal importance to both inputs
bias = -0.7  # High threshold

# Test all possible inputs
test_inputs = [[0, 0], [0, 1], [1, 0], [1, 1]]
expected_outputs = [0, 0, 0, 1]  # AND gate truth table

print("Input 1 | Input 2 | Weighted Sum | Output | Expected")
print("-" * 55)

for i, inputs in enumerate(test_inputs):
    z, output = perceptron_output(inputs, weights, bias)
    expected = expected_outputs[i]
    status = "✓" if output == expected else "✗"
    print(f"   {inputs[0]}    |    {inputs[1]}    |    {z:6.1f}    |   {output}    |    {expected}     {status}")

print("\nThe perceptron successfully implements an AND gate!")

In [None]:
# Visualize how the perceptron makes decisions
def plot_perceptron_decision(weights, bias, title="Perceptron Decision Boundary"):
    plt.figure(figsize=(10, 8))
    
    # Create a grid of points
    x1 = np.linspace(-2, 2, 100)
    x2 = np.linspace(-2, 2, 100)
    X1, X2 = np.meshgrid(x1, x2)
    
    # Calculate decision boundary: w1*x1 + w2*x2 + b = 0
    # Solving for x2: x2 = -(w1*x1 + b) / w2
    if weights[1] != 0:
        boundary_x2 = -(weights[0] * x1 + bias) / weights[1]
        plt.plot(x1, boundary_x2, 'k-', linewidth=3, label='Decision Boundary')
    
    # Calculate outputs for all grid points
    Z = np.zeros_like(X1)
    for i in range(X1.shape[0]):
        for j in range(X1.shape[1]):
            z_val = weights[0] * X1[i,j] + weights[1] * X2[i,j] + bias
            Z[i,j] = step_function(z_val)
    
    # Plot decision regions
    plt.contourf(X1, X2, Z, levels=[0, 0.5, 1], colors=['lightcoral', 'lightblue'], alpha=0.6)
    
    # Plot the test points
    test_points = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
    colors = ['red', 'red', 'red', 'blue']  # AND gate: only [1,1] is blue (output=1)
    
    for i, (point, color) in enumerate(zip(test_points, colors)):
        plt.scatter(point[0], point[1], c=color, s=200, edgecolors='black', linewidth=2)
        plt.annotate(f'({point[0]},{point[1]})', (point[0], point[1]), 
                    xytext=(10, 10), textcoords='offset points', fontsize=12)
    
    plt.xlim(-0.5, 1.5)
    plt.ylim(-0.5, 1.5)
    plt.xlabel('Input 1 (x₁)', fontsize=12)
    plt.ylabel('Input 2 (x₂)', fontsize=12)
    plt.title(title, fontsize=14)
    plt.grid(True, alpha=0.3)
    plt.legend()
    
    # Add region labels
    plt.text(0.2, 0.2, 'Output = 0\n(Red Region)', fontsize=11, 
             bbox=dict(boxstyle="round,pad=0.3", facecolor="lightcoral", alpha=0.8))
    plt.text(1.2, 1.2, 'Output = 1\n(Blue Region)', fontsize=11,
             bbox=dict(boxstyle="round,pad=0.3", facecolor="lightblue", alpha=0.8))
    
    plt.show()

# Visualize the AND gate perceptron
plot_perceptron_decision(weights, bias, "Perceptron Decision Boundary: AND Gate")

print(f"Decision boundary equation: {weights[0]:.1f}x₁ + {weights[1]:.1f}x₂ + {bias:.1f} = 0")
print("Points above/right of the line output 1, points below/left output 0")

## Perceptron Learning Algorithm

The beauty of the perceptron is that it can **learn** the correct weights automatically!

### Learning Rule:
For each training example:
1. Make a prediction: $\hat{y} = \text{step}(\mathbf{w}^T\mathbf{x} + b)$
2. Calculate error: $e = y - \hat{y}$ (where $y$ is the true label)
3. Update weights: $\mathbf{w} = \mathbf{w} + \eta \cdot e \cdot \mathbf{x}$
4. Update bias: $b = b + \eta \cdot e$

Where $\eta$ (eta) is the **learning rate**.

In [None]:
# Implement the perceptron learning algorithm
class Perceptron:
    def __init__(self, learning_rate=0.1, max_epochs=100):
        self.learning_rate = learning_rate
        self.max_epochs = max_epochs
        self.weights = None
        self.bias = None
        self.errors = []
    
    def fit(self, X, y):
        """Train the perceptron"""
        # Initialize weights and bias
        n_features = X.shape[1]
        self.weights = np.random.normal(0, 0.01, n_features)
        self.bias = 0
        
        # Training loop
        for epoch in range(self.max_epochs):
            total_error = 0
            
            for i in range(len(X)):
                # Forward pass
                z = np.dot(X[i], self.weights) + self.bias
                prediction = 1 if z >= 0 else 0
                
                # Calculate error
                error = y[i] - prediction
                
                # Update weights and bias
                if error != 0:  # Only update if there's an error
                    self.weights += self.learning_rate * error * X[i]
                    self.bias += self.learning_rate * error
                
                total_error += abs(error)
            
            self.errors.append(total_error)
            
            # Stop if perfect classification
            if total_error == 0:
                print(f"Perfect classification achieved in {epoch + 1} epochs!")
                break
    
    def predict(self, X):
        """Make predictions"""
        z = np.dot(X, self.weights) + self.bias
        return (z >= 0).astype(int)

# Train a perceptron to learn the AND gate
X_train = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_train = np.array([0, 0, 0, 1])  # AND gate outputs

# Create and train perceptron
perceptron = Perceptron(learning_rate=0.1, max_epochs=100)
print("Training perceptron to learn AND gate...")
perceptron.fit(X_train, y_train)

# Test the trained perceptron
predictions = perceptron.predict(X_train)

print("\n=== Results ===")
print("Input | Expected | Predicted | Correct?")
print("-" * 38)
for i in range(len(X_train)):
    correct = "✓" if predictions[i] == y_train[i] else "✗"
    print(f"{X_train[i]} |    {y_train[i]}     |     {predictions[i]}     |   {correct}")

print(f"\nLearned weights: {perceptron.weights}")
print(f"Learned bias: {perceptron.bias:.3f}")

In [None]:
# Visualize the learning process
plt.figure(figsize=(12, 5))

# Plot 1: Learning curve
plt.subplot(1, 2, 1)
plt.plot(perceptron.errors, 'b-', linewidth=2, marker='o')
plt.xlabel('Epoch')
plt.ylabel('Total Error')
plt.title('Perceptron Learning Curve')
plt.grid(True, alpha=0.3)

# Plot 2: Final decision boundary
plt.subplot(1, 2, 2)
plot_perceptron_decision(perceptron.weights, perceptron.bias, "Learned Decision Boundary")

plt.tight_layout()
plt.show()

print("The perceptron successfully learned to separate the classes!")

## The XOR Problem: Perceptron's Limitation

In 1969, Marvin Minsky and Seymour Papert published "Perceptrons" and showed a critical limitation: the perceptron cannot solve the XOR problem.

### XOR (Exclusive OR) Truth Table:
- (0, 0) → 0
- (0, 1) → 1  
- (1, 0) → 1
- (1, 1) → 0

The problem is that XOR is **not linearly separable** - you cannot draw a single straight line to separate the classes.

In [None]:
# Demonstrate the XOR problem
plt.figure(figsize=(12, 5))

# XOR data
X_xor = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_xor = np.array([0, 1, 1, 0])  # XOR outputs

# Plot 1: XOR problem visualization
plt.subplot(1, 2, 1)
colors = ['red', 'blue', 'blue', 'red']
for i, (point, color, label) in enumerate(zip(X_xor, colors, y_xor)):
    plt.scatter(point[0], point[1], c=color, s=200, edgecolors='black', linewidth=2)
    plt.annotate(f'({point[0]},{point[1]})→{label}', (point[0], point[1]), 
                xytext=(10, 10), textcoords='offset points', fontsize=12)

plt.xlim(-0.5, 1.5)
plt.ylim(-0.5, 1.5)
plt.xlabel('Input 1')
plt.ylabel('Input 2')
plt.title('XOR Problem: Not Linearly Separable')
plt.grid(True, alpha=0.3)

# Try to show that no single line can separate the classes
plt.text(0.5, 1.3, 'No single straight line\ncan separate red from blue!', 
         ha='center', fontsize=12, 
         bbox=dict(boxstyle="round,pad=0.3", facecolor="yellow", alpha=0.7))

# Plot 2: Try training perceptron on XOR (it will fail)
plt.subplot(1, 2, 2)
perceptron_xor = Perceptron(learning_rate=0.1, max_epochs=50)
perceptron_xor.fit(X_xor, y_xor)

plt.plot(perceptron_xor.errors, 'r-', linewidth=2, marker='o')
plt.xlabel('Epoch')
plt.ylabel('Total Error')
plt.title('Perceptron Cannot Learn XOR')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("The perceptron fails to learn XOR because it's not linearly separable.")
print("This limitation led to the 'AI Winter' of the 1970s-1980s.")
print("\nSolution: Multi-layer networks (coming in future lessons!)")

## Applications of the Perceptron

Despite its limitations, the perceptron is still useful for:

### 1. Linearly Separable Problems
- Simple binary classification
- Email spam detection (linear features)
- Sentiment analysis (simple cases)

### 2. Building Blocks
- Foundation for multi-layer networks
- Understanding neural network basics
- Educational purposes

### 3. Real-World Examples
- Early optical character recognition
- Simple pattern recognition
- Feature selection in preprocessing

In [None]:
# Example: Using perceptron for a real classification problem
# Let's create a simple dataset where perceptron will work well

np.random.seed(42)

# Generate linearly separable data
n_samples = 100
X1 = np.random.randn(n_samples//2, 2) + np.array([2, 2])
X2 = np.random.randn(n_samples//2, 2) + np.array([-2, -2])

X = np.vstack([X1, X2])
y = np.hstack([np.ones(n_samples//2), np.zeros(n_samples//2)])

# Train perceptron
perceptron_real = Perceptron(learning_rate=0.01, max_epochs=100)
perceptron_real.fit(X, y)

# Visualize results
plt.figure(figsize=(10, 8))

# Plot data points
colors = ['red', 'blue']
for i in range(2):
    mask = y == i
    plt.scatter(X[mask, 0], X[mask, 1], c=colors[i], alpha=0.7, s=50, 
                label=f'Class {i}')

# Plot decision boundary
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1

xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
                     np.arange(y_min, y_max, 0.1))

mesh_points = np.c_[xx.ravel(), yy.ravel()]
Z = perceptron_real.predict(mesh_points)
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.3, colors=['lightcoral', 'lightblue'])
plt.contour(xx, yy, Z, colors='black', linewidths=2, linestyles='--')

plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Perceptron on Linearly Separable Data')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Calculate accuracy
predictions = perceptron_real.predict(X)
accuracy = np.mean(predictions == y)
print(f"Accuracy: {accuracy:.1%}")
print("The perceptron works perfectly on linearly separable data!")

## Key Takeaways

### Strengths of the Perceptron:
1. **Simple and intuitive** - easy to understand and implement
2. **Guaranteed convergence** - will find solution if one exists
3. **Fast training** - linear in the number of examples
4. **Interpretable** - weights show feature importance
5. **Foundation** - basis for more complex neural networks

### Limitations:
1. **Linear separation only** - cannot solve XOR and similar problems
2. **Binary output** - only 0 or 1
3. **Step function** - not differentiable (limits learning methods)
4. **No probabilistic output** - just hard classifications

### Historical Impact:
- **1958-1969**: Great excitement about perceptrons
- **1969**: Minsky & Papert showed limitations → AI Winter
- **1980s**: Multi-layer networks solved XOR problem
- **Today**: Perceptrons still used as building blocks

## Discussion Questions

1. Why was the XOR problem so devastating to AI research in the 1970s?
2. Can you think of real-world problems that are linearly separable?
3. How might we modify the perceptron to handle non-linear problems?

---

**Next**: We'll explore **Activation Functions** - the key to making neural networks more powerful and flexible!