# Module 4 - Exercise 1: The Perceptron

<a href="https://colab.research.google.com/github/jumpingsphinx/jumpingsphinx.github.io/blob/main/notebooks/module4-neural-networks/exercise1-perceptron.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Learning Objectives

By the end of this exercise, you will be able to:

- Implement a single perceptron from scratch
- Understand activation functions (sigmoid, tanh, ReLU)
- Train with the perceptron learning algorithm
- Visualize decision boundaries
- Recognize linear separability limitations
- Understand the XOR problem

## Prerequisites

- Completion of Modules 1-3
- Understanding of linear algebra and optimization
- Familiarity with classification concepts

## Setup

Run this cell first to import required libraries:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification, load_iris, make_moons, make_circles
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron as SklearnPerceptron
from sklearn.metrics import accuracy_score

# Set random seed for reproducibility
np.random.seed(42)

print("NumPy version:", np.__version__)
print("Setup complete!")

---

## Part 1: Activation Functions

### Background

Activation functions introduce non-linearity into neural networks. The perceptron uses:

$$\hat{y} = f(\mathbf{w}^T \mathbf{x} + b)$$

Common activation functions:
- **Step**: $f(z) = 1$ if $z \geq 0$ else $0$
- **Sigmoid**: $\sigma(z) = \frac{1}{1 + e^{-z}}$
- **Tanh**: $\tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}$
- **ReLU**: $\text{ReLU}(z) = \max(0, z)$

### Exercise 1.1: Implement Activation Functions

**Task:** Complete the activation functions below.

In [None]:
def step_function(z):
    """
    Step activation function.
    
    Returns 1 if z >= 0, else 0
    """
    if z >= 0:
        return 1
    return 0
def sigmoid(z):
    """
    Sigmoid activation function.
    
    σ(z) = 1 / (1 + e^(-z))
    """
    return 1 / (1 + np.exp(-z))
def tanh(z):
    """
    Hyperbolic tangent activation function.
    """
    return np.tanh(z)
def relu(z):
    """
    ReLU (Rectified Linear Unit) activation function.
    
    ReLU(z) = max(0, z)
    """
    return np.maximum(0, z)
# Test activations
z_test = np.array([-2, -1, 0, 1, 2])
print("Input:", z_test)
print("Step:", step_function(z_test))
print("Sigmoid:", sigmoid(z_test))
print("Tanh:", tanh(z_test))
print("ReLU:", relu(z_test))

### Exercise 1.2: Visualize Activation Functions

**Task:** Create a visualization comparing all activation functions.

In [None]:
# Create input range
z = np.linspace(-5, 5, 200)

axes[0, 1].plot(z, sigmoid(z), 'r-', linewidth=2)
axes[0, 1].set_title('Sigmoid Function', fontsize=14, fontweight='bold')
axes[0, 1].set_xlabel('z')
axes[0, 1].set_ylabel('σ(z)')
axes[0, 1].grid(True, alpha=0.3)
axes[0, 1].axhline(y=0, color='k', linewidth=0.5)
axes[0, 1].axvline(x=0, color='k', linewidth=0.5)
axes[1, 0].plot(z, tanh(z), 'g-', linewidth=2)
axes[1, 0].set_title('Tanh Function', fontsize=14, fontweight='bold')
axes[1, 0].set_xlabel('z')
axes[1, 0].set_ylabel('tanh(z)')
axes[1, 0].grid(True, alpha=0.3)
axes[1, 0].axhline(y=0, color='k', linewidth=0.5)
axes[1, 0].axvline(x=0, color='k', linewidth=0.5)
axes[1, 1].plot(z, relu(z), 'm-', linewidth=2)
axes[1, 1].set_title('ReLU Function', fontsize=14, fontweight='bold')
axes[1, 1].set_xlabel('z')
axes[1, 1].set_ylabel('ReLU(z)')
axes[1, 1].grid(True, alpha=0.3)
axes[1, 1].axhline(y=0, color='k', linewidth=0.5)
axes[1, 1].axvline(x=0, color='k', linewidth=0.5)
        # 1. Compute z = X @ weights + bias
        z = np.dot(X, self.weights) + self.bias


---

## Part 2: Implement Perceptron Class

### Background

A perceptron computes:
1. Linear combination: $z = \mathbf{w}^T \mathbf{x} + b$
2. Apply activation: $\hat{y} = f(z)$

Training uses the perceptron learning rule:
- $\mathbf{w} \leftarrow \mathbf{w} + \eta (y - \hat{y}) \mathbf{x}$
- $b \leftarrow b + \eta (y - \hat{y})$

### Exercise 2.1: Complete the Perceptron Class

**Task:** Implement the missing methods in the Perceptron class.

In [None]:
class Perceptron:
    """
    Perceptron classifier.
    
    Parameters:
    -----------
    n_inputs : int
        Number of input features
    activation : str
        Activation function ('step', 'sigmoid', 'tanh', 'relu')
    learning_rate : float
        Learning rate for weight updates
    """
    
    def __init__(self, n_inputs, activation='sigmoid', learning_rate=0.01):
        # Initialize weights with small random values
        self.weights = np.random.randn(n_inputs) * 0.01
        self.bias = 0.0
        self.learning_rate = learning_rate
        
        # Set activation function
        self.activation_name = activation
        if activation == 'step':
            self.activation = step_function
        elif activation == 'sigmoid':
            self.activation = sigmoid
        elif activation == 'tanh':
            self.activation = tanh
        elif activation == 'relu':
            self.activation = relu
        else:
            raise ValueError(f"Unknown activation: {activation}")
    
    def predict(self, X):
        """
        Make predictions for input data.
        
        Parameters:
        -----------
        X : array-like, shape (n_samples, n_features)
            Input data
        
        Returns:
        --------
        predictions : array, shape (n_samples,)
            Predicted values
        """
            # weights: w = w + learning_rate * X^T @ errors
            # Need to handle dimensions carefully for single sample vs batch
            # But Perceptron usually updates sample by sample or batch
            # efficient vectorization:
            update = self.learning_rate * np.dot(errors, X)
            self.weights += update
        return self.activation(z)
    
    def fit(self, X, y, epochs=100, verbose=True):
        """
        Train the perceptron.
        
        Parameters:
        -----------
        X : array-like, shape (n_samples, n_features)
            Training data
        y : array-like, shape (n_samples,)
            Target values (0 or 1)
        epochs : int
            Number of training epochs
        verbose : bool
            Print progress
        """
        for epoch in range(epochs):
            # Make predictions
            predictions = self.predict(X)
            
            # Compute errors
            errors = y - predictions
            
            self.bias += self.learning_rate * np.sum(errors)
                print(f"Epoch {epoch+1}/{epochs}, Accuracy: {accuracy:.4f}")
    
    def score(self, X, y):
        """
        Calculate accuracy.
        
        Parameters:
        -----------
        X : features
        y : true labels
        
        Returns:
        --------
        accuracy : float
        """
        predictions = self.predict(X)
        predictions = self.predict(X)
        binary_preds = (predictions > 0.5).astype(int)
        return np.mean(binary_preds == y)
print("Perceptron class implemented!")

---

## Part 3: Logic Gates - Linearly Separable Problems

### Exercise 3.1: Learn the AND Gate

**Task:** Train a perceptron to learn the AND logic gate.

In [None]:
# AND gate truth table
X_and = np.array([[0, 0],
                  [0, 1],
                  [1, 0],
                  [1, 1]])
y_and = np.array([0, 0, 0, 1])

print("AND Gate Truth Table:")
print("x1  x2  | AND")
print("-" * 15)
for i in range(len(X_and)):
    print(f"{X_and[i, 0]}   {X_and[i, 1]}   | {y_and[i]}")

# Your code here: Create and train perceptron
perceptron_and = Perceptron(n_inputs=2, activation='sigmoid', learning_rate=0.1)
perceptron_and.fit(X_and, y_and, epochs=100)

# Test predictions
print("\nFinal Predictions:")
predictions = perceptron_and.predict(X_and)
for i in range(len(X_and)):
    print(f"Input: {X_and[i]}, True: {y_and[i]}, Predicted: {predictions[i]:.4f}, Class: {int(predictions[i] > 0.5)}")

accuracy = perceptron_and.score(X_and, y_and)
print(f"\nFinal Accuracy: {accuracy:.2%}")

assert accuracy == 1.0, "Perceptron should learn AND gate perfectly"
print("✓ AND gate learned successfully!")

### Exercise 3.2: Learn the OR Gate

**Task:** Train a perceptron to learn the OR logic gate.

In [None]:
# Your code here: Create OR gate dataset
X_or = np.array([[0, 0],
                 [0, 1],
                 [1, 0],
                 [1, 1]])
y_or = # Your code here

# Train perceptron
# Your code here

# Test and evaluate
# Your code here

### Exercise 3.3: Visualize Decision Boundaries

**Task:** Create a function to visualize the perceptron's decision boundary.

In [None]:
def plot_decision_boundary(perceptron, X, y, title="Decision Boundary"):
    """
    Plot decision boundary for 2D data.
    
    Parameters:
    -----------
    perceptron : Perceptron
        Trained perceptron
    X : array, shape (n_samples, 2)
        Input data
    y : array, shape (n_samples,)
        Labels
    title : str
        Plot title
    """
    # Create mesh
    x_min, x_max = -0.5, 1.5
    y_min, y_max = -0.5, 1.5
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))
    
    # Your code here: Make predictions on mesh
    Z = perceptron.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot
    plt.figure(figsize=(10, 7))
    plt.contourf(xx, yy, Z, levels=20, cmap='RdYlBu', alpha=0.8)
    plt.colorbar(label='Prediction')
    
    # Plot data points
    plt.scatter(X[y==0, 0], X[y==0, 1], c='blue', s=100,
                edgecolors='k', label='Class 0', marker='o')
    plt.scatter(X[y==1, 0], X[y==1, 1], c='red', s=100,
                edgecolors='k', label='Class 1', marker='s')
    
    # Plot decision boundary
    plt.contour(xx, yy, Z, levels=[0.5], colors='black', linewidths=2)
    
    plt.xlabel('$x_1$', fontsize=12)
    plt.ylabel('$x_2$', fontsize=12)
    plt.title(title, fontsize=14, fontweight='bold')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.xlim(x_min, x_max)
    plt.ylim(y_min, y_max)
    plt.show()

# Visualize AND gate
plot_decision_boundary(perceptron_and, X_and, y_and, "Perceptron Decision Boundary: AND Gate")

---

## Part 4: The XOR Problem - Non-Linearly Separable

### Exercise 4.1: Attempt to Learn XOR

**Task:** Try to train a perceptron on the XOR problem and observe the failure.

In [None]:
# XOR gate truth table
X_xor = np.array([[0, 0],
                  [0, 1],
                  [1, 0],
                  [1, 1]])
y_xor = np.array([0, 1, 1, 0])  # XOR output

print("XOR Gate Truth Table:")
print("x1  x2  | XOR")
print("-" * 15)
for i in range(len(X_xor)):
    print(f"{X_xor[i, 0]}   {X_xor[i, 1]}   | {y_xor[i]}")

# Visualize XOR problem first
plt.figure(figsize=(8, 6))
plt.scatter(X_xor[y_xor==0, 0], X_xor[y_xor==0, 1], c='blue', s=200,
            edgecolors='k', label='Class 0', marker='o')
plt.scatter(X_xor[y_xor==1, 0], X_xor[y_xor==1, 1], c='red', s=200,
            edgecolors='k', label='Class 1', marker='s')
plt.xlabel('$x_1$', fontsize=12)
plt.ylabel('$x_2$', fontsize=12)
plt.title('XOR Problem: Not Linearly Separable!', fontsize=14, fontweight='bold')
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.xlim(-0.5, 1.5)
plt.ylim(-0.5, 1.5)
plt.show()

print("\nCan you draw a single straight line to separate blue from red?")
print("NO! This is why a single perceptron fails on XOR.\n")

# Train perceptron on XOR (it will fail!)
perceptron_xor = Perceptron(n_inputs=2, activation='sigmoid', learning_rate=0.1)
perceptron_xor.fit(X_xor, y_xor, epochs=200)

# Evaluate
accuracy_xor = perceptron_xor.score(X_xor, y_xor)
print(f"\nFinal Accuracy on XOR: {accuracy_xor:.2%}")
print("Notice: The perceptron cannot achieve 100% accuracy!")

# Visualize the failed attempt
plot_decision_boundary(perceptron_xor, X_xor, y_xor, "Perceptron Fails on XOR")

---

## Part 5: Real Dataset - Iris Classification

### Exercise 5.1: Binary Classification on Iris

**Task:** Apply perceptron to classify two species of Iris flowers.

In [None]:
# Load Iris dataset (use only 2 classes for binary classification)
iris = load_iris()
mask = iris.target != 2  # Remove third class
X_iris = iris.data[mask, :2]  # Use only first 2 features
y_iris = iris.target[mask]

print("Iris Dataset (Binary Classification):")
print(f"Samples: {len(X_iris)}")
print(f"Features: {iris.feature_names[:2]}")
print(f"Classes: Setosa (0) vs Versicolor (1)\n")

# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(
    X_iris, y_iris, test_size=0.3, random_state=42
)

# Feature scaling is important!
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Your code here: Train perceptron
perceptron_iris = 

# Evaluate
train_acc = perceptron_iris.score(X_train_scaled, y_train)
test_acc = perceptron_iris.score(X_test_scaled, y_test)

print(f"Training Accuracy: {train_acc:.2%}")
print(f"Test Accuracy: {test_acc:.2%}")

# Compare with sklearn
sklearn_perceptron = SklearnPerceptron(max_iter=100, eta0=0.1, random_state=42)
sklearn_perceptron.fit(X_train_scaled, y_train)
sklearn_acc = sklearn_perceptron.score(X_test_scaled, y_test)
print(f"\nSklearn Perceptron Accuracy: {sklearn_acc:.2%}")

### Exercise 5.2: Visualize Iris Decision Boundary

**Task:** Modify the plotting function to work with the Iris dataset.

In [None]:
# Your code here: Create visualization showing decision boundary on Iris data
# Hint: You'll need to transform the mesh grid using the scaler

def plot_iris_boundary(perceptron, X, y, scaler, title):
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))
    
    # Scale the mesh grid
    mesh_points = np.c_[xx.ravel(), yy.ravel()]
    mesh_scaled = scaler.transform(mesh_points)
    Z = perceptron.predict(mesh_scaled)
    Z = Z.reshape(xx.shape)
    
    # Plot
    plt.figure(figsize=(10, 7))
    plt.contourf(xx, yy, Z, levels=20, cmap='RdYlBu', alpha=0.6)
    plt.colorbar(label='Prediction')
    
    # Plot data
    plt.scatter(X[y==0, 0], X[y==0, 1], c='blue', s=60,
                edgecolors='k', label='Setosa', alpha=0.7)
    plt.scatter(X[y==1, 0], X[y==1, 1], c='red', s=60,
                edgecolors='k', label='Versicolor', alpha=0.7)
    
    plt.contour(xx, yy, Z, levels=[0.5], colors='black', linewidths=2.5)
    
    plt.xlabel(iris.feature_names[0], fontsize=12)
    plt.ylabel(iris.feature_names[1], fontsize=12)
    plt.title(title, fontsize=14, fontweight='bold')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.show()

plot_iris_boundary(perceptron_iris, X_train, y_train, scaler, 
                   "Perceptron: Iris Classification (Setosa vs Versicolor)")

---

## Part 6: Limitations of the Perceptron

### Exercise 6.1: Non-Linearly Separable Datasets

**Task:** Test the perceptron on datasets that are not linearly separable.

In [None]:
# Create non-linearly separable datasets
X_moons, y_moons = make_moons(n_samples=200, noise=0.2, random_state=42)
X_circles, y_circles = make_circles(n_samples=200, noise=0.1, factor=0.5, random_state=42)

# Visualize datasets
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].scatter(X_moons[y_moons==0, 0], X_moons[y_moons==0, 1], c='blue', alpha=0.7)
axes[0].scatter(X_moons[y_moons==1, 0], X_moons[y_moons==1, 1], c='red', alpha=0.7)
axes[0].set_title('Moons Dataset (Non-linearly Separable)', fontsize=14)
axes[0].grid(True, alpha=0.3)

axes[1].scatter(X_circles[y_circles==0, 0], X_circles[y_circles==0, 1], c='blue', alpha=0.7)
axes[1].scatter(X_circles[y_circles==1, 0], X_circles[y_circles==1, 1], c='red', alpha=0.7)
axes[1].set_title('Circles Dataset (Non-linearly Separable)', fontsize=14)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Your code here: Train perceptron on both datasets and report accuracy
# The perceptron should perform poorly on both

# Moons
perceptron_moons = 
accuracy_moons = 

# Circles
perceptron_circles = 
accuracy_circles = 

print(f"Moons Accuracy: {accuracy_moons:.2%}")
print(f"Circles Accuracy: {accuracy_circles:.2%}")
print("\nNote: Single perceptrons struggle with non-linearly separable data!")
print("Solution: Multi-layer neural networks (next lesson)")

---

## Challenge Problems (Optional)

### Challenge 1: Implement Derivatives

Implement the derivatives of activation functions (needed for backpropagation).

In [None]:
def sigmoid_derivative(z):
    """
    Derivative of sigmoid: σ'(z) = σ(z)(1 - σ(z))
    """
    # Your code here
    s = sigmoid(z)
    return 

def tanh_derivative(z):
    """
    Derivative of tanh: tanh'(z) = 1 - tanh²(z)
    """
    # Your code here
    return 

def relu_derivative(z):
    """
    Derivative of ReLU: 1 if z > 0, else 0
    """
    # Your code here
    return 

# Test
z_test = np.linspace(-3, 3, 100)
plt.figure(figsize=(12, 4))

plt.subplot(1, 3, 1)
plt.plot(z_test, sigmoid_derivative(z_test), linewidth=2)
plt.title("Sigmoid Derivative")
plt.grid(True, alpha=0.3)

plt.subplot(1, 3, 2)
plt.plot(z_test, tanh_derivative(z_test), linewidth=2)
plt.title("Tanh Derivative")
plt.grid(True, alpha=0.3)

plt.subplot(1, 3, 3)
plt.plot(z_test, relu_derivative(z_test), linewidth=2)
plt.title("ReLU Derivative")
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### Challenge 2: Multi-Class Perceptron

Extend the perceptron to handle more than 2 classes using one-vs-rest strategy.

In [None]:
class MultiClassPerceptron:
    """
    Multi-class perceptron using one-vs-rest strategy.
    """
    
    def __init__(self, n_classes):
        self.n_classes = n_classes
        self.perceptrons = []
    
    def fit(self, X, y, epochs=100):
        # Your code here
        # Train one perceptron for each class
        pass
    
    def predict(self, X):
        # Your code here
        # Choose class with highest confidence
        pass

print("Challenge: Implement multi-class perceptron!")

### Challenge 3: Implement Pocket Algorithm

The Pocket algorithm improves the perceptron by keeping track of the best weights.

In [None]:
class PocketPerceptron(Perceptron):
    """
    Pocket Perceptron: keeps best weights seen during training.
    Useful for non-separable data.
    """
    
    def fit(self, X, y, epochs=100, verbose=True):
        # Your code here
        # Track best weights and their accuracy
        # Update pocket when current weights are better
        pass

print("Challenge: Implement pocket algorithm!")

---

## Reflection Questions

1. **Why does the perceptron fail on XOR?**
   - Think about linear separability

2. **When would you use sigmoid vs ReLU activation?**
   - Consider output interpretation and gradient flow

3. **Why is feature scaling important for perceptrons?**
   - How do different feature scales affect the dot product?

4. **What does the perceptron learning rule do geometrically?**
   - How does it adjust the decision boundary?

5. **How is the perceptron related to logistic regression?**
   - Compare the update rules and loss functions

---

## Summary

In this exercise, you learned:

- How to implement a perceptron from scratch
- Different activation functions and their properties
- The perceptron learning algorithm
- How to visualize decision boundaries
- The limitation of perceptrons: cannot learn non-linearly separable patterns
- Why we need multi-layer networks (next lesson!)

**Key Takeaways:**

- Perceptron = linear classifier with activation function
- Can learn AND, OR but not XOR
- Decision boundary is a hyperplane
- Foundation for understanding neural networks

**Next Steps:**

- Complete Exercise 2 on Feedforward Networks
- Review [Lesson 1: The Perceptron](https://jumpingsphinx.github.io/module4-neural-networks/01-perceptron/)
- Experiment with different activation functions and learning rates

---

**Need help?** Check the solution notebook or open an issue on [GitHub](https://github.com/jumpingsphinx/jumpingsphinx.github.io/issues).