# Lab 3 – Module 2: Activation Functions in Detail

**Time:** ~5 minutes

---

In Module 1 you saw how activation functions bend space. Now let’s zoom in on **how they behave with different inputs** — especially very large ones — and why that matters for a model that needs to *learn*.

## 1. Setup

Run this cell to load three activation functions: **Sigmoid**, **ReLU**, and **Step**.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import FloatSlider, interact

def sigmoid(x):
    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))

def relu(x):
    return np.maximum(0, x)

def step(x):
    return (x > 0).astype(float)

print('Activation functions loaded!')

## 2. Test All Three on the Same Input

Drag the slider to feed the **same number** into Sigmoid, ReLU, and Step.  
Pay special attention to what happens at **very large positive values** (like 100) and **very large negative values** (like –100).

Questions to think about while exploring:
- Which function’s output keeps growing when the input gets bigger?
- Which function’s output flattens out no matter how big the input gets?
- Which function gives you the least information about the input?

In [None]:
def compare_activations(input_value):
    x = np.array([input_value])
    results = {
        'Sigmoid': sigmoid(x)[0],
        'ReLU':    relu(x)[0],
        'Step':    step(x)[0],
    }

    fig, (ax_graph, ax_bar) = plt.subplots(1, 2, figsize=(14, 5), dpi=100,
                                            gridspec_kw={'width_ratios': [2, 1]})

    # --- Left: function curves with current input marked ---
    t = np.linspace(-6, 6, 300)
    for func, name, color in [(sigmoid, 'Sigmoid', '#7b2d8e'),
                               (relu,    'ReLU',    '#2d8e4e'),
                               (step,    'Step',    '#c0392b')]:
        ax_graph.plot(t, func(t), color=color, lw=2.5, label=name)
    # Marker for current input (clamped to visible range on x-axis)
    x_vis = np.clip(input_value, -6, 6)
    for func, color in [(sigmoid, '#7b2d8e'), (relu, '#2d8e4e'), (step, '#c0392b')]:
        ax_graph.plot(x_vis, func(np.array([input_value]))[0], 'o', color=color,
                      markersize=10, markeredgecolor='k', zorder=5)
    ax_graph.axhline(0, color='k', lw=0.5); ax_graph.axvline(0, color='k', lw=0.5)
    ax_graph.set_xlabel('Input', fontsize=12); ax_graph.set_ylabel('Output', fontsize=12)
    ax_graph.set_title(f'Activation curves  (input = {input_value:.1f})', fontsize=13, fontweight='bold')
    ax_graph.legend(fontsize=11); ax_graph.grid(True, alpha=0.3)
    ax_graph.set_xlim(-6, 6); ax_graph.set_ylim(-0.5, 6.5)

    # --- Right: bar chart of outputs ---
    names  = list(results.keys())
    values = list(results.values())
    colors = ['#7b2d8e', '#2d8e4e', '#c0392b']
    bars = ax_bar.bar(names, values, color=colors, alpha=0.8, edgecolor='k', lw=1.5)
    for bar, val in zip(bars, values):
        ax_bar.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.15,
                    f'{val:.3f}', ha='center', va='bottom', fontsize=11, fontweight='bold')
    ax_bar.set_ylabel('Output', fontsize=12)
    ax_bar.set_title('Output values', fontsize=13, fontweight='bold')
    ax_bar.set_ylim(-0.3, max(6, max(values) + 1))
    ax_bar.grid(True, alpha=0.3, axis='y')

    plt.tight_layout(); plt.show()

    # Contextual hint
    if input_value > 5:
        print('Notice: Sigmoid is stuck near 1.000 while ReLU keeps climbing.')
        print('That "stuck" behavior is called SATURATION.')
    elif input_value < -5:
        print('Notice: Sigmoid is stuck near 0.000, ReLU is exactly 0, Step is exactly 0.')
    elif abs(input_value) < 1:
        print('Near zero all three functions give noticeably different outputs.')

interact(
    compare_activations,
    input_value=FloatSlider(min=-10, max=10, step=0.5, value=0.0,
                            description='Input:', continuous_update=False)
);

## 3. Saturation — When Learning Gets Stuck

**Saturation** means the output barely changes even as the input keeps growing — like squeezing a sponge that’s already dry.

Look at the Sigmoid curve at the extremes: whether the input is 5 or 500, the output is essentially 1.000. The function has *flattened out*.

Why does this matter?

When a model is **learning**, it adjusts its numbers a little at a time and checks whether the output improved. If the output barely budges no matter what you change, the model has no signal to follow — it’s stuck.

**ReLU avoids this** for positive inputs: the output keeps growing proportionally, so the model always gets useful feedback about which direction to adjust.

## 4. The Step Function — Simple but Rigid

Step is the simplest activation: input negative → **0**, input positive → **1**. Like a light switch.

That sounds appealing — simple is usually good. But there’s a serious problem:

- If the input goes from 0.01 to 0.02, the output stays exactly 1. No change.
- If the input goes from –0.01 to –0.02, the output stays exactly 0. No change.
- The *only* place the output changes is at exactly 0, and there it jumps all at once.

A learning system needs **gradual feedback** — small input changes should produce small output changes so the model knows it’s heading in the right direction. Step doesn’t provide that; it’s either 0 or 1 with nothing in between.

That’s why modern neural networks use **smooth** activations like Sigmoid or ReLU instead.

## Answer‑Sheet Questions (Q7 – Q9)

**Q7.** Test a very large positive input (like 100) on Sigmoid and then on ReLU. What does each one output? Which one keeps changing, and which one flattens out?

**Q8.** When a function “saturates,” its output barely changes even as the input keeps growing — like squeezing a sponge that’s already dry. Why would that be a problem for a model that’s trying to learn and adjust itself?

**Q9.** The Step function is the simplest of all — just on or off, like a light switch. If simple is usually good, why isn’t Step the obvious choice for a learning system? What does it lose by being so rigid?

---

**Next:** Continue to **Module 3** to build a *perceptron* — the single building block of every neural network.