![](https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/se_02.png)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CLDiego/uom_fse_dl_workshop/blob/main/SE02_CA_Artificial_neural_networks.ipynb)

## Workshop Overview
***
In this workshop, we demystify the "black box" of neural networks by building from the ground up‚Äîstarting with linear models (which you already understand from regression) and progressing to non-linear neural architectures. You'll see exactly how neurons combine to solve problems that simple linear models cannot.

**Prerequisites**: Linear regression concepts, basic PyTorch (SE01)

**Learning Objectives**:
- Understand a neuron as a generalization of linear regression
- Manually implement and visualize decision boundaries
- Recognize when linear models fail (e.g., non-separable data)
- Build multi-layer perceptrons (MLPs) to solve non-linear problems
- Visualize probability fields and decision boundaries interactively

In [1]:
import sys
import subprocess

if "google.colab" in sys.modules:
    print("Running in Google Colab: downloading utils...")
    subprocess.run([
        "wget",
        "-q",
        "--show-progress",
        "https://raw.githubusercontent.com/CLDiego/uom_fse_dl_workshop/main/colab_utils.txt",
        "-O",
        "colab_utils.txt",
    ], check=True)
    subprocess.run([
        "wget",
        "-q",
        "--show-progress",
        "-x",
        "-nH",
        "--cut-dirs=3",
        "-i",
        "colab_utils.txt",
    ], check=True)
else:
    print("Running locally: skipping Colab utils download.")

Running in Google Colab: downloading utils...


In [2]:
from pathlib import Path
import sys

# Setup paths for helper utilities
helper_utils = Path(Path.cwd().parent)
if str(helper_utils) not in sys.path:
    sys.path.append(str(helper_utils))

# Core scientific computing libraries
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim

# Visualization
from utils.plotting import plot_2d_classification, plot_training_loss, plot_model_comparison

# Machine learning utilities
from sklearn.datasets import make_classification, make_circles, make_moons

# Progress tracking
from tqdm import tqdm


print("=" * 60)
print("PyTorch Setup Information")
print("=" * 60)
print(f"PyTorch Version: {torch.__version__}")
print(f"NumPy Version: {np.__version__}")
print(f"Pandas Version: {pd.__version__}")
print("-" * 60)

if torch.cuda.is_available():
    print(f"‚úì GPU Available: {torch.cuda.get_device_name(0)}")
    print(f"  GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("‚úó No GPU detected - using CPU")
    print("  (For GPU: Runtime > Change runtime type > Hardware accelerator > GPU)")

print("=" * 60)

Faculty of Science and Engineering üî¨
[95mThe University of Manchester [0m
Invoking utils version: [92m1.5.0+8043cfd[0m
PyTorch Setup Information
PyTorch Version: 2.10.0+cpu
NumPy Version: 2.0.2
Pandas Version: 2.2.2
------------------------------------------------------------
‚úó No GPU detected - using CPU
  (For GPU: Runtime > Change runtime type > Hardware accelerator > GPU)


<!-- Font styling (Share Tech Mono) is handled automatically in visualization utilities -->

# <img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/write.svg" width="30"/> 1. The Anatomy of a Neuron
***

## From Regression to Neural Networks

If you've worked with linear regression before, you already understand the foundation of neural networks. A single neuron is mathematically equivalent to linear regression with an activation function applied to the output.

> <img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/docs.svg" width="20"/> **Definition**: A **neuron** (or perceptron) is a computational unit that:
> 1. Takes multiple inputs $\mathbf{x} = [x_1, x_2, ..., x_n]$
> 2. Computes a weighted sum plus bias: $z = \mathbf{w}^T \mathbf{x} + b$
> 3. Applies an activation function: $a = \sigma(z)$

The basic structure of a neuron can be seen below:

<div align="center">
  <img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/non_linenar_neuron.png" width="80%">
</div>

### The Scientific Context

Many physical systems start with linear assumptions:
- **Ohm's Law**: $V = IR$ (voltage is linear in current)
- **Newton's Second Law**: $F = ma$ (force is linear in acceleration)
- **Hooke's Law**: $F = kx$ (spring force is linear in displacement)

These linear models work well within certain regimes, but real-world systems often exhibit non-linearities. Neural networks give us a principled way to model these non-linear relationships while maintaining mathematical transparency.

## 1.1 A Single Neuron as Linear Regression

Let's start with the simplest case: a neuron performing **binary classification** on linearly separable data. This is conceptually identical to logistic regression.

### Mathematical Formulation

For a 2D input $\mathbf{x} = [x_1, x_2]$:

$$z = w_1 x_1 + w_2 x_2 + b$$

$$\hat{y} = \sigma(z) = \frac{1}{1 + e^{-z}}$$

where:
- $\mathbf{w} = [w_1, w_2]$ are the **weights** (determine the decision boundary orientation)
- $b$ is the **bias** (shifts the decision boundary)
- $\sigma$ is the **sigmoid activation function** (maps $z$ to probability $[0, 1]$)



## 1.2 Generating Linearly Separable Data
***

Let's create a synthetic dataset where two classes can be perfectly separated by a straight line. This is analogous to having two distinct groups in an experiment (e.g., control vs. treatment, healthy vs. diseased) where measurements clearly distinguish them.

<img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/code.svg" width="20"/> **Exercise 1**: Generate and visualize a linearly separable dataset.

In [3]:
# Generate linearly separable data using sklearn
np.random.seed(42)
X_linear, y_linear = make_classification(
    n_samples=100,           # Number of data points
    n_features=2,            # 2D for visualization
    n_redundant=0,           # No redundant features
    n_informative=2,         # Both features are informative
    n_clusters_per_class=1,  # Single cluster per class
    class_sep=1.0,           # Clear separation between classes
    flip_y=0.0,              # No label noise
    random_state=42
)

# Convert to PyTorch tensors
X_linear_tensor = torch.FloatTensor(X_linear)
y_linear_tensor = torch.FloatTensor(y_linear).unsqueeze(1)

print("=" * 60)
print("Linear Dataset Generated")
print("=" * 60)
print(f"Shape: {X_linear.shape} (samples √ó features)")
print(f"Class distribution: {np.bincount(y_linear)}")
print(f"Class 0: {np.sum(y_linear == 0)} samples")
print(f"Class 1: {np.sum(y_linear == 1)} samples")
print("=" * 60)

Linear Dataset Generated
Shape: (100, 2) (samples √ó features)
Class distribution: [50 50]
Class 0: 50 samples
Class 1: 50 samples


In [4]:
# Visualize the data
plot_2d_classification(
    X_linear, y_linear,
    title="Linearly Separable Data: Two Distinct Classes".upper(),
    show_boundary=False
)

## 1.3 Manual Implementation: Understanding the Forward Pass
***

Before using PyTorch's built-in layers, let's manually implement a neuron to understand exactly what's happening under the hood. This transparency is crucial for debugging and understanding more complex architectures later.

### The Forward Pass (Matrix Formulation)

For a batch of $N$ samples with $D$ features:

$$\mathbf{Z} = \mathbf{X} \mathbf{W}^T + \mathbf{b}$$

$$\mathbf{A} = \sigma(\mathbf{Z})$$

where:
- $\mathbf{X} \in \mathbb{R}^{N \times D}$ is the input matrix (each row is a sample)
- $\mathbf{W} \in \mathbb{R}^{D}$ is the weight vector
- $\mathbf{b} \in \mathbb{R}$ is the bias scalar
- $\mathbf{Z} \in \mathbb{R}^{N}$ is the linear output (logits)
- $\mathbf{A} \in \mathbb{R}^{N}$ is the activated output (probabilities)

<img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/reminder.svg" width="20"/> **Physical Analogy**: Think of $\mathbf{Z}$ as an "energy" or "potential" that gets converted to a "probability" through the sigmoid activation function, similar to a Boltzmann distribution in statistical mechanics.

<img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/code.svg" width="20"/> **Exercise 2**: Implement the forward pass manually using PyTorch operations.

In [8]:
# Exercise 2: Initialize weights and bias randomly

###################
# TODO: COMPLETE THE CODE BELOW
# Initialize random weights and bias for a 2D input neuron

# Initialize weights and bias randomly
torch.manual_seed(42)
weights_manual = torch.randn((2,), requires_grad=True)  # Shape: (2,) for 2D input
bias_manual = torch.randn((1,), requires_grad=True)     # Shape: (1,) scalar bias

print("Initial Parameters:")
print(f"  Weights: {weights_manual.detach().numpy()}")
print(f"  Bias: {bias_manual.item():.4f}")


Initial Parameters:
  Weights: [0.33669037 0.1288094 ]
  Bias: 0.2345


In [9]:
# Exercise 2 (continued): Manual forward pass implementation

###################
# TODO: COMPLETE THE CODE BELOW
# Implement the forward pass: Z = X @ W^T + b, then apply sigmoid

def forward_pass(X, weights, bias):
    """
    Compute forward pass manually.

    Steps:
    1. Z = X @ W^T + b  (linear combination)
    2. A = sigmoid(Z)   (activation)

    Args:
        X: Input data (N, 2)
        weights: Weight vector (2,)
        bias: Bias scalar (1,)

    Returns:
        z: Linear output (logits)
        a: Activated output (probabilities)
    """
    # Linear transformation: matrix multiplication + bias
    # Hint: Use torch.matmul() and unsqueeze(1) to make weights column vector
    z_manual = torch.matmul(X, weights.unsqueeze(1)) + bias

    # Sigmoid activation: œÉ(z) = 1 / (1 + exp(-z))
    # Hint: Use torch.exp() for exponential function
    a_manual = 1 / (1 + torch.exp(-z_manual))

    return z_manual, a_manual


In [19]:
# Exercise 2 (continued): Compute predictions using the forward pass

###################
# TODO: COMPLETE THE CODE BELOW
# Use the forward_pass function to compute predictions

# Compute predictions using the forward pass function
z, predictions_manual = forward_pass(X_linear_tensor, weights_manual,bias_manual)

print(f"\nOutput logits (z) - first 5: {z[:5].squeeze().detach().numpy()}")
print(f"Output probabilities - first 5: {predictions_manual[:5].squeeze().detach().numpy()}")
print(f"Predicted classes - first 5: {(predictions_manual[:5] > 0.5).int().squeeze().numpy()}")



Output logits (z) - first 5: [0.5504914  0.29181495 1.414939   0.25330436 0.2594523 ]
Output probabilities - first 5: [0.63424957 0.5724404  0.8045438  0.56298965 0.56450164]
Predicted classes - first 5: [1 1 1 1 1]


In [13]:
# Visualize initial decision boundary
plot_2d_classification(
    X_linear, y_linear,
    weights=weights_manual,
    bias=bias_manual,
    title="Initial Random Decision Boundary".upper(),
    show_boundary=True
)

<img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/reminder.svg" width="20"/> **Interpretation**: The decision boundary is a straight line in 2D (hyperplane in higher dimensions) defined by $w_1 x_1 + w_2 x_2 + b = 0$.

## 1.4 Learning via Gradient Descent: Sensitivity Analysis
***

The initial decision boundary is random and performs poorly. How do we improve it? Through **gradient descent** (an iterative optimization algorithm).

### The Learning Algorithm

> <img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/docs.svg" width="20"/> **Definition**: **Gradient descent** iteratively updates parameters in the direction that most reduces the loss function:

$$\mathbf{w}_{new} = \mathbf{w}_{old} - \eta \frac{\partial L}{\partial \mathbf{w}}$$

$$b_{new} = b_{old} - \eta \frac{\partial L}{\partial b}$$

where:
- $\eta$ is the **learning rate** (step size)
- $\frac{\partial L}{\partial \mathbf{w}}$ is the **gradient** of loss with respect to weights
- $L$ is the **loss function** measuring prediction error

### Loss Function: Binary Cross-Entropy

For binary classification, we use Binary Cross-Entropy (BCE) loss:

$$L = -\frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]$$

This measures the "distance" between predicted probabilities $\hat{y}$ and true labels $y$.

<img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/reminder.svg" width="20"/> **Scientific Interpretation**: Gradient descent is a form of "sensitivity analysis". We are asking "how much does the output change if I nudge this parameter?" This is fundamental to inverse problems in physics (e.g., seismic inversion, tomography).

<img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/code.svg" width="20"/> **Exercise 3**: Train the neuron using gradient descent and visualize the learning process.

In [28]:
# Exercise 3: Train the neuron using gradient descent

###################
# TODO: COMPLETE THE CODE BELOW
# Implement the training loop with gradient descent

# Reset parameters
torch.manual_seed(42)
weights_manual = torch.randn(2, requires_grad=True)
bias_manual = torch.randn(1, requires_grad=True)

# Training hyperparameters
learning_rate = 0.01  # Typical range: 0.01 to 0.5 for this problem
epochs =10000         # Number of training iterations

# Track training progress
losses = []

print("Training the neuron...")
print("=" * 60)

for epoch in tqdm(range(epochs), desc="Training"):
    # Forward pass
    z, predictions = forward_pass(X_linear_tensor,weights_manual, bias_manual)

    # Compute Binary Cross-Entropy loss
    # BCE = -[y*log(p) + (1-y)*log(1-p)]
    epsilon = 1e-7  # Small constant to avoid log(0)
    loss = -torch.mean(
        y_linear_tensor * torch.log(predictions + epsilon) +
        (1 - y_linear_tensor) * torch.log(1 - predictions + epsilon)
    )

    # Backward pass: compute gradients automatically
    loss.backward()

    # Update parameters using gradient descent: param = param - lr * gradient
    with torch.no_grad():
        weights_manual -= learning_rate * weights_manual.grad
        bias_manual -= learning_rate * bias_manual.grad

        # Zero gradients for next iteration
        weights_manual.grad.zero_()
        bias_manual.grad.zero_()

    losses.append(loss.item())

    if (epoch + 1) % 20 == 0:
        print(f"Epoch {epoch+1:3d} | Loss: {loss.item():.4f}")

print("=" * 60)
print("Training complete!")
print(f"Final Loss: {losses[-1]:.4f}")
print(f"Final Weights: {weights_manual.detach().numpy()}")
print(f"Final Bias: {bias_manual.item():.4f}")


Training the neuron...


Training:   2%|‚ñè         | 206/10000 [00:00<00:04, 2058.97it/s]

Epoch  20 | Loss: 0.6288
Epoch  40 | Loss: 0.5887
Epoch  60 | Loss: 0.5534
Epoch  80 | Loss: 0.5222
Epoch 100 | Loss: 0.4947
Epoch 120 | Loss: 0.4703
Epoch 140 | Loss: 0.4486
Epoch 160 | Loss: 0.4292
Epoch 180 | Loss: 0.4117
Epoch 200 | Loss: 0.3960
Epoch 220 | Loss: 0.3817
Epoch 240 | Loss: 0.3687
Epoch 260 | Loss: 0.3568
Epoch 280 | Loss: 0.3459
Epoch 300 | Loss: 0.3359
Epoch 320 | Loss: 0.3266
Epoch 340 | Loss: 0.3180
Epoch 360 | Loss: 0.3100
Epoch 380 | Loss: 0.3025
Epoch 400 | Loss: 0.2955
Epoch 420 | Loss: 0.2889


Training:   4%|‚ñç         | 429/10000 [00:00<00:04, 2155.92it/s]

Epoch 440 | Loss: 0.2827


Training:   6%|‚ñã         | 646/10000 [00:00<00:04, 2160.54it/s]

Epoch 460 | Loss: 0.2769
Epoch 480 | Loss: 0.2714
Epoch 500 | Loss: 0.2662
Epoch 520 | Loss: 0.2612
Epoch 540 | Loss: 0.2565
Epoch 560 | Loss: 0.2520
Epoch 580 | Loss: 0.2477
Epoch 600 | Loss: 0.2436
Epoch 620 | Loss: 0.2397
Epoch 640 | Loss: 0.2359
Epoch 660 | Loss: 0.2323
Epoch 680 | Loss: 0.2288
Epoch 700 | Loss: 0.2255
Epoch 720 | Loss: 0.2223
Epoch 740 | Loss: 0.2192
Epoch 760 | Loss: 0.2162
Epoch 780 | Loss: 0.2133
Epoch 800 | Loss: 0.2105


Training:   9%|‚ñä         | 863/10000 [00:00<00:05, 1709.89it/s]

Epoch 820 | Loss: 0.2078
Epoch 840 | Loss: 0.2052
Epoch 860 | Loss: 0.2026
Epoch 880 | Loss: 0.2002
Epoch 900 | Loss: 0.1978
Epoch 920 | Loss: 0.1955
Epoch 940 | Loss: 0.1932
Epoch 960 | Loss: 0.1910
Epoch 980 | Loss: 0.1889
Epoch 1000 | Loss: 0.1868
Epoch 1020 | Loss: 0.1847


Training:  12%|‚ñà‚ñè        | 1197/10000 [00:00<00:06, 1355.05it/s]

Epoch 1040 | Loss: 0.1828
Epoch 1060 | Loss: 0.1808
Epoch 1080 | Loss: 0.1789
Epoch 1100 | Loss: 0.1771
Epoch 1120 | Loss: 0.1753
Epoch 1140 | Loss: 0.1735
Epoch 1160 | Loss: 0.1718
Epoch 1180 | Loss: 0.1702
Epoch 1200 | Loss: 0.1685
Epoch 1220 | Loss: 0.1669
Epoch 1240 | Loss: 0.1653
Epoch 1260 | Loss: 0.1638
Epoch 1280 | Loss: 0.1623


Training:  13%|‚ñà‚ñé        | 1342/10000 [00:00<00:07, 1171.66it/s]

Epoch 1300 | Loss: 0.1608
Epoch 1320 | Loss: 0.1593
Epoch 1340 | Loss: 0.1579
Epoch 1360 | Loss: 0.1565
Epoch 1380 | Loss: 0.1552
Epoch 1400 | Loss: 0.1538
Epoch 1420 | Loss: 0.1525


Training:  16%|‚ñà‚ñå        | 1578/10000 [00:01<00:08, 1015.80it/s]

Epoch 1440 | Loss: 0.1512
Epoch 1460 | Loss: 0.1499
Epoch 1480 | Loss: 0.1487
Epoch 1500 | Loss: 0.1474
Epoch 1520 | Loss: 0.1462
Epoch 1540 | Loss: 0.1450
Epoch 1560 | Loss: 0.1439
Epoch 1580 | Loss: 0.1427
Epoch 1600 | Loss: 0.1416
Epoch 1620 | Loss: 0.1405


Training:  18%|‚ñà‚ñä        | 1817/10000 [00:01<00:07, 1048.41it/s]

Epoch 1640 | Loss: 0.1394
Epoch 1660 | Loss: 0.1383
Epoch 1680 | Loss: 0.1373
Epoch 1700 | Loss: 0.1362
Epoch 1720 | Loss: 0.1352
Epoch 1740 | Loss: 0.1342
Epoch 1760 | Loss: 0.1332
Epoch 1780 | Loss: 0.1322
Epoch 1800 | Loss: 0.1312
Epoch 1820 | Loss: 0.1303
Epoch 1840 | Loss: 0.1293
Epoch 1860 | Loss: 0.1284
Epoch 1880 | Loss: 0.1275


Training:  20%|‚ñà‚ñà        | 2032/10000 [00:01<00:08, 933.79it/s] 

Epoch 1900 | Loss: 0.1266
Epoch 1920 | Loss: 0.1257
Epoch 1940 | Loss: 0.1248
Epoch 1960 | Loss: 0.1240
Epoch 1980 | Loss: 0.1231
Epoch 2000 | Loss: 0.1223
Epoch 2020 | Loss: 0.1215
Epoch 2040 | Loss: 0.1206
Epoch 2060 | Loss: 0.1198


Training:  22%|‚ñà‚ñà‚ñè       | 2221/10000 [00:01<00:08, 896.94it/s]

Epoch 2080 | Loss: 0.1190
Epoch 2100 | Loss: 0.1183
Epoch 2120 | Loss: 0.1175
Epoch 2140 | Loss: 0.1167
Epoch 2160 | Loss: 0.1160
Epoch 2180 | Loss: 0.1152
Epoch 2200 | Loss: 0.1145
Epoch 2220 | Loss: 0.1137


Training:  24%|‚ñà‚ñà‚ñç       | 2433/10000 [00:02<00:07, 965.25it/s]

Epoch 2240 | Loss: 0.1130
Epoch 2260 | Loss: 0.1123
Epoch 2280 | Loss: 0.1116
Epoch 2300 | Loss: 0.1109
Epoch 2320 | Loss: 0.1102
Epoch 2340 | Loss: 0.1095
Epoch 2360 | Loss: 0.1089
Epoch 2380 | Loss: 0.1082
Epoch 2400 | Loss: 0.1075
Epoch 2420 | Loss: 0.1069
Epoch 2440 | Loss: 0.1063


Training:  26%|‚ñà‚ñà‚ñã       | 2649/10000 [00:02<00:07, 1024.65it/s]

Epoch 2460 | Loss: 0.1056
Epoch 2480 | Loss: 0.1050
Epoch 2500 | Loss: 0.1044
Epoch 2520 | Loss: 0.1038
Epoch 2540 | Loss: 0.1032
Epoch 2560 | Loss: 0.1026
Epoch 2580 | Loss: 0.1020
Epoch 2600 | Loss: 0.1014
Epoch 2620 | Loss: 0.1008
Epoch 2640 | Loss: 0.1002
Epoch 2660 | Loss: 0.0997
Epoch 2680 | Loss: 0.0991


Training:  28%|‚ñà‚ñà‚ñä       | 2775/10000 [00:02<00:06, 1092.38it/s]

Epoch 2700 | Loss: 0.0985
Epoch 2720 | Loss: 0.0980
Epoch 2740 | Loss: 0.0975
Epoch 2760 | Loss: 0.0969
Epoch 2780 | Loss: 0.0964
Epoch 2800 | Loss: 0.0959
Epoch 2820 | Loss: 0.0953
Epoch 2840 | Loss: 0.0948
Epoch 2860 | Loss: 0.0943


Training:  30%|‚ñà‚ñà‚ñà       | 3033/10000 [00:02<00:06, 1064.95it/s]

Epoch 2880 | Loss: 0.0938
Epoch 2900 | Loss: 0.0933
Epoch 2920 | Loss: 0.0928
Epoch 2940 | Loss: 0.0923
Epoch 2960 | Loss: 0.0918
Epoch 2980 | Loss: 0.0913
Epoch 3000 | Loss: 0.0909
Epoch 3020 | Loss: 0.0904
Epoch 3040 | Loss: 0.0899
Epoch 3060 | Loss: 0.0894
Epoch 3080 | Loss: 0.0890
Epoch 3100 | Loss: 0.0885


Training:  33%|‚ñà‚ñà‚ñà‚ñé      | 3252/10000 [00:02<00:07, 871.85it/s] 

Epoch 3120 | Loss: 0.0881
Epoch 3140 | Loss: 0.0876
Epoch 3160 | Loss: 0.0872
Epoch 3180 | Loss: 0.0868
Epoch 3200 | Loss: 0.0863
Epoch 3220 | Loss: 0.0859
Epoch 3240 | Loss: 0.0855
Epoch 3260 | Loss: 0.0850


Training:  36%|‚ñà‚ñà‚ñà‚ñå      | 3565/10000 [00:03<00:05, 1122.18it/s]

Epoch 3280 | Loss: 0.0846
Epoch 3300 | Loss: 0.0842
Epoch 3320 | Loss: 0.0838
Epoch 3340 | Loss: 0.0834
Epoch 3360 | Loss: 0.0830
Epoch 3380 | Loss: 0.0826
Epoch 3400 | Loss: 0.0822
Epoch 3420 | Loss: 0.0818
Epoch 3440 | Loss: 0.0814
Epoch 3460 | Loss: 0.0810
Epoch 3480 | Loss: 0.0806
Epoch 3500 | Loss: 0.0803
Epoch 3520 | Loss: 0.0799
Epoch 3540 | Loss: 0.0795
Epoch 3560 | Loss: 0.0791


Training:  40%|‚ñà‚ñà‚ñà‚ñâ      | 3989/10000 [00:03<00:03, 1580.62it/s]

Epoch 3580 | Loss: 0.0788
Epoch 3600 | Loss: 0.0784
Epoch 3620 | Loss: 0.0780
Epoch 3640 | Loss: 0.0777
Epoch 3660 | Loss: 0.0773
Epoch 3680 | Loss: 0.0770
Epoch 3700 | Loss: 0.0766
Epoch 3720 | Loss: 0.0763
Epoch 3740 | Loss: 0.0759
Epoch 3760 | Loss: 0.0756
Epoch 3780 | Loss: 0.0753
Epoch 3800 | Loss: 0.0749
Epoch 3820 | Loss: 0.0746
Epoch 3840 | Loss: 0.0743
Epoch 3860 | Loss: 0.0739
Epoch 3880 | Loss: 0.0736
Epoch 3900 | Loss: 0.0733
Epoch 3920 | Loss: 0.0730
Epoch 3940 | Loss: 0.0727
Epoch 3960 | Loss: 0.0723
Epoch 3980 | Loss: 0.0720
Epoch 4000 | Loss: 0.0717


Training:  44%|‚ñà‚ñà‚ñà‚ñà‚ñç     | 4424/10000 [00:03<00:02, 1860.54it/s]

Epoch 4020 | Loss: 0.0714
Epoch 4040 | Loss: 0.0711
Epoch 4060 | Loss: 0.0708
Epoch 4080 | Loss: 0.0705
Epoch 4100 | Loss: 0.0702
Epoch 4120 | Loss: 0.0699
Epoch 4140 | Loss: 0.0696
Epoch 4160 | Loss: 0.0693
Epoch 4180 | Loss: 0.0690
Epoch 4200 | Loss: 0.0688
Epoch 4220 | Loss: 0.0685
Epoch 4240 | Loss: 0.0682
Epoch 4260 | Loss: 0.0679
Epoch 4280 | Loss: 0.0676
Epoch 4300 | Loss: 0.0674
Epoch 4320 | Loss: 0.0671
Epoch 4340 | Loss: 0.0668
Epoch 4360 | Loss: 0.0666
Epoch 4380 | Loss: 0.0663
Epoch 4400 | Loss: 0.0660
Epoch 4420 | Loss: 0.0658
Epoch 4440 | Loss: 0.0655
Epoch 4460 | Loss: 0.0652


Training:  48%|‚ñà‚ñà‚ñà‚ñà‚ñä     | 4849/10000 [00:03<00:02, 1957.89it/s]

Epoch 4480 | Loss: 0.0650
Epoch 4500 | Loss: 0.0647
Epoch 4520 | Loss: 0.0645
Epoch 4540 | Loss: 0.0642
Epoch 4560 | Loss: 0.0640
Epoch 4580 | Loss: 0.0637
Epoch 4600 | Loss: 0.0635
Epoch 4620 | Loss: 0.0632
Epoch 4640 | Loss: 0.0630
Epoch 4660 | Loss: 0.0627
Epoch 4680 | Loss: 0.0625
Epoch 4700 | Loss: 0.0623
Epoch 4720 | Loss: 0.0620
Epoch 4740 | Loss: 0.0618
Epoch 4760 | Loss: 0.0616
Epoch 4780 | Loss: 0.0613
Epoch 4800 | Loss: 0.0611
Epoch 4820 | Loss: 0.0609
Epoch 4840 | Loss: 0.0606
Epoch 4860 | Loss: 0.0604
Epoch 4880 | Loss: 0.0602


Training:  53%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé    | 5270/10000 [00:04<00:02, 1904.39it/s]

Epoch 4900 | Loss: 0.0600
Epoch 4920 | Loss: 0.0597
Epoch 4940 | Loss: 0.0595
Epoch 4960 | Loss: 0.0593
Epoch 4980 | Loss: 0.0591
Epoch 5000 | Loss: 0.0589
Epoch 5020 | Loss: 0.0587
Epoch 5040 | Loss: 0.0585
Epoch 5060 | Loss: 0.0582
Epoch 5080 | Loss: 0.0580
Epoch 5100 | Loss: 0.0578
Epoch 5120 | Loss: 0.0576
Epoch 5140 | Loss: 0.0574
Epoch 5160 | Loss: 0.0572
Epoch 5180 | Loss: 0.0570
Epoch 5200 | Loss: 0.0568
Epoch 5220 | Loss: 0.0566
Epoch 5240 | Loss: 0.0564
Epoch 5260 | Loss: 0.0562


Training:  57%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã    | 5670/10000 [00:04<00:02, 1929.88it/s]

Epoch 5280 | Loss: 0.0560
Epoch 5300 | Loss: 0.0558
Epoch 5320 | Loss: 0.0556
Epoch 5340 | Loss: 0.0554
Epoch 5360 | Loss: 0.0552
Epoch 5380 | Loss: 0.0551
Epoch 5400 | Loss: 0.0549
Epoch 5420 | Loss: 0.0547
Epoch 5440 | Loss: 0.0545
Epoch 5460 | Loss: 0.0543
Epoch 5480 | Loss: 0.0541
Epoch 5500 | Loss: 0.0539
Epoch 5520 | Loss: 0.0538
Epoch 5540 | Loss: 0.0536
Epoch 5560 | Loss: 0.0534
Epoch 5580 | Loss: 0.0532
Epoch 5600 | Loss: 0.0530
Epoch 5620 | Loss: 0.0529
Epoch 5640 | Loss: 0.0527
Epoch 5660 | Loss: 0.0525


Training:  59%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä    | 5865/10000 [00:04<00:02, 1603.00it/s]

Epoch 5680 | Loss: 0.0523
Epoch 5700 | Loss: 0.0522
Epoch 5720 | Loss: 0.0520
Epoch 5740 | Loss: 0.0518
Epoch 5760 | Loss: 0.0517
Epoch 5780 | Loss: 0.0515
Epoch 5800 | Loss: 0.0513
Epoch 5820 | Loss: 0.0512
Epoch 5840 | Loss: 0.0510
Epoch 5860 | Loss: 0.0508
Epoch 5880 | Loss: 0.0507


Training:  60%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà    | 6036/10000 [00:04<00:02, 1401.20it/s]

Epoch 5900 | Loss: 0.0505
Epoch 5920 | Loss: 0.0504
Epoch 5940 | Loss: 0.0502
Epoch 5960 | Loss: 0.0500
Epoch 5980 | Loss: 0.0499
Epoch 6000 | Loss: 0.0497
Epoch 6020 | Loss: 0.0496
Epoch 6040 | Loss: 0.0494
Epoch 6060 | Loss: 0.0493
Epoch 6080 | Loss: 0.0491
Epoch 6100 | Loss: 0.0489
Epoch 6120 | Loss: 0.0488
Epoch 6140 | Loss: 0.0486


Training:  63%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé   | 6335/10000 [00:04<00:02, 1415.10it/s]

Epoch 6160 | Loss: 0.0485
Epoch 6180 | Loss: 0.0483
Epoch 6200 | Loss: 0.0482
Epoch 6220 | Loss: 0.0481
Epoch 6240 | Loss: 0.0479
Epoch 6260 | Loss: 0.0478
Epoch 6280 | Loss: 0.0476
Epoch 6300 | Loss: 0.0475
Epoch 6320 | Loss: 0.0473
Epoch 6340 | Loss: 0.0472
Epoch 6360 | Loss: 0.0470
Epoch 6380 | Loss: 0.0469
Epoch 6400 | Loss: 0.0468
Epoch 6420 | Loss: 0.0466
Epoch 6440 | Loss: 0.0465


Training:  66%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå   | 6624/10000 [00:05<00:02, 1363.40it/s]

Epoch 6460 | Loss: 0.0463
Epoch 6480 | Loss: 0.0462
Epoch 6500 | Loss: 0.0461
Epoch 6520 | Loss: 0.0459
Epoch 6540 | Loss: 0.0458
Epoch 6560 | Loss: 0.0457
Epoch 6580 | Loss: 0.0455
Epoch 6600 | Loss: 0.0454
Epoch 6620 | Loss: 0.0453
Epoch 6640 | Loss: 0.0451
Epoch 6660 | Loss: 0.0450
Epoch 6680 | Loss: 0.0449
Epoch 6700 | Loss: 0.0448
Epoch 6720 | Loss: 0.0446
Epoch 6740 | Loss: 0.0445


Training:  69%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ   | 6932/10000 [00:05<00:02, 1449.28it/s]

Epoch 6760 | Loss: 0.0444
Epoch 6780 | Loss: 0.0442
Epoch 6800 | Loss: 0.0441
Epoch 6820 | Loss: 0.0440
Epoch 6840 | Loss: 0.0439
Epoch 6860 | Loss: 0.0437
Epoch 6880 | Loss: 0.0436
Epoch 6900 | Loss: 0.0435
Epoch 6920 | Loss: 0.0434
Epoch 6940 | Loss: 0.0433
Epoch 6960 | Loss: 0.0431
Epoch 6980 | Loss: 0.0430
Epoch 7000 | Loss: 0.0429
Epoch 7020 | Loss: 0.0428
Epoch 7040 | Loss: 0.0427
Epoch 7060 | Loss: 0.0425


Training:  72%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  | 7237/10000 [00:05<00:01, 1481.05it/s]

Epoch 7080 | Loss: 0.0424
Epoch 7100 | Loss: 0.0423
Epoch 7120 | Loss: 0.0422
Epoch 7140 | Loss: 0.0421
Epoch 7160 | Loss: 0.0420
Epoch 7180 | Loss: 0.0419
Epoch 7200 | Loss: 0.0417
Epoch 7220 | Loss: 0.0416
Epoch 7240 | Loss: 0.0415
Epoch 7260 | Loss: 0.0414
Epoch 7280 | Loss: 0.0413
Epoch 7300 | Loss: 0.0412
Epoch 7320 | Loss: 0.0411
Epoch 7340 | Loss: 0.0410
Epoch 7360 | Loss: 0.0409
Epoch 7380 | Loss: 0.0407


Training:  76%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 7554/10000 [00:05<00:01, 1529.12it/s]

Epoch 7400 | Loss: 0.0406
Epoch 7420 | Loss: 0.0405
Epoch 7440 | Loss: 0.0404
Epoch 7460 | Loss: 0.0403
Epoch 7480 | Loss: 0.0402
Epoch 7500 | Loss: 0.0401
Epoch 7520 | Loss: 0.0400
Epoch 7540 | Loss: 0.0399
Epoch 7560 | Loss: 0.0398
Epoch 7580 | Loss: 0.0397
Epoch 7600 | Loss: 0.0396
Epoch 7620 | Loss: 0.0395
Epoch 7640 | Loss: 0.0394
Epoch 7660 | Loss: 0.0393
Epoch 7680 | Loss: 0.0392


Training:  79%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä  | 7853/10000 [00:05<00:01, 1367.15it/s]

Epoch 7700 | Loss: 0.0391
Epoch 7720 | Loss: 0.0390
Epoch 7740 | Loss: 0.0389
Epoch 7760 | Loss: 0.0388
Epoch 7780 | Loss: 0.0387
Epoch 7800 | Loss: 0.0386
Epoch 7820 | Loss: 0.0385
Epoch 7840 | Loss: 0.0384
Epoch 7860 | Loss: 0.0383
Epoch 7880 | Loss: 0.0382
Epoch 7900 | Loss: 0.0381
Epoch 7920 | Loss: 0.0380


Training:  81%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 8118/10000 [00:06<00:01, 1165.32it/s]

Epoch 7940 | Loss: 0.0379
Epoch 7960 | Loss: 0.0378
Epoch 7980 | Loss: 0.0377
Epoch 8000 | Loss: 0.0376
Epoch 8020 | Loss: 0.0376
Epoch 8040 | Loss: 0.0375
Epoch 8060 | Loss: 0.0374
Epoch 8080 | Loss: 0.0373
Epoch 8100 | Loss: 0.0372
Epoch 8120 | Loss: 0.0371
Epoch 8140 | Loss: 0.0370


Training:  84%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé | 8356/10000 [00:06<00:01, 1157.19it/s]

Epoch 8160 | Loss: 0.0369
Epoch 8180 | Loss: 0.0368
Epoch 8200 | Loss: 0.0367
Epoch 8220 | Loss: 0.0366
Epoch 8240 | Loss: 0.0366
Epoch 8260 | Loss: 0.0365
Epoch 8280 | Loss: 0.0364
Epoch 8300 | Loss: 0.0363
Epoch 8320 | Loss: 0.0362
Epoch 8340 | Loss: 0.0361
Epoch 8360 | Loss: 0.0360
Epoch 8380 | Loss: 0.0360


Training:  86%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã | 8629/10000 [00:06<00:01, 1261.77it/s]

Epoch 8400 | Loss: 0.0359
Epoch 8420 | Loss: 0.0358
Epoch 8440 | Loss: 0.0357
Epoch 8460 | Loss: 0.0356
Epoch 8480 | Loss: 0.0355
Epoch 8500 | Loss: 0.0355
Epoch 8520 | Loss: 0.0354
Epoch 8540 | Loss: 0.0353
Epoch 8560 | Loss: 0.0352
Epoch 8580 | Loss: 0.0351
Epoch 8600 | Loss: 0.0350
Epoch 8620 | Loss: 0.0350
Epoch 8640 | Loss: 0.0349
Epoch 8660 | Loss: 0.0348


Training:  89%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ | 8879/10000 [00:06<00:00, 1140.72it/s]

Epoch 8680 | Loss: 0.0347
Epoch 8700 | Loss: 0.0346
Epoch 8720 | Loss: 0.0346
Epoch 8740 | Loss: 0.0345
Epoch 8760 | Loss: 0.0344
Epoch 8780 | Loss: 0.0343
Epoch 8800 | Loss: 0.0342
Epoch 8820 | Loss: 0.0342
Epoch 8840 | Loss: 0.0341
Epoch 8860 | Loss: 0.0340
Epoch 8880 | Loss: 0.0339


Training:  92%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè| 9218/10000 [00:07<00:00, 1402.74it/s]

Epoch 8900 | Loss: 0.0339
Epoch 8920 | Loss: 0.0338
Epoch 8940 | Loss: 0.0337
Epoch 8960 | Loss: 0.0336
Epoch 8980 | Loss: 0.0336
Epoch 9000 | Loss: 0.0335
Epoch 9020 | Loss: 0.0334
Epoch 9040 | Loss: 0.0333
Epoch 9060 | Loss: 0.0333
Epoch 9080 | Loss: 0.0332
Epoch 9100 | Loss: 0.0331
Epoch 9120 | Loss: 0.0330
Epoch 9140 | Loss: 0.0330
Epoch 9160 | Loss: 0.0329
Epoch 9180 | Loss: 0.0328
Epoch 9200 | Loss: 0.0328
Epoch 9220 | Loss: 0.0327


Training:  96%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 9591/10000 [00:07<00:00, 1636.30it/s]

Epoch 9240 | Loss: 0.0326
Epoch 9260 | Loss: 0.0325
Epoch 9280 | Loss: 0.0325
Epoch 9300 | Loss: 0.0324
Epoch 9320 | Loss: 0.0323
Epoch 9340 | Loss: 0.0323
Epoch 9360 | Loss: 0.0322
Epoch 9380 | Loss: 0.0321
Epoch 9400 | Loss: 0.0321
Epoch 9420 | Loss: 0.0320
Epoch 9440 | Loss: 0.0319
Epoch 9460 | Loss: 0.0319
Epoch 9480 | Loss: 0.0318
Epoch 9500 | Loss: 0.0317
Epoch 9520 | Loss: 0.0317
Epoch 9540 | Loss: 0.0316
Epoch 9560 | Loss: 0.0315
Epoch 9580 | Loss: 0.0315
Epoch 9600 | Loss: 0.0314
Epoch 9620 | Loss: 0.0313


Training: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10000/10000 [00:07<00:00, 1348.51it/s]

Epoch 9640 | Loss: 0.0313
Epoch 9660 | Loss: 0.0312
Epoch 9680 | Loss: 0.0311
Epoch 9700 | Loss: 0.0311
Epoch 9720 | Loss: 0.0310
Epoch 9740 | Loss: 0.0309
Epoch 9760 | Loss: 0.0309
Epoch 9780 | Loss: 0.0308
Epoch 9800 | Loss: 0.0307
Epoch 9820 | Loss: 0.0307
Epoch 9840 | Loss: 0.0306
Epoch 9860 | Loss: 0.0306
Epoch 9880 | Loss: 0.0305
Epoch 9900 | Loss: 0.0304
Epoch 9920 | Loss: 0.0304
Epoch 9940 | Loss: 0.0303
Epoch 9960 | Loss: 0.0302
Epoch 9980 | Loss: 0.0302
Epoch 10000 | Loss: 0.0301
Training complete!
Final Loss: 0.0301
Final Weights: [-2.1533275  4.016128 ]
Final Bias: 2.1071





In [29]:
# Calculate accuracy
# Perform a final forward pass to get predictions
final_z, final_predictions = forward_pass(X_linear_tensor, weights_manual, bias_manual)

# Convert probabilities to binary class predictions
predicted_classes = (final_predictions > 0.5).int().squeeze()

# Accuracy = (number of correct predictions) / (total predictions)
accuracy = (predicted_classes == y_linear_tensor.squeeze()).float().mean()

print(f"Training Accuracy: {accuracy.item()*100:.1f}%")

Training Accuracy: 100.0%


In [30]:
# Visualize training progress with loss curve
fig_loss = plot_training_loss(
    losses,
    title="Training Loss Over Time".upper(),
    width=600,
    height=400
)
fig_loss.show()

# Visualize learned decision boundary
fig_boundary = plot_2d_classification(
    X_linear, y_linear,
    weights=weights_manual,
    bias=bias_manual,
    title="Learned Decision Boundary".upper(),
    show_boundary=True
)
fig_boundary.show()

print("\n‚úì The neuron successfully learned to separate the two classes!")


‚úì The neuron successfully learned to separate the two classes!


## 1.5 Transition to PyTorch: The Standard Recipe
***

Now that we understand what's happening under the hood, let's use PyTorch's built-in `nn.Linear` layer. This is the standard approach in research and production.

> <img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/docs.svg" width="20"/> **Definition**: `torch.nn.Linear(in_features, out_features)` implements the transformation $y = xW^T + b$ where $W$ and $b$ are learnable parameters.

### Benefits of PyTorch Modules

1. **Automatic parameter management**: Weights and biases are created automatically
2. **GPU compatibility**: Seamlessly move to GPU with `.to('cuda')`
3. **Integration with optimizers**: Works with `torch.optim` for advanced optimization
4. **Standard interface**: Follows the `nn.Module` pattern used everywhere in research

<img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/code.svg" width="20"/> **Exercise 4**: Implement the same neuron using `nn.Linear` and verify results match.

In [31]:
# Exercise 4: Implement the same neuron using PyTorch's nn.Linear

###################
# TODO: COMPLETE THE CODE BELOW
# Use PyTorch's built-in nn.Linear layer and train it

# Create a linear model using PyTorch's nn.Linear
torch.manual_seed(42)
linear_model = nn.Linear(in_features=2, out_features=1)

# Define loss function and optimizer
criterion = nn.BCELoss()  # Binary Cross-Entropy loss
optimizer = optim.SGD(linear_model.parameters(), lr=0.01)

# Track losses
losses_lib = []

print("Training with nn.Linear...")
print("=" * 60)

for epoch in tqdm(range(epochs), desc="Training"):
    # Forward pass
    logits = linear_model(X_linear_tensor)
    predictions_lib = torch.sigmoid(logits)  # Apply sigmoid to get probabilities

    # Compute loss
    loss = criterion(predictions_lib, y_linear_tensor)

    # Backward pass
    optimizer.zero_grad()      # Clear previous gradients
    loss.backward()           # Compute gradients
    optimizer.step()      # Update parameters

    losses_lib.append(loss.item())

    if (epoch + 1) % 20 == 0:
        print(f"Epoch {epoch+1:3d} | Loss: {loss.item():.4f}")

print("=" * 60)
print("Training complete!")
print(f"Final Loss: {losses_lib[-1]:.4f}")


Training with nn.Linear...


Training:   2%|‚ñè         | 227/10000 [00:00<00:04, 2262.35it/s]

Epoch  20 | Loss: 0.4802
Epoch  40 | Loss: 0.4608
Epoch  60 | Loss: 0.4434
Epoch  80 | Loss: 0.4277
Epoch 100 | Loss: 0.4134
Epoch 120 | Loss: 0.4004
Epoch 140 | Loss: 0.3884
Epoch 160 | Loss: 0.3775
Epoch 180 | Loss: 0.3674
Epoch 200 | Loss: 0.3580
Epoch 220 | Loss: 0.3493
Epoch 240 | Loss: 0.3411
Epoch 260 | Loss: 0.3335
Epoch 280 | Loss: 0.3263
Epoch 300 | Loss: 0.3196
Epoch 320 | Loss: 0.3132
Epoch 340 | Loss: 0.3072
Epoch 360 | Loss: 0.3015
Epoch 380 | Loss: 0.2960
Epoch 400 | Loss: 0.2908
Epoch 420 | Loss: 0.2859
Epoch 440 | Loss: 0.2812
Epoch 460 | Loss: 0.2766


Training:   5%|‚ñç         | 474/10000 [00:00<00:04, 2380.17it/s]

Epoch 480 | Loss: 0.2723
Epoch 500 | Loss: 0.2681
Epoch 520 | Loss: 0.2641


Training:  10%|‚ñâ         | 963/10000 [00:00<00:03, 2406.45it/s]

Epoch 540 | Loss: 0.2603
Epoch 560 | Loss: 0.2565
Epoch 580 | Loss: 0.2529
Epoch 600 | Loss: 0.2495
Epoch 620 | Loss: 0.2461
Epoch 640 | Loss: 0.2429
Epoch 660 | Loss: 0.2397
Epoch 680 | Loss: 0.2367
Epoch 700 | Loss: 0.2337
Epoch 720 | Loss: 0.2308
Epoch 740 | Loss: 0.2280
Epoch 760 | Loss: 0.2253
Epoch 780 | Loss: 0.2227
Epoch 800 | Loss: 0.2201
Epoch 820 | Loss: 0.2176
Epoch 840 | Loss: 0.2151
Epoch 860 | Loss: 0.2127
Epoch 880 | Loss: 0.2104
Epoch 900 | Loss: 0.2081
Epoch 920 | Loss: 0.2059
Epoch 940 | Loss: 0.2038
Epoch 960 | Loss: 0.2016
Epoch 980 | Loss: 0.1996
Epoch 1000 | Loss: 0.1975
Epoch 1020 | Loss: 0.1956


Training:  15%|‚ñà‚ñç        | 1463/10000 [00:00<00:03, 2467.06it/s]

Epoch 1040 | Loss: 0.1936
Epoch 1060 | Loss: 0.1917
Epoch 1080 | Loss: 0.1899
Epoch 1100 | Loss: 0.1880
Epoch 1120 | Loss: 0.1862
Epoch 1140 | Loss: 0.1845
Epoch 1160 | Loss: 0.1828
Epoch 1180 | Loss: 0.1811
Epoch 1200 | Loss: 0.1794
Epoch 1220 | Loss: 0.1778
Epoch 1240 | Loss: 0.1762
Epoch 1260 | Loss: 0.1746
Epoch 1280 | Loss: 0.1731
Epoch 1300 | Loss: 0.1716
Epoch 1320 | Loss: 0.1701
Epoch 1340 | Loss: 0.1686
Epoch 1360 | Loss: 0.1672
Epoch 1380 | Loss: 0.1658
Epoch 1400 | Loss: 0.1644
Epoch 1420 | Loss: 0.1630
Epoch 1440 | Loss: 0.1617
Epoch 1460 | Loss: 0.1604
Epoch 1480 | Loss: 0.1590
Epoch 1500 | Loss: 0.1578
Epoch 1520 | Loss: 0.1565
Epoch 1540 | Loss: 0.1553


Training:  20%|‚ñà‚ñâ        | 1954/10000 [00:00<00:03, 2331.86it/s]

Epoch 1560 | Loss: 0.1540
Epoch 1580 | Loss: 0.1528
Epoch 1600 | Loss: 0.1516
Epoch 1620 | Loss: 0.1505
Epoch 1640 | Loss: 0.1493
Epoch 1660 | Loss: 0.1482
Epoch 1680 | Loss: 0.1471
Epoch 1700 | Loss: 0.1459
Epoch 1720 | Loss: 0.1449
Epoch 1740 | Loss: 0.1438
Epoch 1760 | Loss: 0.1427
Epoch 1780 | Loss: 0.1417
Epoch 1800 | Loss: 0.1406
Epoch 1820 | Loss: 0.1396
Epoch 1840 | Loss: 0.1386
Epoch 1860 | Loss: 0.1376
Epoch 1880 | Loss: 0.1367
Epoch 1900 | Loss: 0.1357
Epoch 1920 | Loss: 0.1347
Epoch 1940 | Loss: 0.1338
Epoch 1960 | Loss: 0.1329
Epoch 1980 | Loss: 0.1320
Epoch 2000 | Loss: 0.1311


Training:  24%|‚ñà‚ñà‚ñç       | 2431/10000 [00:01<00:03, 2347.19it/s]

Epoch 2020 | Loss: 0.1302
Epoch 2040 | Loss: 0.1293
Epoch 2060 | Loss: 0.1284
Epoch 2080 | Loss: 0.1275
Epoch 2100 | Loss: 0.1267
Epoch 2120 | Loss: 0.1259
Epoch 2140 | Loss: 0.1250
Epoch 2160 | Loss: 0.1242
Epoch 2180 | Loss: 0.1234
Epoch 2200 | Loss: 0.1226
Epoch 2220 | Loss: 0.1218
Epoch 2240 | Loss: 0.1210
Epoch 2260 | Loss: 0.1203
Epoch 2280 | Loss: 0.1195
Epoch 2300 | Loss: 0.1187
Epoch 2320 | Loss: 0.1180
Epoch 2340 | Loss: 0.1173
Epoch 2360 | Loss: 0.1165
Epoch 2380 | Loss: 0.1158
Epoch 2400 | Loss: 0.1151
Epoch 2420 | Loss: 0.1144
Epoch 2440 | Loss: 0.1137
Epoch 2460 | Loss: 0.1130
Epoch 2480 | Loss: 0.1123


Training:  29%|‚ñà‚ñà‚ñâ       | 2925/10000 [00:01<00:02, 2414.65it/s]

Epoch 2500 | Loss: 0.1116
Epoch 2520 | Loss: 0.1110
Epoch 2540 | Loss: 0.1103
Epoch 2560 | Loss: 0.1096
Epoch 2580 | Loss: 0.1090
Epoch 2600 | Loss: 0.1084
Epoch 2620 | Loss: 0.1077
Epoch 2640 | Loss: 0.1071
Epoch 2660 | Loss: 0.1065
Epoch 2680 | Loss: 0.1059
Epoch 2700 | Loss: 0.1053
Epoch 2720 | Loss: 0.1047
Epoch 2740 | Loss: 0.1041
Epoch 2760 | Loss: 0.1035
Epoch 2780 | Loss: 0.1029
Epoch 2800 | Loss: 0.1023
Epoch 2820 | Loss: 0.1017
Epoch 2840 | Loss: 0.1012
Epoch 2860 | Loss: 0.1006
Epoch 2880 | Loss: 0.1001
Epoch 2900 | Loss: 0.0995
Epoch 2920 | Loss: 0.0990
Epoch 2940 | Loss: 0.0984
Epoch 2960 | Loss: 0.0979
Epoch 2980 | Loss: 0.0974


Training:  34%|‚ñà‚ñà‚ñà‚ñç      | 3432/10000 [00:01<00:02, 2450.05it/s]

Epoch 3000 | Loss: 0.0968
Epoch 3020 | Loss: 0.0963
Epoch 3040 | Loss: 0.0958
Epoch 3060 | Loss: 0.0953
Epoch 3080 | Loss: 0.0948
Epoch 3100 | Loss: 0.0943
Epoch 3120 | Loss: 0.0938
Epoch 3140 | Loss: 0.0933
Epoch 3160 | Loss: 0.0928
Epoch 3180 | Loss: 0.0923
Epoch 3200 | Loss: 0.0919
Epoch 3220 | Loss: 0.0914
Epoch 3240 | Loss: 0.0909
Epoch 3260 | Loss: 0.0905
Epoch 3280 | Loss: 0.0900
Epoch 3300 | Loss: 0.0895
Epoch 3320 | Loss: 0.0891
Epoch 3340 | Loss: 0.0886
Epoch 3360 | Loss: 0.0882
Epoch 3380 | Loss: 0.0878
Epoch 3400 | Loss: 0.0873
Epoch 3420 | Loss: 0.0869
Epoch 3440 | Loss: 0.0865
Epoch 3460 | Loss: 0.0860
Epoch 3480 | Loss: 0.0856
Epoch 3500 | Loss: 0.0852
Epoch 3520 | Loss: 0.0848


Training:  39%|‚ñà‚ñà‚ñà‚ñâ      | 3933/10000 [00:01<00:02, 2479.33it/s]

Epoch 3540 | Loss: 0.0844
Epoch 3560 | Loss: 0.0840
Epoch 3580 | Loss: 0.0836
Epoch 3600 | Loss: 0.0832
Epoch 3620 | Loss: 0.0828
Epoch 3640 | Loss: 0.0824
Epoch 3660 | Loss: 0.0820
Epoch 3680 | Loss: 0.0816
Epoch 3700 | Loss: 0.0812
Epoch 3720 | Loss: 0.0808
Epoch 3740 | Loss: 0.0805
Epoch 3760 | Loss: 0.0801
Epoch 3780 | Loss: 0.0797
Epoch 3800 | Loss: 0.0794
Epoch 3820 | Loss: 0.0790
Epoch 3840 | Loss: 0.0786
Epoch 3860 | Loss: 0.0783
Epoch 3880 | Loss: 0.0779
Epoch 3900 | Loss: 0.0776
Epoch 3920 | Loss: 0.0772
Epoch 3940 | Loss: 0.0769
Epoch 3960 | Loss: 0.0765
Epoch 3980 | Loss: 0.0762
Epoch 4000 | Loss: 0.0759
Epoch 4020 | Loss: 0.0755
Epoch 4040 | Loss: 0.0752


Training:  42%|‚ñà‚ñà‚ñà‚ñà‚ñè     | 4182/10000 [00:01<00:02, 2466.64it/s]

Epoch 4060 | Loss: 0.0749
Epoch 4080 | Loss: 0.0745
Epoch 4100 | Loss: 0.0742
Epoch 4120 | Loss: 0.0739
Epoch 4140 | Loss: 0.0736
Epoch 4160 | Loss: 0.0732
Epoch 4180 | Loss: 0.0729
Epoch 4200 | Loss: 0.0726
Epoch 4220 | Loss: 0.0723
Epoch 4240 | Loss: 0.0720
Epoch 4260 | Loss: 0.0717
Epoch 4280 | Loss: 0.0714
Epoch 4300 | Loss: 0.0711
Epoch 4320 | Loss: 0.0708
Epoch 4340 | Loss: 0.0705
Epoch 4360 | Loss: 0.0702
Epoch 4380 | Loss: 0.0699
Epoch 4400 | Loss: 0.0696
Epoch 4420 | Loss: 0.0693


Training:  44%|‚ñà‚ñà‚ñà‚ñà‚ñç     | 4429/10000 [00:01<00:02, 2345.37it/s]

Epoch 4440 | Loss: 0.0690
Epoch 4460 | Loss: 0.0688
Epoch 4480 | Loss: 0.0685
Epoch 4500 | Loss: 0.0682
Epoch 4520 | Loss: 0.0679


Training:  49%|‚ñà‚ñà‚ñà‚ñà‚ñâ     | 4928/10000 [00:02<00:02, 2421.30it/s]

Epoch 4540 | Loss: 0.0676
Epoch 4560 | Loss: 0.0674
Epoch 4580 | Loss: 0.0671
Epoch 4600 | Loss: 0.0668
Epoch 4620 | Loss: 0.0666
Epoch 4640 | Loss: 0.0663
Epoch 4660 | Loss: 0.0660
Epoch 4680 | Loss: 0.0658
Epoch 4700 | Loss: 0.0655
Epoch 4720 | Loss: 0.0652
Epoch 4740 | Loss: 0.0650
Epoch 4760 | Loss: 0.0647
Epoch 4780 | Loss: 0.0645
Epoch 4800 | Loss: 0.0642
Epoch 4820 | Loss: 0.0640
Epoch 4840 | Loss: 0.0637
Epoch 4860 | Loss: 0.0635
Epoch 4880 | Loss: 0.0632
Epoch 4900 | Loss: 0.0630
Epoch 4920 | Loss: 0.0628
Epoch 4940 | Loss: 0.0625
Epoch 4960 | Loss: 0.0623
Epoch 4980 | Loss: 0.0621
Epoch 5000 | Loss: 0.0618
Epoch 5020 | Loss: 0.0616


Training:  54%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç    | 5418/10000 [00:02<00:01, 2433.25it/s]

Epoch 5040 | Loss: 0.0614
Epoch 5060 | Loss: 0.0611
Epoch 5080 | Loss: 0.0609
Epoch 5100 | Loss: 0.0607
Epoch 5120 | Loss: 0.0604
Epoch 5140 | Loss: 0.0602
Epoch 5160 | Loss: 0.0600
Epoch 5180 | Loss: 0.0598
Epoch 5200 | Loss: 0.0596
Epoch 5220 | Loss: 0.0593
Epoch 5240 | Loss: 0.0591
Epoch 5260 | Loss: 0.0589
Epoch 5280 | Loss: 0.0587
Epoch 5300 | Loss: 0.0585
Epoch 5320 | Loss: 0.0583
Epoch 5340 | Loss: 0.0581
Epoch 5360 | Loss: 0.0579
Epoch 5380 | Loss: 0.0577
Epoch 5400 | Loss: 0.0575
Epoch 5420 | Loss: 0.0573
Epoch 5440 | Loss: 0.0571
Epoch 5460 | Loss: 0.0569
Epoch 5480 | Loss: 0.0567
Epoch 5500 | Loss: 0.0565
Epoch 5520 | Loss: 0.0563


Training:  59%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ    | 5912/10000 [00:02<00:01, 2391.48it/s]

Epoch 5540 | Loss: 0.0561
Epoch 5560 | Loss: 0.0559
Epoch 5580 | Loss: 0.0557
Epoch 5600 | Loss: 0.0555
Epoch 5620 | Loss: 0.0553
Epoch 5640 | Loss: 0.0551
Epoch 5660 | Loss: 0.0549
Epoch 5680 | Loss: 0.0547
Epoch 5700 | Loss: 0.0545
Epoch 5720 | Loss: 0.0544
Epoch 5740 | Loss: 0.0542
Epoch 5760 | Loss: 0.0540
Epoch 5780 | Loss: 0.0538
Epoch 5800 | Loss: 0.0536
Epoch 5820 | Loss: 0.0534
Epoch 5840 | Loss: 0.0533
Epoch 5860 | Loss: 0.0531
Epoch 5880 | Loss: 0.0529
Epoch 5900 | Loss: 0.0527
Epoch 5920 | Loss: 0.0526
Epoch 5940 | Loss: 0.0524
Epoch 5960 | Loss: 0.0522
Epoch 5980 | Loss: 0.0521
Epoch 6000 | Loss: 0.0519


Training:  64%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç   | 6420/10000 [00:02<00:01, 2471.22it/s]

Epoch 6020 | Loss: 0.0517
Epoch 6040 | Loss: 0.0515
Epoch 6060 | Loss: 0.0514
Epoch 6080 | Loss: 0.0512
Epoch 6100 | Loss: 0.0511
Epoch 6120 | Loss: 0.0509
Epoch 6140 | Loss: 0.0507
Epoch 6160 | Loss: 0.0506
Epoch 6180 | Loss: 0.0504
Epoch 6200 | Loss: 0.0502
Epoch 6220 | Loss: 0.0501
Epoch 6240 | Loss: 0.0499
Epoch 6260 | Loss: 0.0498
Epoch 6280 | Loss: 0.0496
Epoch 6300 | Loss: 0.0495
Epoch 6320 | Loss: 0.0493
Epoch 6340 | Loss: 0.0491
Epoch 6360 | Loss: 0.0490
Epoch 6380 | Loss: 0.0488
Epoch 6400 | Loss: 0.0487
Epoch 6420 | Loss: 0.0485
Epoch 6440 | Loss: 0.0484
Epoch 6460 | Loss: 0.0482
Epoch 6480 | Loss: 0.0481
Epoch 6500 | Loss: 0.0480
Epoch 6520 | Loss: 0.0478


Training:  69%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ   | 6914/10000 [00:02<00:01, 2370.14it/s]

Epoch 6540 | Loss: 0.0477
Epoch 6560 | Loss: 0.0475
Epoch 6580 | Loss: 0.0474
Epoch 6600 | Loss: 0.0472
Epoch 6620 | Loss: 0.0471
Epoch 6640 | Loss: 0.0469
Epoch 6660 | Loss: 0.0468
Epoch 6680 | Loss: 0.0467
Epoch 6700 | Loss: 0.0465
Epoch 6720 | Loss: 0.0464
Epoch 6740 | Loss: 0.0463
Epoch 6760 | Loss: 0.0461
Epoch 6780 | Loss: 0.0460
Epoch 6800 | Loss: 0.0458
Epoch 6820 | Loss: 0.0457
Epoch 6840 | Loss: 0.0456
Epoch 6860 | Loss: 0.0454
Epoch 6880 | Loss: 0.0453
Epoch 6900 | Loss: 0.0452
Epoch 6920 | Loss: 0.0451
Epoch 6940 | Loss: 0.0449
Epoch 6960 | Loss: 0.0448
Epoch 6980 | Loss: 0.0447
Epoch 7000 | Loss: 0.0445


Training:  74%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç  | 7410/10000 [00:03<00:01, 2405.84it/s]

Epoch 7020 | Loss: 0.0444
Epoch 7040 | Loss: 0.0443
Epoch 7060 | Loss: 0.0442
Epoch 7080 | Loss: 0.0440
Epoch 7100 | Loss: 0.0439
Epoch 7120 | Loss: 0.0438
Epoch 7140 | Loss: 0.0437
Epoch 7160 | Loss: 0.0435
Epoch 7180 | Loss: 0.0434
Epoch 7200 | Loss: 0.0433
Epoch 7220 | Loss: 0.0432
Epoch 7240 | Loss: 0.0431
Epoch 7260 | Loss: 0.0429
Epoch 7280 | Loss: 0.0428
Epoch 7300 | Loss: 0.0427
Epoch 7320 | Loss: 0.0426
Epoch 7340 | Loss: 0.0425
Epoch 7360 | Loss: 0.0424
Epoch 7380 | Loss: 0.0422
Epoch 7400 | Loss: 0.0421
Epoch 7420 | Loss: 0.0420
Epoch 7440 | Loss: 0.0419
Epoch 7460 | Loss: 0.0418
Epoch 7480 | Loss: 0.0417
Epoch 7500 | Loss: 0.0416


Training:  77%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã  | 7652/10000 [00:03<00:01, 2253.64it/s]

Epoch 7520 | Loss: 0.0414
Epoch 7540 | Loss: 0.0413
Epoch 7560 | Loss: 0.0412
Epoch 7580 | Loss: 0.0411
Epoch 7600 | Loss: 0.0410
Epoch 7620 | Loss: 0.0409
Epoch 7640 | Loss: 0.0408
Epoch 7660 | Loss: 0.0407
Epoch 7680 | Loss: 0.0406
Epoch 7700 | Loss: 0.0405
Epoch 7720 | Loss: 0.0404
Epoch 7740 | Loss: 0.0403
Epoch 7760 | Loss: 0.0401
Epoch 7780 | Loss: 0.0400
Epoch 7800 | Loss: 0.0399
Epoch 7820 | Loss: 0.0398
Epoch 7840 | Loss: 0.0397


Training:  81%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 8084/10000 [00:03<00:01, 1906.19it/s]

Epoch 7860 | Loss: 0.0396
Epoch 7880 | Loss: 0.0395
Epoch 7900 | Loss: 0.0394
Epoch 7920 | Loss: 0.0393
Epoch 7940 | Loss: 0.0392
Epoch 7960 | Loss: 0.0391
Epoch 7980 | Loss: 0.0390
Epoch 8000 | Loss: 0.0389
Epoch 8020 | Loss: 0.0388
Epoch 8040 | Loss: 0.0387
Epoch 8060 | Loss: 0.0386
Epoch 8080 | Loss: 0.0385
Epoch 8100 | Loss: 0.0384
Epoch 8120 | Loss: 0.0383
Epoch 8140 | Loss: 0.0382
Epoch 8160 | Loss: 0.0381
Epoch 8180 | Loss: 0.0381


Training:  85%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç | 8467/10000 [00:03<00:00, 1838.82it/s]

Epoch 8200 | Loss: 0.0380
Epoch 8220 | Loss: 0.0379
Epoch 8240 | Loss: 0.0378
Epoch 8260 | Loss: 0.0377
Epoch 8280 | Loss: 0.0376
Epoch 8300 | Loss: 0.0375
Epoch 8320 | Loss: 0.0374
Epoch 8340 | Loss: 0.0373
Epoch 8360 | Loss: 0.0372
Epoch 8380 | Loss: 0.0371
Epoch 8400 | Loss: 0.0370
Epoch 8420 | Loss: 0.0369
Epoch 8440 | Loss: 0.0369
Epoch 8460 | Loss: 0.0368
Epoch 8480 | Loss: 0.0367
Epoch 8500 | Loss: 0.0366
Epoch 8520 | Loss: 0.0365
Epoch 8540 | Loss: 0.0364
Epoch 8560 | Loss: 0.0363


Training:  88%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä | 8835/10000 [00:03<00:00, 1757.50it/s]

Epoch 8580 | Loss: 0.0362
Epoch 8600 | Loss: 0.0362
Epoch 8620 | Loss: 0.0361
Epoch 8640 | Loss: 0.0360
Epoch 8660 | Loss: 0.0359
Epoch 8680 | Loss: 0.0358
Epoch 8700 | Loss: 0.0357
Epoch 8720 | Loss: 0.0356
Epoch 8740 | Loss: 0.0356
Epoch 8760 | Loss: 0.0355
Epoch 8780 | Loss: 0.0354
Epoch 8800 | Loss: 0.0353
Epoch 8820 | Loss: 0.0352
Epoch 8840 | Loss: 0.0351
Epoch 8860 | Loss: 0.0351
Epoch 8880 | Loss: 0.0350
Epoch 8900 | Loss: 0.0349


Training:  92%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè| 9190/10000 [00:04<00:00, 1682.93it/s]

Epoch 8920 | Loss: 0.0348
Epoch 8940 | Loss: 0.0347
Epoch 8960 | Loss: 0.0347
Epoch 8980 | Loss: 0.0346
Epoch 9000 | Loss: 0.0345
Epoch 9020 | Loss: 0.0344
Epoch 9040 | Loss: 0.0344
Epoch 9060 | Loss: 0.0343
Epoch 9080 | Loss: 0.0342
Epoch 9100 | Loss: 0.0341
Epoch 9120 | Loss: 0.0340
Epoch 9140 | Loss: 0.0340
Epoch 9160 | Loss: 0.0339
Epoch 9180 | Loss: 0.0338
Epoch 9200 | Loss: 0.0337
Epoch 9220 | Loss: 0.0337
Epoch 9240 | Loss: 0.0336
Epoch 9260 | Loss: 0.0335


Training:  95%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 9530/10000 [00:04<00:00, 1649.08it/s]

Epoch 9280 | Loss: 0.0334
Epoch 9300 | Loss: 0.0334
Epoch 9320 | Loss: 0.0333
Epoch 9340 | Loss: 0.0332
Epoch 9360 | Loss: 0.0331
Epoch 9380 | Loss: 0.0331
Epoch 9400 | Loss: 0.0330
Epoch 9420 | Loss: 0.0329
Epoch 9440 | Loss: 0.0329
Epoch 9460 | Loss: 0.0328
Epoch 9480 | Loss: 0.0327
Epoch 9500 | Loss: 0.0326
Epoch 9520 | Loss: 0.0326
Epoch 9540 | Loss: 0.0325
Epoch 9560 | Loss: 0.0324
Epoch 9580 | Loss: 0.0324
Epoch 9600 | Loss: 0.0323


Training:  99%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ| 9904/10000 [00:04<00:00, 1772.24it/s]

Epoch 9620 | Loss: 0.0322
Epoch 9640 | Loss: 0.0322
Epoch 9660 | Loss: 0.0321
Epoch 9680 | Loss: 0.0320
Epoch 9700 | Loss: 0.0319
Epoch 9720 | Loss: 0.0319
Epoch 9740 | Loss: 0.0318
Epoch 9760 | Loss: 0.0317
Epoch 9780 | Loss: 0.0317
Epoch 9800 | Loss: 0.0316
Epoch 9820 | Loss: 0.0315
Epoch 9840 | Loss: 0.0315
Epoch 9860 | Loss: 0.0314
Epoch 9880 | Loss: 0.0313
Epoch 9900 | Loss: 0.0313
Epoch 9920 | Loss: 0.0312


Training: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10000/10000 [00:04<00:00, 2174.17it/s]

Epoch 9940 | Loss: 0.0311
Epoch 9960 | Loss: 0.0311
Epoch 9980 | Loss: 0.0310
Epoch 10000 | Loss: 0.0310
Training complete!
Final Loss: 0.0310





In [32]:
# Extract learned parameters
weights_lib = linear_model.weight.detach().numpy()[0]
bias_lib = linear_model.bias.detach().item()
print(f"Final Weights: {weights_lib}")
print(f"Final Bias: {bias_lib:.4f}")

Final Weights: [-2.117927  4.005323]
Final Bias: 2.0733


In [34]:
# Exercise 4 (continued): Calculate accuracy

###################
# TODO: COMPLETE THE CODE BELOW
# Calculate the training accuracy

# Calculate accuracy
with torch.no_grad():
    final_logits = linear_model(X_linear_tensor)
    final_preds_lib = torch.sigmoid(final_logits)
    # Convert probabilities to binary predictions (threshold = 0.5)
    predicted_classes_lib = (final_preds_lib > 0.5).int().squeeze()
    # Calculate accuracy as percentage of correct predictions
    accuracy_lib = (predicted_classes_lib == y_linear_tensor.squeeze()).float().mean()

print(f"Training Accuracy: {accuracy_lib.item()*100:.1f}%")


Training Accuracy: 100.0%


In [35]:
# Verify results are similar to manual implementation
print("\n" + "=" * 60)
print("Comparison: Manual vs. Library Implementation")
print("=" * 60)
print(f"Loss difference: {abs(losses[-1] - losses_lib[-1]):.6f}")
print(f"Both implementations converged: {abs(losses[-1] - losses_lib[-1]) < 0.01}")
print("‚úì PyTorch's nn.Linear produces equivalent results!")


Comparison: Manual vs. Library Implementation
Loss difference: 0.000829
Both implementations converged: True
‚úì PyTorch's nn.Linear produces equivalent results!


# <img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/write.svg" width="30"/> 2. The Non-Linear Wall: When Lines Don't Work
***

## The Fundamental Limitation of Linear Models

Linear models (including single neurons) can only create **linear decision boundaries**‚Äîstraight lines in 2D, planes in 3D, hyperplanes in higher dimensions. But what happens when the data isn't linearly separable?

### Real-World Non-Linear Systems

Many physical and biological systems exhibit non-linear behavior:
- **Phase transitions**: Solid/liquid/gas states don't follow linear boundaries
- **Chemical kinetics**: Reaction rates often follow non-linear curves (Michaelis-Menten, Hill equations)
- **Bifurcations**: Small parameter changes can cause dramatic system transitions
- **Classification problems**: Cancer vs. healthy cells may cluster in non-linear patterns

## 2.1 The Circles Dataset: A Non-Linear Challenge
***

Let's create a dataset where one class forms a circle inside another class's ring. No straight line can separate these classes.

<img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/code.svg" width="20"/> **Exercise 5**: Generate concentric circles and attempt linear classification.

In [36]:
# Generate concentric circles dataset
np.random.seed(42)
X_circles, y_circles = make_circles(
    n_samples=200,
    noise=0.1,
    factor=0.5,  # Spacing between circles
    random_state=42
)

# Convert to PyTorch tensors
X_circles_tensor = torch.FloatTensor(X_circles)
y_circles_tensor = torch.FloatTensor(y_circles).unsqueeze(1)

print("=" * 60)
print("Circles Dataset Generated")
print("=" * 60)
print(f"Shape: {X_circles.shape} (samples √ó features)")
print(f"Class distribution: {np.bincount(y_circles)}")
print(f"Inner circle (Class 0): {np.sum(y_circles == 0)} samples")
print(f"Outer ring (Class 1): {np.sum(y_circles == 1)} samples")
print("=" * 60)

# Visualize the circles
plot_2d_classification(
    X_circles, y_circles,
    title="Non-Linearly Separable Data: Concentric Circles".upper(),
    show_boundary=False
)

Circles Dataset Generated
Shape: (200, 2) (samples √ó features)
Class distribution: [100 100]
Inner circle (Class 0): 100 samples
Outer ring (Class 1): 100 samples


## 2.2 The Failure of Linear Classification
***

Let's train our linear neuron on the circles dataset and see what happens. Spoiler: it will fail spectacularly, but understanding *why* is crucial.

<img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/code.svg" width="20"/> **Exercise 6**: Train a linear classifier on non-linear data and observe the failure.

In [41]:
# Exercise 6: Train a linear model on the circles dataset

###################
# TODO: COMPLETE THE CODE BELOW
# Train a linear classifier on non-linear data and observe the failure

# Train a linear model on the circles dataset
torch.manual_seed(42)
linear_model_circles = nn.Linear(in_features=2, out_features=1)
criterion = nn.BCELoss()
optimizer = optim.SGD(linear_model_circles.parameters(), lr=0.01)

losses_linear_circles = []
epochs_circles = 500

print("Attempting to fit a straight line to circular data...")
print("=" * 60)

for epoch in tqdm(range(epochs_circles), desc="Training"):
    logits = linear_model_circles(X_circles_tensor)
    predictions = torch.sigmoid(logits)
    loss = criterion(predictions, y_circles_tensor)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    losses_linear_circles.append(loss.item())

    if (epoch + 1) % 50 == 0:
        print(f"Epoch {epoch+1:3d} | Loss: {loss.item():.4f}")


Attempting to fit a straight line to circular data...


Training:  52%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè    | 258/500 [00:00<00:00, 2574.11it/s]

Epoch  50 | Loss: 0.7185
Epoch 100 | Loss: 0.7163
Epoch 150 | Loss: 0.7144
Epoch 200 | Loss: 0.7126
Epoch 250 | Loss: 0.7110
Epoch 300 | Loss: 0.7096
Epoch 350 | Loss: 0.7083
Epoch 400 | Loss: 0.7071
Epoch 450 | Loss: 0.7060


Training: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 500/500 [00:00<00:00, 2424.73it/s]

Epoch 500 | Loss: 0.7050





In [42]:
# Calculate accuracy
with torch.no_grad():
    final_logits = linear_model_circles(X_circles_tensor)
    final_preds = torch.sigmoid(final_logits)
    predicted_classes = (final_preds > 0.5).int().squeeze()
    accuracy_linear = (predicted_classes == y_circles_tensor.squeeze()).float().mean()

print(f"Final Loss: {losses_linear_circles[-1]:.4f}")
print(f"Accuracy: {accuracy_linear.item()*100:.1f}%")
print(f"\n‚ùå Accuracy ‚âà 50% = Random guessing!")
print("The linear model cannot learn the circular pattern.")

Final Loss: 0.7050
Accuracy: 47.5%

‚ùå Accuracy ‚âà 50% = Random guessing!
The linear model cannot learn the circular pattern.


In [43]:
# Visualize the failed attempt
weights_circles_linear = linear_model_circles.weight.detach()[0]
bias_circles_linear = linear_model_circles.bias.detach().item()

plot_2d_classification(
    X_circles, y_circles,
    weights=weights_circles_linear,
    bias=bias_circles_linear,
    title="Linear Model Failure: Straight Line Cannot Fit Circles".upper(),
    show_boundary=True
)

# <img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/write.svg" width="30"/> 3. The Neural Solution: Multi-Layer Perceptrons
***

## Breaking Through the Linear Barrier

The solution: **stack multiple neurons in layers** and introduce **non-linear activation functions** between them. This creates a **Multi-Layer Perceptron (MLP)**, capable of learning arbitrarily complex decision boundaries.

> <img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/docs.svg" width="20"/> **Definition**: A **Multi-Layer Perceptron (MLP)** is a feedforward neural network with:
> - An **input layer** (receives features)
> - One or more **hidden layers** (perform transformations)
> - An **output layer** (produces predictions)
> - **Non-linear activation functions** between layers (enable non-linear mappings)

<div align="center">
  <img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/ann.png" width="60%">
</div>

### Why Multiple Layers Work

**Universal Approximation Theorem**: An MLP with a single hidden layer containing enough neurons can approximate any continuous function to arbitrary precision.

**Intuition**:
- First hidden layer learns **feature combinations** (e.g., $x_1^2 + x_2^2$ for circles)
- Subsequent layers combine these features to create complex boundaries
- Non-linear activations allow "bending" of decision surfaces

## 3.1 Common Activation Functions
***

Activation functions introduce non-linearity. Common choices:

| Activation | Formula | Use Case |
|-----------|---------|----------|
| **Sigmoid** | $$\sigma(x) = \frac{1}{1+e^{-x}}$$ | Output layer (probabilities) |
| **ReLU** | $$\text{ReLU}(x) = \max(0, x)$$ | Hidden layers (fast, effective) |
| **Tanh** | $$\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$$ | Hidden layers (zero-centered) |
| **GeLU** | $$\text{GeLU}(x) = x \cdot \Phi(x)$$ | Modern architectures (smooth) |

<img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/reminder.svg" width="20"/> **Choosing Activations**:
- Use **ReLU** or **GeLU** in hidden layers (default choice)
- Use **sigmoid** for binary classification output
- Use **softmax** for multi-class classification output

## 3.2 Building an MLP in PyTorch
***

Let's build a simple MLP to solve the circles problem:

**Architecture**:
```
Input (2D) ‚Üí Hidden (16 neurons, ReLU) ‚Üí Hidden (8 neurons, ReLU) ‚Üí Output (1 neuron, Sigmoid)
```

<img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/code.svg" width="20"/> **Exercise 7**: Implement an MLP classifier for the circles dataset.

In [58]:
# Exercise 7: Build a Multi-Layer Perceptron (MLP)

###################
# TODO: COMPLETE THE CODE BELOW
# Create an MLP class with multiple hidden layers

class CircleClassifier(nn.Module):
    """
    Multi-Layer Perceptron for binary classification.

    Architecture:
        Input(2) ‚Üí Linear(16) ‚Üí ReLU ‚Üí Linear(8) ‚Üí ReLU ‚Üí Linear(1) ‚Üí Sigmoid

    This demonstrates the standard PyTorch pattern:
    1. Define layers in __init__
    2. Define forward pass in forward()
    3. Activation functions are applied explicitly
    """

    def __init__(self, input_dim=2, hidden_dim1=16, hidden_dim2=8):
        super(CircleClassifier, self).__init__()

        # Layer 1: Input ‚Üí First hidden layer
        self.fc1 = nn.Linear(input_dim, hidden_dim1)  # input_dim ‚Üí hidden_dim1

        # Layer 2: First hidden ‚Üí Second hidden layer
        self.fc2 = nn.Linear(hidden_dim1, hidden_dim2)  # hidden_dim1 ‚Üí hidden_dim2

        # Layer 3: Second hidden ‚Üí Output
        self.fc3 = nn.Linear(hidden_dim2,1)  # hidden_dim2 ‚Üí 1 (binary classification)

        # Activation functions
        self.relu = nn.ReLU()      # ReLU for hidden layers
        self.Sigmoid = nn.Sigmoid()   # Sigmoid for output layer

    def forward(self, x):
        """
        Forward pass through the network.

        Flow: x ‚Üí Linear ‚Üí ReLU ‚Üí Linear ‚Üí ReLU ‚Üí Linear ‚Üí Sigmoid ‚Üí output
        """
        # First hidden layer with ReLU activation
        x = self.fc1(x)
        x = self.relu(x)

        # Second hidden layer with ReLU activation
        x = self.fc2(x)
        x = self.relu(x)

        # Output layer with Sigmoid activation
        x = self.fc3(x)
        x = self.Sigmoid(x)

        return x


In [59]:
# Exercise 7 (continued): Instantiate and inspect the model

###################
# TODO: COMPLETE THE CODE BELOW
# Create an instance of the CircleClassifier and examine its structure

# Instantiate the model
torch.manual_seed(42)
model = CircleClassifier(input_dim=2, hidden_dim1=16, hidden_dim2=8)

print("=" * 60)
print("MLP Architecture")
print("=" * 60)
print(model)
print("=" * 60)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"Total trainable parameters: {total_params}")
print("\nParameter breakdown:")
for name, param in model.named_parameters():
    print(f"  {name:20s}: {param.shape} = {param.numel()} parameters")

print("=" * 60)


MLP Architecture
CircleClassifier(
  (fc1): Linear(in_features=2, out_features=16, bias=True)
  (fc2): Linear(in_features=16, out_features=8, bias=True)
  (fc3): Linear(in_features=8, out_features=1, bias=True)
  (relu): ReLU()
  (Sigmoid): Sigmoid()
)
Total trainable parameters: 193

Parameter breakdown:
  fc1.weight          : torch.Size([16, 2]) = 32 parameters
  fc1.bias            : torch.Size([16]) = 16 parameters
  fc2.weight          : torch.Size([8, 16]) = 128 parameters
  fc2.bias            : torch.Size([8]) = 8 parameters
  fc3.weight          : torch.Size([1, 8]) = 8 parameters
  fc3.bias            : torch.Size([1]) = 1 parameters


## 3.3 Training the MLP
***

Now let's train the MLP on the circles dataset and watch it succeed where the linear model failed.

<img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/code.svg" width="20"/> **Exercise 8**: Train the MLP and visualize the non-linear decision boundary.

In [60]:
# Exercise 8: Train the MLP on the circles dataset

###################
# TODO: COMPLETE THE CODE BELOW
# Implement the training loop for the MLP

# Training setup
criterion = nn.BCELoss()  # Binary Cross-Entropy loss
optimizer = optim.Adam(model.parameters(), lr=0.01)  # Adam optimizer (better than SGD)

epochs_mlp = 500
losses_mlp = []

print("Training MLP on circles dataset...")
print("=" * 60)

for epoch in tqdm(range(epochs_mlp), desc="Training MLP"):
    # Forward pass (sigmoid is already in the model)
    predictions = model(X_circles_tensor)

    # Compute loss
    loss = criterion(predictions, y_circles_tensor)

    # Backward pass and optimization
    optimizer.zero_grad()    # Clear previous gradients
    loss.backward()         # Compute gradients
    optimizer.step()    # Update parameters

    losses_mlp.append(loss.item())

    if (epoch + 1) % 100 == 0:
        # Calculate current accuracy
        with torch.no_grad():
            current_preds = model(X_circles_tensor)
            predicted_classes = (current_preds > 0.5).int().squeeze()
            accuracy = (predicted_classes == y_circles_tensor.squeeze()).float().mean()
        print(f"Epoch {epoch+1:3d} | Loss: {loss.item():.4f} | Accuracy: {accuracy.item()*100:.1f}%")

print("=" * 60)
print("Training complete!")


Training MLP on circles dataset...


Training MLP:  16%|‚ñà‚ñå        | 78/500 [00:00<00:00, 778.03it/s]

Epoch 100 | Loss: 0.0414 | Accuracy: 99.5%


Training MLP:  54%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç    | 271/500 [00:00<00:00, 908.89it/s]

Epoch 200 | Loss: 0.0160 | Accuracy: 99.5%


Training MLP:  72%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  | 362/500 [00:00<00:00, 838.27it/s]

Epoch 300 | Loss: 0.0120 | Accuracy: 99.5%


Training MLP:  90%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ | 448/500 [00:00<00:00, 845.56it/s]

Epoch 400 | Loss: 0.0104 | Accuracy: 99.5%


Training MLP: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 500/500 [00:00<00:00, 856.85it/s]

Epoch 500 | Loss: 0.0093 | Accuracy: 99.5%
Training complete!





In [61]:
# Exercise 8 (continued): Evaluate the MLP

###################
# TODO: COMPLETE THE CODE BELOW
# Calculate final accuracy and compare with linear model

# Final evaluation
with torch.no_grad():
    final_predictions = model(X_circles_tensor)
    predicted_classes = (final_predictions > 0.5).int().squeeze()
    final_accuracy = (predicted_classes == y_circles_tensor.squeeze()).float().mean()

print(f"Final Loss: {losses_mlp[-1]:.4f}")
print(f"Final Accuracy: {final_accuracy.item()*100:.1f}%")
print(f"\n‚úì The MLP successfully learned the circular pattern!")
print(f"  Linear model accuracy: {accuracy_linear.item()*100:.1f}%")
print(f"  MLP accuracy: {final_accuracy.item()*100:.1f}%")
print(f"  Improvement: +{(final_accuracy.item() - accuracy_linear.item())*100:.1f}%")


Final Loss: 0.0093
Final Accuracy: 99.5%

‚úì The MLP successfully learned the circular pattern!
  Linear model accuracy: 47.5%
  MLP accuracy: 99.5%
  Improvement: +52.0%


## 3.4 Visualizing the Non-Linear Decision Boundary
***

The true power of MLPs becomes apparent when we visualize the **probability field**‚Äîthe predicted probability $P(y=1)$ at every point in the feature space. For the circles dataset, this field should show a "well" in the center (low probability) surrounded by a "hill" (high probability).

<img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/reminder.svg" width="20"/> **Physical Analogy**: Think of the probability field as a potential energy surface or phase diagram. The decision boundary (P=0.5) is analogous to a phase transition or separatrix in dynamical systems.

In [62]:
print("\n‚úì The probability contours clearly 'hug' the circular structure!")
print("  Blue region (P ‚âà 0): Inner circle")
print("  Red region (P ‚âà 1): Outer ring")
print("  Boundary (P = 0.5): Curved separation")


‚úì The probability contours clearly 'hug' the circular structure!
  Blue region (P ‚âà 0): Inner circle
  Red region (P ‚âà 1): Outer ring
  Boundary (P = 0.5): Curved separation


In [63]:
# Visualize the probability field with contours
plot_2d_classification(
    X_circles, y_circles,
    title="MLP Decision Boundary: Non-Linear Probability Field",
    show_boundary=False,
    model=model,
    show_probabilities=True
)

## 3.5 Comparative Analysis: Linear vs. Non-Linear
***

Let's create a side-by-side comparison to see the dramatic difference between linear and non-linear models.

In [64]:
# Create comprehensive comparison visualization
fig = plot_model_comparison(
    X=X_circles,
    y=y_circles,
    linear_weights=weights_circles_linear,
    linear_bias=bias_circles_linear,
    mlp_model=model,
    losses_linear=losses_linear_circles,
    losses_mlp=losses_mlp,
    accuracy_linear=accuracy_linear.item(),
    accuracy_mlp=final_accuracy.item(),
    width=1400,
    height=450
)
fig.show()

print("\n" + "=" * 60)
print("KEY OBSERVATIONS")
print("=" * 60)
print("1. Linear Model: Straight boundary, ~50% accuracy (random)")
print("2. MLP Model: Curved boundary perfectly hugs the circle")
print("3. Loss: MLP converges to near-zero, linear plateaus high")
print("=" * 60)


KEY OBSERVATIONS
1. Linear Model: Straight boundary, ~50% accuracy (random)
2. MLP Model: Curved boundary perfectly hugs the circle
3. Loss: MLP converges to near-zero, linear plateaus high


# <img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/write.svg" width="30"/> 4. Summary and Key Takeaways
***

## What We've Learned

### 1. **Single Neurons = Linear Regression + Activation**
   - A neuron computes $\hat{y} = \sigma(\mathbf{w}^T \mathbf{x} + b)$
   - This is logistic regression for binary classification
   - Decision boundary is a straight line (hyperplane)
   - Works perfectly for linearly separable data

### 2. **Linear Models Have Fundamental Limitations**
   - Cannot learn non-linear patterns (circles, spirals, XOR, etc.)
   - Accuracy plateaus at ~50% for non-separable data
   - No amount of training can overcome this structural limitation
   - Real-world data often exhibits non-linear structure

### 3. **Multi-Layer Perceptrons Break the Linear Barrier**
   - Stack multiple layers with non-linear activations
   - Hidden layers learn feature combinations
   - Can approximate arbitrary continuous functions
   - Decision boundaries "bend" to fit complex patterns

### 4. **The PyTorch Standard Operating Procedure**
   ```python
   class Model(nn.Module):
       def __init__(self):
           # Define layers
       
       def forward(self, x):
           # Define forward pass
   
   model = Model()
   optimizer = optim.Adam(model.parameters())
   criterion = nn.BCELoss()
   
   for epoch in range(epochs):
       predictions = model(inputs)
       loss = criterion(predictions, targets)
       optimizer.zero_grad()
       loss.backward()
       optimizer.step()
   ```

## Physical and Scientific Parallels

| Concept | Neural Network | Physical System |
|---------|---------------|-----------------|
| **Decision Boundary** | P(y=1) = 0.5 surface | Phase transition, separatrix |
| **Probability Field** | P(y=1) at each point | Potential energy, field strength |
| **Gradient Descent** | Parameter optimization | Variational methods, energy minimization |
| **Non-linearity** | ReLU, Sigmoid, Tanh | Saturation, bifurcations, hysteresis |
| **Hidden Layers** | Feature extraction | Latent variables, order parameters |


**Next Steps**:
1. Try modifying the MLP architecture (more layers, different activations)
2. Experiment with the `make_moons` dataset (another non-linear challenge)
3. Visualize how hidden layer features evolve during training
4. Explore other datasets from your scientific domain

<img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/reminder.svg" width="20"/> **Remember**: Neural networks are tools, not magic. Understanding their mechanics empowers you to apply them rigorously to scientific problems.

## 4.1 Bonus Challenge: The Moons Dataset
***

Ready to test your understanding? The `make_moons` dataset presents another non-linear challenge‚Äîtwo interleaving half-circles.

<img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/code.svg" width="20"/> **Bonus Exercise**: Apply what you've learned to solve the moons problem!

In [65]:
# Generate moons dataset
np.random.seed(42)
X_moons, y_moons = make_moons(n_samples=200, noise=0.15, random_state=42)

# Convert to tensors
X_moons_tensor = torch.FloatTensor(X_moons)
y_moons_tensor = torch.FloatTensor(y_moons).unsqueeze(1)

print("=" * 60)
print("Moons Dataset Generated")
print("=" * 60)
print(f"Shape: {X_moons.shape}")
print(f"Class distribution: {np.bincount(y_moons)}")
print("=" * 60)

# Visualize the moons
plot_2d_classification(
    X_moons, y_moons,
    title="Bonus Challenge: Interleaving Moons",
    show_boundary=False
)

# TODO: Your turn!
# 1. Create a new MLP class called MoonClassifier
# 2. Train it on X_moons_tensor and y_moons_tensor
# 3. Visualize the decision boundary
# 4. Compare linear vs. MLP performance

print("\nüí° Hint: Use the same structure as CircleClassifier!")
print("üí° Try experimenting with different architectures:")
print("   - Change the number of hidden layers")
print("   - Change the number of neurons per layer")
print("   - Try different activation functions (Tanh, LeakyReLU, etc.)")

Moons Dataset Generated
Shape: (200, 2)
Class distribution: [100 100]

üí° Hint: Use the same structure as CircleClassifier!
üí° Try experimenting with different architectures:
   - Change the number of hidden layers
   - Change the number of neurons per layer
   - Try different activation functions (Tanh, LeakyReLU, etc.)


### Bonus Solution

Here's one possible solution for the moons dataset:

In [66]:
# Solution: MLP for moons dataset
class MoonClassifier(nn.Module):
    """MLP classifier for the moons dataset"""

    def __init__(self, input_dim=2, hidden_dim1=16, hidden_dim2=8):
        super(MoonClassifier, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim1)
        self.fc2 = nn.Linear(hidden_dim1, hidden_dim2)
        self.fc3 = nn.Linear(hidden_dim2, 1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.sigmoid(self.fc3(x))
        return x

# Create and train model
torch.manual_seed(42)
model_moons = MoonClassifier()
criterion = nn.BCELoss()
optimizer = optim.Adam(model_moons.parameters(), lr=0.01)

losses_moons = []
for epoch in range(500):
    predictions = model_moons(X_moons_tensor)
    loss = criterion(predictions, y_moons_tensor)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    losses_moons.append(loss.item())

# Evaluate
with torch.no_grad():
    final_predictions = model_moons(X_moons_tensor)
    predicted_classes = (final_predictions > 0.5).int().squeeze()
    accuracy_moons = (predicted_classes == y_moons_tensor.squeeze()).float().mean()

print("=" * 60)
print("Moons Challenge Results")
print("=" * 60)
print(f"Final Loss: {losses_moons[-1]:.4f}")
print(f"Accuracy: {accuracy_moons.item()*100:.1f}%")
print("=" * 60)

# Visualize result
plot_2d_classification(
    X_moons, y_moons,
    title="MLP Solution: Moons Dataset",
    show_boundary=False,
    model=model_moons,
    show_probabilities=True
)

print("\n‚úì Challenge complete! The MLP successfully learned the moon pattern.")

Moons Challenge Results
Final Loss: 0.0048
Accuracy: 100.0%

‚úì Challenge complete! The MLP successfully learned the moon pattern.
