You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A fully manual implementation of a feedforward neural network that learns a 2D binary classification boundary using the Adam optimiser. No deep learning frameworks are used, meaning every forward pass, analytical backpropagation step, and weight update is written from scratch in NumPy.
Adapted from the worked example in Higham & Higham (2019).
Overview
Given a random set of labelled 2D points whose classes are defined by a Voronoi tessellation, the network learns to approximate the class boundary. Training progress is captured as an animation, showing the decision region evolve from noise to a well-fitted boundary.
This is an example output:
Step 2 500 - loss ≈ 0.228
Step 21 700 - loss ≈ 0.00003
Network still undecided
Boundary tightly fitted
Results may vary due to the random generation of voronoi patches and training samples.
Data Generation
Training points $\mathbf{x}_i \in [0,1]^2$ are drawn uniformly at random. Labels are assigned by a Voronoi rule: three green seeds $G = {g_1, g_2, g_3}$ and three blue seeds $B = {b_1, b_2, b_3}$ are fixed, and each point is assigned the colour of its nearest seed:
All weights and biases are concatenated into a single parameter vector $\mathbf{w} \in \mathbb{R}^p$ (where $p = 2{\cdot}2 + 8{\cdot}2 + 2{\cdot}8 + 2 + 8 + 2 = 46$) for uniform treatment during optimisation.
Loss Function
The mean squared error over the $n$ training pairs $(x_i, y_i)$ is
The factor $\tfrac{1}{2}$ is a standard convenience that cancels with the exponent when differentiating.
Backpropagation
Gradients are computed analytically via backpropagation. Defining the pre-activations $\mathbf{z}^{(k)}$ and post-activations $\mathbf{a}^{(k)}$ for a single sample $\mathbf{x}$:
The error signals (deltas), created by the successive application of the chain rule for the many inner derivatives, are propagated backwards using the chain rule (with regards to the correct label y):
where $\odot$ is the element-wise (Hadamard) product: given two vectors of the same length, it multiplies corresponding entries rather than computing a dot product or matrix product. It appears here because the chain rule requires multiplying the upstream error signal $\boldsymbol{\delta}$ by the local derivative $\sigma'(\mathbf{z})$ at each neuron independently — a scalar multiplication per neuron, not a linear combination across neurons.
The gradient contributions for each layer $k \in {1, 2, 3}$ (indexing the three weight matrices $W_1, W_2, W_3$) are then:
with the convention $\mathbf{a}^{(0)} = \mathbf{x}$.
Optimiser — Adam (Adaptive Moment Estimation)
The ordinary gradient descent $\mathbf{w} \leftarrow \mathbf{w} - \eta \nabla\mathcal{L}$ uses the same step size for every parameter. Adam maintains per-parameter running estimates of the first and second moments of the gradient, effectively giving each weight its own adaptive learning rate.
Intuition. The denominator $\sqrt{\hat{\mathbf{v}}_t}$ is a running estimate of the gradient's root-mean-square. Dividing by it normalises the effective step: parameters whose gradients are consistently large receive a smaller update (automatic dampening), while parameters with small, uncertain gradients receive a larger relative update. The numerator $\hat{\mathbf{m}}_t$ is a momentum term that smooths the noisy gradient signal, helping the optimiser maintain direction across iterations.
Hyperparameters used
Parameter
Value
Meaning
$\eta$
$10^{-3}$
global learning rate
$\beta_1$
$0.9$
exponential decay for 1st moment
$\beta_2$
$0.999$
exponential decay for 2nd moment
$\varepsilon$
$10^{-8}$
numerical stability constant
tol
$10^{-7}$
convergence threshold on $|\mathbf{g}_t|$
iter
$10^5$
maximum number of iterations
Usage
The script will:
Display the true Voronoi boundary with the random seeds and training points.
Run Adam training, printing loss and gradient norm every 1 000 steps.
Open an animated plot showing the network's decision region evolving over training.
Reference
Higham, C. F., & Higham, D. J. (2019). Deep learning: An introduction for applied mathematicians.SIAM Review, 61(4), 860–891. https://doi.org/10.1137/18M1165748
About
A basic 2d decision boundary classifier neural network, implemented from scratch in python for studying.