Skip to content

libfpga/fpga-neuron

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fpga-neuron

A neural network from first principles to FPGA hardware: train it, quantize it, synthesize it, prove it.

This is an educational repository. In ~200 lines of dependency-free Python and ~100 lines of Verilog-2005, it walks the complete journey: gradient descent on the CPU, fixed-point quantization, a hardware neuron, a synthesizable network, and an exhaustive testbench that proves the silicon-ready design matches the math, on every one of 293 test vectors.

The punchline is worth spoiling: the finished network is 832 LUT4s and zero flip-flops. A neural network that is pure combinational logic: inputs go in, the answer falls out, no clock required.

The problem: XOR

XOR is the classic "why we need hidden layers" function: no single straight line separates its classes, so a single neuron can't learn it. A hidden layer bends the space; four ReLU neurons are plenty.

x0 x1 y
0 0 0
0 1 1
1 0 1
1 1 0

Step 1: what a neuron is

One artificial neuron

A neuron multiplies each input by a learned weight, sums the products, adds a learned bias, and applies a nonlinearity (here ReLU: max(0, v)). In hardware that's multipliers, an adder, a comparator and a mux. Nothing else. The mystique of "neural" evaporates pleasantly when you draw the schematic.

Step 2: train it (py/train.py)

Pure-Python gradient descent, no frameworks, because at this size nothing is hidden behind a library call. Forward pass, mean-squared error, backpropagation by hand: ~40 lines.

Training loss

One honest lesson baked in: ReLU networks can die. With an unlucky initialization every hidden pre-activation goes negative, gradients vanish, and the loss parks at 0.25 forever. The script searches seeds until training converges and the quantized network is still correct, which is both reproducible and true to real practice: initialization matters.

The trained float network draws this decision surface (green = "1"), the XOR diagonal band:

Float decision boundary

Step 3: quantize it

Hardware wants fixed point. Every weight, bias and activation becomes Q4.4: 8 bits, 4 fraction bits, range ±8, resolution 1/16 (why so few bits works, and when it wouldn't, is the subject of How many bits do you actually need?).

The quantized, integer-only model in train.py (fixed_forward) is bit-exact with the Verilog: same shifts, same saturation, same threshold. It regenerates two Verilog headers: weights.vh (the learned parameters) and golden.vh (expected outputs for a 17×17 grid over the input square).

The quantized decision boundary, now a hard yes/no:

Fixed-point decision boundary

Step 4: the hardware (rtl/)

Datapath

neuron.v is one parameterized neuron following the golden rule of fixed-point datapaths, multiply narrow, accumulate wide: Q4.4×Q4.4 products land in an 18-bit accumulator, ReLU clamps, and only then does the result get resized back to Q4.4 with saturation.

xor_net.v instantiates four hidden neurons and a thresholded output dot product:

Network topology

Step 5: prove it (tb/)

The testbench checks the four canonical XOR cases and then sweeps the full 17×17 grid, comparing every output against the Python golden model:

TB PASS: xor_net (293 vectors)

Not "looks right on the waveform." Proven equal to the model, at every point we can enumerate. For a network this size, exhaustive verification is cheap, take it.

Run everything yourself

python3 py/train.py     # train, quantize, emit weights.vh + golden.vh
python3 py/plots.py     # regenerate the images from the artifacts
make                    # lint (Verilator), simulate (Icarus), synth (Yosys)

Requires iverilog, verilator, yosys, all packaged on most distros. No Python dependencies at all.

Where to go next

  • Make the network pipelined: register each layer, trade latency for clock speed. (The zero-FF version is the fun fact; the pipelined version is what a real datapath looks like.)
  • Grow it: MNIST-scale inference is the same ideas with more of everything; that's the libfpga library's upcoming neural micro-kit.
  • Learn the building blocks interactively: the free course and Verilog playground at libfpga.com, where you can paste neuron.v and watch it work.
  • Why FPGAs suit neural networks in the first place: the fabric is already shaped like one.

Follow @libfpga for new modules, examples and releases.

License

MIT · Copyright (c) 2026 Antonio Roldao, Ph.D.

About

A neural network from first principles to FPGA hardware: train, quantize, synthesize, verify. Educational, heavily illustrated.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors