fpga-neuron

A neural network from first principles to FPGA hardware: train it, quantize it, synthesize it, prove it.

This is an educational repository. In ~200 lines of dependency-free Python and ~100 lines of Verilog-2005, it walks the complete journey: gradient descent on the CPU, fixed-point quantization, a hardware neuron, a synthesizable network, and an exhaustive testbench that proves the silicon-ready design matches the math, on every one of 293 test vectors.

The punchline is worth spoiling: the finished network is 832 LUT4s and zero flip-flops. A neural network that is pure combinational logic: inputs go in, the answer falls out, no clock required.

The problem: XOR

XOR is the classic "why we need hidden layers" function: no single straight line separates its classes, so a single neuron can't learn it. A hidden layer bends the space; four ReLU neurons are plenty.

x0	x1	y
0	0	0
0	1	1
1	0	1
1	1	0

Step 1: what a neuron is

A neuron multiplies each input by a learned weight, sums the products, adds a learned bias, and applies a nonlinearity (here ReLU: max(0, v)). In hardware that's multipliers, an adder, a comparator and a mux. Nothing else. The mystique of "neural" evaporates pleasantly when you draw the schematic.

Step 2: train it (py/train.py)

Pure-Python gradient descent, no frameworks, because at this size nothing is hidden behind a library call. Forward pass, mean-squared error, backpropagation by hand: ~40 lines.

One honest lesson baked in: ReLU networks can die. With an unlucky initialization every hidden pre-activation goes negative, gradients vanish, and the loss parks at 0.25 forever. The script searches seeds until training converges and the quantized network is still correct, which is both reproducible and true to real practice: initialization matters.

The trained float network draws this decision surface (green = "1"), the XOR diagonal band:

Step 3: quantize it

Hardware wants fixed point. Every weight, bias and activation becomes Q4.4: 8 bits, 4 fraction bits, range ±8, resolution 1/16 (why so few bits works, and when it wouldn't, is the subject of How many bits do you actually need?).

The quantized, integer-only model in train.py (fixed_forward) is bit-exact with the Verilog: same shifts, same saturation, same threshold. It regenerates two Verilog headers: weights.vh (the learned parameters) and golden.vh (expected outputs for a 17×17 grid over the input square).

The quantized decision boundary, now a hard yes/no:

Step 4: the hardware (rtl/)

neuron.v is one parameterized neuron following the golden rule of fixed-point datapaths, multiply narrow, accumulate wide: Q4.4×Q4.4 products land in an 18-bit accumulator, ReLU clamps, and only then does the result get resized back to Q4.4 with saturation.

xor_net.v instantiates four hidden neurons and a thresholded output dot product:

Step 5: prove it (tb/)

The testbench checks the four canonical XOR cases and then sweeps the full 17×17 grid, comparing every output against the Python golden model:

TB PASS: xor_net (293 vectors)

Not "looks right on the waveform." Proven equal to the model, at every point we can enumerate. For a network this size, exhaustive verification is cheap, take it.

Run everything yourself

python3 py/train.py     # train, quantize, emit weights.vh + golden.vh
python3 py/plots.py     # regenerate the images from the artifacts
make                    # lint (Verilator), simulate (Icarus), synth (Yosys)

Requires iverilog, verilator, yosys, all packaged on most distros. No Python dependencies at all.

Where to go next

Make the network pipelined: register each layer, trade latency for clock speed. (The zero-FF version is the fun fact; the pipelined version is what a real datapath looks like.)
Grow it: MNIST-scale inference is the same ideas with more of everything; that's the libfpga library's upcoming neural micro-kit.
Learn the building blocks interactively: the free course and Verilog playground at libfpga.com, where you can paste neuron.v and watch it work.
Why FPGAs suit neural networks in the first place: the fabric is already shaped like one.

Follow @libfpga for new modules, examples and releases.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
docs/img		docs/img
py		py
rtl		rtl
tb		tb
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fpga-neuron

The problem: XOR

Step 1: what a neuron is

Step 2: train it (py/train.py)

Step 3: quantize it

Step 4: the hardware (rtl/)

Step 5: prove it (tb/)

Run everything yourself

Where to go next

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

fpga-neuron

The problem: XOR

Step 1: what a neuron is

Step 2: train it (py/train.py)

Step 3: quantize it

Step 4: the hardware (rtl/)

Step 5: prove it (tb/)

Run everything yourself

Where to go next

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages