A bfloat16 matrix multiply-accumulate (MMA) unit written in Amaranth HDL.
It builds bottom-up from arithmetic primitives to a 4×4×4 MAC array (MMA)
that computes D = A·B + C over bf16 matrices, accumulating in extended (26-bit
mantissa) precision and rounding to bf16 only at the output.
uv syncThis installs the project editable, putting src/ on the import path so tests
can from bfloat16 import ... directly.
uv run pytest test/ -v # all tests
uv run pytest test/ --vcd # also dump .vcd waveformsruff check --fix && ruff formatbf16_mac.py(BF16_MAC) is the fused multiply-add core.pe_mac.pywraps it with a registered accumulator.mma.py(MMA) is the 16-PE array.
The rest are standalone arithmetic primitives (adders, aligner, normalizer, LZA, multiplier, rounder).
amaranth.simbenches.test_mma.pyholds the single-rounding FMA reference model.
The per-primitive files cover the building blocks.