This repository contains a fully-pipelined, synthesizable 16-tap FIR (Finite Impulse Response) filter implemented in Verilog HDL.
The design demonstrates fixed-point DSP techniques, deterministic pipelining, a balanced adder tree, and full cycle-accurate verification using a self-checking testbench.
- Author
- Introduction
- What Is Fixed-Point Arithmetic?
- FIR Filter Theory
- Coefficient Design
- Architecture Overview
- Module Descriptions
- Simulation Waveforms
- Possible Improvements
- Conclusion
- License
Rom Barak
B.Sc. Electrical Engineering, Bar-Ilan University
Focus: Nanoelectronics and Communication Systems
This project implements a complete fixed-point low-pass FIR filter using:
- 16-tap input shift register
- Constant Q1.15 coefficients
- Two-stage pipelined MAC core
- Balanced 36-bit adder tree
- Fully self-checking testbench
The design outputs one filtered sample per clock with a deterministic two-cycle latency
and is fully synthesizable for FPGA/ASIC integration.
Fixed-point (Q-format) is widely used in hardware DSP due to:
- Low area and power
- Deterministic rounding/overflow
- Reproducible bit-accurate behavior
- Efficient synthesis and timing closure
- 1 sign bit
- 15 fractional bits
- Range: −1.0 ≤ x < +1.0
- Resolution ≈ 3.05×10⁻⁵
Multiplications:
- 16-bit × 16-bit → 32-bit
- Summing 16 products → extended to 36 bits to avoid overflow.
A 16-tap FIR computes:
[ y[n] = \sum_{k=0}^{15} h[k] \cdot x[n-k] ]
FIR properties:
- Always stable
- Linear phase (symmetric coefficients)
- Deterministic timing
- Ideal for audio, comms, and sensor filtering
The filter uses the following symmetric low-pass FIR coefficients:
[-84, -53, 120, 240, 350, 420, 450, 460,
460, 450, 420, 350, 240, 120, -53, -84]
Why they were chosen:
- Symmetry → linear phase
- Smooth shape → good attenuation
- Normalized → avoids internal overflow
- 16 taps → optimal area/latency tradeoff
| Block | Description |
|---|---|
| Shift Register | Stores the last 16 samples |
| Coefficient ROM | Provides Q1.15 taps |
| MAC Core | 16 multipliers + 36-bit adder tree |
| Top-Level | Wiring + control |
| Testbench | Full cycle-accurate verification |
Latency: 2 cycles
Throughput: 1 output/clock
Stores 16 sequential samples (tap0 = newest).
Provides a stable 256-bit packed bus to the MAC core.
Shows correct shifting of samples each cycle and proper reset initialization.
Provides 16 constant signed Q1.15 coefficients.
Zero latency, fully synthesizable.
Each tap × coefficient pair is multiplied in parallel
and stored in 32-bit registers.
Shows correct multiplier timing and stable registered outputs.
A balanced four-stage adder tree reduces:
16 → 8 → 4 → 2 → 1
Shows progressive accumulation of partial sums without overflow.
Registers the final 36-bit result → ensures deterministic 2-cycle latency.
Shows exact 2-cycle delay between input activity and final output.
Connects the entire FIR structure:
sample_in ──► shift_reg ──► samples_flat
(samples_flat, coeffs) ──► MAC ──► y_out
Clock/reset propagate synchronously to ensure stable timing.
Features:
- Software reference FIR
- 2-cycle aligned comparison
- Automatic mismatch detection
- Stimuli: ramp, step, alternating, random
- Generates
dump.vcdfor waveform inspection
- Programmable coefficients
- 32/64/128-tap variants
- Deeper pipelining (>500 MHz)
- SIMD multi-channel FIR
- AXI-Stream interface
- Half-band / polyphase design
This project demonstrates a modular, synthesizable, pipelined FIR filter with:
- Q-format arithmetic
- Parallel multipliers
- Balanced adder tree
- Deterministic timing
- Self-checking verification
Ideal for FPGA, ASIC, and DSP learning environments.
Open for academic and educational use.
Modifications are welcome with credit.







