🚀 Real-Time High-Level Synthesis (HLS) Projects for UltraScale+ FPGAs. Accelerated design pipelines, dataflow architectures, and low-latency compute cores — powered by Vitis HLS, AXI4-Stream, and signal processing for 5G, SDR, and HFT applications. 🔧🔬💡
Welcome to the HLS_FPGA repository — a cutting-edge collection of real-world, production-grade FPGA accelerators built with Vitis HLS, tailored for:
- 🛰️ Software-Defined Radio (SDR)
- 📡 5G / O-RAN Signal Processing
- 📈 High-Frequency Trading (HFT)
- 🧠 AI/ML Edge Inference
- 🔐 Embedded Security Accelerators
- Fully pipelined, low-latency compute chains (filter → gain → envelope)
- AXI4-Stream + AXI-MM optimized for UltraScale+ SoCs
- Complete Vivado-ready project structure
- Benchmark reports: latency, throughput, utilization
- 🧪 Co-Sim & Waveform validation included
#pragma HLS DATAFLOW,PIPELINE,UNROLLhls::stream,ap_fixed,hls::vector- Loop optimization, interface pragmas, and dataflow chaining
- AXI4-Lite control interface + AXI-MM streaming buffer logic
- Real-time DSP on FPGAs
- Hardware-accelerated quant finance
- Latency-critical radio and vision systems
Want to optimize a pipeline, add a new design, or share your performance results? Fork this repo and open a PR. Let’s redefine hardware acceleration together.
📣 Let's make FPGAs mainstream for developers and innovators everywhere. Star ⭐ the repo, share with your network, and ignite the hardware revolution!
This project implements a real-time FPGA/HLS-based Doppler-Aware Spoof Detection Engine designed to secure RF and GNSS systems from spoofing attacks. Built with Vitis HLS + Vivado, it validates end-to-end dataflow from high-level C++ through synthesis, implementation, and hardware deployment on Xilinx Zynq UltraScale+ (ZCU104).
- ADC Demux + FFT Pipeline: Continuous real-time spectral scanning.
- Magnitude Estimator: Computes
sqrt(I² + Q²)per cycle. - Peak Detector: Finds max FFT bin + magnitude.
- Spoof Rule Engine: Detects spoofing with
(edge_bin && mag > threshold)logic. - Ultra-Low Latency: Timing closure at 7.107 ns, Fmax ≈ 140.71 MHz.
- Reusable IP: Packaged as an AXI4-Stream IP core for seamless FPGA/SoC integration.
- Input (adc_stream) → Demux → FFT → Magnitude → Peak Detection → Spoof Rule → Output Alert Stream.
- Bit-accurate functional verification between C++ and RTL.
- Validated pipeline parallelism and throughput.
- Confirmed cycle-by-cycle correctness of streams, handshake, and throughput.
- Timing closed at 7.107 ns (target: 10 ns).
- Achieved Fmax: ~140.71 MHz.
- Efficient resource utilization: LUT: 158 | FF: 130 | BRAM: 0 | DSP: 0.
- Testcases confirmed:
- Edge spoof → Detected 🚨
- Center strong (clean) → No alert ✅
- Weak edge noise → Ignored ✅
- Vivado Implementation reports: timing met, routing clean.
- Packaged as AXI4-Stream IP for integration.
- IP block packaged for Zynq SoC integration.
- Interfaces:
adc_stream,alert_stream,ap_ctrl,ap_clk,ap_rst_n.
- Integrated into ZCU104 block design.
- End-to-end hardware datapath verified with DMA → FIFO → Spoof Engine → AXI interconnect.
- Clean placement on ZCU104 FPGA fabric.
- Resource-efficient footprint.
This project demonstrates how FPGA + HLS can deliver production-grade, ultra-low-latency spoof detection for GNSS, RF, and quantum-secure communication systems.
Open project in Vitis HLS 2024.1+
Run C Simulation → C/RTL Co-Simulation
Export RTL → Vivado → Package as IP
Integrate into Zynq block design
Generate Bitstream & Deploy
📌 Author 👤 John Bagshaw | FPGA/HLS & RF Security Expert | 7+ Years Industry Experience
✨ If you believe secure RF/GNSS systems are critical for the future — let’s connect!
Just completed — a fully pipelined, real-time DSP accelerator optimized using Vitis HLS for High-Frequency Trading, signal intelligence, and low-latency inferencing.
🧠 Project: dsp_pipeline — HLS + AXI4-Stream + Dataflow 🔥
✅ Dataflow Rocket: Every block—load_process, fir_filter, gain_adjust, store_process—runs truly in parallel. No bubbles. Ultrafast throughput.
✅ Timing Dominated: Required: 10ns. Achieved: 4.641ns. That’s 216+ MHz Fmax without even trying.
✅ DSP-Efficient: Just 18 DSPs, 0 BRAMs, 0 URAMs. UltraLow-resource. UltraFast.
✅ Latency Optimized: Full processing latency = 1,052 cycles (under 10μs @ 100MHz)
✅ Waveform Proven: Cosim ✅. RTL trace ✅. FIFO depth ✅. Parallel timeline ✅. Every bit validated.
When nanoseconds = millions, a pipelined FPGA-based DSP engine like this can preprocess, filter, amplify, and route market signals before a single CPU interrupt.
No latency tax. No wasted cycles.
- Market signal denoising before tick injection
- Real-time RF front-ends (e.g., GNSS, O-RAN, EW)
- Quant pipelines for proprietary strategy compute
- On-FPGA ML feature pre-processing (waveforms → vectors)
| Report | Preview |
|---|---|
| 🔧 Post-Implementation Resource Usage | ![]() |
| ⏱️ Co-Simulation Timeline Trace | ![]() |
| 🔁 Waveform Trace | ![]() |
| 🧠 Code Analyzer | ![]() |
| 🧮 Schedule Viewer | ![]() |
| 📊 Synthesis Report | ![]() |
| 📟 C Simulation Output | ![]() |
This is not your average FIFO. This is a fully pipelined, BRAM-backed, single-producer single-consumer (SPSC) queue designed to obliterate lock-based latency in high-frequency trading systems.
When nanoseconds = millions, software locks just don’t cut it. This lock-free SPSC queue:
- ❌ No locks
- ❌ No FIFOs
- ❌ No DSPs
- ✅ Fully pipelined
- ✅ 100% deterministic timing
- ✅ BRAM circular buffer
- ✅ Synthesized with II=1, latency = 5 cycles
- ✅ CoSim, CSim, and Waveform ✅
- FIX/ITCH packet pipelines
- Market snapshot replay
- FPGA risk-aware strategy engines
- Predictable FIFO-less queuing across accelerators
| Metric | Value |
|---|---|
| CoSim | ✅ Passed |
| II | 1 |
| Latency | 5 cycles |
| FFs | 38 |
| LUTs | 175 |
| BRAM | 1 |
| URAM | 0 |
| DSPs | 0 |
| Preview | Description |
|---|---|
![]() |
AXI4 + BRAM Block Design |
![]() |
RTL Cosim Waveform |
![]() |
RTL Cosim |
![]() |
Vivado Synthesis |
![]() |
Real Trade Loop Output |
Ideas? Want to scale this to MPMC or adaptive queues? PRs welcome.
Pull requests, forks, and optimizations welcome.
📈 Ultra-efficient, pipelined, and zero-stall FIR+Gain+Decimation DSP Engine built with Vitis HLS
Imagine streaming 1024 samples through a FIR → Gain → Decimation pipeline in ~5µs on FPGA — no kernel, no interrupts, no stalls.
We built it. It’s fast, deterministic, and verified.
This HLS project demonstrates a fully pipelined digital signal processing accelerator optimized for 5G PHY, radar DSP, and ultra-low-latency financial compute (HFT).
load_input() ↓ fir_filter() ↓ gain_control() ↓ decimator() ↓ store_output()
✅ Built with #pragma HLS DATAFLOW and hls::stream for full task-level parallelism
✅ AXI4-Stream interfaces, scalable, synthesizable
✅ UltraScale+ friendly (ZCU104/ZCU9EG)
✅ Bit-accurate waveform and timing validated
| Feature | Status |
|---|---|
| ⏱ FIR Latency | ~1,052 cycles |
| 🚀 Max Frequency (Fmax) | > 200 MHz |
| 🧠 DSP Slice Usage | 18 DSPs only |
| 📦 BRAM / URAM Usage | Zero |
| 📉 Decimation Factor | 2× |
| ✅ Timing Closure | ✔️ Clean |
| 🔬 Waveform Match | ✔️ Co-sim + RTL |
| Co-Simulation | Cycle-Accurate | Synthesis Report |
|---|---|---|
| Latency Timeline | Vivado Block Design | Vitis HLS Design Implementation |
|---|---|---|
| Benchmark |
|---|
| Metric | Value |
|---|---|
| 🔄 Full Pipeline Latency | 1,052 cycles |
| 🧠 DSP Usage | 17 DSPs |
| 🎯 Fmax Achieved (Post-Impl) | > 214 MHz (4.671ns period) |
| ⚙️ Timing Closure | ✅ Achieved |
| 🧩 Slices / LUT / FF | 0 / 3431 / 4256 |
- 📡 5G Uplink/Downlink PHY Chains
- 🛰️ Radar Signal Envelope Filtering
- 💸 HFT Market Signal Preprocessing
- 🤖 Edge AI Signal Cleaning (Pre-ML)
A Real-Time Rolling Volatility Engine Designed for High-Frequency Trading — in Hardware.
This project computes rolling variance of tick price streams using a fully pipelined architecture implemented in Vitis HLS with AXI4-Stream interfaces. It mimics real-world financial models used in HFT, portfolio risk estimation, and embedded analytics engines — but executes entirely in silicon.
- ✅ Zero software latency — pure hardware acceleration
- ✅ Fully pipelined (II = 1) for deterministic throughput
- ✅ Real-time streaming via
hls::stream<tick_t> - ✅ Fixed-point arithmetic with
ap_fixed - ✅ Modular stages:
fork_input,preload_window,compute_variance - ✅ Validated via waveform, CSim, CoSim, synthesis, and timing
σ² = (Σxᵢ²)/W − (Σxᵢ/W)²
Where:
- W = rolling window size
- xᵢ = tick price at time
i
fork_input()splits tick input into two streaming branchespreload_window()initializes window sums for mean and variancecompute_variance()maintains a sliding window using shift registers, computing:
All modules are connected using hls::stream and operate concurrently under #pragma HLS DATAFLOW.
| Metric | Value |
|---|---|
| Latency | 6.079 ns |
| Fmax | 151.72 MHz |
| II | 1 |
| DSPs | 16 |
| LUTs | 8,587 |
| FFs | 9,693 |
| BRAMs | 6 |
| Co-Simulation Output | Synthesis Utilization | Latency Timeline | Testbench Waveform 1 | Testbench Waveform 2 |
|---|---|---|---|---|
- ⚡ FPGA-native volatility filters in high-frequency trading
- 📉 DeFi and crypto risk analytics with ultra-low latency
- 🧠 AI models fed by hardware-accelerated financial stats
- 🚀 Edge inference on risk/volatility for embedded finance
Pull requests are welcome.
Feel free to fork and enhance — new features, new financial indicators, AXI-MM integration, or CI automation.
John Bagshaw
Award-winning researcher and HLS/FPGA/C++ Engineer
🔗 LinkedIn
DM me if you're hiring, collaborating, or building FPGA-based quant infrastructure.
If this project inspires you:
- ⭐ Star the repo
- 🔁 Share with fellow quant or hardware devs
- 💬 Comment your ideas or feature requests
Let’s build finance at the speed of light. 🚀
Silicon-native volatility is here.
Real-Time Multiply-Accumulate Hardware Accelerator built using Vitis HLS for DSP/AI workloads.
🧠 Pipelined. 💡 Coefficient-Instantiated. ⚡ Ultra-low-latency.
🎯 Ideal for real-time 5G PHY, radar pipelines, CNN inference, and embedded vision systems.
AXI4-Stream Input ↓ load_input() ↓ multiply_accumulate() ↓ store_result() ↓ AXI4-Stream Output
- ✅ Fully Pipelined Top-Level MAC
- ✅ Function Instantiation for coefficient-based specialization
- ✅ AXI4-Stream I/O + AXI-Lite Control (Vivado/Vitis Ready)
- ✅ II = 1 (Initiation Interval of 1) for Real-Time Performance
- ✅ Waveform, Cosim, Synthesis Snapshots Included
// Reads two operands from AXI4-Stream
- void load_input(hls::stream<axis_t>& in, data_t& a, data_t& b);
// Computes (a + b) * coefficient
- acc_t multiply_accumulate(data_t a, data_t b, data_t coefficient);
// Streams result to AXI4-Stream
-
void store_result(hls::stream<axis_t>& out, acc_t result);
-
🔁 mac_top(...) wires it all into a fully pipelined, real-time MAC unit.
MAC result: 90
| Co-Simulation Waveform | Throughput Timeline | Synthesis Report |
|---|---|---|
| Implementation Block Diagram | Control Analysis |
|---|---|
| File Name | Description |
|---|---|
HLS_MAC_accel.hpp |
Top-level function and type declarations |
HLS_MAC_accel.cpp |
Pipeline logic with function directives |
HLS_MAC_accel_tb.cpp |
AXI stream testbench simulation |
| Port Name | Description |
|---|---|
input_stream_operands |
AXI4-Stream input (two operands) |
output_stream_result |
AXI4-Stream output (MAC result) |
coefficient |
AXI4-Lite control register for MAC weight |
ctrl_axi_lite |
AXI4-Lite interface |
mac_clk |
Clock signal |
mac_reset_n |
Active-low reset |
mac_done_interrupt |
Signals when result is valid |
- 📶 5G Baseband DSP
- 🧠 Edge AI Inference Engines
- 🛰️ Satellite Imaging & Radar
- 💻 Embedded Vision SoCs
- 📊 High-Frequency Financial Analytics
Want to port this to Zynq, add burst DMA, or wrap it in a full CNN accelerator?
Fork, enhance, and tag me!
If you're building ultra-low-latency pipelines, this project is your launchpad.
⭐ Star this repo and share your MAC benchmarks.












