Skip to content

mfkiwl/HLS_FPGA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HLS_FPGA

🚀 Real-Time High-Level Synthesis (HLS) Projects for UltraScale+ FPGAs. Accelerated design pipelines, dataflow architectures, and low-latency compute cores — powered by Vitis HLS, AXI4-Stream, and signal processing for 5G, SDR, and HFT applications. 🔧🔬💡

🔥 HLS_FPGA: High-Level Synthesis Design Vault

Welcome to the HLS_FPGA repository — a cutting-edge collection of real-world, production-grade FPGA accelerators built with Vitis HLS, tailored for:

  • 🛰️ Software-Defined Radio (SDR)
  • 📡 5G / O-RAN Signal Processing
  • 📈 High-Frequency Trading (HFT)
  • 🧠 AI/ML Edge Inference
  • 🔐 Embedded Security Accelerators

🚀 Key Features

  • Fully pipelined, low-latency compute chains (filter → gain → envelope)
  • AXI4-Stream + AXI-MM optimized for UltraScale+ SoCs
  • Complete Vivado-ready project structure
  • Benchmark reports: latency, throughput, utilization
  • 🧪 Co-Sim & Waveform validation included

🧠 Concepts Covered

  • #pragma HLS DATAFLOW, PIPELINE, UNROLL
  • hls::stream, ap_fixed, hls::vector
  • Loop optimization, interface pragmas, and dataflow chaining
  • AXI4-Lite control interface + AXI-MM streaming buffer logic

💡 Use Cases

  • Real-time DSP on FPGAs
  • Hardware-accelerated quant finance
  • Latency-critical radio and vision systems

🤝 Contribute

Want to optimize a pipeline, add a new design, or share your performance results? Fork this repo and open a PR. Let’s redefine hardware acceleration together.


📣 Let's make FPGAs mainstream for developers and innovators everywhere. Star ⭐ the repo, share with your network, and ignite the hardware revolution!

🚀 RF Doppler Spoof Detection Engine

📌 Overview

This project implements a real-time FPGA/HLS-based Doppler-Aware Spoof Detection Engine designed to secure RF and GNSS systems from spoofing attacks. Built with Vitis HLS + Vivado, it validates end-to-end dataflow from high-level C++ through synthesis, implementation, and hardware deployment on Xilinx Zynq UltraScale+ (ZCU104).


🔒 Key Features

  • ADC Demux + FFT Pipeline: Continuous real-time spectral scanning.
  • Magnitude Estimator: Computes sqrt(I² + Q²) per cycle.
  • Peak Detector: Finds max FFT bin + magnitude.
  • Spoof Rule Engine: Detects spoofing with (edge_bin && mag > threshold) logic.
  • Ultra-Low Latency: Timing closure at 7.107 ns, Fmax ≈ 140.71 MHz.
  • Reusable IP: Packaged as an AXI4-Stream IP core for seamless FPGA/SoC integration.

📊 System Architecture

proj1_final_arch_slide9

Block Flow

  • Input (adc_stream) → Demux → FFT → Magnitude → Peak Detection → Spoof Rule → Output Alert Stream.

✅ Verification Flow

1. C/RTL Co-Simulation

rf_doppler_spoof_engine_slide7

  • Bit-accurate functional verification between C++ and RTL.
  • Validated pipeline parallelism and throughput.

2. Cycle-Accurate Verification

rf_doppler_spoof_engine_slide6

  • Confirmed cycle-by-cycle correctness of streams, handshake, and throughput.

⚡ Synthesis & Timing

rf_doppler_spoof_engine_slide5

  • Timing closed at 7.107 ns (target: 10 ns).
  • Achieved Fmax: ~140.71 MHz.
  • Efficient resource utilization: LUT: 158 | FF: 130 | BRAM: 0 | DSP: 0.

🔐 Spoof Detection Validated

rf_doppler_spoof_engine_slide4

  • Testcases confirmed:
    • Edge spoof → Detected 🚨
    • Center strong (clean) → No alert ✅
    • Weak edge noise → Ignored ✅

🏗️ Implementation

rf_doppler_spoof_engine_slide8

  • Vivado Implementation reports: timing met, routing clean.
  • Packaged as AXI4-Stream IP for integration.

🖥️ Customized IP Core

rf_doppler_spoof_engine_slide2

  • IP block packaged for Zynq SoC integration.
  • Interfaces: adc_stream, alert_stream, ap_ctrl, ap_clk, ap_rst_n.

🔄 From HLS to Hardware

rf_doppler_spoof_engine_slide1

  • Integrated into ZCU104 block design.
  • End-to-end hardware datapath verified with DMA → FIFO → Spoof Engine → AXI interconnect.

📍 Place & Route (PnR)

rf_doppler_spoof_engine_slide3

  • Clean placement on ZCU104 FPGA fabric.
  • Resource-efficient footprint.

🌐 Impact

This project demonstrates how FPGA + HLS can deliver production-grade, ultra-low-latency spoof detection for GNSS, RF, and quantum-secure communication systems.

⚡ How to Build & Run

Open project in Vitis HLS 2024.1+

Run C Simulation → C/RTL Co-Simulation

Export RTL → Vivado → Package as IP

Integrate into Zynq block design

Generate Bitstream & Deploy

📌 Author 👤 John Bagshaw | FPGA/HLS & RF Security Expert | 7+ Years Industry Experience

✨ If you believe secure RF/GNSS systems are critical for the future — let’s connect!

🚨 BREAKING: FPGA-Accelerated DSP Pipelines ⚡📈

Just completed — a fully pipelined, real-time DSP accelerator optimized using Vitis HLS for High-Frequency Trading, signal intelligence, and low-latency inferencing.

🧠 Project: dsp_pipeline — HLS + AXI4-Stream + Dataflow 🔥


🚀 Why This Is Mind-Blowing

Dataflow Rocket: Every block—load_process, fir_filter, gain_adjust, store_process—runs truly in parallel. No bubbles. Ultrafast throughput.
Timing Dominated: Required: 10ns. Achieved: 4.641ns. That’s 216+ MHz Fmax without even trying.
DSP-Efficient: Just 18 DSPs, 0 BRAMs, 0 URAMs. UltraLow-resource. UltraFast.
Latency Optimized: Full processing latency = 1,052 cycles (under 10μs @ 100MHz)
Waveform Proven: Cosim ✅. RTL trace ✅. FIFO depth ✅. Parallel timeline ✅. Every bit validated.


💥 Why It Matters for HFT

When nanoseconds = millions, a pipelined FPGA-based DSP engine like this can preprocess, filter, amplify, and route market signals before a single CPU interrupt.
No latency tax. No wasted cycles.


🔁 Use Cases

  • Market signal denoising before tick injection
  • Real-time RF front-ends (e.g., GNSS, O-RAN, EW)
  • Quant pipelines for proprietary strategy compute
  • On-FPGA ML feature pre-processing (waveforms → vectors)

📸 Visual Validation Gallery

Report Preview
🔧 Post-Implementation Resource Usage Synthesis Table
⏱️ Co-Simulation Timeline Trace Timeline Trace
🔁 Waveform Trace Waveform 1 Waveform 2
🧠 Code Analyzer Code Analyzer
🧮 Schedule Viewer Schedule Viewer
📊 Synthesis Report Synthesis Report
📟 C Simulation Output C Sim Output

🚀 Lock-Free FPGA Queue for High-Frequency Trading (HFT)

This is not your average FIFO. This is a fully pipelined, BRAM-backed, single-producer single-consumer (SPSC) queue designed to obliterate lock-based latency in high-frequency trading systems.


🔥 Why It Matters

When nanoseconds = millions, software locks just don’t cut it. This lock-free SPSC queue:

  • ❌ No locks
  • ❌ No FIFOs
  • ❌ No DSPs
  • ✅ Fully pipelined
  • ✅ 100% deterministic timing
  • ✅ BRAM circular buffer
  • ✅ Synthesized with II=1, latency = 5 cycles
  • ✅ CoSim, CSim, and Waveform ✅

💡 Built For:

  • FIX/ITCH packet pipelines
  • Market snapshot replay
  • FPGA risk-aware strategy engines
  • Predictable FIFO-less queuing across accelerators

📊 Performance Snapshot

Metric Value
CoSim ✅ Passed
II 1
Latency 5 cycles
FFs 38
LUTs 175
BRAM 1
URAM 0
DSPs 0

🧪 Visual Proof

Preview Description
AXI4 + BRAM Block Design
RTL Cosim Waveform
RTL Cosim
Vivado Synthesis
Real Trade Loop Output

🙌 Contribute

Ideas? Want to scale this to MPMC or adaptive queues? PRs welcome.


🙌 Contribute

Pull requests, forks, and optimizations welcome.

🚀 HLS-based 5G FIR Chain Accelerator

📈 Ultra-efficient, pipelined, and zero-stall FIR+Gain+Decimation DSP Engine built with Vitis HLS

🔥 Project Summary

Imagine streaming 1024 samples through a FIR → Gain → Decimation pipeline in ~5µs on FPGA — no kernel, no interrupts, no stalls.
We built it. It’s fast, deterministic, and verified.

This HLS project demonstrates a fully pipelined digital signal processing accelerator optimized for 5G PHY, radar DSP, and ultra-low-latency financial compute (HFT).


⛓️ Processing Pipeline

load_input() ↓ fir_filter() ↓ gain_control() ↓ decimator() ↓ store_output()

✅ Built with #pragma HLS DATAFLOW and hls::stream for full task-level parallelism
✅ AXI4-Stream interfaces, scalable, synthesizable
✅ UltraScale+ friendly (ZCU104/ZCU9EG)
✅ Bit-accurate waveform and timing validated


💡 Technical Highlights

Feature Status
⏱ FIR Latency ~1,052 cycles
🚀 Max Frequency (Fmax) > 200 MHz
🧠 DSP Slice Usage 18 DSPs only
📦 BRAM / URAM Usage Zero
📉 Decimation Factor
✅ Timing Closure ✔️ Clean
🔬 Waveform Match ✔️ Co-sim + RTL

📸 Waveform & Benchmark Snapshots

Co-Simulation Cycle-Accurate Synthesis Report
WF CA SYN
Latency Timeline Vivado Block Design Vitis HLS Design Implementation
TT BD DI
Benchmark
Benchmark

🔍 Quick Analysis

Metric Value
🔄 Full Pipeline Latency 1,052 cycles
🧠 DSP Usage 17 DSPs
🎯 Fmax Achieved (Post-Impl) > 214 MHz (4.671ns period)
⚙️ Timing Closure ✅ Achieved
🧩 Slices / LUT / FF 0 / 3431 / 4256

🎯 Use Cases

  • 📡 5G Uplink/Downlink PHY Chains
  • 🛰️ Radar Signal Envelope Filtering
  • 💸 HFT Market Signal Preprocessing
  • 🤖 Edge AI Signal Cleaning (Pre-ML)

🧪 How to Run Locally

1. Open in Vitis HLS 2023.2 or later

2. Import source files

3. Run C Simulation

4. Run C/RTL Co-Simulation

5. Check reports for Fmax and utilization

🚀 HFT_Volatility_Engine

A Real-Time Rolling Volatility Engine Designed for High-Frequency Trading — in Hardware.

This project computes rolling variance of tick price streams using a fully pipelined architecture implemented in Vitis HLS with AXI4-Stream interfaces. It mimics real-world financial models used in HFT, portfolio risk estimation, and embedded analytics engines — but executes entirely in silicon.


💡 Key Features

  • ✅ Zero software latency — pure hardware acceleration
  • ✅ Fully pipelined (II = 1) for deterministic throughput
  • ✅ Real-time streaming via hls::stream<tick_t>
  • ✅ Fixed-point arithmetic with ap_fixed
  • ✅ Modular stages: fork_input, preload_window, compute_variance
  • ✅ Validated via waveform, CSim, CoSim, synthesis, and timing

📈 Rolling Variance Formula

σ² = (Σxᵢ²)/W − (Σxᵢ/W)²

Where:

  • W = rolling window size
  • xᵢ = tick price at time i

⚙️ How It Works

  1. fork_input() splits tick input into two streaming branches
  2. preload_window() initializes window sums for mean and variance
  3. compute_variance() maintains a sliding window using shift registers, computing:

All modules are connected using hls::stream and operate concurrently under #pragma HLS DATAFLOW.


📊 Performance Summary

Metric Value
Latency 6.079 ns
Fmax 151.72 MHz
II 1
DSPs 16
LUTs 8,587
FFs 9,693
BRAMs 6

📸 Waveform & Benchmark Snapshots

Co-Simulation Output Synthesis Utilization Latency Timeline Testbench Waveform 1 Testbench Waveform 2
RTL Cosimulation Synthesis Timeline Waveform1 Waveform2

💬 Real-World Use Cases

  • ⚡ FPGA-native volatility filters in high-frequency trading
  • 📉 DeFi and crypto risk analytics with ultra-low latency
  • 🧠 AI models fed by hardware-accelerated financial stats
  • 🚀 Edge inference on risk/volatility for embedded finance

🤝 Contributing

Pull requests are welcome.
Feel free to fork and enhance — new features, new financial indicators, AXI-MM integration, or CI automation.


👤 Author

John Bagshaw
Award-winning researcher and HLS/FPGA/C++ Engineer
🔗 LinkedIn

DM me if you're hiring, collaborating, or building FPGA-based quant infrastructure.


⭐ Support

If this project inspires you:

  • ⭐ Star the repo
  • 🔁 Share with fellow quant or hardware devs
  • 💬 Comment your ideas or feature requests

Let’s build finance at the speed of light. 🚀
Silicon-native volatility is here.

🚀 MAC_Accelerator_HLS

Real-Time Multiply-Accumulate Hardware Accelerator built using Vitis HLS for DSP/AI workloads.
🧠 Pipelined. 💡 Coefficient-Instantiated. ⚡ Ultra-low-latency.
🎯 Ideal for real-time 5G PHY, radar pipelines, CNN inference, and embedded vision systems.


🔧 Architecture Overview

AXI4-Stream Input ↓ load_input() ↓ multiply_accumulate() ↓ store_result() ↓ AXI4-Stream Output


📦 Features

  • Fully Pipelined Top-Level MAC
  • Function Instantiation for coefficient-based specialization
  • AXI4-Stream I/O + AXI-Lite Control (Vivado/Vitis Ready)
  • II = 1 (Initiation Interval of 1) for Real-Time Performance
  • Waveform, Cosim, Synthesis Snapshots Included

🧠 Core Functions

// Reads two operands from AXI4-Stream

  • void load_input(hls::stream<axis_t>& in, data_t& a, data_t& b);

// Computes (a + b) * coefficient

  • acc_t multiply_accumulate(data_t a, data_t b, data_t coefficient);

// Streams result to AXI4-Stream

  • void store_result(hls::stream<axis_t>& out, acc_t result);

  • 🔁 mac_top(...) wires it all into a fully pipelined, real-time MAC unit.

🧠 Core Functions


🧪 Testbench Output

MAC result: 90


📸 Visual Highlights

3-Panel View

Co-Simulation Waveform Throughput Timeline Synthesis Report
Waveform Timeline Synthesis

2-Panel View

Implementation Block Diagram Control Analysis
Block Diagram Control Analysis

📂 Files Included

File Name Description
HLS_MAC_accel.hpp Top-level function and type declarations
HLS_MAC_accel.cpp Pipeline logic with function directives
HLS_MAC_accel_tb.cpp AXI stream testbench simulation

🔗 Interface Summary

Port Name Description
input_stream_operands AXI4-Stream input (two operands)
output_stream_result AXI4-Stream output (MAC result)
coefficient AXI4-Lite control register for MAC weight
ctrl_axi_lite AXI4-Lite interface
mac_clk Clock signal
mac_reset_n Active-low reset
mac_done_interrupt Signals when result is valid

💡 Real-World Applications

  • 📶 5G Baseband DSP
  • 🧠 Edge AI Inference Engines
  • 🛰️ Satellite Imaging & Radar
  • 💻 Embedded Vision SoCs
  • 📊 High-Frequency Financial Analytics

🤝 Contributing

Want to port this to Zynq, add burst DMA, or wrap it in a full CNN accelerator?
Fork, enhance, and tag me!


🔥 Let’s Go Viral Together

If you're building ultra-low-latency pipelines, this project is your launchpad.
Star this repo and share your MAC benchmarks.

About

🚀 Real-Time High-Level Synthesis (HLS) Projects for UltraScale+ FPGAs. Accelerated design pipelines, dataflow architectures, and low-latency compute cores — powered by Vitis HLS, AXI4-Stream, and signal processing for 5G, SDR, and HFT applications. 🔧🔬💡

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • C++ 100.0%