| # | Task | Status |
|---|---|---|
| 1 | MATLAB Model — floating-point & fixed-point Q3.13 CORDIC QR reference | ✅ Done |
| 2 | RTL Design — QR stage FSM, CORDIC datapath, AXI-Stream wrapper | ✅ Done |
| 3 | RTL Synthesis — Vivado synthesis & timing closure | 🔲 Pending |
| 5 | R⁻¹ Stage — CORDIC reciprocal in RTL | 🔲 Pending |
| 6 | A⁻¹ Assembly Stage | 🔲 Pending |
| 4 | UVM Verification Environemnt — UVM testbench with MATLAB HDL Verifier talking to RTL via DPI-C | 🔲 Pending |
An FPGA hardware accelerator for real-time 3×3 matrix inversion using the QR decomposition method, implemented entirely with iterative CORDIC arithmetic on a fixed-point Q3.13 datapath.
Computing A⁻¹ directly requires divisions and square roots that are expensive in hardware. Instead, this design factors the problem using QR decomposition:
where Q is orthogonal (
Because Q is orthogonal,
Input A (3×3)
│
▼
┌─────────────┐ ← Step 1 — this block
│ QR Stage │──> R matrix (upper triangular)
│ CORDIC-QR │──> G1, G2, G3 (Givens params: c, s per stage)
└─────────────┘
│
▼
┌─────────────┐ ← Step 2
│ R Inverse │──> R⁻¹ via CORDIC reciprocals + back-substitution
└─────────────┘
│
▼
┌─────────────┐ ← Step 3
│ Q Assembly │──> Q = (G3·G2·G1)ᵀ built from stored (c, s) pairs
└─────────────┘
│
▼
┌──────────────────┐ ← Step 4
│ A⁻¹ = R⁻¹ · Qᵀ │──> Final inverse output
└──────────────────┘****
The QR stage is the first and most compute-intensive step. It decomposes A into Q and R using three successive Givens rotations, each driven by a CORDIC vectoring + rotation kernel.
Three stages zero the three sub-diagonal entries of A:
| Stage | Vector pair | Rotated pairs | Eliminates |
|---|---|---|---|
| 1 | (a₁₁, a₂₁) | (a₁₂, a₂₂), (a₁₃, a₂₃) | a₂₁ → 0 |
| 2 | (a₁₁, a₃₁) | (a₁₂, a₃₂), (a₁₃, a₃₃) | a₃₁ → 0 |
| 3 | (a₂₂, a₃₂) | (a₂₃, a₃₃) | a₃₂ → 0 |
Four CORDIC units operate in parallel per stage, sharing direction bits and pre-rotation flags from the vector unit:
- Vector CORDIC — drives y to zero; outputs
angle_direction,swap,neg_x,neg_yflags each iteration and assertsslv_enableto gate all slave rotators. - Rotation CORDIC CS — rotates fixed vector (0, 1);
s = x_out,c = y_out— Givens cosine/sine for this stage. - Rotation CORDIC R0 — updates matrix element pair 1 in-place.
- Rotation CORDIC R1 — updates matrix element pair 2; receives (0, 0) in stage 3 for uniform timing.
The wrapper qr_stage_axis_slave accepts the 3×3 matrix as a stream of 9 Q3.13 samples (row-major: a₁₁, a₁₂ … a₃₃):
| Signal | Dir | Description |
|---|---|---|
s_axis_tvalid |
in | Element valid |
s_axis_tready |
out | Ready to accept — low during compute |
s_axis_tdata |
in | Matrix element, Q3.13 signed 16-bit |
| Signal | Description |
|---|---|
r_elem_o[0:8] |
Flattened 3×3 R matrix (Q3.13) |
g_c_o[0:2] |
Cosine c of each Givens stage |
g_s_o[0:2] |
Sine s of each Givens stage |
out_valid_o |
Outputs valid, held until out_ready_i |
| Parameter | Default | Description |
|---|---|---|
DATA_WIDTH |
16 | Word width in bits |
NUM_ITERATIONS |
16 | CORDIC iterations per stage |
FRAC_BITS |
13 | Fractional bits (Q3.13 format) |
| File | Description |
|---|---|
src/qr_stage.sv |
Core QR FSM and CORDIC datapath |
src/qr_stage_axis_slave.sv |
AXI-Stream wrapper |
src/vector_cordic.sv |
Vector CORDIC (vectoring mode) |
src/rotation_cordic.sv |
Rotation CORDIC (rotation mode) |
MATLAB/cordic_qr_3x3_demo_fp.m |
Fixed-point golden reference |
