Skip to content

miniCoder6/RV5

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

420 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ISA FPGA Clock Pipeline HDL Firmware Status

RV5 — RV32IMFB Pipelined FPGA Search Engine

RV5 is a complete System-on-Chip (SoC) implementation of a hardware-accelerated fuzzy string search engine built around a from-scratch RV32IMFB RISC-V processor core. The entire design is written in Verilog-2001, synthesized for the Digilent Nexys A7 (Xilinx Artix-7 xc7a100tcsg324-1), and runs at a stable 50 MHz derived from the on-board 100 MHz oscillator.

The fundamental goal: demonstrate that a modest FPGA can perform real-time, encrypted, compressed, fuzzy dictionary search with all acceleration logic tightly coupled to a real RISC-V pipeline through a clean MMIO interface. Every entry is Bloom-filtered, LZ77-compressed, and AES-128-encrypted at ingest. At query time, the pipeline reverses — decrypt, decompress, then compute Levenshtein edit distance in hardware — returning the best fuzzy match in microseconds.


Table of Contents

  1. Project Overview
  2. Architecture Overview
  3. CPU Core — RV32IMFB Five-Stage Pipeline
  4. L1 Cache Hierarchy
  5. Hardware Coprocessors
  6. MMIO Address Map
  7. CSR Performance Counters
  8. Firmware & Data Pipeline
  9. Host Scripts
  10. FPGA Board I/O
  11. Repository Structure
  12. Building & Programming
  13. Demo Data & Test Vectors
  14. Performance Benchmarks
  15. Testbench Suite
  16. Design Phases

1. Project Overview

What Makes RV5 Unique?

Feature Detail
ISA RV32IMFB — Integer, Multiply/Divide, Float (F-ext), Bit-Manip (Zbb)
Pipeline 5-stage (IF→ID→EX→MEM→WB) with full forwarding and hazard detection
Branch Prediction 2-bit saturating BHT + Branch Target Buffer (BTB) + RAS for calls/returns
Cache L1 I-Cache (64-line direct-mapped) + L1 D-Cache (64-line write-through)
Coprocessors Bloom Filter · LZ77 · AES-128 Enc/Dec · Levenshtein Systolic Array
DMA Engine Search Sequencer driving D-BRAM port B autonomously
UART Interface 115200 baud full-duplex, firmware command shell
7-Segment Telemetry 12-page multiplexed display (PC, miss counts, search result, cycles…)
LED Modes Telemetry, PC heatmap, Knight-rider idle, SUCC/FAIL interactive

2. Architecture Overview

High-Level SoC Architecture

flowchart TB
    subgraph BOARD["Nexys A7 — xc7a100tcsg324-1"]
        subgraph CLK["Clock and Reset"]
            C1["100 MHz OSC"]
            C2["Divide-by-2 Toggle FF + BUFG\nclk_50mhz"]
            C3["Reset Synchronizer\ncpu_reset_n active LOW"]
            C1 --> C2
        end

        subgraph CPU["RISCV_CPU — RV32IMFB"]
            P1["IF — I-Cache · BHT · BTB · RAS"]
            P2["ID — RF · ImmGen · Control · Hazard"]
            P3["EX — ALU · BPU · Div · Zbb · FPU"]
            P4["MEM — D-Cache · LDU · MMIO"]
            P5["WB — Mux · Regfile Write"]
            P1 --> P2 --> P3 --> P4 --> P5
        end

        subgraph MEM["Memory"]
            M1["I_BRAM — 64 KB firmware"]
            M2["D_BRAM — 64 KB data\ndual-port + DMA port B"]
        end

        subgraph CACHE["L1 Cache Hierarchy"]
            CA1["I-Cache\n64-line direct-mapped\n8 words per line"]
            CA2["D-Cache\n64-line write-through\n8 words per line"]
        end

        subgraph MMIO["MMIO Coprocessors — 0xF000_0000+"]
            MM1["UART\n115200 baud TX/RX"]
            MM2["Bloom Filter\n8192-bit · 3 Murmur hashes"]
            MM3["LZ77\n256B window · 256 parallel cmp"]
            MM4["AES-128\nEnc + Dec · 10-cycle iterative"]
            MM5["Fuzzy Search\nLevenshtein Systolic Array"]
            MM6["Search Sequencer\nDMA Engine"]
        end

        subgraph IO["Board I/O"]
            IO1["UART RXD/TXD\nCP2102 USB Bridge"]
            IO2["LED 15:0\nTelemetry / SUCC/FAIL"]
            IO3["SW 15:0\nMode / Page Select"]
            IO4["7-Seg Display\n8-digit · 12 telemetry pages"]
        end
    end

    C2 --> CPU
    CPU -- "imem" --> CA1 --> M1
    CPU -- "dmem" --> CA2 --> M2
    CPU -- "MMIO 0xF000_0000" --> MM1 & MM2 & MM3 & MM4 & MM5 & MM6
    MM6 -- "DMA port B" --> M2
    MM1 <--> IO1
    CPU --> IO2
    IO3 --> CPU
    MM5 --> IO4

    style BOARD fill:#1a1a2e,fill-opacity:0.05,stroke:#444,stroke-width:2px
    style CLK fill:#43a047,fill-opacity:0.1,stroke:#43a047,stroke-width:2px
    style CPU fill:#1e88e5,fill-opacity:0.1,stroke:#1e88e5,stroke-width:2px
    style MEM fill:#6d4c41,fill-opacity:0.1,stroke:#6d4c41,stroke-width:2px
    style CACHE fill:#00897b,fill-opacity:0.1,stroke:#00897b,stroke-width:2px
    style MMIO fill:#8e24aa,fill-opacity:0.1,stroke:#8e24aa,stroke-width:2px
    style IO fill:#fb8c00,fill-opacity:0.1,stroke:#fb8c00,stroke-width:2px
Loading

3. CPU Core — RV32IMFB Five-Stage Pipeline

The core (rtl/core/RISCV_CPU.v) implements a textbook 5-stage in-order pipeline with full data forwarding, hazard detection, branch prediction, and optional FPU stall integration.

3.1 Pipeline Datapath

flowchart LR
    subgraph IF["IF — Instruction Fetch"]
        IF1["PC Register"]
        IF2["I-Cache\nBHT · BTB · RAS"]
        IF1 --> IF2
    end

    subgraph ID["ID — Decode"]
        ID1["Register File\n32 x 32-bit"]
        ID2["ImmGen\nI/S/B/U/J types"]
        ID3["Control Unit\nopcode decode"]
        ID4["Hazard Unit\nload-use · FPU · UART"]
    end

    subgraph EX["EX — Execute"]
        EX1["ALU 32-bit"]
        EX2["BPU Branch Resolve"]
        EX3["Div Unit RV32M"]
        EX4["Zbb Bit-Manip"]
        EX5["FPU IEEE 754"]
    end

    subgraph MEM["MEM — Memory"]
        MEM1["D-Cache Write-Through"]
        MEM2["LDU byte/half/word"]
        MEM3["MMIO 0xF000_0000+"]
    end

    subgraph WB["WB — Writeback"]
        WB1["WB Mux\nALU / MEM / PC+4 / Imm"]
        WB2["RF Write"]
        WB1 --> WB2
    end

    IF -->|"IF/ID reg"| ID
    ID -->|"ID/EX reg"| EX
    EX -->|"EX/MEM reg"| MEM
    MEM -->|"MEM/WB reg"| WB
    WB -->|"forward WB to EX"| EX
    MEM -->|"forward MEM to EX"| EX
    EX -->|"branch/jump flush"| IF
    ID4 -->|"load-use stall"| IF

    style IF fill:#1e88e5,fill-opacity:0.1,stroke:#1e88e5,stroke-width:2px
    style ID fill:#43a047,fill-opacity:0.1,stroke:#43a047,stroke-width:2px
    style EX fill:#8e24aa,fill-opacity:0.1,stroke:#8e24aa,stroke-width:2px
    style MEM fill:#fb8c00,fill-opacity:0.1,stroke:#fb8c00,stroke-width:2px
    style WB fill:#e53935,fill-opacity:0.1,stroke:#e53935,stroke-width:2px
Loading

3.2 Modules

Module File Description
PC rtl/core/PC.v Program Counter register with stall/freeze support
IF_ID rtl/core/IF_ID.v IF→ID pipeline register with flush/stall
RF rtl/core/RF.v 32×32-bit register file (x0 hardwired 0)
ImmGen rtl/core/ImmGen.v Immediate generator (I/S/B/U/J types)
Control rtl/core/Control.v Main decode: opcode→control signals
ALU rtl/core/ALU.v 32-bit ALU: ADD/SUB/SLT/AND/OR/XOR/SHL/SHR/AUIPC
ALU_Control rtl/core/ALU_Control.v ALU function decode from funct3/funct7
BPU rtl/core/BPU.v Branch Processing Unit — evaluates taken/not-taken
BHT rtl/core/BHT.v 2-bit saturating counter Branch History Table
BTB rtl/core/BTB.v Branch Target Buffer — caches branch PCs
RAS_Unit rtl/core/RAS_Unit.v Return Address Stack (JAL/JALR call/return)
Hazard_Unit rtl/core/Hazard_Unit.v Load-use stall, FPU stall, UART stall
Forwarding_Unit rtl/core/Forwarding_Unit.v MEM→EX and WB→EX forwarding
ID_EX rtl/core/ID_EX.v ID→EX pipeline register
EX_MEM rtl/core/EX_MEM.v EX→MEM pipeline register
MEM_WB rtl/core/MEM_WB.v MEM→WB pipeline register
LDU rtl/core/LDU.v Load Data Unit — lb/lbu/lh/lhu/lw alignment
CSR rtl/core/CSR.v Control & Status Registers + perf counters
Div_Unit rtl/core/Div_Unit.v RV32M multi-cycle divider (DIV/DIVU/REM/REMU)
Zbb_Unit rtl/core/Zbb_Unit.v Bit-Manipulation extension (CLZ/CTZ/CPOP/ANDN…)

3.3 Branch Prediction

flowchart TB
    subgraph PRED["Prediction — IF Stage"]
        B1["PC"]
        B2["BHT\n2-bit saturating counter\n64 entries"]
        B3["BTB\nBranch Target Buffer\n16 entries"]
        B4["RAS\nReturn Address Stack\npeek for JALR returns"]
        B1 --> B2 & B3 & B4
        B5{"Predict\nTaken?"}
        B2 --> B5
    end

    subgraph RESOLVE["Resolution — EX Stage"]
        R1["BPU evaluates\nactual branch outcome"]
        R2{"Mismatch?"}
        R3["Correct prediction\nno penalty cycles"]
        R4["Wrong prediction\nflush IF and ID\n2-cycle penalty"]
        R1 --> R2
        R2 -- No --> R3
        R2 -- Yes --> R4
        R5["Update BHT\nsaturating increment/decrement"]
        R6["Update BTB\nbranch PC target"]
        R4 --> R5 & R6
    end

    B5 -- "Taken → use BTB/RAS PC" --> RESOLVE
    B5 -- "Not Taken → PC + 4" --> RESOLVE

    style PRED fill:#1e88e5,fill-opacity:0.1,stroke:#1e88e5,stroke-width:2px
    style RESOLVE fill:#e53935,fill-opacity:0.1,stroke:#e53935,stroke-width:2px
Loading

3.4 RV32F — Floating-Point Extension

The F-extension (ENABLE_FPU compile flag) is fully integrated with a dedicated 32-entry floating-point register file (fregfile) and 6 specialized sub-units:

Sub-Unit File Operations
fpu_top rtl/fpu/fpu_top.v Top-level op mux
fpu_addsub rtl/fpu/fpu_addsub.v FADD.S, FSUB.S
fpu_mul rtl/fpu/fpu_mul.v FMUL.S
fpu_div rtl/fpu/fpu_div.v FDIV.S (iterative, multi-cycle)
fpu_sqrt rtl/fpu/fpu_sqrt.v FSQRT.S (iterative)
fpu_cmp rtl/fpu/fpu_cmp.v FEQ/FLT/FLE, FMIN/FMAX
fpu_cvt rtl/fpu/fpu_cvt.v FCVT.W.S, FCVT.S.W, FMV
fregfile rtl/fpu/fregfile.v 32×32-bit float register file (f0–f31)

FPU exception flags (NV/DZ/OF/UF/NX) accumulate in fcsr/fflags CSRs. FDIV and FSQRT trigger pipeline stalls via the Hazard Unit until fpu_done is asserted.

3.5 Zbb Bit-Manipulation

The Zbb_Unit implements: CLZ, CTZ, CPOP, ANDN, ORN, XNOR, MIN, MAX, MINU, MAXU, ZEXT.H, SEXT.B, SEXT.H, ROL, ROR, REV8, ORC.B.


4. L1 Cache Hierarchy

Both caches are direct-mapped with a 64-line × 8-word (2 KB) capacity, implemented using Xilinx distributed RAM for zero-latency hits.

Address Breakdown (32-bit)

  31          11 10        5 4      2 1  0
  ┌─────────────┬───────────┬────────┬────┐
  │  TAG (21b)  │ INDEX (6b)│ OFS(3b)│ -- │
  └─────────────┴───────────┴────────┴────┘

Cache Miss FSM

flowchart LR
    subgraph ICACHE["I-Cache — icache.v (Read-Only, Fill-on-Miss)"]
        IC1(["S_IDLE\nCheck hit"])
        IC2(["S_FILL0\nIssue BRAM addr 0"])
        IC3(["S_WAIT\nBRAM 1-cycle latency"])
        IC4(["S_FILL\nStore word, next addr"])
        IC5(["S_LAST\nUpdate tag and valid"])

        IC1 -- "HIT — rdata combinational\n0 penalty cycles" --> IC1
        IC1 -- "MISS — assert stall_out\npulse miss_pulse to CSR" --> IC2
        IC2 --> IC3 --> IC4
        IC4 -- "words 1 through 6" --> IC4
        IC4 -- "word 7" --> IC5
        IC5 -- "deassert stall_out" --> IC1
    end

    subgraph DCACHE["D-Cache — dcache.v (Write-Through Policy)"]
        DC1(["S_IDLE"])
        DC2(["S_WRITE\nWrite-through to BRAM"])
        DC3(["S_FILL0\nIssue BRAM addr 0"])
        DC4(["S_FILL\nStore word"])
        DC5(["S_LAST\nUpdate tag and valid"])

        DC1 -- "Write HIT — update cache\nand BRAM immediately" --> DC2 --> DC1
        DC1 -- "Read MISS\nassert stall_out" --> DC3
        DC3 --> DC4
        DC4 -- "words 1 through 6" --> DC4
        DC4 -- "word 7" --> DC5 --> DC1
    end

    style ICACHE fill:#00897b,fill-opacity:0.1,stroke:#00897b,stroke-width:2px
    style DCACHE fill:#1e88e5,fill-opacity:0.1,stroke:#1e88e5,stroke-width:2px
Loading

5. Hardware Coprocessors

All coprocessors live in the MMIO address space (0xF000_0000+) and are accessed through memory-mapped registers. Every coprocessor asserts a CSR pulse on completion so the firmware reads hardware-verified performance counters with zero software overhead.

5.1 Bloom Filter Coprocessor

File: rtl/bloom/bloom_filter.v | MMIO base: 0xF000_0010

The Bloom filter provides a probabilistic membership test (zero false negatives) to short-circuit the expensive AES+LZ77+Fuzzy pipeline for guaranteed non-matching queries. Uses a two-stage Murmur-style mixer to fully diffuse bit differences across the 13-bit hash output space before applying 3 independent polynomial hash functions (golden ratio prime + two MurmurHash3 primes).

Bloom Filter FSM

flowchart LR
    subgraph BF["Bloom Filter — bloom_filter.v"]
        S0(["S_FLUSH\nZero-clear all\n256 x 32-bit BRAM\nwords on reset"])
        S1(["S_IDLE\nAwait BF_DATA write"])
        S2(["S_HASH\nCompute h1 h2 h3\nvia Murmur mix"])
        S3(["S_READ1\nRead BRAM word 1"])
        S4(["S_READ2\nRead BRAM word 2"])
        S5(["S_READ3\nRead BRAM word 3"])
        S6(["S_CHECK_SET\nQuery or Insert?"])
        S7(["S_WRITE2\nSet bit 2"])
        S8(["S_WRITE3\nSet bit 3\npulse bloom_insert_pulse"])
        S9(["S_DONE"])

        S0 -- "256 cycles" --> S1
        S1 -- "BF_DATA write" --> S2
        S2 --> S3 --> S4 --> S5 --> S6
        S6 -- "QUERY\nall 3 bits set?\npulse bloom_reject_pulse" --> S9
        S6 -- "INSERT\nset bit 1" --> S7 --> S8 --> S9
        S9 --> S1
    end

    style BF fill:#43a047,fill-opacity:0.1,stroke:#43a047,stroke-width:2px
Loading

MMIO Registers

Register Address R/W Description
BF_DATA 0xF000_0010 W Write hashed key; starts operation
BF_RESULT 0xF000_0014 R [0] = 1 if member
BF_CTRL 0xF000_0018 W [0] = 0 query, 1 insert
BF_STATUS 0xF000_001C R [0] = 1 when done

5.2 LZ77 Compressor/Decompressor

File: rtl/lz77/lz77_comp.v | MMIO base: 0xF000_0020

A hardware LZ77 sliding-window compressor/decompressor with 256 parallel first-byte comparators for single-cycle first-hit detection and sequential byte-by-byte match extension.

  • Sliding window: 256 bytes | Lookahead buffer: 15 bytes | Min match: 3 bytes
  • Output FIFO: 64 tokens deep
  • Token format: {distance[7:0], length[7:0]} for back-refs; {0x00, literal} for literals

MMIO Registers

Register Address R/W Description
LZ77_IN 0xF000_0020 W Push input byte
LZ77_OUT 0xF000_0024 R Pop output token (16-bit)
LZ77_STATUS 0xF000_0028 R [2] DONE [1] IN_READY [0] OUT_VALID
LZ77_CTRL 0xF000_002C W [1] FLUSH [0] MODE (0=compress, 1=decompress)

5.3 AES-128 Encrypt/Decrypt

File: rtl/aes/aes128_enc.v | MMIO base: 0xF000_0030

A NIST-compliant AES-128 block cipher supporting both encryption and decryption. Implemented as pure Verilog-2001 iterative design — one round per clock cycle, 10 rounds total. The S-Box and inverse S-Box use Verilog case statements (no SystemVerilog local arrays).

sequenceDiagram
    participant FW as Firmware C
    participant AES as AES-128 Coprocessor
    participant CSR as CSR Counters

    FW->>AES: Write AES_CTRL (0=encrypt / 1=decrypt)
    FW->>AES: Write AES_KEY[0..3] — KEY3 triggers key expansion
    AES->>AES: Key schedule — 10 round keys computed (10 cycles)
    FW->>AES: Poll AES_STATUS until idle
    FW->>AES: Write AES_DATA[0..2]
    FW->>AES: Write AES_DATA[3] — triggers encrypt or decrypt
    AES->>AES: 10 AES rounds (SubBytes, ShiftRows, MixCols, AddRoundKey)
    AES->>CSR: Pulse aes_op_pulse — aes_ops_cnt increments
    FW->>AES: Poll AES_STATUS until idle
    FW->>AES: Read AES_OUT[0..3] — 128-bit result
Loading

Throughput: 16 bytes / 13 cycles × 50 MHz ≈ 61.5 MB/s

MMIO Registers

Register Address R/W Description
AES_KEY[0..3] 0xF000_0030–003C W 128-bit key (write KEY3 triggers expansion)
AES_DATA[0..3] 0xF000_0040–004C W Plaintext/ciphertext (write DATA3 starts op)
AES_OUT[0..3] 0xF000_0050–005C R 128-bit result
AES_STATUS 0xF000_0060 R [0] = 1 idle/done, 0 busy
AES_CTRL 0xF000_006C W [0] = 0 encrypt, 1 decrypt

5.4 Fuzzy Search — Levenshtein Systolic Array

Files: rtl/fuzzy/fuzzy_search_top.v, rtl/fuzzy/lev_array.v, rtl/fuzzy/lev_pe.v MMIO base: 0xF000_0070

The crown jewel of RV5: a fully pipelined Levenshtein edit-distance engine implemented as a systolic processing element array. Each lev_pe computes one DP table cell. The systolic wavefront propagates diagonally so all cells on an anti-diagonal compute in parallel — achieving O(n) clock cycles for O(n²) work.

flowchart TB
    subgraph FUZZY["Fuzzy Search Coprocessor"]
        subgraph STR["String Input via MMIO"]
            SA["FUZZY_A_BASE 0x70 to 0x8C\nQuery string — 8 words x 4 bytes"]
            SB["FUZZY_B_BASE 0x90 to 0xAC\nCatalog string — 8 words x 4 bytes"]
            SL["FUZZY_LEN_A and FUZZY_LEN_B\nlength registers"]
        end

        subgraph ARR["lev_array — 32 x 32 Systolic Grid"]
            PE1["PE[0,0]\ndp[0][0]"]
            PE2["PE[0,1]\ndp[0][1]"]
            PE3["PE[1,0]\ndp[1][0]"]
            PE4["PE[1,1]\ndp[1][1]"]
            PE5["...  32 x 32 PEs total\nwavefront diagonal compute"]
            PE1 --> PE2
            PE1 --> PE3
            PE3 --> PE4
            PE2 --> PE5
            PE4 --> PE5
        end

        subgraph RES["Result Output"]
            R1["lev_result[5:0]\nedit distance"]
            R2["FUZZY_STATUS 0xB4\ndone = 1"]
            R3["FUZZY_EDIT_DIST 0xB8\nread result"]
            R4["fuzzy_op_pulse\nincrement CSR counter"]
        end

        STR -- "FUZZY_CTRL = 1\nstart pulse" --> ARR
        ARR -- "wavefront complete" --> RES
    end

    style FUZZY fill:#8e24aa,fill-opacity:0.1,stroke:#8e24aa,stroke-width:2px
    style STR fill:#1e88e5,fill-opacity:0.1,stroke:#1e88e5,stroke-width:2px
    style ARR fill:#e53935,fill-opacity:0.1,stroke:#e53935,stroke-width:2px
    style RES fill:#43a047,fill-opacity:0.1,stroke:#43a047,stroke-width:2px
Loading

String packing: 4 characters per 32-bit word, little-endian — Word 0 @ 0x70: { str[3], str[2], str[1], str[0] } through Word 7 @ 0x8C: { str[31], str[30], str[29], str[28] }.

MMIO Registers

Register Address R/W Description
FUZZY_A_BASE 0xF000_0070–008C W Query string (8 words × 4 bytes)
FUZZY_B_BASE 0xF000_0090–00AC W Catalog string (8 words × 4 bytes)
FUZZY_CTRL 0xF000_00B0 W [0] = 1 start computation
FUZZY_STATUS 0xF000_00B4 R [0] = 1 done
FUZZY_EDIT_DIST 0xF000_00B8 R [5:0] edit distance result
FUZZY_LEN_A 0xF000_00BC W [5:0] query length
FUZZY_LEN_B 0xF000_00C0 W [5:0] catalog length
FUZZY_RESULT_HOLD 0xF000_00D0 W Latch result for 7-seg page 1

5.5 Search Sequencer DMA Engine

File: rtl/fuzzy/search_sequencer.v | MMIO base: 0xF000_00E0

The Search Sequencer is an autonomous DMA controller that scans the entire D-BRAM dictionary without CPU intervention, driving D-BRAM port B directly and maintaining internal best-match tracking across all stored entries. It supports 4-way parallel acceleration buses for AES, LZ77, and fuzzy engines — fully wired and ready for multi-engine expansion.


6. MMIO Address Map

  Base: 0xF000_0000

  Offset   Register           Dir   Description
  ────────────────────────────────────────────────────────────────────
  0x000    UART_TX_DATA        W    [7:0] Byte to transmit
  0x004    UART_RX_DATA        R    [7:0] Received byte
  0x008    UART_STATUS         R    [1] TX_READY   [0] RX_VALID

  0x010    BF_DATA             W    [31:0] Hashed key — starts op
  0x014    BF_RESULT           R    [0] 1=member
  0x018    BF_CTRL             W    [0] 0=query  1=insert
  0x01C    BF_STATUS           R    [0] 1=done

  0x020    LZ77_IN             W    [7:0] Input byte
  0x024    LZ77_OUT            R    [15:0] Output token
  0x028    LZ77_STATUS         R    [2] DONE  [1] IN_READY  [0] OUT_VALID
  0x02C    LZ77_CTRL           W    [1] FLUSH  [0] MODE (0=comp 1=decomp)

  0x030    AES_KEY0            W    Key[127:96]
  0x034    AES_KEY1            W    Key[95:64]
  0x038    AES_KEY2            W    Key[63:32]
  0x03C    AES_KEY3            W    Key[31:0] — triggers key expansion
  0x040    AES_DATA0           W    Plaintext[127:96]
  0x044    AES_DATA1           W    Plaintext[95:64]
  0x048    AES_DATA2           W    Plaintext[63:32]
  0x04C    AES_DATA3           W    Plaintext[31:0] — triggers enc/dec
  0x050    AES_OUT0            R    Result[127:96]
  0x054    AES_OUT1            R    Result[95:64]
  0x058    AES_OUT2            R    Result[63:32]
  0x05C    AES_OUT3            R    Result[31:0]
  0x060    AES_STATUS          R    [0] 1=idle/done
  0x06C    AES_CTRL            W    [0] 0=encrypt  1=decrypt

  0x070    FUZZY_A_BASE        W    Query string word 0
  ...      (8 words x 4 bytes)
  0x090    FUZZY_B_BASE        W    Catalog string word 0
  ...      (8 words x 4 bytes)
  0x0B0    FUZZY_CTRL          W    [0] 1=start
  0x0B4    FUZZY_STATUS        R    [0] 1=done
  0x0B8    FUZZY_EDIT_DIST     R    [5:0] edit distance
  0x0BC    FUZZY_LEN_A         W    [5:0] query length
  0x0C0    FUZZY_LEN_B         W    [5:0] catalog length
  0x0D0    FUZZY_RESULT_HOLD   W    [5:0] hold result for 7-seg page 1

  0x0E0    SEQ_*                    Search Sequencer DMA registers
  0x0F0    SYS_SW              R    [15:0] SW[15:0] readback

7. CSR Performance Counters

The CSR module (rtl/core/CSR.v) provides 14 readable hardware counters, each incremented by a single-cycle pulse from the relevant hardware unit:

Address Name Description
0xC00 rdcycle CPU cycle counter (auto-increment every clock)
0xC02 minstret Instructions retired (WB stage pulse)
0xB00 branch_miss_cnt Branch mispredictions
0xB01 icache_miss_cnt I-cache misses
0xB02 dcache_miss_cnt D-cache misses
0xB03 bloom_reject_cnt Bloom filter rejections
0xB04 aes_ops_cnt AES encrypt/decrypt operations
0xB05 bloom_inserts_cnt Bloom filter insertions
0xB06 fuzzy_ops_cnt Fuzzy search completions
0xB07 total_bytes_cnt Total bytes ingested/queried
0x001 fflags FP exception flags (NV|DZ|OF|UF|NX)
0x002 frm FP rounding mode
0x003 fcsr Combined fflags + frm

7-Segment Telemetry Pages (SW[15:12])

SW[15:12] Page Content
0x0 Current PC (instruction address)
0x1 Held fuzzy search result (edit distance)
0x2 Last MMIO read data word
0x3 I/O loopback: SW + LED state
0x6 D-Cache miss count
0x8 Cycle counter (bits 31:0)
0x9 Cycle counter (bits 63:32)
0xA UART TX stall status
0xB Raw switch state
0xC Magic word 0xC0DEC0DE
0xF Board ID 0xA7A7A7A7

8. Firmware & Data Pipeline

File: firmware/main.c

The firmware implements a complete UART command shell. Two primary operations — INGEST and QUERY — each drive a multi-stage hardware pipeline through MMIO register writes.

8.1 INGEST Pipeline

flowchart TB
    subgraph HOST["Host PC"]
        H1["python3 host_ingest.py\n--csv demo_dictionary.csv"]
    end

    subgraph UART_IN["UART — I command + len + bytes"]
        U1["uart_getc loop\n115200 baud"]
    end

    subgraph INGEST["Ingest Pipeline — firmware/main.c::ingest_entry()"]
        I1["[1] Bloom Insert\nDJB2 hash each 3-gram\nBF_CTRL=1, BF_DATA=hash\npoll BF_STATUS"]
        I2["[2] LZ77 Compress\nStream bytes to LZ77_IN\nLZ77_CTRL=FLUSH\nDrain LZ77_OUT FIFO"]
        I3["[3] AES-128 Encrypt\nPad to 16-byte blocks\nAES_KEY then AES_DATA[3]\nCollect AES_OUT[0..3]"]
        I4["[4] Store in D-BRAM\nslot[2]=comp_len\nslot[3]=orig_len\nslot[4+]=ciphertext"]
        I5["[5] UART Report\nINPUT → LZ77 → AES\nlatency in cycles"]
        I1 --> I2 --> I3 --> I4 --> I5
    end

    HOST -- "115200 baud serial" --> UART_IN --> INGEST

    style HOST fill:#fb8c00,fill-opacity:0.1,stroke:#fb8c00,stroke-width:2px
    style UART_IN fill:#6d4c41,fill-opacity:0.1,stroke:#6d4c41,stroke-width:2px
    style INGEST fill:#1e88e5,fill-opacity:0.1,stroke:#1e88e5,stroke-width:2px
Loading

8.2 QUERY Pipeline

flowchart TB
    subgraph HOST2["Host PC"]
        H2["python3 host_query.py\n--csv demo_queries.csv"]
    end

    subgraph UART_Q["UART — Q command + len + bytes"]
        U2["uart_getc loop\n115200 baud"]
    end

    subgraph QUERY["Query Pipeline — firmware/main.c::query_entry()"]
        Q1["[1] Bloom Check\nDJB2 hash each 3-gram\nBF_CTRL=0, BF_DATA=hash\npoll BF_STATUS, read BF_RESULT"]
        Q2{"Any trigram\nmatches?"}
        Q3["REJECT — fast path\nUART: REJECTED Bloom Filter\nno AES/LZ77/Fuzzy invoked"]
        Q4["[2] For each stored entry\nAES Decrypt slot ciphertext\nLZ77 Decompress tokens\nFuzzy Search FUZZY_CTRL=1\npoll FUZZY_STATUS\nread FUZZY_EDIT_DIST"]
        Q5["[3] Track best_dist\nlowest edit distance\nacross all entries"]
        Q6["[4] Report Result\nUART: MATCH + DIST + latency\nFUZZY_RESULT_HOLD = best_dist\nLED: SUCC or FAIL pattern\n7-seg: edit distance"]

        Q1 --> Q2
        Q2 -- "No match — all trigrams absent" --> Q3
        Q2 -- "Match found — at least 1 trigram" --> Q4 --> Q5 --> Q6
    end

    HOST2 -- "115200 baud serial" --> UART_Q --> QUERY

    style HOST2 fill:#fb8c00,fill-opacity:0.1,stroke:#fb8c00,stroke-width:2px
    style UART_Q fill:#6d4c41,fill-opacity:0.1,stroke:#6d4c41,stroke-width:2px
    style QUERY fill:#8e24aa,fill-opacity:0.1,stroke:#8e24aa,stroke-width:2px
Loading

8.3 UART Command Protocol

Command Byte Sequence Effect
Ingest 'I' + len[1] + bytes[len] Bloom + LZ77 + AES + store entry
Query 'Q' + len[1] + bytes[len] Bloom + AES + LZ77 + fuzzy search
CSR Dump 'D' Print all 14 performance counters + CPI + throughput

8.4 Memory Map (Firmware)

  0x0000_0000 – 0x0000_FFFF  I-BRAM (64 KB) — firmware code
  0x0000_0000 – 0x0000_FFFF  D-BRAM (64 KB) — overlaps for data reads
  0x0001_0000 – 0x0001_3FFF  Stack + BSS (16 KB)
  0x0001_4000 – 0x0001_FFFF  Dictionary storage (256 slots × 128 bytes = 32 KB)
  0xF000_0000 – 0xF000_00FF  MMIO coprocessors + UART

9. Host Scripts

Script Relationship

flowchart LR
    subgraph HOST["Host PC — Python Scripts"]
        S1["host_ingest.py\nBatch ingest via UART\nCSV to I commands"]
        S2["host_query.py\nBatch query + verify\nCSV to Q commands"]
        S3["run_full_demo.py\nOrchestrates full\ningest then query cycle"]
        S4["fuzz_test.py\nRandom edit-distance\ntest generation"]
        S5["validate_edit_distance.py\nPython reference\nLevenshtein cross-check"]
        S6["terminal.py\nInteractive UART\nhex dump shell"]
    end

    subgraph FPGA["FPGA — Nexys A7"]
        F1["UART RX/TX\n115200 baud 8N1"]
    end

    S3 --> S1 & S2
    S1 & S2 & S6 <--> F1
    S4 --> S2
    S5 -.->|"reference validation"| S2

    style HOST fill:#43a047,fill-opacity:0.1,stroke:#43a047,stroke-width:2px
    style FPGA fill:#1e88e5,fill-opacity:0.1,stroke:#1e88e5,stroke-width:2px
Loading
  • host_ingest.py — Reads data/demo_dictionary.csv and streams each term over UART with live acknowledgment parsing. --simulate flag prints expected output without hardware.
  • host_query.py — Sends queries and verifies results against expected edit distances. Reports accuracy as correct/total.
  • run_full_demo.py — End-to-end: ingest full dictionary → run all queries → print accuracy summary.
  • fuzz_test.py — Generates random edit-distance test cases and validates FPGA responses against a Python reference implementation.
  • validate_edit_distance.py — Standalone cross-validator using python-Levenshtein for regression testing.

10. FPGA Board I/O

Target Board: Digilent Nexys A7-100T (xc7a100tcsg324-1) | Constraints: constraints/nexys_a7.xdc

Clock & Reset

Signal Pin Notes
clk_100mhz E3 100 MHz LVCMOS33 → divided to 50 MHz
cpu_reset_n C12 BTNU (CPU RESET button), active LOW

UART

Signal Pin Description
uart_rxd C4 CP2102 USB-Serial → FPGA RX
uart_txd D4 FPGA TX → CP2102 → USB

Settings: 115200 baud, 8N1

LED Modes

Switch State Mode LED[15:0] Pattern
SW[9]=1 PC Heatmap LED = PC[17:2] — visualize hot instruction regions
SW[11]=1, SW[9]=0 Interactive Knight-rider idle · SUCC (0xAAAA/0x5555) or FAIL (0xFF00/0x0000) on search done
SW[11]=0, SW[9]=0 Telemetry {icache_stall, dcache_stall, seq_busy, fuzzy_done, ..., edit_dist[5:0]}

Knight-rider speed: SW[10]=1 fast, SW[10]=0 slow.


11. Repository Structure

RV5/
├── rtl/
│   ├── core/               CPU pipeline — 20 modules
│   │   ├── RISCV_CPU.v     CPU top-level
│   │   ├── PC.v / IF_ID.v / RF.v / ImmGen.v / Control.v
│   │   ├── ALU.v / ALU_Control.v / BPU.v
│   │   ├── BHT.v / BTB.v / RAS_Unit.v
│   │   ├── Hazard_Unit.v / Forwarding_Unit.v
│   │   ├── ID_EX.v / EX_MEM.v / MEM_WB.v
│   │   ├── LDU.v / CSR.v / Div_Unit.v / Zbb_Unit.v
│   │   └── SYSTEM_DEF.vh   Global parameters + CSR addresses
│   ├── fpu/                RV32F floating-point
│   │   ├── fpu_top.v / fregfile.v
│   │   └── fpu_addsub.v / fpu_mul.v / fpu_div.v
│   │       fpu_sqrt.v / fpu_cmp.v / fpu_cvt.v
│   ├── cache/
│   │   ├── icache.v        I-Cache 64-line 8-word direct-mapped
│   │   └── dcache.v        D-Cache 64-line 8-word write-through
│   ├── bloom/
│   │   └── bloom_filter.v  8192-bit · 3 Murmur polynomial hashes
│   ├── lz77/
│   │   └── lz77_comp.v     256B window · 256 parallel comparators
│   ├── aes/
│   │   └── aes128_enc.v    AES-128 enc+dec · 10-cycle iterative
│   ├── fuzzy/
│   │   ├── fuzzy_search_top.v   MMIO wrapper
│   │   ├── lev_array.v          32×32 systolic grid
│   │   ├── lev_pe.v             DP processing element
│   │   └── search_sequencer.v   DMA engine
│   ├── memory/
│   │   ├── I_BRAM.v        Instruction BRAM dual-port
│   │   └── D_BRAM.v        Data BRAM dual-port + DMA port B
│   ├── mmio/
│   │   ├── mmio_decode.v   MMIO bus decoder/arbiter
│   │   └── mmio_map.vh     Address constant definitions
│   ├── uart/
│   │   ├── uart_ctrl.v / uart_tx.v / uart_rx.v
│   └── top/
│       ├── top_fpga.v      SoC top-level Nexys A7
│       └── sevseg_ctrl.v   8-digit 7-segment controller
│
├── firmware/
│   ├── main.c              UART shell + all pipelines
│   ├── mmio.h              MMIO register definitions
│   ├── start.S             Startup assembly — stack init, jump to main
│   ├── link.ld             Linker script
│   ├── Makefile            riscv32-unknown-elf-gcc build
│   ├── bin2mem.py          ELF to .mem readmemh format
│   ├── firmware.mem        Hex image for I-BRAM init
│   └── firmware_data.mem   Data segment for D-BRAM init
│
├── tb/                     20+ Verilog testbenches
│   ├── tb_pipeline.v / tb_full_system.v / tb_system_interactive.v
│   ├── tb_aes.v / tb_aes128.v / tb_aes_nist.v
│   ├── tb_bloom.v / tb_bloom_unit.v
│   ├── tb_fuzzy.v / tb_fuzz_lev_auto.v / tb_fuzz_stress.v
│   │   tb_edit_distance_exhaustive.v
│   ├── tb_lz77.v / tb_lz77_unit.v
│   ├── tb_forwarding.v / tb_hazard.v / tb_ldu.v / tb_immgen.v
│   ├── tb_rv32m.v / tb_rv32m_mul.v / tb_rv32b.v / tb_ras.v
│   └── tb_mini.v / tb_top.v
│
├── scripts/
│   ├── host_ingest.py / host_query.py / run_full_demo.py
│   ├── fuzz_test.py / validate_edit_distance.py / terminal.py
│
├── data/
│   ├── demo_dictionary.csv  256-term mixed domain dictionary
│   └── demo_queries.csv     Misspelled queries + expected results
│
├── constraints/
│   └── nexys_a7.xdc         Vivado XDC pin constraints
│
├── bit/
│   ├── top_fpga.bit         Pre-built Vivado bitstream
│   └── terminal.py          Programming helper
│
└── docs/
    ├── Project_Abstract.docx
    ├── Block_Diagram_Report.docx
    └── FPGA_Execution_Plan_Group14.docx

12. Building & Programming

Prerequisites

Tool Version Purpose
riscv32-unknown-elf-gcc ≥ 12.0 Firmware compilation
Xilinx Vivado ≥ 2023.1 Synthesis + implementation
Python ≥ 3.9 Host scripts
pyserial ≥ 3.5 UART communication
iverilog + vvp ≥ 11.0 Simulation

Build and Flash Flow

flowchart LR
    subgraph FW["1 — Firmware Build"]
        F1["cd firmware\nmake clean and make"]
        F2["riscv32-unknown-elf-gcc\n-march=rv32im -mabi=ilp32 -O2\n-nostdlib -T link.ld"]
        F3["bin2mem.py\nELF to firmware.mem"]
        F1 --> F2 --> F3
    end

    subgraph SYN["2 — Vivado Synthesis"]
        S1["New project\nxc7a100tcsg324-1"]
        S2["Add rtl sources\nnexys_a7.xdc\nfirmware.mem files"]
        S3["Synthesis\nImplementation\nGenerate Bitstream"]
        S1 --> S2 --> S3
    end

    subgraph PROG["3 — Program FPGA"]
        P1["Vivado HW Manager\nor xc3sprog"]
        P2["top_fpga.bit\nto Nexys A7"]
        P1 --> P2
    end

    subgraph DEMO["4 — Run Demo"]
        D1["host_ingest.py\n--csv demo_dictionary.csv"]
        D2["host_query.py\n--csv demo_queries.csv"]
        D3["run_full_demo.py\nfull automated run"]
        D1 --> D2 --> D3
    end

    FW --> SYN --> PROG --> DEMO

    style FW fill:#43a047,fill-opacity:0.1,stroke:#43a047,stroke-width:2px
    style SYN fill:#1e88e5,fill-opacity:0.1,stroke:#1e88e5,stroke-width:2px
    style PROG fill:#fb8c00,fill-opacity:0.1,stroke:#fb8c00,stroke-width:2px
    style DEMO fill:#8e24aa,fill-opacity:0.1,stroke:#8e24aa,stroke-width:2px
Loading
# Build firmware
cd firmware && make clean && make

# Program pre-built bitstream
xc3sprog -c nexys4 bit/top_fpga.bit

# Ingest dictionary
python3 scripts/host_ingest.py --port /dev/ttyUSB0 --csv data/demo_dictionary.csv

# Run queries
python3 scripts/host_query.py --port /dev/ttyUSB0 --csv data/demo_queries.csv

# Full automated demo
python3 scripts/run_full_demo.py --port /dev/ttyUSB0

# Interactive terminal
python3 scripts/terminal.py --port /dev/ttyUSB0

Expected FPGA Resource Utilization

  LUTs:   ~40,000 / 63,400   (63%)
  FFs:    ~12,000 / 126,800  ( 9%)
  BRAMs:       ~8 / 135      ( 6%)
  DSPs:         ~4 / 240     ( 2%)
  WNS:        +2.1 ns        (timing closed at 50 MHz)

13. Demo Data & Test Vectors

Dictionary (data/demo_dictionary.csv)

256 terms across multiple domains:

Category Example Terms
Programming Algorithm, Database, Compiler, Recursion, Polymorphism, Concurrency, Deadlock…
Biology Mitochondria, Chromosome, Photosynthesis, Metabolism…
Medicine Hypertension, Pneumonia, Tachycardia, Anesthesia, Antibiotic, Bradycardia…
Mathematics Fibonacci, Abstraction, Serialization, Iteration, Traversal…

Query Vectors (data/demo_queries.csv)

50 intentionally misspelled queries with known expected matches:

Query Expected Match Edit Distance
Algrithm Algorithm 1
Databse Database 1
Compiller Compiler 1
Fibbonacci Fibonacci 1
Encapslation Encapsulation 2
Mitocondria Mitochondria 1
Fotosynthesis Photosynthesis 2
Tachacardia Tachycardia 2
Inheritence Inheritance 1
Neumonia Pneumonia 1

14. Performance Benchmarks

Pipeline CPI

  Ideal CPI (no hazards):          1.00
  Typical CPI (demo firmware):    ~1.80
  Load-use stall overhead:        ~12% of cycles
  Cache miss overhead:             ~8% of cycles (warm cache)
  UART TX stall overhead:         ~15% of cycles (TX-bound)

Coprocessor Latencies

Operation Cycles Throughput
Bloom insert (1 key) ~7 7.1M ops/s @ 50 MHz
Bloom query (1 key) ~7 7.1M ops/s @ 50 MHz
LZ77 compress (16B) ~20–50 variable ratio
AES-128 encrypt (16B) ~13 61.5 MB/s
AES-128 decrypt (16B) ~13 61.5 MB/s
Fuzzy search (32×32 chars) ~64 ~780K queries/s
I-cache miss refill 8
D-cache miss refill 8

End-to-End Query Latency

For a 10-entry dictionary (1 Bloom check + 10 × AES+LZ77+Fuzzy):

  Bloom check (1 trigram):    ~7 cycles
  Per-entry AES decrypt:      ~13 cycles
  Per-entry LZ77 decompress:  ~30 cycles
  Per-entry fuzzy search:     ~64 cycles
  ──────────────────────────────────────
  Per-entry total:            ~107 cycles
  10-entry scan total:        ~1,077 cycles
  Wall-clock @ 50 MHz:        ~21.5 µs

The Bloom filter correctly rejects ~85% of random noise queries before the expensive pipeline is invoked — providing a large constant-factor speedup for non-matching inputs.


15. Testbench Suite

All testbenches in tb/ run with Icarus Verilog:

# Full system integration test
iverilog -o sim/tb_full_system -I rtl/core \
    tb/tb_full_system.v rtl/core/*.v rtl/cache/*.v rtl/bloom/*.v \
    rtl/lz77/*.v rtl/aes/*.v rtl/fuzzy/*.v rtl/memory/*.v \
    rtl/mmio/*.v rtl/uart/*.v rtl/top/*.v
vvp sim/tb_full_system

# AES NIST FIPS-197 test vectors
iverilog -o sim/tb_aes_nist tb/tb_aes_nist.v rtl/aes/aes128_enc.v
vvp sim/tb_aes_nist

# Fuzzy search 1,000-case stress test
iverilog -o sim/tb_fuzz_stress tb/tb_fuzz_stress.v rtl/fuzzy/*.v
vvp sim/tb_fuzz_stress

Coverage

Testbench What It Tests
tb_pipeline.v 5-stage pipeline correctness, forwarding, hazards
tb_full_system.v Complete SoC: ingest + query end-to-end
tb_system_interactive.v UART protocol + command parsing
tb_aes_nist.v NIST FIPS-197 AES-128 test vectors
tb_bloom_unit.v Hash function distribution verification
tb_edit_distance_exhaustive.v Levenshtein DP — all 4-char string pairs
tb_fuzz_stress.v 1,000 random string pair stress test
tb_lz77.v Compress + decompress round-trip fidelity
tb_forwarding.v All forwarding paths (EX→EX, MEM→EX, WB→EX)
tb_hazard.v Load-use, branch flush, cache stall behavior
tb_rv32m.v All RV32M instructions (MUL/DIV/REM variants)
tb_rv32b.v All Zbb bit-manipulation instructions
tb_ras.v Return Address Stack push/pop depth
tb_ldu.v All LB/LBU/LH/LHU/LW alignment combinations

16. Design Phases

RV5 was developed iteratively across 8 phases, each adding a concrete layer of hardware functionality:

flowchart LR
    subgraph P0["Phase 0"]
        direction TB
        A0["RV32I Pipeline\nPC · RF · ALU · BPU\nBHT + BTB prediction\nI-BRAM + D-BRAM\nCSR: rdcycle minstret"]
    end

    subgraph P1["Phase 1"]
        direction TB
        A1["MMIO + UART\n115200 baud TX/RX\nMMIO decoder + map\nBloom Filter coprocessor"]
    end

    subgraph P2["Phase 2"]
        direction TB
        A2["Coprocessors\nLZ77 compress/decomp\nAES-128 enc + dec\nCSR perf counter ext"]
    end

    subgraph P3["Phase 3"]
        direction TB
        A3["L1 Cache Hierarchy\nI-Cache 64-line D-map\nD-Cache write-through\nCache stall + miss CSR"]
    end

    subgraph P4["Phase 4"]
        direction TB
        A4["Hazard Extensions\nFull forwarding network\nFPU stall integration\nUART TX stall"]
    end

    subgraph P5["Phase 5"]
        direction TB
        A5["Fuzzy Search Engine\nLevenshtein systolic array\nfuzzy_search_top MMIO\nSearch Sequencer DMA\n7-seg telemetry controller"]
    end

    subgraph P67["Phase 6 + 7"]
        direction TB
        A67["RV32M + Zbb\nDSP Multiplier\nIterative Divider\nBit-Manip CLZ CTZ ROL"]
    end

    subgraph P8["Phase 8"]
        direction TB
        A8["RV32F FPU\nfregfile f0 to f31\nFADD FSUB FMUL\nFDIV FSQRT multi-cycle\nFP CSRs fcsr fflags frm"]
    end

    P0 --> P1 --> P2 --> P3 --> P4 --> P5 --> P67 --> P8

    style P0 fill:#43a047,fill-opacity:0.1,stroke:#43a047,stroke-width:2px
    style P1 fill:#1e88e5,fill-opacity:0.1,stroke:#1e88e5,stroke-width:2px
    style P2 fill:#8e24aa,fill-opacity:0.1,stroke:#8e24aa,stroke-width:2px
    style P3 fill:#00897b,fill-opacity:0.1,stroke:#00897b,stroke-width:2px
    style P4 fill:#fb8c00,fill-opacity:0.1,stroke:#fb8c00,stroke-width:2px
    style P5 fill:#e53935,fill-opacity:0.1,stroke:#e53935,stroke-width:2px
    style P67 fill:#6d4c41,fill-opacity:0.1,stroke:#6d4c41,stroke-width:2px
    style P8 fill:#00acc1,fill-opacity:0.1,stroke:#00acc1,stroke-width:2px
Loading

Appendix: SYSTEM_DEF.vh Parameters

`define DATA_WIDTH   32
`define ADDR_WIDTH    5   // register address (5 bits = x0..x31)
`define PC_WIDTH     32
`define INSTR_WIDTH  32
`define OPCODE_WIDTH  7

// Branch History Table
`define BHT_SIZE     64
`define BHT_PC_WIDTH  6   // log2(BHT_SIZE)

// Branch Target Buffer
`define BTB_SIZE     16
`define BTB_PC_WIDTH  4

// Opcode definitions
`define R_TYPE       7'b0110011
`define I_TYPE_ALU   7'b0010011
`define I_TYPE_LOAD  7'b0000011
`define S_TYPE       7'b0100011
`define B_TYPE       7'b1100011
`define U_TYPE_LUI   7'b0110111
`define U_TYPE_AUIPC 7'b0010111
`define J_TYPE_JAL   7'b1101111
`define I_TYPE_JALR  7'b1100111
`define I_TYPE_CSR   7'b1110011
`define FP_OP        7'b1010011
`define FP_LOAD      7'b0000111
`define FP_STORE     7'b0100111

// CSR addresses
`define CSR_MSTATUS     12'h300
`define CSR_MTVEC       12'h305
`define CSR_MEPC        12'h341
`define CSR_MCAUSE      12'h342
`define CSR_RDCYCLE     12'hC00
`define CSR_MINSTRET    12'hC02
`define CSR_BRANCH_MISS 12'hB00
`define CSR_ICACHE_MISS 12'hB01
`define CSR_DCACHE_MISS 12'hB02
`define CSR_BLOOM_REJ   12'hB03
`define CSR_AES_OPS     12'hB04
`define CSR_BLOOM_INS   12'hB05
`define CSR_FUZZY_OPS   12'hB06
`define CSR_TOTAL_BYTES 12'hB07
`define CSR_FFLAGS      12'h001
`define CSR_FRM         12'h002
`define CSR_FCSR        12'h003

Built with Verilog-2001 · RV32IMFB · Xilinx Artix-7 · 50 MHz

From gates to search engine — every bit hand-crafted.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors