# CPTR380 RISC-V RV32I FPGA Implementation

Jaron Brown & Ethan Jansen

Abstract—TBD

Index Terms—ISA, RISC, RISC-V, RV32I FPGA, VHDL.

#### I. Introduction

S INCE designing a custom ALU-based CPU last quarter for the Intro to Digital Design class, both of the authors developed an interest in implementing a more commonly supported instruction set architecture (ISA). When presented with the opportunity to pursue a project related to computer architecture in their Computer Architecture class, both authors jumped at the opportunity to implement another CPU design. They initially considered implementing the MIPS ISA but, after conducting more research, decided on implementing one of the RISC-V standard ISAs, RV32I. This decision was made based on the abundant documentation for the RISC-V ISAs, RV32I's relative simplicity, the high commercial interest in these ISA specifications, and the interest in supporting opensource projects.

One of the dominant reasons many companies are developing RISC-V implementations is it open-source nature. The dominant ISAs (x86, ARM, MIPS, etc.) all have steep licensing fees. The RISC-V ISAs, in comparison, are completely free. Additionally, the closed-source nature of the dominant ISAs limit the freedom those designing implementation of these ISAs. Having a completely open-source ISA just makes sense—it lowers the barrier to entry for designing hardware to implement standard ISAs and allows for more innovation in the realm of CPU design.

# II. A BRIEF INTRODUCTION TO RISC-V

The original design of RISC-V was begun in 2010 at the University of California, Berkeley. It was designed not only as an academic learning aid but also a set of ISAs that could have commercial viability. As its name suggests, RISC-V is a RISC (Reduced Instruction Set Computer) architecture. This means, in comparision to CISC (Complex Instruction Set Computer) ISAs, RISC ISAs have simple instructions. This means more individual instructions will be required to complete a given task compared to CISC ISAs but also allows the hardware implementation to be much simpler than that for CISC ISAs. In fact, implementations of x86-64, the most widely adopted CISC ISA, breaks its CISC instructions into "microinstructions" which are easier to implement in hardware, similar to RISC ISAs.

RISC-V is not actually *an* ISA but rather a collection of ISAs, ranging from 32-bit to 128-bit standards and supporting various levels of complexity by way of extensions to the base ISAs. Some official extensions include the M extension which

adds multiplication and division support, the A extension which adds support for atomic instructions, and the F and D extensions which add support for single- and double-precision floating numbers, respectively [1]. Developers are encouraged to design their own compatible extensions to customize the base RISC-V ISA to their specific application.

# III. PROJECT GOALS

In this project, revision 2.1 of the RV32I unprivileged ISA will be implemented in VHDL and tested on a Xilinx Artix 7 FPGA (Field-Programmable Gate Array). RV32I is the RISC-V base integer ISA and implements the core functionality of the RISC-V architecture family [1]. Using this, the implemented microcontroller should easily support C code cross-compilation via clang. This will help test/debug the microcontroller during implementation. Even more notable, however, is this will greatly improve the accessibility of this project as a useful processor.

### A. Stretch Goals

The following goals would all improve the design, however are not necessary to meet a basic RV32I implementation. Because of time constraints these are items are not actively being pursued.

- I/O: I/O support is not part of RV32I. For this reasoning
  an extension, or a co-processor, would be needed for
  proper implementation. For these reasons, proper I/O
  implementation is a stretch goal. However, because basic
  I/O support is required for testing/debugging, it will be
  unofficially implemented for these reasons.
- Pipelining: Although RV32 supports pipelining, this
  project's implementation will not implement this feature
  due to projected time constraints. Pipelining is not necessary to meet ISA requirements, and the lack of pipelining
  will only increase the clock cycles per instruction (CPI)
  but will greatly simplify the control logic and data path
  unit (DPU).
- Multiplication & Division: These are technically part of the M extension [1] and thus are not high priorities. However, because IEEE.numeric\_std (the VHDL library) has built in support for multiplication and division this would not be difficult to add to the existing arithment logic unit (ALU). The ALU would simply require another bit of control.
- Standard C Code Compilation Support: Basic C support is planned for this project, however exensive support is currently not guarnateed.

The authors intend to continue supporting this project by refining the design and implemented the discussed stretch goals.

Manuscript created March, 2024.

#### IV. IMPLEMENTATION

#### A. Overview

This project's implementation of non-pipelined RV32I consists of a few core components:

- instruction & data memory (separated)
- a register file
- an ALU
- various other DPU components related to sign-extension, incrementing the program counter, branching, and routing signals
- · a control module

Unlike the authors' previous CPU designs which used an accumulator-based design, this implementation uses a register file with 32 32-bit wide registers. Besides the different ISA support (the original accumulator-based CPU used a custom ISA), the switch to using a bank of registers is the most significant hardware design change between the two implementations. This change, however, is in keeping with the RV32I standard and allows for more flexibility when programming the CPU.

# B. Instruction & Data Memory



Fig. 1. Instruction Memory & Program Counter with Preliminary Step & Branch Support

Each memory (instruciton and data) block was implemented in VHDL using AMD's write-first memory example using an array of std\_logic\_vector's needs citation. This allows for quick implementation; furthermore, it ensures that the memory blocks will be mapped to a physical FPGA memory block for improved performance —the clock will not have to be further delayed as it may have been otherwise.

One change from AMD's example that needed to be made was support writing bytes and half-words for data memory. This is to support the sb and sh instructions as they do not overwrite the upper bits. Without the additional support baked into the data memory, an additional register would need to be added to temporarily hold the upper bits so that they do not get overwritten—this would require additional clock cycles to implement.

As of now the instruction and data memory depth is not defined, however RV32I supports 32-bit addresses. Because of this the goal to implement  $2^{32}$  deep memory blocks, however there may be some currently unkown limitation of the FPGAs used for testing.



Fig. 2. 32-bit Register File

# C. Register File

# Incomplete..

Similar to the Data Memory implementation, the register block supports writing byte and half-words in addition to full words. **Note:** this might be changed as when a register is written to the full register is overwritten regardless, even when only writing bytes and half-words.

## D. ALU



Fig. 3. ALU with ALU Control and Conrol Flag Output

The arithmetic logic unit is a fully combinational block which will perform the desired arithmetic operation based on the input control lines. Shown in Table IV-D are the main operations the ALU is required to support. Because there are various instructions associated with the same arithmetic operation (such as needing to determine the address used for load and store using ADD) the control line into the ALU is only 3-bits wide in order to support the eight different arithmetic operations, but the actual control signal is determined from a separate "ALU Control" block.

This ALU Control block takes in opcode, funct3, and funct7 from the encoded instruction and translates this into the necessary 3-bit ALU control signal. Separating this block also allows more easily decoding the necessary information from the six different [1] instruction types.

The ALU also outputs flags indicating if the result is less than or greater than zero. With these two flag outputs any of the branch conditions can be supported:

BEQ: equalsBNE: not equalsBLT & BLTU: less-thanBGE & BGEU: greater-than

Furthermore, the r<0 flag is also zero-extended and output (via a mux) in order to support the SLT & SLTI (set less-than) instructions—which set a register to 1 or 0 if the first imput is less-than/greater-than the second input.

| Operation            | Instructions            |
|----------------------|-------------------------|
| ADD                  | ADD(I),                 |
|                      | Load & Store (various), |
|                      | JALR, AUIPC             |
| SUB                  | SUB, SLT (various),     |
|                      | Branch (various)        |
| XOR                  | XOR(I)                  |
| OR                   | OR(I)                   |
| AND                  | AND(I)                  |
| Left Shift           | SLL(I)                  |
| Right Shift          | SRL(I)                  |
| Right Shift (Arith.) | SRA(I)                  |
| TARIFI               |                         |

RV32I BASE ARITHMETIC OPERATIONS MATCHED TO THE INSTRUCTIONS

## E. DPU Miscellaneous

**TBD** 

## F. Control

# Incomplete...

The control is primarily separate from the DPU (apart from a few components such as the ALU control). This nicely separates the control from the rest of the CPU processing and makes a nice division in the design. We were able to do this by not considering pipelining, thus removing any forwarding registers and branch prediction modules from being interweaved into the DPU.

However, because there is no pipelining, the controller needs to be aware of how long each instruction takes to execute. This is mostly separated into the six different instruction types—where each instruction within an instruction type takes the same number of clock cycles to execute. For the current implementations, the instruction length is hardcoded into the controller. For instance, the controller will wait four clock cycles for an ADD to execute before continuing to the next instruction.

The controller also has support of pausing and single-stepping. This greatly improves the debugging of the device. This is useful not only during hardware design of the CPU but also when writing software to run on the CPU. Currently, however, there is no proper I/O support, so there are dedicated buttons on the FPGA board to control the single-stepping. Thus, single-stepping can not be controlled via a software debugger. This is planned for future work.

# V. TESTING & DESIGN REVIEW

**TBD** 

VI. CONCLUSION

**TBD** 

COMPLETE CPU BLOCK DIAGRAMS

## VHDL IMPLEMENTATIONS

Sample: TBD due to current overfull issues...

```
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
entity program_counter is
  port
  (
    clk
                : in std_logic; --! Clock
    reset
                : in std logic; --! Reset
     \rightarrow to 0
                : in std_logic; --! Enable
    en
                : in std_logic; --! Branch
    jump_value : in std_logic_vector(11
     \rightarrow downto 0);
                : out std_logic_vector(10
    count.
       downto 0)
  );
end program_counter;
architecture counter_signed of
  program_counter is
  signal count_buf : signed(11 downto 0);
begin
  process (clk)
  begin
    if rising_edge(clk) then
      if reset = '1' then
        count buf <= to signed(0, 12);
      elsif br = '1' then
        count buf <= count buf +

    signed(jump_value) - 1;

      elsif en = '1' then
        count_buf <= count_buf + 1;</pre>
        if count\_buf(11) = '1' then
           count_buf <= to_signed(0, 12);</pre>
        end if;
      end if;
    end if;
  end process;
  count <=
      std logic vector (unsigned (count buf (10
   \rightarrow downto 0)));
end counter_signed;
```

# REFERENCES

[1] RISC-V Foundation, *The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version 20191213*, pp. i–30, Dec 2019. [Online]. Available: https://riscv.org/wp-content/uploads/2019/12/riscv-spec-20191213.pdf