# A Minimal RISC-V processor in VHDL

Jesse E.J. op den Brouw\*

The Hague University of Applied Sciences

November 27, 2021

https://github.com/jesseopdenbrouw/riscv-minimal

#### **Abstract**

The RISC-V Instruction Set Architecture (ISA) is an open source instruction set for a processor. This means that anybody can create a processor that uses this instruction set. There are already processors available such as E2-core from SiFive. More freeware cores are available on several platforms (e.g. on GitHub). This documents describes a basic RISC-V core in VHDL. The core can only execute the RV32I unprivileged instruction set. The processor incorporates a ROM, RAM and some simple I/O. It is targeted for implementation on an FPGA. It is tested on an Intel Cyclone V with a DE0-CV development board from Terasic. The GNU C-compiler for RISC-V is used for software development. Currently only simple C programs can be compiled and run. C++ is currently not supported.

This processor is not intended as a replacement for commercial available processors. It is intended as a study object for Computer Science students. The standard processor executes each instruction in one clock cycle because the ROM is realized with cells, except for reads from RAM which need two clock cycles. The alternative processor executes each instruction in two clock cycles because the ROM is inferred using onboard RAM, except for ROM and RAM reads which need an extra clock cycle. The processor has a simple, non-pipelined instruction decoder. Exceptions are currently not implemented. This will be for future development.

This is work in progress. Things will certainly change in the future.

<sup>\*</sup>J.E.J.opdenBrouw@hhs.nl

# **Contents**

| 1  | Introduction                    | 3  |
|----|---------------------------------|----|
| 2  | Registers                       | 3  |
| 3  | ROM                             | 5  |
| 4  | RAM                             | 5  |
| 5  | I/O                             | 6  |
| 6  | ALU                             | 6  |
| 7  | PC                              | 6  |
| 8  | Instruction Decoder             | 6  |
| 9  | Address Decoder and Data Router | 7  |
| 10 | Stack pointer                   | 7  |
| 11 | Implemented instructions        | 7  |
| 12 | The FPGA                        | 7  |
| 13 | Simulation                      | 8  |
| 14 | Setting up the GNU C compiler   | 8  |
| 15 | Cloning the RISC-V project      | 10 |
| 16 | Register subset                 | 10 |
| 17 | Compiling a C program by hand   | 11 |
| 18 | VHDL files                      | 12 |
| 19 | srec2vhdl                       | 14 |
| 20 | Current problem                 | 14 |
| 21 | Future plans                    | 15 |
| 22 | Author's note                   | 15 |

### 1 Introduction

This document describes the buildup of a simple, single/dual clock cycle, one core, RISC-V processor, completely written in VHDL. The processor is able to run a simple compiled C-program. C++ is currently not supported. The processor can handle the RV32I Base Integer Instruction Set as set forward in "The RISC-V Instruction Set Manual Volume I: Unprivileged ISA". The RV32M Instruction Set is currently not supported, so multiplications and divisions have to be handled in software. The toolchain will take care of that when supplied with the correct parameters. The aim is to synthesize for a minimum clock frequency of 50 MHz.

This RISC-V processor consists of the following building blocks:

- The registers contain intermediate data for calculations.
- The ROM contains the program instructions and constant (read-only) data.
- The RAM contains read-write data (mutable data).
- The I/O is an interface with the outside world.
- The ALU is responsible for almost all computations in the processor.
- The PC is used to point to the currently executing instruction.
- The Address Decoder and Data Router is an interface between the memory (ROM, RAM, I/O) and the ALU and registers.
- The Instruction Decoder decodes the currently executing instruction and provides control signals to other building blocks.

A block diagram is shown in Figure 1.

There are two versions available. The standard version uses Logic Cells to implement the ROM. This will limit the program size but each ROM access (instruction and data) only requires one clock cycle. The alternative version uses preprogrammed onboard RAM blocks to implement the ROM. Larger programs are possible but each ROM access (program and data) requires two clock cycles.

# 2 Registers

The processor consists of thirty-two 32-bit registers denoted by x0 to x31. Internally, the registers use Big Endian format. Register x0 (alias zero) is hardwired to all zeros. Writing this register has no effect. Reading this register returns all zero bits. Normally, the x-names are not used but may be handy when simulation the designs. Table 1 shows the names of the registers as they should be used.



Figure 1: The complete RISC-V MCU.

Table 1: RISC-V registers and their purpose.

| Register   | Name   | Purpose                           | Saver  |
|------------|--------|-----------------------------------|--------|
| x0         | zero   | Hard-wired zero                   | _      |
| x1         | ra     | Return address                    | Caller |
| <b>x</b> 2 | sp     | Stack pointer                     | Callee |
| <b>x</b> 3 | gp     | Global pointer                    | _      |
| x4         | tp     | Thread pointer                    | _      |
| <b>x</b> 5 | t0     | Temporary/alternate link register | Caller |
| x6-x7      | t1-t2  | Temporaries                       | Caller |
| x8         | s0/fp  | Saved register/frame pointer      | Callee |
| <b>x</b> 9 | s1     | Saved register                    | Callee |
| x10-x11    | a0-a1  | Function arguments/return values  | Caller |
| x12-x17    | a2-a7  | Function arguments                | Caller |
| x18-x27    | s2-s11 | Saved registers                   | Callee |
| x28-x31    | t3-t6  | Temporaries                       | Caller |

#### 3 ROM

The ROM consists of bytes and is only word addressable for instructions. The ROM is byte, half word and word addressable when reading constant data. Half word and word entries are in Litte Endian format. When reading data from the ROM, halfword accesses must be on 2-byte boundaries and word accesses must be on 4-byte boundaries. This simplifies the decoding circuitry. The ROM returns undefined data if an access is not aligned. The standard processor instantiates the ROM in cells, which limits the size of the program. The alternative processor instantiates the ROM in onboard RAM, so bigger programs are possible. Rearranging half word and word data accesses in Big Endian format is handled by the ROM decoding unit.

Note: the alternative processor uses onboard RAM to simulate ROM. Because of this, each read from ROM (instruction and data) requires two clock cycles. See Section 4.

#### 4 RAM

The RAM consists of bytes and is byte, halfword and word addressable. Half word and word entries are in Little Endian format. The RAM itself is made up of word (i.e. 32-bit) entries and is instantiated with onboard RAM blocks. Due to this fact, halfword accesses are only permitted on 2-byte boundaries and word accesses are only permitted on 4-byte boundaries. The RAM returns undefined data if an access is not aligned. Writes will not take place if an access is unaligned. This simplifies the decoding circuitry. For the Cyclone V a maximum of 65536 words of RAM can be instantiated. Rearranging half word and word data accesses in Big Endian format is handled by the RAM decoding unit.

Note: the Cyclone V has 3,153,920 bits of RAM available. Because of the 32-bit entries a maximum of 2,097,152 (65536 x 32) bits can be instantiated. This is equivalent to 262,144 bytes.

Writing the RAM (byte, half word of word) requires 1 clock cycle. Reading the RAM (byte, half word, word) requires 2 clock cycles because the RAM output is buffered by a register. This is automatically handled by the processor.

### 5 I/O

Currently, the I/O consists of one 32-bit data input and one 32-bit data output. More I/O (timers/counter etc.) will be added in the future, but most I/O requires the use of interrupts (timer overflow etc.). Note that the I/O can only be accessed as words and the addresses must be on 4-byte boundaries. If not on a 4-byte boundaries, reads return undefined data whereas writes will not write data.

#### 6 ALU

The Arithmetic an Logic Unit (ALU) handles all computations on data. It can add, subtract, do logic operations such as AND, OR en XOR, can shift data left or right, and sign extend byte and halfword data. Some operations require two registers, some only use one register. Furthermore the ALU is also used to determine if a conditional branch should be taken. Note that the RISC-V programmer's model does not incorporate status flags as some other architectures do. This requires some extra instructions when adding or subtracting double word (64-bit) data. The ALU is also used to compute the return address from unconditional function calls (JAL and JALR instructions). The data is in Big Endian format.

Note that the computation of jump target addresses is handled by the Program Counter (PC)

#### 7 PC

The Program Counter contains the address of the currently executed instruction. The address is always on a 4-byte boundary although function calls and conditional jump (JAL, JALR en Bxx instructions) can be on non 4-byte boundaries (the toolchain will always create 4-bytes boundaries). The PC (or rather the VHDL description of the PC) handles the address calculations of jumps and branches taken.

### **8 Instruction Decoder**

The instruction decoder decodes the instruction supplied by the ROM as pointed by the PC. An instruction is 4-bytes wide and in Little Endian order. The instruction decoder provides all control signals for the ALU, the PC, the Address Decoder and the register file.

In the standard processor, a simple two-state Finite State Machine (FSM) is used. Almost all instructions are executed in one clock cycle. Only reads from RAM require two clock

cycles. The FSM takes care of that.

In the alternative processor, a simple three-state FSM is used. All instructions require two clock cycles to be fetched and executed, and an extra clock cycle is needed when reading RAM or ROM. The FSM takes care of that.

The instruction decoder is non-pipelined. That simplifies the design but slows down the computational speed.

#### 9 Address Decoder and Data Router

The Address Decoder and Data Router routes reads and writes to the ROM (only reads), RAM and the I/O. The processor uses a 32-bit linear address space for ROM, RAM and I/O. In the default setting, ROM starts at address 0x00000000 and the length is implementation-defined, it depends on the program. Unused ROM addresses return don't cares (in simulation) and in hardware the data returned is implementation-defined. Currently a maximum of 16384 bytes of ROM is supported. The RAM starts at address 0x20000000 and the length can be up to 262144 bytes, in powers of 2. The I/O starts at address 0xF0000000 and the length is implementation-defined.

When data is read, the data is collected from the accessed memory and put on an internal bus to the ALU. The ALU can perform sign extension (byte and halfword accesses) if needed. Please note that reading data from the RAM and ROM requires an extra clock cycle. Reading data from I/O requires one clock cycle.

### 10 Stack pointer

The stack pointer is fully implemented although the ISA does not provided pushes and pops. The stack pointer is used to allocate local variables and is updated with each allocation and deallocation. As usual, the stack grows downwards (to lower addresses) on allocations and upwards (to higher addresses) on deallocations. Therefore the stack pointer is set to the highest RAM address on startup (which is 0x20004000 by default). The ISA postulates that the stack is aligned on 16-byte boundaries.

## 11 Implemented instructions

All RV32I Unprivileged instructions are implemented with the exception of FENCE, ECALL and EBREAK instructions. These instructions act as a no-operation (NOP). This is because exceptions are not implemented.

#### 12 The FPGA

For this project, we use the Cyclone V FPGA from Intel (formerly Altera). See https://www.intel.com/content/www/us/en/products/details/fpga/cyclone/v.html.

The used Cyclone V is the 5CEFA4F23C7 which has about 18000 cells available. It has 3080 KB of onboard RAM bits available which are partially used for RAM and ROM (in the alternative processor). Depending on the program and used resources, the compiled RISC-V processor uses about 2000-2500 cells (about 12% - 15%). This FPGA is mounted on a Terasic DEO-CV board, see http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&No=921. For downloading the program file, the onboard USBBlaster is used.

### 13 Simulation

The designs can be simulated fully, using QuestaSim Intel Starter or ModelSim Intel Starter. You need a (free) license for QuestaSim. During simulation, all essential signals can be viewed, as is the RAM. The RAM is viewed as 32-bit entries, so we need to do some manual calculations to correctly find byte, halfword and word accesses. Simulation can be started from Quartus.

## 14 Setting up the GNU C compiler

The processor can run simple compiled C-programs that are compiled using the GNU C-compiler for RISC-V. Besides that, a separate linker script is needed to setup the compiled code. Building the C compiler (for Linux) is straightforward:

- 1. You need a current GNU C-compiler installed or your Linux box.
- 2. You need the texinfo package. On Ubuntu et al. issue

```
apt_install_texinfo
```

3. In your home directory, enter the command

- 4. Wait for the cloning to end (takes a long time, about 30 minutes on a Zbook G5 2020 with a 10 MB/s internet connection)
- 5. Change to the directory with

```
cd_riscv-gnu-toolchain
```

6. Make the build directory with:

```
mkdir_build; _cd_build
```

7. Check the current configuration with

```
../configure_--help_|_grep_abi
```

It should say:

The toolchain is currently configured for 64-bit RISC-V. That is not what we want.

#### 8. Enter:

This will set the architecture to RV32I and the ABI to ipl32. This means that integers, long integers and pointers use 32-bit entries. The destination directory is /opt/riscv32

9. Now enter the make command as root: make

MAKE SURE TO ENTER THIS COMMAND AS root, because the toolchain is put in /opt/riscv32. This takes a long time (about 45 minutes on a Zbook G5). At some points the compilation seems to hang, but it is just compiling complicated C-files. By the way, you will see a lot of warnings.

10. Now that the toolchain is setup, we have to put the path into the \$PATH environment variable so enter

```
export_PATH=/opt/riscv32/bin:$PATH
```

11. Check if the compiler is available:

```
riscv32-unknown-elf-gcc_-v
```

It should say something like:

```
Using_built-in_specs.
COLLECT_GCC=riscv32-unknown-elf-gcc
COLLECT_LTO_WRAPPER=/opt/riscv32/libexec/gcc/riscv32-←
  →unknown-elf/11.1.0/lto-wrapper
Target: _riscv32-unknown-elf
Configured_with: _/mnt/d/PROJECTS/RISCVDEV/riscv-gnu-←
  →toolchain/build/../riscv-gcc/configure_--target=←
  \hookrightarrow shared_--disable-threads_--enable-languages=c,c++_--\leftarrow
  \hookrightarrow with-system-zlib_--enable-tls_--with-newlib_--with-\leftarrow
  ⇒sysroot=/opt/riscv32/riscv32-unknown-elf_--with-native←

→-system-header-dir=/include_--disable-libmudflap_--

  →disable-libssp_--disable-libquadmath_--disable-libgomp
  --disable-nls_--disable-tm-clone-registry_--src
  ←=../../riscv-gcc_--disable-multilib_--with-abi=ilp32_←
  ← --with-arch=rv32i_--with-tune=rocket_' ←
  \hookrightarrow CFLAGS_FOR_TARGET=-Os____-mcmodel=medlow'_' \leftarrow
  CXXFLAGS_FOR_TARGET=-Os _ _ _ -mcmodel=medlow'
Thread_model: _single
```

```
Supported_LTO_compression_algorithms:_zlib
qcc_version_11.1.0_(GCC)
```

### 15 Cloning the RISC-V project

Now we have to clone the RISC-V project. It incorporates the full Quartus Prime Lite project with the processor written in VHDL. It also incorporates some simple C program examples and a taylor-made program to convert a RISC-V executable to a VHDL table suitable for the ROM. Create a working directory (and change to that directory) and issue the command:

```
git_clone_https:/github.com/jesseopdenbrouw/riscv-minimal
```

In the created directory, you will see the following directories:

```
CODE – Sample software programs
DOCS – Documentation
HARDWARE – the VHDL description
OLD – yes, really old files for backup
```

Change directory to CODE. Now enter the command make. It will compile all programs and the taylor-made conversion program. To clean up the programs, issue the command make clean.

Next, start your Quartus Prime Lite software and open the project in the HARDWARE directory. Now start a build by clicking on the play-symbol. It should compile a standard setting (this takes a long time). When finished, you can download the FPGA contents to the DE0-CV board.

To test one of the programs, change directory to one of the directories and copy the file with .vhd extension to the directory containing the VHDL description under the name processor\_common\_rom.vhd. Now start Quartus and start the compilation. After a successful compilation, you can program the Cyclone V on a DEO-CV board.

### 16 Register subset

It is possible to compile the toolchain to only use register x0 to x15. This is called the RISC-V E extension. As a positive side effect, the register file can be cut down from 32 registers to 16 registers. This will lower the cell count and possible speed up the device. A negative side effect is that the pressure on register allocation is higher, possibly increasing instruction count when saving registers on the stack.

To configure the GNU C compiler for the E extension issue the command:

```
../configure_--prefix=/opt/riscv32_--with-arch=rv32e_--with-abi= \leftarrow ilp32e
```

and build the compiler.

Now compile a C program with:

Make sure to use -march=rv32e and -mabi=ilp32e.

## 17 Compiling a C program by hand

Note: only very, very simple C programs can be compiled for the processor at this time. We tested some simple looping (with for) and reading/writing the I/O. We did not test the use of the C library at this time.

Compiling a program requires the following steps:

- In the program directory CODE, create a new directory and change to that directory.
- Create a C program file, we assume flash.c.
- Now issue the command:

We supply our own linker file (-T\_.../ldfiles/riscv.ld) and we supply our own startup file (.../crt/minimal.S). Make sure to use -nostartupfiles content of the default startup file will be linked and errors will report. There a three startup files:

- empty.S Empty startup file only providing the entry symbol. Can be used with assembler programs.
- minimal.S Loads the global pointer and stack pointer and calls main. On return of main, it waits in an endless loop. Can be used with minimalistic C programs.
- startup.c Full support for C programs.
- Next issue the command:

```
riscv32-unknown-elf-objcopy_-O_srec_flash_flash.srec
```

This will create an S-record file in Motorola hex-format.

• Next issue the command:

```
../bin/srec2vhdl_-wf_flash.srec_flash.vhd
```

This will create a VHDL file with the ROM encoded. Note: the taylor-made srec2vhdl has to be compiled before. See Section 15.

• Next issue the command:

cp\_flash.vhd\_../../HARDWARE/riscv/processor\_common\_rom.vhd

This will copy the VHDL file to the RISC-V processor ROM file.

• Now start the compilation of the VHDL code in Quartus Prime Lite and program the compiled file. This file has the extension . sof. See Figures 2 to 4.



Figure 2: Image of the Quartus project (1).

### 18 VHDL files

Both descriptions are composed of the following files:

- processor\_common.vhd Common types and constants.
- processor\_common\_rom.vhd Description of the ROM contents.
- address\_decode.vhd The address decoder and data router to the memory (ROM, RAM, I/O).
- alu.vhd Description of the ALU.
- pc.vhd Description of the Program Counter.
- instruction\_decoder.vhd The instruction decoder.
- regs. vhd Description of the register file.
- rom. vhd Description of the ROM.



Figure 3: Image of the Quartus project (2).



Figure 4: Image of the programmer.

- ram. vhd Description of the RAM interface between the address
- ram\_inst.vhd Description of the onboard RAM. decoder and data router and

the onboard RAM.

- io.vhd Description of the I/O
- riscv.vhd Top-level description of the processor.
- riscv.sdc Constraints file. Sets the target clock frequency.
- tb\_riscv.vhd VHDL testbench to simulate the design.
- tb\_riscv.do QuestaSim/Modelsim command script.

The alternative version as one more file:

• rom\_inst.vhd - Description of the ROM. The ROM contents will be placed in onboard, initialized RAM blocks. The file rom.vhd is then a frontend between the address decoder and data router and the onboard ROM.

#### 19 srec2vhdl

This is a homebrew utility to convert a Motorola S-record file into a VHDL file suitable for inclusion of the processor. The program is called with:

```
srec2vhdl_[-fbwhqv]_[-i_<arg>]_inputfile_[outputfile]
```

inputfile is the S-record file, created by the objdump program. outputfile is the VHDL outputfile. When omitted, stdout is used. There are a number of options:

- -f makes a full output that directly can be used. If not used, only the ROM table contents itself is produced.
- -w ROM contents is in words (32 bits).
- -h ROM contents is in halfwords (16 bits).
- -b ROM contents is in bytes (8 bits).
- -v Verbose output
- -q Quiet output, only error messages are displayed.
- -i <arg> indents each line with <arg> spaces.

### 20 Current problem

The linker script riscv.ld is not working properly. Currently it is not possible to use static/global initialized data, or uninitialized static/global data that should be set to all zero. The data is stored in the executable (and hence in the VHDL file), but we cannot access it with linker global symbols. A workaround this problem is to explicitly set the data at startup (i.e. in main).

The standard library is not fully supported. System calls (e.g. \_read and \_write) are not supported.

### 21 Future plans

Some future plans:

- Implement exceptions and interrupts in general. This will bring the possibility to add more I/O, such as timers.
- Implement the M standard: multiplier and divider. Multiplication can be achieved with the onboard multipliers but division has to be handled with a multi-cycle FSM.
- Implement more General Purpose I/O (pins), with data direction registers. On the Cyclone V this is an issue, since the tri-state buffers must be in the top level of the design. This makes it hard to implement this processor as part of greater design.
- Implement more functions of the standard library. Now only a few functions work.

#### 22 Author's note

I managed to create this basic RISC-V processor within one week, including compiling the GNU C compiler and the created C program examples. Of course, this is not the fastest core available, but it gives a good example on designing a RISC-V processor yourself. Next in line is to make the standard C library work. In the mean time, files will change, so be sure to grab the latest GitHub repository clone.