A small RISC-V RV32I core written in VHDL, intended as testbed for my personal VHDL learning
Switch branches/tags
Nothing to show
Clone or download


Tom Thumb RISC-V CPU core and demo system

WARNING: I stopped working on this project and basically developed something very similar in Verilog, which IMO has better open-source tools available. The result is SPU32, which can be found over at https://github.com/maikmerten/spu32 and applies some lessons learned from Tom Thumb.

This repository contains the VHDL sources for a simple CPU design which executes RV32I RISC-V instructions (http://riscv.org/) and some peripherals for testing. Also included are project files for the Terasic DE0-Nano board (containing an Altera Cyclone IV FPGA) and some programs (mostly assembler) to test the design.

My main motivation for this project is to learn VHDL - the code quality and feature set reflects this. No guarantees! Feel free to use the code in any way you see fit, though (see LICENSE file).

Primary design goals are simplicity of the design and lightness regarding consumption of FPGA resources. Currently, the complete designs fits within ~1400 LEs on a Cyclone IV FPGA, ~925 LEs being used by the CPU core. The design safely can be clocked at over 80 MHz on Cyclone IV devices, although the default configuration for the DE0-Nano board clocks it at merely 50 MHz.

To reflect its design sophistication and technical prowess, the design is named after Tom Thumb, an experimental locomotive design from 1830. https://en.wikipedia.org/wiki/Tom_Thumb_%28locomotive%29

CPU Core


The CPU executes the RV32I subset of the RISC-V instruction set. Instructions need several cycles to execute, as they progress through fetch, decode, execute, memory and writeback stages (depending on the instruction type). For compactness, the ALU is reused over several cycles to compute the instruction result, program counter value and memory address. Shift operations keep the ALU busy for several cycles, with the number of additional cycles being equal to the shift amount (shifts are done one bit position at a time).

The speed of instruction fetch and load/store instructions is highly dependent on the bus interface. Currently, the only CPU interface available implements a Wishbone bus (http://cdn.opencores.org/downloads/wbspec_b4.pdf) with a data width of 8 bit (!) and an address width of 32 bit (bus_wb8.vhd). Four bus cycles are needed to fetch 32 bits, which means that performance is completely dominated by the dozen clock cycles needed to complete instruction fetch. Performance can thus trivially be improved by, e.g., plumbing in a 32-bit bus interface.

Basically, Tom Thumb implements a 2014 instruction set with (at best) 1970ies implementation features ;-)

The core starts execution at address 0x00000000, where a jump instruction to a bootstrap routine should be present.

Interrupt support

During the instruction fetch phase, the CPU will check if the interrupt line is pulled high. If so (and if not already handling an interrupt) the processor will jump to 0x00000008 which should contain an interrupt service routine. A "return from interrupt" instruction restores normal program flow. For reasons of simplicity, interrupt handling does not follow any particular official RISC-V specification. Instead the custom0 opcode is used for interrupt handling. The "return from interrupt (rti)" instruction is simply defined as follows:

.macro rti
custom0 0,0,0,0

Note that the CPU will by default ignore interrupts. This is to prevent that the interrupt service routine will be executed directly after reset, before, e.g., the stack is set up properly. After the machine is set up properly, interrupt handling can be enabled via the "enable interrupt (eni)" instruction:

.macro eni
custom0 0,0,0,1

Interrupt handling can be disabled via the "disable interrupt (disi)" instruction:

.macro disi
custom0 0,0,0,2

If it is necessary to have support for more than one interrupt, a dedicated interrupt controller with several interrupt input lines should control the CPU's single interrupt line. This interrupt controller then should be queried and controlled by the interrupt service routine via memory-mapped I/O.

Trap support

Instructions with the SYSTEM opcode (as well as all unknown opcodes) will be trapped by the CPU. Normal program flow will be interrupted and the CPU will jump to 0x00000004, where a jump instruction to a trap handling routine should be present. For trap-handling, following custom instructions are defined:

The "return from trap (rtt)" instruction will resume execution of the program by jumping to the instruction following the instruction that caused the trap. A simple trap-handler can consist merely of the rtt-instruction ("ignore all traps").

.macro rtt
custom0 0,0,0,8

The "get trap return address (gtret)" instruction will provide the address where the rtt-instruction will resume execution. By loading from that address with an offset of -4 the instruction that caused the trap can be retrieved and analyzed.

.macro gtret rd
custom0 \rd,0,0,9

Note that the trap handling routine should never include instructions that cause another trap, as nested traps are not supported. Interrupts, however, may include instructions that cause trap-handling. To avoid nested traps, interrupts will not be served while handling a trap.

Bus arbiter


This is a basically a muxer for the output signals of the peripheral devices and generator for the device select signals. The topmost four bits of the bus address are used to determine the peripheral to be selected, which allows for up to 16 attached peripherals. Consequently, each device has 28 bits of address space available.

Synthesized RAM


Four kilobytes of RAM, synthesized from FPGA block RAM (should be device-independent) and containing a test program. Eventually this should contain a simple bootloader that loads programs into a "proper" RAM device (e.g., the SDRAM contained on DE0-Nano board) and then jumps into the respective memory region.

VGA output


This component generates a 640x480@60 Hz VGA signal with a 40x30 text mode. Eight colors (one bit per color channel) are supported.

Three RAM blocks are utilized by the design:

  • a text RAM, storing the character codes to be displayed
  • a color RAM, with a fore- and background-color coded in one byte (_RGB_RGB with _ denoting unused bits) per text character
  • a font RAM, containing a font for codepage 850, modifyable at runtime

These RAMs are synthesized from FPGA block RAM resources and thus should be device-independent.

Serial I/O


A simple two-wire (TX and RX) serial interface that can be interfaced to, e.g., by means of a 3.3 Volt USB-serial cable. The interface is by default configured for 115200 Baud connections and features status registers to denote if fresh data has arrived and whether the device is ready to transmit data.

LED output


The DE0-Nano board features eight LEDs, which can be controlled by the CPU by means of store instructions. The value currenty displayed can be read back.