

## Catamaran/ARIFIC: Architecture Research in FPGAs in the Cloud

Tutorial at HPCA-29, Montreal



Rishiyur S. Nikhil Bluespec, Inc. Sunday, February 26, 2023

## What is Catamaran/ARIFIC?

#### The Architecture Researcher's Dilemma

- The Architecture Researcher is typically focused on a specific artifact:
  - CPU pipeline organization, branch predictor, cache organization, store buffers, instruction fusion, ..., reorder buffer, accelerator...

Creative work; occupies you full-time

Lots of boring boilerplate; takes man-years to construct; often not portable, reusable or scalable (e.g., one-off, soon obsolete FPGA board)

- But for credible impact, needs whole-system environment:
  - Reasonable SoC with devices (networking, block storage, UART)
  - Running actual apps, often under a real operating system
  - On FPGA
    - For microarchitecture credibility
    - To run programs of credible length

Beyond Qemu, GEM5, ...

## What is Catamaran/ARIFIC?

#### Infrastructure so Architect can focus on their creative work



## What is Catamaran/ARIFIC?

What are "FPGAs in the cloud"?

Your laptop/

desktop



Catamaran/ARIFIC

(infrastructure in the cloud)

Host computer running Linux

**System Control:** 

Load ELF/memhex

Console (to/from UART)

Disk-device

support

Note: you need to set up an Amazon AWS account to access AWS services. This is your only cost for using Catamaran/ARIFIC.

**UART** 

(for console)

**Disk-device** 

support

(substitutable RISC-V core)

Disclaimer: this author has no formal relationship with Amazon AWS, and is merely an ordinary user of their services.

## Plan for this tutorial

### Walk through: how to substitute your own core, and run it

- Basics of Amazon AWS virtual machines
- Demo/description of what you can do with Catamaran/ARIFIC
  - Run ISA tests
  - Cross-compile and run bare-metal (no OS) C programs
  - Run Linux, with networking and block devices
  - Cross-compile and run C programs under Linux
- Demo/description of plugging in your own RISC-V Core<sup>1</sup>
  - Standard RTL-level interface for the core
  - Build a full-system simulation (using Verilator), and run it (can do on your local computer; does not need cloud)
  - Build a bitfile for AWS F1 FPGA, and run it
- Use GDB to control the RISC-V CPU on the FPGA

#### <sup>1</sup> "Your own RISC-V Core"

#### Options:

- Your own new CPU design
- Modify available CPU (many open-source)
  - microarchitecture change
  - new instruction
  - new CSRs (e.g., counters)
- Modify/replace memory system (caches, MMUs, PMPs, PTWs, ...)
- Add tightly-coupled or loosely coupled accelerator
- ... or other research idea ...

- AMIs (virtual machines in the Amazon cloud)
- Connecting to an AMI using "ssh"
- About AWS bitfiles; available Catamaran/ARIFIC bitfiles

### AMIs (virtual machines in Amazon cloud)



Amazon terminology for a virtual machine in their cloud: "AMI" = "Amazon Machine Instance"
We also just say "instance" for an AMI/virtual machine

With an Amazon AWS account, you can "create" one or more AMIs for yourself (see Appendix for info on how to create an AMI).

For Catamaran/ARIFIC we usually create two AMIs:

- For *building* FPGA bitfiles: choose the free "FPGA Developer AMI" from AWS Marketplace, which is CentOS + Xilinx tools (Vivado) and licenses + aws-fpga SDK and HDK, running on "m6i.2xlarge" instance type.
- For *running* Catamaran/ARIFIC on FPGA: choose the free basic standard Ubuntu 22.04 AMI from AWS Marketplace, running on an "f1.2xlarge" instance type. "f1" instances have FPGAs attached.

#### Costs:

- For using Catamaran/ARIFIC, your only costs are AWS costs (between you and Amazon)
- AWS charges vary by instance type. Charges for "f1" instances (with FPGA) are somewhat higher, hence our choice above of a cheaper instance for our "build" AMI, which does not need an attached FPGA.
- Hint: put your AMI into the "stopped" state when not in use, to save charges.

### AMIs (virtual machines in Amazon cloud)



Please see Appendix for how to install:

- aws-fpga SDK and HDK
  - Needed to go from RTL to bitfiles
  - Needed for installing Xilinx XDMA driver for host-FPGA PCIe communication
- CLI (command-line interface)
  - Needed for loading bitfiles into FPGA, etc.

- AMIs (virtual machines in the Amazon cloud)
- DEMO: Connecting to an AMI using "ssh"
- About AWS bitfiles; available Catamaran/ARIFIC bitfiles

#### Basics of Amazon AWS Connecting to your AMI with ssh (1/2)

In "EC2 Instances" dashboard ...



Note the IPv4 DNS address for the instance. Hint: click to copy to clipboard

# Basics of Amazon AWS Connecting to your AMI with ssh (2/2)

In a terminal on your laptop/desktop, connect to your AMI:

```
$ ssh -i ~/.ssh/MyPrivateKey.pem ubuntu@ec2-54-173-149-133.compute-1.amazonaws.com
```

Note: you can copy files to and from your AMI using "scp" (which is based on "ssh")

```
$ scp -i ~/.ssh/MyPrivateKey.pem \
localfile \
ubuntu@ec2-54-173-149-133.compute-1.amazonaws.com:~/remotefile
```

```
$ scp -i ~/.ssh/MyPrivateKey.pem \
ubuntu@ec2-54-173-149-133.compute-1.amazonaws.com:~/remotefile \
localfile
```

DEMO: connect to your "run" AMI

ssh

connection

Your laptop/

desktop



- On AWS web dashboard, "spin up" the AMI into "running" state
- Note its internet (DNS) address (assigned by AWS during spin-up)
- ssh to that DNS address
- Tour of tutorial directory contents

- AMIs (virtual machines in the Amazon cloud)
- DEMO: Connecting to an AMI using "ssh"
- About AWS bitfiles; available Catamaran/ARIFIC bitfiles

#### **About AWS bitfiles**

- Unlike "on-premises" FPGAs (on your lab bench/desktop), on AWS, FPGA bitfiles reside somewhere in the cloud (we have no direct access to the actual bitfile).
- Each bitfile has unique ids:
  - AFI (Amazon FPGA Image). These are unique within an AWS geographic region.
  - AGFI (...Global...). These are unique across all AWS geographic regions.
- See later section of this tutorial re. creating a bitfile and obtaining its AFI and AGFI
- AWS makes available software that, given an AGFI, will fetch the bitfile from somewhere in the cloud and load it into the FPGA in your instance.

(The author is unclear why AWS has two IDs for a bitfile, the AFI and AGFI, instead of just the AGFI?)

## Available pre-built bitfiles in Catamaran/ARIFIC (all based on open-source CPUs)

| CPU                                                      | Original HDL  | ISA                                                   | Runs bare-metal programs | Runs Linux               |
|----------------------------------------------------------|---------------|-------------------------------------------------------|--------------------------|--------------------------|
| Berkeley/SiFive Rocket                                   | Chisel        | RV64GC<br>MSU privilege levels<br>Sv39 virtual memory | Yes                      | Yes                      |
| Bluespec Flute                                           | BSV           | RV64GC<br>MSU privilege levels<br>Sv39 virtual memory | Yes                      | Yes                      |
| OpenHardware Group<br>CVA6<br>(a.k.a. ETH Zurich Ariane) | SystemVerilog | RV64GC<br>MSU privilege levels<br>Sv39 virtual memory | Yes                      | Yes<br>(expected 3/2023) |

In this tutorial we'll show demos on Amazon AWS using Catamaran with Rocket and CVA6

Other low-hanging fruit: Berkeley/SiFive BOOM and MIT RISCY-OOO (both designs speculative, superscalar, out-of-order)

DEMO: Loading an AWS bitfile into the FPGA



- DEMO: Cross-compile and run ISA tests
- DEMO: Cross-compile C programs for bare-metal and Linux
- DEMO: Run C programs on bare-metal
- DEMO: Run Linux, with networking and block devices
- DEMO: Run C programs under Linux

### Cross-compile and run ISA tests

#### What are "ISA tests"?

- RVI (RISC-V International, <a href="https://riscv.org">https://riscv.org</a>) provides a standard set of tests for the RISC-V Instruction Set Architecture (ISA)
  - https://github.com/riscv-software-src/riscv-tests
- Consists of a number of assembly-language program files; each one focuses on testing one particular RISC-V instruction (opcode)
  - 236 tests for RV64 IMAFDC + MSU privileges + Sv39 Virtual Memory
  - 173 tests for RV32 IMAFDC + MSU privileges + Sv32 Virtual Memory
- Each program contains a number of sub-tests (each self-checking)
- Each program ends by writing to a particular "tohost" MMIO address
  - 0, if all sub-tests succeed (pass)
  - N, if sub-test N failed

### **Cross-compile ISA tests**

Please see Appendix for info on installing the (free) RISC-V GNU Toolchain ("gcc" and friends).

This tutorial repository contains, in the dir ISA\_Tests/

- A copy of the RISC-V ISA tests source code in ISA\_Tests/isa/ (see GitHub URL on previous slide)
- A Makefile and linker scripts to cross-compile the RISC-V ISA tests for RV64GC+MSU+Sv39
- Pre-built results of cross-compiling: set of RISC-V ELF files for the tests in ISA\_Tests/isa/elfs

#### Note: our linker script sets

- Catamaran/ARIFIC program-start address = 0x\_8000\_0000
- "tohost" MMIO address = 0x\_6fff\_0010
- See ISA\_Tests/README.txt for details

# What you can do with Catamaran/ARIFIC DEMO: cross-compile RISC-V ISA tests



## What you can do with Catamaran/ARIFIC DEMO: run RISC-V ISA tests



dessert reset, allowing the CPU to run

await the 'tohost' result (PASS/FAIL/TIMEOUT)

- DEMO: Cross-compile and run ISA tests
- DEMO: Cross-compile C programs for bare-metal and Linux
- DEMO: Run C programs on bare-metal
- DEMO: Run Linux, with networking and block devices
- DEMO: Run C programs under Linux

Bare-metal (no OS) C programs and C programs under Linux

| Bare-metal C program                                                                                                                                             | C program under Linux                                                                                                                                         |  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| No system calls. E.g., "printf/putc/putchar", "scanf/getc/getchar" are compiled to function calls to library code that directly interacts with a console device. | Has system calls for OS services. E.g., "printf/putc/putchar", "scanf/getc/getchar" are compiled to system calls, and the OS interacts with a console device. |  |
| No "exit"; typically ends in an infinite idle loop.                                                                                                              | Typically ends in an "exit" system call that reverts to the OS scheduling loop.                                                                               |  |

### Two small example C programs in the repository

#### This tutorial repository contains

- Two small C programs: "Hello World!" and "cat" (echoes stdin to stdout)
- A Makefile and linker scripts to cross-compile the C programs for RV64GC
- Pre-built results of cross-compiling the examples: two RISC-V ELF files

#### Note: our linker script sets

- Catamaran/ARIFIC program-start address = 0x\_8000\_0000
- UART MMIO address = 0x\_6010\_0000

DEMO: Cross-compile C programs to run on "bare-metal" (no OS)



#### Prerequisites:

- your RISC-V Gnu Toolchain (including cross-compiler gcc) is installed
- you've defined RISCV environment variable to point at your toolchain installation directory
- you've added \${RISCV}/bin to your PATH environment variable.

Classical "Hello World!" program

```
$ cd C_Examples/hello
$ make all_bare
```

Creates: hello.RV64.bare.elf hello.RV64.bare.map hello.RV64.bare.objdump

Program that just echoes stdin to stdout (UART-in to UART-out)

```
$ cd C_Examples/cat
$ make all_bare
```

# What you can do with Catamaran/ARIFIC DEMO: Cross-compile C programs to run under Linux



#### Prerequisites:

- your RISC-V Gnu Toolchain (including cross-compiler gcc) is installed
- you've defined RISCV environment variable to point at your toolchain installation directory
- you've added \${RISCV}/bin to your PATH environment variable.

Classical "Hello World!" program

\$ cd C\_Examples/hello \$ make all\_linux

Creates: hello.RV64.linux.elf hello.RV64.linux.map hello.RV64.linux.objdump

Program that just echoes stdin to stdout (UART-in to UART-out)

\$ cd C\_Examples/cat \$ make all\_linux

- DEMO: Cross-compile and run ISA tests
- DEMO: Cross-compile C programs for bare-metal and Linux
- DEMO: Run C programs on bare-metal
- DEMO: Run Linux, with networking and block devices
- DEMO: Run C programs under Linux

# What you can do with Catamaran/ARIFIC For most programs we need a UART (console device for "printf")

Catamaran/ARIFIC contains a UART and connections to a terminal on the host computer



# What you can do with Catamaran/ARIFIC DEMO: Open a second terminal on the AMI for the UART process



Make 2nd ssh connection to AMI; start UART console

```
$ ssh -i ~/.ssh/MyPrivateKey.pem \ ubuntu@ec2-54-173-149-133.compute-1.amazonaws.com
```

In the terminal, start the "UART\_Console" program

```
$ sudo UART_Console —log_UART.txt
```

```
--22:32:22--ip-172-31-83-224: ~/Git/Tutorial_at_HPCA-29/Host_side_exes

$ sudo ./UART_Console --log_UART.txt

ptyname_file filename = UART_ptyname_file

UART output logfile name = (null)

Awaiting file 'UART_ptyname_file' for reading pty name
```

# What you can do with Catamaran/ARIFIC DEMO: run bare-metal C programs

(Assume FPGA bitfile loaded from previous activity.) Virtual Machine with FPGA board (e.g., Amazon Web Services (AWS) "F1" instance) Run bare-metal C programs \$ cd Host\_side\_exes \$ ./exe\_Host\_AWSF1 Host-side command: assert\_reset two ssh connections Host-side command: loadelf C\_Examples/hello/hello.RV64.bare.elf Host-side command: deassert\_reset **UART** console ... observe "Hello World!" on UART console ... Your laptop/ Your laptop/ Repeat for desktop desktop next ELF

- DEMO: Cross-compile and run ISA tests
- DEMO: Cross-compile C programs for bare-metal and Linux
- DEMO: Run C programs on bare-metal
- DEMO: Run Linux, with networking and block devices
- DEMO: Run C programs under Linux

**DEMO: Run Linux** 



Load bitfile into FPGA (Catamaran+Rocket):

\$ sudo fpga-load-local-image -S 0 -l agfi-0e54cfdbc5e783a9b

Linux is just another ELF file, just like our bare-metal C programs

\$ cd Host\_side\_exes
\$ ./exe\_Host\_AWSF1

Host-side command: assert\_reset

Host-side command: loadelf <path to Linux ELF>

Host-side command: deassert\_reset

Observe console messages from OpenSBI boot loader, followed by console messages from Linux boot, finally the Linux shell prompt.

Then, ...

But note: without networking or block devices (disks), Linux is very limited

DEMO: Run Linux with a block device (1/2)



Create a file on the Ubuntu AMI host that will act as a block-device (disk) for Linux:

\$ truncate — size=1G disk.img

This represents a 1 GiB "unformatted" disk; we will format it with a filesystem from inside Linux.

Boot Linux again on the FPGA, providing it these facilities:

\$ sudo ./exe\_Host\_AWSF1. --elf ./Elfs/Linux.elf \
--blockdev ./disk.img \
--tundev tap0

... interact with the Linux shell and use block device (see next slide)

DEMO: Run Linux with block device (2/2)



... interact with the Linux shell and use block device

desktop

desktop

DEMO: Run Linux with a network and block device (1/2)



... interact with the Linux shell and use network and block device (see next slide)

DEMO: Run Linux with networking and block device (2/2)



After booting Linux, mount the block device:

# mount -t ext2 /dev/vda /mnt

Observe this network-setup shell script:

# cat /mnt/net.sh ip a add 192.168.3.2/24 dev eth0 ip link set eth0 up ip route add 0.0.0.0/0 via 192.168.3.1 echo "nameserver 9.9.9.9" > /etc/resolv.conf

Execute it:

#/mnt/net.sh

Use 'ssh' to log in to a remote machine, 'scp' to copy files, etc.:

# ssh <u>user@remote.machine.org</u>
# scp localfile user@remonte.machine.org:~/remotefile
# scp user@remonte.machine.org:~/remotefile localfile

... interact with the Linux shell and use network and block device

# What you can do with Catamaran/ARIFIC

- DEMO: Cross-compile and run ISA tests
- DEMO: Cross-compile C programs for bare-metal and Linux
- DEMO: Run C programs on bare-metal
- DEMO: Run Linux, with networking and block devices
- DEMO: Run C programs under Linux

# What you can do with Catamaran/ARIFIC DEMO: Run C programs under Linux



At the Linux shell running on the RISC-V CPU, use 'scp' to copy the ELF files we created earlier for 'hello' and 'cat' for running under Linux:

```
# Is /mnt/
... hello.RV64.linux.elf ...
... cat.RV64.linux.elf ...
```

#### Execute them:

```
# /mnt/hello.RV64.linux.elf
Hello, World!
# /mnt/cat.RV64.linux.elf
... echo chars typed on the keyboard to the screen ...
```

... interact with the Linux shell, run C programs under Linux

- The Catamaran/ARIFIC Core Interface (RTL)
- DEMO: Building and running a whole-system simulation executable
- DEMO: Building an FPGA bitfile for Amazon AWS

### Plug in your own RISC-V core: goals



Assume you've created your own core<sup>1</sup>

- Build a whole-system simulation executable of the FPGA-side, with your core substituted in place of the demo cores
- 2. Build the FPGA bitfile of the FPGA-side, with your core substituted in place of the demo cores, and run on AWS

#### <sup>1</sup> "Your own RISC-V Core"

#### Options:

- Your own new CPU design
- Modify available CPU (many open-source)
  - microarchitecture change
  - new instruction
  - new CSRs (e.g., counters)
- Modify/replace memory system (caches, MMUs, PMPs, PTWs, ...)
- Add tightly-coupled or loosely coupled accelerator
- ... or other research idea ...

- The Catamaran/ARIFIC Core Interface (RTL)
- DEMO: Building and running a whole-system simulation executable
- DEMO: Building an FPGA bitfile for Amazon AWS

The Catamaran/ARIFIC Core Interface (RTL)



Please see this file in the tutorial repository: Catamaran\_Core\_Interface/mkAWSteria\_Core\_EMPTY.v which contains a template "empty" core module with the required RTL inputs and outputs. The Architecture Researcher substitutes their core for the module body, connects inputs/outputs.

#### The Catamaran/ARIFIC Core Interface (RTL)

DEMO: View actual interface RTL in tutorial repository: Catamaran\_Core\_Interface/mkAWSteria\_Core\_EMPTY.v

Architecture Researcher's Core CPU can be RV32 or RV64, CLK from small ("embedded, IoT") RISC-V **RSTN** CPU to large ("application", "server"). Minimal core Memory subsystem (caches, MMUs, ...) AXI4 M mmio\_N AXI4 "Manager" interface 64-bit address, 64-bit data, connects to DDR memory, **UART**, other devices

CLK can connect to one of six available clocks provided by Catamaran/ARIFIC, ranging from about 15 MHz to 250 MHz, depending on the speed/timing requirements of the core.

#### The Catamaran/ARIFIC Core Interface (RTL)

DEMO: View actual interface RTL in tutorial repository: Catamaran\_Core\_Interface/mkAWSteria\_Core\_EMPTY.v



In some cores, dma\_S connects to a cache-coherent port in the memory subsystem

<=> DDR

# Plug in your own RISC-V core The Catamaran/ARIFIC Core Interface (RTL)

DEMO: View actual interface RTL in tutorial repository: Catamaran\_Core\_Interface/mkAWSteria\_Core\_EMPTY.v



# Plug in your own RISC-V core The Catamaran/ARIFIC Core Interface (RTL)

Typical modules found in more complete and complex cores



- Catamaran provides 6 clocks at various steps from 250MHz down to about 15 MHz
- All these components are available open-source from many projects; architecture researcher can focus on the subject module(s) of interest.

- The Catamaran/ARIFIC Core Interface (RTL)
- DEMO: Building and running a whole-system simulation executable
- DEMO: Building an FPGA bitfile for Amazon AWS

#### Build a whole-system simulation executable



Researcher's core can be written in any HDL, as long as you can generate RTL for it.

- E.g., Bluespec Flute is written in BSV
- E.g., Rocket is written in Chisel
- E.g., CVA6/Ariane is written in SystemVerilog

#### DEMO: Build a whole-system simulation executable

This tutorial repository contains

- Fixed RTL for Catamaran/ARIFIC
- Makefile for using Verilator to create a whole-system simulation executable
  - Note: Verilator is a free, open-source tool for creating RTL simulators
  - For installation, please see: <a href="https://verilator.org/guide/latest/install.html">https://verilator.org/guide/latest/install.html</a>

Build\_HW/
Catamaran\_C/
Catamaran\_RTL/
Core\_CVA6\_Wrapper\_RTL/
bsc\_lib\_RTL/

Fixed RTL/C for Catamaran/ARIFIC

RTL from OpenHardware Group CVA6 (Ariane) RISC-V Core repository

\$ cd Build\_HW; make exe\_HW\_AWSF1\_sim

Note: does not need Amazon AWS; can be built on your laptop/desktop.

exe\_HW\_AWSF1\_sim

DEMO: Run a whole-system simulation executable

exe\_Host\_ISA\_Tester\_AWSF1\_sim





exe\_HW\_AWSF1\_sim

```
Catamaran_simulation_HW
-13:05:30--Airedale: ~/Git/Tutorial_at_HPCA-29/Build_HW
 ./exe HW AWSF1 sim
 uespec Catamaran simulation v2.2
 yright (c) 2020-2023 Bluespec, Inc. All Rights Reserved.
NFO: Listening for connection from host-side on TCP port 30000
NFO: Accepted connection from host-side on TCP port 30000
ost_Control_Status: Assert Core Reset
ost_Control_Status: Deassert Core Reset
ost_Control_Status: Assert Core Reset
 st Control Status: Deassert Core Reset
   Control Status: Assert Core Reset
   Control_Status: Deassert Core Reset
ost Control Status: Assert Core Reset
ost_Control_Status: Deassert Core Reset
ost_Control_Status: Assert Core Reset
Connection closed by remote host (in c_host_recv2())
-13:10:44--Airedale: ~/Git/Tutorial_at_HPCA-29/Build_HW
```

- Simulation is, of course, much slower than running on FPGA Simulation (Linux boot can take a day or more)
- But simulation can print RTL \$displays, dump VCD waveforms, etc.
- So: simulation is good for running/debugging with small RISC-V programs

- The Catamaran/ARIFIC Core Interface (RTL)
- DEMO: Building and running a whole-system simulation executable
- DEMO: Building an FPGA bitfile for Amazon AWS

Building an FPGA bitfile for Amazon AWS



## Plug in your own RISC-V core DEMO: Building an FPGA bitfile for Amazon AWS

General Caveat

- The flow described here and the files currently in the tutorial repository are somewhat specialized for building CVA6/Ariane RV64GC\_MSU\_Sv39 as the RISC-V Core in Catamaran.
- For a different core (and therefore different core RTL)
  - Various Vivado Tcl scripts have to be modified accordingly
  - Other adjustments may be needed to select a different clock speed
  - Other adjustments may be needed to incorporate a RISC-V Debug Module, which must sit
    outside the "reset domain" of the core
- We expect to simplify some of these adjustments in the weeks/months after the tutorial

# Plug in your own RISC-V core DEMO: Building an FPGA bitfile for Amazon AWS

"DCP build" step (this example is for the CVA6/Ariane core)

Prepare a directory that is ready for the AWS HDK flow:

\$ cd Build\_HW \$ make for\_aws

- Creates a directory "AWS\_CL\_Catamaran/" which is initialized from "AWS\_CL\_Catamaran\_wo\_Core.tar.gz", i.e., fixed RTL for Catamaran
- Then, copies the core's RTL from your clone of <a href="https://github.com/openhwgroup/cva6.git">https://github.com/openhwgroup/cva6.git</a> (in the Makefile, "CVA6\_REPO\_DIR=..." should be edited to point at this clone)
- Finally, tars up this directory into "AWS\_CL\_Catamaran\_for\_AWS\_HDK.tar.gz"
- Copy this tar file to your "build AMI", where you have Vivado tools and the AWS aws-fpga HDK+SDK installed, and untar it.
- Launch the DCP build:
  - \$ cd AWS\_CL\_Catamaran
  - \$./aws\_build\_dcp\_from\_cl.sh -ignore\_memory\_requirement -clock\_recipe\_a A1
- This will launch a background process to perform Vivado synthesis resulting in a DCP (Design Checkpoint)
- The "clock\_recipe" argument here sets the core's clock frequency to 125MHz
- The Vivado log is saved in a file with a timestamped filename, like this: "23\_02\_21-012910.vivado.log". You should monitor this file to check that Vivado has not exited due to some error.
- This step takes ~ 2.5 hours for CVA6 RV64 GC SMU Sv39.
- When Vivado has finished, you should examine some files to check that it succeeded:
  - "build/scripts/<timestamp>.vivado.log" for any errors encountered by Vivado
  - "build/reports/<timestamp>.timing\_summary\_route\_design.rpt" to check that synthesis met the timing target

See AWS HDK documentation for more info on clock recipes, email-notification of synthesis completion, etc.

# Plug in your own RISC-V core DEMO: Building an FPGA bitfile for Amazon AWS

#### "AFI create" step: Launching the AFI-build

- A successful "DCP build" step produces a design checkpoint file: "build/checkpoints/to\_aws/23\_02\_21-012910.Developer\_CL.tar"
- Create an S3 bucket (if you have not already done so before):

```
$ aws s3 mb s3://rsnbucket1/Catamaran/
```

- "S3" is Amazon AWS' name for their "storage in the cloud" service
- "bucket" is Amazon AWS' name for "top-level directory"
- Upload your design checkpoint file to your S3 bucket:

```
$ aws s3 cp <path>/<timestamp>.Developer_CL.tar s3://rsnbucket1/Catamaran/
```

Launch the AFI build

```
$ aws ec2 create-fpga-image --region us-east-1 \
—name "Catamaran_CVA6_RV64GC_MSU_Sv39_Boot_ROM_125MHz" \
—description "Catamaran CVA6 RV64GC MSU Sv39 Boot ROM_125MHz" \
—input-storage-location Bucket=rsnbucket1,Key=Catamaran/<timestamp>.Developer_CL.tar \
—logs-storage-location Bucket=rsnbucket1,Key=Catamaran
```

• This will immediately print out the AFI and AGFI allocated for this new bitfile:

```
"FpgalmageId": "afi-0a3fa15056f3de4d7",
   "FpgalmageGlobalId": "agfi-0ab786417c04b7031"
}
```

IMPORTANT: please note these down immediately! You have no other way to refer to your bitfile other than these IDs!

See AWS HDK documentation for more info

#### Plug in your own RISC-V core DEMO: Building an FPGA bitfile for Amazon AWS

"AFI create" step: Checking on your AFI build

The following command prints info about an AFI:

\$ aws ec2 describe-fpga-images --fpga-image-ids "afi-0a3fa15056f3de4d7"

This will print out something like this:

AFI build time ~ 55 minutes. Seems independent of complexity of user's design (just stitches together user's DCP with AWS boilerplate)

The AFI is now ready for use (load into FPGA, ...)

```
"Fpgalmages": [
                                                                                         Finish time of AFI build
    "UpdateTime": "2023-02-21T16:28:22.000Z", <
    "Name": "Catamaran_CVA6_RV64GC_MSU_Sv39_Boot_ROM_125MHz",
    "Tags": [],
    "Pcild": {
      "SubsystemVendorId": "0xfedc",
      "Vendorld": "0x1d0f",
      "DeviceId": "0xf001",
                                                                                       "pending" at start of AFI build;
      "SubsystemId": "0x1d51"
                                                                                       "available" at finish of AFI build
    "DataRetentionSupport": true,
    "FpgalmageGlobalId": "agfi-0ab786417c04b7031",
    "Public": false,
    "State": {
      "Code": "available"
    "ShellVersion": "0x04261818"
    "Ownerld": "071524437452",
                                                                                         Start time of AFI build
    "Fpgalmageld": "afi-0a3fa15056f3de4d7"
    "CreateTime": "2023-02-21T15:32:09.000Z",
    "Description": "Catamaran CVA6 RV64GC MSU Sv39 Boot ROM_125MHz"
                                                                                  See AWS HDK documentation for more info
                              56
```

# Use GDB to control the RISC-V CPU on the FPGA

#### GDB control of RISC-V CPU on FPGA

Traditional vs. hardware remote GDB

Traditional "remote debugging" is done entirely with software:



Process being debugged

gdbserver control/ access

This assumes that the hardware (remote machine) is reliable (e.g., to run gdbserver); this may not be true for the architecture researcher.



On Catamaran:

# GDB control of RISC-V CPU on FPGA Connecting GDB

• When you start the Host-side Hub, provide a TCP/IP port number for GDB to connect:

\$ exe\_Host\_AWSF1 —debugport 5555

• In another terminal window, start GDB with the provided script, which will start GDB and OpenOCD and make the connections, and pause at the usual GDB prompt:

```
$ riscv64-unknown-elf-gdb —silent —command init_64.gdb ... (gdb)
```

Please peruse the supplied gdb and OpenOCD scripts for more details of how the connections are established, etc.

 At the (gdb) prompt, use GDB commands as usual: load ELF, start, stop, continue, step, step, break, read/write registers and CSRs and memory, ...

Hint: Loading an ELF from GDB can be slow; you can still use the "loadelf" command at the Host-side Hub prompt to load an ELF, much faster.



## Summary and Conclusion

#### Summary of tutorial



- Catamaran/ARIFIC provides the large and complex infrastructure that frees the Architecture Researcher to focus on their creative work: research into the CPU microarchitectures, memory systems, accelerators, etc.
- In this tutorial we showed the capabilities thereby immediately enabled for the Researcher:
  - Run on FPGA on Amazon AWS virtual machines
  - Run ISA tests
  - Run bare-metal C programs with console I/O
  - Run Linux and C programs-under-Linux with console I/O, networking and "disk" devices
  - GDB-debug bare-metal programs
  - Dump memory data (e.g., performance data collected during program execution)

### End of Tutorial Slides

#### Appendix with reference material follows

Thank you all for attending!

GitHub repo for this tutorial: https://github.com/rsnikhil/Tutorial\_at\_HPCA-29

## Appendix: Reference Material

- Installing the RISC-V Gnu Toolchain (gcc, as, Id, gdb, ...)
- Opening an account on Amazon AWS
- Creating an AMI (or two) for Catamaran/ARIFIC
- Installing aws-fpga HDK and SDK
- Installing XDMA driver for Host-FPGA PCIe communication

#### Installing the RISC-V Gnu Toolchain

- By "RISC-V Gnu Toolchain" we mean the familiar gcc, as, ld, gdb, objdump, ... etc.
- These are needed for compiling C, C++ and RISC-V Assembly Language programs into ELF binaries to be loaded and run on a RISC-V CPU.
- We install "cross-compilers" that run under Ubuntu, but generate RISC-V machine code.

Full details may be found at this URL; please follow the installation directions there.

https://github.com/riscv/riscv-gnu-toolchain

Hint: in the "configure" step, specify "medany" and "multilib":

```
$./configure --prefix=<path_to_installation_dir> --with-cmodel=medany --enable-multilib
```

- <path...> should be a full absolute path (do not use ~ for your home directory).
- "newlib" is for the compiler for bare-metal C RV32 and RV64 programs (no system calls)
- "multilib" produces a compiler that can produce both RV32 and RV64 code, even though it is named *riscv64-unknown-elf-gcc*

Before using the cross-compiler, export the following environment variables:

```
$ export RISCV=<path_to_installation_dir>
$ export PATH=${RISCV}/bin:${PATH}
```

## Appendix: Reference Material

- Installing the RISC-V Gnu Toolchain (gcc, as, Id, gdb, ...)
- Opening an account on Amazon AWS
- Creating an AMI (or two) for Catamaran/ARIFIC
- Installing aws-fpga HDK and SDK
- Installing XDMA driver for Host-FPGA PCIe communication

#### Create your Amazon AWS account

• At: <a href="https://aws.amazon.com">https://aws.amazon.com</a>



Amazon runs cloud farms in many parts of the world; you may be asked to select a region "near" you. E.g., I use: 'us-east-1 (N.Virginia)'

Caution: select a region that has support instances with FPGAs



## Appendix: Reference Material

- Installing the RISC-V Gnu Toolchain (gcc, as, ld, gdb, ...)
- Opening an account on Amazon AWS
- Creating an AMI (or two) for Catamaran/ARIFIC
- Installing aws-fpga HDK and SDK
- Installing XDMA driver for Host-FPGA PCIe communication

#### Navigate to your EC2 Instances dashboard



Then, move on to the "Instances" dashboard ("instances" = AWS terminology for your virtual machines)

FPGA (cheaper), such as the first one shown above.

#### The EC2 "instances" dashboard



#### Create one or two AMIs?

The "Instance type" determines, for your AMI:

- Host CPU type (x86, ARM, ...)
- Sizing (# of virtual cores, memory, ...)
- "f1.\*" types are the *only* ones with FPGAs attached.
- Note: AWS charges vary with instance type
- Warning: "f1" instances typically have higher AWS charges.

Hint (for lower AMS charges): Create two AMIs:

- A "build" AMI of type "m6i.2xlarge" for building bitfiles for Catamaran/ARIFIC
  - Use the free AWS "FPGA Developer AMI" for this, which comes preloaded with Vivado, Vivado licenses, etc. (free). The supplied OS is CentOS Linux.
- A "run" AMI of type "f1.2xlarge" for running Catamaran/ARIFIC with FPGA
  - You can choose an AMI with OS of your choice (we typically choose latest Ubuntu)

Note: when you build a bitfile on the "build" AMI, it'll reside in the cloud named by an AGFI (unique id), which you can access from the "run" AMI

### "Create" your virtual machine ("instance")

When you select "Launch Instances" to create a new instance, it will take you through a series of steps ("Launch Instances Wizard"). Here are some notes on the various steps:

- "Name and tags": choose a name for your instance (e.g., on our slides: "RSN\_AFI\_Build2", "RSN\_AFI\_run", "RSN\_AWSteria")
- Application and OS Images (Amazon Machine Image)
  - Choose one of the AMIs from the 1000s available in the search box
  - For Catamaran/ARIFIC bitfile building, we recommend the latest AWS "FPGA Developers AMI"
  - For Catamaran/ARIFIC running on FPGA, we recommend an AMI with the latest Ubuntu (22.04 LTS at time of writing), x86 (not ARM!), 64-bit
- "Select and Instance Type"
  - The menu shows 100s of possibilities, x86 and ARM, etc.
  - For Catamaran/ARIFIC bitfile building, we recommend "m6i.2xlarge" (e.g., see "Instance type" on previous slide)
  - For Catamaran/ARIFIC running on FPGA, this *must* be an "f1" instance; we recommend "f1.2xlarge" (e.g., see "Instance type" on our slides)
- "Create your Key Pair"
  - When one connects to an AMI (with "ssh"), for security reasons Amazon does not support password-based login authentication, only crypto-key based authentication.
  - This menu Item creates a key pair for you, a public key and private key.
  - The prompts will ask you to copy the private key, and save it on your laptop/desktop in a file ~/.ssh/MyPrivateKey.pem and set its protection to 0x400
  - Note: once you've created a key pair, you can reuse it for new instances; it will show you your existing key pairs and you can select one, instead of creating a new one.
- "Network Settings": Leave unchanged. In particular, "Auto-assign public IP" = "Enable" and "Allow SSH traffic from Anywhere"
- "Configure Storage": Leave unchanged, e.g., 8 GiB (these can be updated later, if needed)
- "Advanced Details": Leave unchanged
- Finally, select "Launch Instance" on the "Summary" panel which floats on the right during the wizard.
  - (but see caveat on next slide)

### "Create" your virtual machine ("instance")

Caveat: After selecting "Launch Instance" at the end of the AMI-creation wizard, you may encounter an error message like this:

You have requested more vCPU capacity than your current vCPU limit of 0 allows for the instance bucket that the specified instance type belongs to. Please visit

http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit.

If you get this message, please visit the contact-us link shown, click on "Support" and lodge a request to increase your "vCPU quota" to 8, which is required for the f1.2xlarge instance type. Try to include a few sentences explaining your need for an f1.2xlarge instance type.

You will get an almost instance response saying:

This specific limit increase request requires further internal review before approval ...

These requests are processed by humans, so it may take a few hours before you receive an email indicating that your request has been approved, after which you can proceed.

## Appendix: Reference Material

- Installing the RISC-V Gnu Toolchain (gcc, as, ld, gdb, ...)
- Opening an account on Amazon AWS
- Creating an AMI (or two) for Catamaran/ARIFIC
- Installing aws-fpga HDK and SDK
- Installing XDMA driver for Host-FPGA PCIe communication

#### Installing aws-fpga HDK and SDK

"aws-fpga" is a free HDK (Hardware Development Kit) + SDK (Software Development Kit)

- The HDK is needed for creating AWS bitfiles for the FPGA
- The SDK is needed for creating host-side software that communicates with the FPGA
- It also contains the Xilinx XDMA Linux driver for the host-side to communicate with the FPGA over the PCIe connection

Git-clone it into your AMI from here:

\$ git clone <a href="https://github.com/aws/aws-fpga.git">https://github.com/aws/aws-fpga.git</a>

Each time you connect to your AMI, please do:

```
$ cd aws-fpga
```

\$ source hdk\_setup.sh

\$ source sdk\_setup.sh

## Appendix: Reference Material

- Installing the RISC-V Gnu Toolchain (gcc, as, Id, gdb, ...)
- Opening an account on Amazon AWS
- Creating an AMI (or two) for Catamaran/ARIFIC
- Installing aws-fpga HDK and SDK
- Installing XDMA driver for Host-FPGA PCIe communication

#### Installing XDMA driver for Host-FPGA PCIe communication

The XDMA driver is a Linux kernel driver for the host to communicate with the FPGA over the high-speed PCIe bus. The driver is included in the "aws-fpga" HDK + SDK.

Git-clone the aws-fpga HDK+SDK into your AMI from here:

```
$ git clone <a href="https://github.com/aws/aws-fpga.git">https://github.com/aws/aws-fpga.git</a>
```

Compile the driver (creates file "xdmi.ko"):

```
$ cd aws-fpga/sdk/linux_kernel_drivers/xdma
$ make
```

\$ Is xdma.ko

Install the driver into the AMI's running Linux kernel:

\$ sudo make install

Check that XDMA is installed and running:

```
$ Ismod | grep xdma
xdma 94208 0
$ Is /dev/xdma*
... will show many devices with the xdma prefix ...
```

### END

GitHub repo for this tutorial: <a href="https://github.com/rsnikhil/Tutorial">https://github.com/rsnikhil/Tutorial</a> at HPCA-29