Skip to content

Roadmap to support SPEC in the FPGA

UlisesLuzius edited this page May 23, 2022 · 36 revisions

This guide shows our different methods to be able to run SPEC in the FPGA. As SPEC mostly consists of User-mode code, we run the system-calls in QEMU.

Even though QEMU executes the system-call, it can impact state that is being maintained in the FPGA.

For e.g.: changes to page permissions, page flushes, synchronising pages, and transplants.


Research

Benchmarking System

We want to check the latencies and throughput of operations to get estimated performance:

  • How much does a transplant cost?
  • What is the average cost to access DRAM?
  • What is the throughput in best case scenario ? (e.g. Matrix multiply)

System development

Here's a resume of what's working (in the FPGA):

  • Classic page faults: FPGA demanding pages to read and write (any test)
  • Cache evictions and write back due to page synchronisation (any test)
  • TLB eviction due to page synchronisation (any test)
  • Page synchronisation: QEMU requesting an FPGA page to modify (multiple tests)
  • Page permission changes on segfault -> recall page from FPGA and update permissions (test_memsvc_segfault.c)
  • Page evictions when forking -> clearing TLB's, synchronising modified pages (fork.c)
  • Forking a simple program (fork.c)
  • Be able to load a binary with DevteroFlex attached (loader.c)
  • Complex sys call with I/O (puts.c)

What's next:

  • Be able to run multiple cpu's with QEMU

System calls

Methodology:

  1. Create a small unit test with system call
  2. If the unit test fails, run in debug mode to find the point of divergence
  3. Fix one by one till the full system call passes

List of all syscalls

SYSCALLS
FILE openat ✔️ access read ✔️ write ✔️ (puts.c) close ✔️
pread64 newfstatat
MEM mmap ✔️ mremap ✔️ mprotect ✔️ munmap ✔️ brk ✔️
THREAD set_tid_address arch_prctl set_robust_list (LOCK) exit_group ✔️
SCHEDULE rseq RESOURCES prlimit64
FUNCTION getrandom

More detailed list

System call Status Which test? Why is it not working? Expected fix
Threads
clone ✔️ fork.c, loader.c
fork ✔️ fork.c, loader.c Make sure add a copy on write to strengthen it
exec ✔️ loader.c
exit_group ✔️ loader.c
wait ✔️ loader.c
set_tid_address, arch_prctl, set_robust_list, ?
File
openat ✔️ test_fopen.c
read ✔️ test_fread.c, test_fread_buffer.c
write ✔️ puts.c
close ✔️ test_fclose.c
newfstatat ? gcc
faccess. ? gcc
getcwd ? gcc
readlinkat ? gcc
unlinkat ? gcc
pread64 ? no test yet
Memory
mmap ✔️ test_mmap_strong.c, test_memsvc_segfault.c
mremap ✔️ test_mmap.c, test_memsvc_segfault.c
mprotect ✔️ test_mmap.c, test_memsvc_segfault.c
munmap ✔️ test_mmap_strong.c, test_memsvc_segfault.c
brk ✔️ puts.c, fopen.c
Extra
clock_nanosleep() ✔️ sleep.c #35
getpid() ✔️ getpid.c
rt_sigaction() test_memsvc_segfault.c, gcc Registers segfault routine
prlimit64 gcc
pipe gcc

access, rseq, prlimit64, getrandom | ? | No test yet


Known errors due to QEMU:

  • Running multiple CPU's: managing when the CPU sleeps (EXCP_HALTED) flag on transplants #36

Architectural errors (RTL)

MMU

  • Page Table set fetched from DRAM that leads to set duplication #52

Pipeline

  • Add again support for MADD/MSUB instruction (3-sources data-processing 3DP) #58

Regression suite

  • Select-sort with 16 threads #67, is probably same error as #52

Known errors only in AWS FPGA

  • First 512 bits of Page Table in DRAM get trashed #66