Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FPU support #215

Closed
Dolu1990 opened this issue Mar 17, 2021 · 66 comments
Closed

FPU support #215

Dolu1990 opened this issue Mar 17, 2021 · 66 comments

Comments

@Dolu1990
Copy link
Contributor

Dolu1990 commented Mar 17, 2021

Hi,

Got the FPU added in VexRiscv and linux-on-litex-vexriscv (not merged yet) via

There is some documentation about the FPU here :
https://github.com/SpinalHDL/VexRiscv/tree/fiber#fpu

In short,

  • support both F32/F64, subnormal and all 5 rounding mods
  • can be shared between multiple CPU to save area
  • can schedule most operations each cycle (as long there is no inter depedancies)
  • so far, it shouln't impact FMax much (at least on Artix7)
  • test with 2 CPU, if more is used, we might need to pipeline the connections between the FPU and the core a bit more.
  • actualy cost 3662 LUT on Artix7

TODO :

Getting the buildroot image to build (see #214)

The only requirements to get the FPU to work in buildroot is to add the following in the buildroot defconfig :
BR2_RISCV_ISA_CUSTOM_RVF=y
BR2_RISCV_ISA_CUSTOM_RVD=y

And set the CONFIG_FPU of the linux.config to y

The DTS generation is already fixed, opensbi do not seems to require a rebuild.

@mithro
Copy link
Contributor

mithro commented Mar 17, 2021

This is super cool!

@Dolu1990
Copy link
Contributor Author

Dolu1990 commented Mar 17, 2021

Got it to work !

Welcome to Buildroot
buildroot login: root
                   __   _
                  / /  (_)__  __ ____ __
                 / /__/ / _ \/ // /\ \ /
                /____/_/_//_/\_,_//_\_\
                      / _ \/ _ \
   __   _ __      _  _\___/_//_/         ___  _
  / /  (_) /____ | |/_/__| | / /____ __ / _ \(_)__ _____  __
 / /__/ / __/ -_)>  </___/ |/ / -_) \ // , _/ (_-</ __/ |/ /
/____/_/\__/\__/_/|_|____|___/\__/_\_\/_/|_/_/___/\__/|___/
                  / __/  |/  / _ \
                 _\ \/ /|_/ / ___/
                /___/_/  /_/_/
  32-bit RISC-V Linux running on LiteX / VexRiscv-SMP.

login[77]: root login on 'console'
root@buildroot:~# cat /proc/cpuinfo 
processor	: 0
hart		: 0
isa		: rv32imafd
mmu		: sv32

processor	: 1
hart		: 1
isa		: rv32imafd
mmu		: sv32

It was also able to decode a MP3 using mpg123 with its default float backend, and write it into a wave file. So all should be good :)

@Dolu1990
Copy link
Contributor Author

So, one thing, it was tested on ArtyA7 35T using the following configuration :
VexRiscvLitexSmpCluster_Cc2_Iw64Is8192Iy2_Dw64Ds8192Dy2_Ldw128_Cdma_Ood_Fpu.v

The FPGA is quite full (89.07% LUT used), but the timings are ok.

@Dolu1990
Copy link
Contributor Author

Dolu1990 commented Mar 18, 2021

@enjoy-digital Ready for merge (#217)

Also, i improved the SoC pipelining, and now one FPU will be created for each 4 cores, so it should scale. Synthesis with 8 cores pass timings on nexys_video

@enjoy-digital
Copy link
Member

Great work @Dolu1990! I'll review the integration code soon.

@rdolbeau
Copy link

Excellent :-) Seems I've got the FPU running as well, rebuilding the buildroot with a hard-float ABI now.

@Dolu1990 : Is there a way to optionally configure more FPU than one for 4 cores ? (e.g. one-fortwo or one-for-one)

@Dolu1990
Copy link
Contributor Author

Currently, the code could do it, but the parameter isn't accessible. Basicaly it is done here :
https://github.com/SpinalHDL/VexRiscv/blob/80f64f0f9f0e434796f72d583aa1eaaa47d863b9/src/main/scala/vexriscv/demo/smp/VexRiscvSmpLitexCluster.scala#L37

(group cpus by 4)

I will add the option to the litex integration

@Dolu1990
Copy link
Contributor Author

@rdolbeau There is the feature to change the ratio between CPU count and FPU :
enjoy-digital/litex#859

@Dolu1990
Copy link
Contributor Author

Did some mandelbrot tests. With F64, got 21 clock per iteration using https://github.com/SpinalHDL/buildroot-spinal-saxon/blob/main/package/mandelbrot/src/mandelbrot.c#L248

realy hard to ensure the ordering in which GCC schedule operations XD

@rdolbeau
Copy link

@Dolu1990 Thanks :-) Now I need to figure out why 'init' crashes with a signal 11 when buildroot is compiled as ilp32d (hard-float)...

@Dolu1990
Copy link
Contributor Author

on which board with which config are you trying it ?

ilp32d

You had to specify ilp32d manualy ?

@Dolu1990
Copy link
Contributor Author

There is the images i'm using on artyA7 for a dual corp system :
images.zip

Can you give a try ? (maybe the DTB isn't compatible with your board, so keep your own ^^)

@rdolbeau
Copy link

@Dolu1990 Thanks will try that ASAP (as I might have a HW problem, I'm trying the merge of my three-operands branch with dev to get RV32IMAFDCBK...)

@Dolu1990
Copy link
Contributor Author

Hoo i would say try without the merge first ^^

@rdolbeau
Copy link

@Dolu1990 I'm an optimist :-) It seems your archive is soft-float (ilp32); the ELF signature have flags 0x0. Mine are 0x5 (I have both EF_RISCV_FLOAT_ABI_DOUBLE from using ilp32d and EF_RISCV_RVC as some test binaries from K used C and I enabled it by default).

With my own soft-float buildroot, I was able to run a trivial test code in soft-float at -O0 (the lone fadd in it worked), but the same test at -O3 would crash with a memory error (sig11) somewhere in a library, I think after the fadd (maybe the printf). I figured using FP instructions in the soft-float ABI (-march=rv32imafdc -mabi=ilp32) might have issue, so I rebuilt the whole buildroot & the test binaries with the hard-float ABI (-march=rv32imafdc -mabi=ilp32d for my test code, BR2_RISCV_ABI_ILP32D=y along with F/D/C in the buildroot config, CONFIG_FPU=y for the kernel). In that configuration, init itself blows up with a sig11.

I'll have to retry on a pure 'dev' build, with none of my own stuff in it (the merge wasn't very smooth...) to make sure.

@rdolbeau
Copy link

@Dolu1990 I seem to have the same issue when the core is generated from 'dev' (the only change is compressedGen = True):

[    7.446527] Run /init as init process                                                  
[    7.460148] init[1]: unhandled signal 11 code 0x1 at 0x95bd628c                        
[    7.461692] CPU: 0 PID: 1 Comm: init Not tainted 5.11.0-rc7 #2                         
[    7.462927] epc: 95bd628c ra : 95b017fe sp : 9d8fa610                                  
[    7.464346]  gp : c057a4a8 tp : 00000000 t0 : 0000000a                                 
[    7.465532]  t1 : 95af09cc t2 : 00000000 s0 : 9d8fac40                                 
[    7.466715]  s1 : 95b09a1c a0 : 9d8fa630 a1 : 00000000                                 
[    7.468260]  a2 : 9d8fa7d8 a3 : 00000024 a4 : 00000000                                 
[    7.469448]  a5 : 9d8fa628 a6 : 7efefeff a7 : 464d4824                                 
[    7.470648]  s2 : 9d8fa760 s3 : 000b2f08 s4 : 95b09a40                                 
[    7.472144]  s5 : 95b09a40 s6 : 00000000 s7 : 00000000                                 
[    7.473329]  s8 : 00000000 s9 : 9d8fa7cc s10: 9d8fa7d8                                 
[    7.474517]  s11: 95b08f30 t3 : 95b00638 t4 : 00000110                                 
[    7.475950]  t5 : 756e6900 t6 : 6ffffe35                                               
[    7.476734] status: 00000020 badaddr: 95bd628c cause: 0000000c                         
[    7.478851] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b    
[    7.480694] CPU: 0 PID: 1 Comm: init Not tainted 5.11.0-rc7 #2                         
[    7.481927] Call Trace:                                                                
[    7.482366] [<c000337e>] walk_stackframe+0x0/0xa6                                      
[    7.483410] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x000
0000b ]---                                                                                

@Dolu1990
Copy link
Contributor Author

t seems your archive is soft-float (ilp32)

Hmm that's right, i'm confused, i'm trying to rebuild things now

compressedGen = True

XD that's it. I mean, at least it will not work with that.

The reason is that the FPU instruction of RVC isn't implemented yet. Will work on that.

@rdolbeau
Copy link

@Dolu1990 OK good to know, I'll retry without RVC when I can - I will have to remove some stuff from my buildroot as RVC really helps with the size (and it's already over 60 MiB by now, I needed C++ and python for some test codes...)

@Dolu1990
Copy link
Contributor Author

Hoo i found something.

CONFIG_FPU=n was in the linux.defconfig

You need to set it to CONFIG_FPU=y too

@Dolu1990
Copy link
Contributor Author

images.zip
that kernel is with the CONFIG_FPU enabled, it has the save/restore fld fsd

@Dolu1990
Copy link
Contributor Author

So just tested on ArtyA7, works for me, let's me know how things goes for you

@rdolbeau
Copy link

@Dolu1990 went a slightly different course - it's only 8 expansion patterns to add to the decompressor, I figured it would take less time to add them than recompile the full buildroot :-) I had (at least...) a small bug but init was fixed so it's definitely the issue. Second bitstream being generated, patch to VexRiscv to follow if it works-for-me.

@Dolu1990
Copy link
Contributor Author

Hoooo i was adding the feature too XD I just finished now, but was looking into how to verify it. Unfortunatly riscv-test do not has any case for it XD

So your PR is welcome, i will cross check with my implementation, that should give a good level of certainty.

@rdolbeau
Copy link

Oups didn't think you'd get around to it so fast... anyway I can do a PR, but while I go much further than before (init no longer crashes) I have a(n apparent) hang after/around the time haveged starts :

[    7.516527] mmcblk0: mmc0:0001 EB1QT 29.8 GiB                                          
[    7.548444]  mmcblk0: p1                                                               
Starting syslogd: OK                                                                      
Starting klogd: OK                                                                        
Running sysctl: OK                                                                        
Saving random seed: [    9.030607] random: dd: uninitialized urandom read (512 bytes read)
OK                                                                                        
Starting haveged: haveged: can not open UNIX socket                                       
haveged: can not initialize command socket: Address family not supported by protocol      

Then nothing happens anymore and the console is not responsive :-(

@Dolu1990
Copy link
Contributor Author

Starting haveged: haveged: can not open UNIX socket

Most of the time i had a hang in that spot, was due to process to actively waiting on /dev/random, which isn't loaded because of no entropy generation.

The entropy generation in that system should be done by haveged, but here a it can't open the unix socket, things are stuck.
Likely you need to add some package in order to enable it.

Do you have the CONFIG_UNIX=y in the linux.config ?

More broadly on saxonsoc ot have haveged working, i have :

CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=y
CONFIG_UNIX=y
CONFIG_INET=y

@rdolbeau
Copy link

rdolbeau commented Mar 24, 2021

@Dolu1990 Generally speaking that could be the problem, but this configuration works in soft-float and I think that when waiting on /dev/urandom the console would still work as it doesn't rely on random data (but it was a while since I added haveged so I'm not 100% sure).

I have all those but CONFIG_UNIX, I'll add it and try again to make sure.

My current code in SpinalHDL/VexRiscv#167

Upd: unfortunately even with CONFIG_UNIX:

[    8.349725] mmcblk0: mmc0:0001 EB1QT 29.8 GiB
[    8.379964]  mmcblk0: p1
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Saving random seed: [   10.026069] random: dd: uninitialized urandom read (512 bytes read)
OK 
Starting haveged: haveged: command socket is listening at fd 3 

then, hang :-(

@Dolu1990
Copy link
Contributor Author

Before the look in root, there is no console, i mean, when i had that /dev/random issue (not urandom), everything freezed.

Do you have the openssh package ?

@rdolbeau
Copy link

rdolbeau commented Mar 24, 2021

no I didn't enable OpenSSH as this board has no ethernet (only micro-sd and serial; the new one crawling its way through the postal system should have ethernet ;-) )

Upd: I do have SSL and the AES stuff (patch to use K instead of the custom opcodes), but in this configuration the instructions are missing - but that should cause SIGILL rather than hang.

# Target options
BR2_riscv=y
BR2_RISCV_32=y

# Instruction Set Extensions
BR2_riscv_custom=y
BR2_RISCV_ISA_CUSTOM_RVM=y
BR2_RISCV_ISA_CUSTOM_RVA=y
BR2_RISCV_ISA_CUSTOM_RVF=y  # Uncomment to enable FPU
BR2_RISCV_ISA_CUSTOM_RVD=y  # Uncomment to enable FPU
BR2_RISCV_ISA_CUSTOM_RVC=y
#BR2_RISCV_ABI_ILP32=y
BR2_RISCV_ABI_ILP32D=y

# Patches
BR2_GLOBAL_PATCH_DIR="$(BR2_EXTERNAL_LITEX_VEXRISCV_PATH)/patches"

# GCC
BR2_GCC_VERSION_10_X=y
BR2_GCC_ENABLE_OPENMP=y
BR2_INSTALL_LIBSTDCPP=y
BR2_TOOLCHAIN_BUILDROOT_CXX=y
BR2_TOOLCHAIN_BUILDROOT_FORTRAN=y

# System
BR2_TARGET_GENERIC_GETTY=y
BR2_TARGET_GENERIC_GETTY_PORT="console"

# Filesystem
BR2_TARGET_ROOTFS_CPIO=y

# Kernel (litex-rebase branch)
BR2_LINUX_KERNEL=y
BR2_LINUX_KERNEL_CUSTOM_GIT=y
#BR2_LINUX_KERNEL_CUSTOM_REPO_URL="git://github.com/litex-hub/linux.git"
#BR2_LINUX_KERNEL_CUSTOM_REPO_VERSION="litex-rebase"
BR2_LINUX_KERNEL_CUSTOM_REPO_URL="git://github.com/geertu/linux.git"
BR2_LINUX_KERNEL_CUSTOM_REPO_VERSION="litex-v5.11"
BR2_PACKAGE_HOST_LINUX_HEADERS_CUSTOM_5_11=y
BR2_LINUX_KERNEL_USE_CUSTOM_CONFIG=y
BR2_LINUX_KERNEL_CUSTOM_CONFIG_FILE="$(BR2_EXTERNAL_LITEX_VEXRISCV_PATH)/board/litex_vexri
scv/linux.config"
BR2_LINUX_KERNEL_IMAGE=y

# Rootfs customisation
BR2_ROOTFS_OVERLAY="$(BR2_EXTERNAL_LITEX_VEXRISCV_PATH)/board/litex_vexriscv/rootfs_overla
y"
BR2_GLOBAL_PATCH_DIR="$(BR2_EXTERNAL_LITEX_VEXRISCV_PATH)/patches"

# Extra packages
#BR2_PACKAGE_DHRYSTONE=y
#BR2_PACKAGE_MICROPYTHON=y
#BR2_PACKAGE_SPIDEV_TEST=y
#BR2_PACKAGE_MTD=y
#BR2_PACKAGE_MTD_JFFS_UTILS=y

# Crypto
#BR2_PACKAGE_LIBATOMIC_OPS_ARCH_SUPPORTS=y
#BR2_PACKAGE_LIBATOMIC_OPS=y
BR2_PACKAGE_OPENSSL=y
BR2_PACKAGE_LIBOPENSSL_ENGINES=y
BR2_PACKAGE_LIBOPENSSL_BIN=y
#BR2_PACKAGE_LIBRESSL=y
#BR2_PACKAGE_LIBRESSL_BIN=y
BR2_PACKAGE_HAVEGED=y
BR2_PACKAGE_VEXRISCV_AES=y

# Python for Krypto tests
BR2_PACKAGE_PYTHON3=y
BR2_PACKAGE_PYTHON_PYCRYPTODOMEX=y

# for Hydro
BR2_PACKAGE_NUMACTL=y

@Dolu1990
Copy link
Contributor Author

So
I were able to test rv32imafdc on SaxonSoc, got it to boot and work fine (mandelbrot in double, mpg123 in float)

Just for sanity, did you checked the timings reports of the synthesis ? (as that's quite a lot of additionnal features XD)

Also your buildroot / linux config were working fine before the FPU addition right ?

@rdolbeau
Copy link

Not sure for the simulation, I know Vivado can do some stuff but I'm not enough of an expert. Anyway don't worry, I'll try to figure out what's wrong when I have more time. I can fall back on the soft-float ABI anyway, that one should have been fixex by the C stuff.

@rdolbeau
Copy link

Update; it might not be the ABI for me. Reloaded the soft-float buildroot, simple test binary works, but the 'real' binary (a numerical mini-app) hangs for me. Good news is, that's an easier test to share.

@Dolu1990
Copy link
Contributor Author

(so early test did not have it, it doesn't seem to change the userland behavior)

which behaviour wasn't changed ? crashing behaviour ? or correct behaviour ?

readelf -h

Cool thanks, didn't know :)

@rdolbeau
Copy link

Didn't change either behavior :-) init crashes in hard-float w/ or w/o RVC in the kernel, works fine in soft-float w/ or w/o RVC in the kernel; as far as I can tell it only changes whether the kernel uses C instructions (with, it is much smaller).

@rdolbeau
Copy link

rdolbeau commented Mar 24, 2021

New update; partial PIBKAC - I had forgotten (again) to update that loathsome DTB
, and that seem to have caused the hang of init, as I reach login and can log now. It seems RVC was the primary 'real' problem to fully boot, but my test code still crashes the SoC:

a) soft-float test code 'hydro' hangs the system on the soft-float buildroot;
b) hard-float test code 'hydro' doesn't-fully-hang on the hard-float buildroot:

root@buildroot:~# ./hydro -i input.nml                                                    
+-------------------+                                                                     
|GlobNx=80          |                                                                     
|GlobNy=80          |                                                                     
|nx=80              |                                                                     
|ny=80              |                                                                     
|nxystep=80         |                                                                     
[   62.830654] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:                        
[   62.831893] rcu:     1-...0: (0 ticks this GP) idle=50e/1/0x40000000 softirq=522/522 fq
s=2625                                                                                    
[   62.833793]  (detected by 0, t=5252 jiffies, g=-47, q=4)                               
[   62.834823] Task dump for CPU 1:                                                       
[   62.835386] task:hydro           state:R  running task     stack:    0 pid:   83 ppid: 
   75 flags:0x00000008                                                                    
[   62.837479] Call Trace:                                                                
[   62.837886] [<c00020f0>] ret_from_fork+0x0/0xc                                         

The kernel output then repeats every 63 seconds (with t=XYZ jiffies the only thing that changes beyond the kernel timestamp); I can't stop or background the process or regain control, but the terminal acknowledge my actions (carriage return, displaying ^Z or ^C).

Edit: typo

I had forgotten to patch the updated DTS to accomodate a large buildroot, so the buildroot was truncated - and apparently that was enough to enable booting!?! Anyway with enough space allocated in the FP-enabled DTS, same as before: hang after starting haveged in hard-float, boot but hand in hydro in soft-float.

@Dolu1990
Copy link
Contributor Author

Is hydro using some of the custom instruction added in the CPU ?

I mean, we are currently debugging a system with many tunned parameters at once.
Would it be possible to try the system with a "vanilla" VexRiscv ? without the additional custom extentions ?

@rdolbeau
Copy link

Normally no, it was compiled with the buildroot compiler:

[dolbeau@localhost Src]$ cat make.inc 
R5IMA_TOOLCHAIN=/home/dolbeau2/LITEX/buildroot-rv32/output/host
R5IMA_BUILDROOT=/home/dolbeau2/LITEX/buildroot-rv32/output/target

CC=$(R5IMA_TOOLCHAIN)/bin/riscv32-buildroot-linux-gnu-gcc
#CC=/opt/riscv64bk/bin/riscv64-unknown-elf-gcc
CFLAGS=-O3 -march=rv32imafdc -mabi=ilp32
LDFLAGS=-I$(R5IMA_BUILDROOT)include -L$(R5IMA_BUILDROOT)/lib -lnuma -lm -latomic

Does it work for you? (I don't think it's ever been tried on RV32GC, though it is known to work in RV64GC under QEmu). If it works, then the problem might be specific to my hardware build.

I was generating the core from a 'vanilla' dev branch, none of my stuff there to remove a variable.

I also tried the same code with -mno-fdiv (this avoids all fdiv and fsqrt instructions, as experience has taught me they can be troublesome), but as far as I can tell they are not the culprit.

I'll try again with a completely clean slate this week-end to make sure ; disabling my stuff might not have cleaned up everything properly. Hopefully, I'll also have access to a bigger FPGA - and if ethernet works in that one, I could have NFS to access binaries and speed up the testing considerably (although making NFS works from buildroot might be a project in itself...)

@rdolbeau
Copy link

OK, so starting with a clean slate (current GitHub repos with just enough patches to generate for my board):

a) adding FD instructions in the core, buildroot, kernel, hydro, retaining the soft-float ABI (ilp32): hydro works
b) switching buildroot & hydro to the hard-float ABI (ilp32d): hydro works
c) adding C to the core, buildroot, kernel, hydro & bootloader: linux boots but hydro causes the system to hang

So while SpinalHDL/VexRiscv#167 helped by enabling boot (init no longer crashes), there's still a C+FD issue as C itself was not causing issue for my crypto tests (including running large binaries such as python3).

Other than that one issue, fantastic job to have a numerical code like hydro working fine on the SoC :-)

@Dolu1990
Copy link
Contributor Author

Cool ^^

there's still a C+FD issue

So, here, try to pull the vexriscv dev, i did one fix, which fix one bad interraction between wfi and the fpu which could in some very specific condition (i don't think that the issue here) hang the system.

By the way what is that hydro binary doing ?

@Dolu1990
Copy link
Contributor Author

@rdolbeau
Copy link

Moving the core to dev (to get the fix) did not help unfortunately.

For Hydro:

HydroC is not an application, but rather a mini-application (...)It is a simplified version of the astrophysical code RAMSES. It is a 2-dimensional CFD using the Finite Volume Method with a Godunov’s scheme and a Riemann solver at each interface on a regular 2D mesh.

It's a numerical code really designed to run on multiple big HPC-oriented servers with high-speed interconnect, not inside a small SoC, so it's impressive it actually works :-) (though I've not enabled any parallelism yet).

@Dolu1990
Copy link
Contributor Author

Ok ^^

Hmm, how many CPU core do you have in the SoC, and how many core per FPU ?

Because that may be a important delta with my actual configuration.

@rdolbeau
Copy link

rdolbeau commented Mar 25, 2021

1 core so far. Will try with 2 cores/1 FPU (I don't think I can fit 2 cores/2 FPUs in my 35T).

Update: 2 cores/1 FPU doesn't help, hydro still crashes the SoC for me on rv32imafdc/ilp32d

@Dolu1990
Copy link
Contributor Author

ok, I'm trying now to build the whole system on litex

@Dolu1990
Copy link
Contributor Author

Localy, i added the possibility to have RVC on linux-on-litex-vexriscv, got it to work :

buildroot login: root
                   __   _
                  / /  (_)__  __ ____ __
                 / /__/ / _ \/ // /\ \ /
                /____/_/_//_/\_,_//_\_\
                      / _ \/ _ \
   __   _ __      _  _\___/_//_/         ___  _
  / /  (_) /____ | |/_/__| | / /____ __ / _ \(_)__ _____  __
 / /__/ / __/ -_)>  </___/ |/ / -_) \ // , _/ (_-</ __/ |/ /
/____/_/\__/\__/_/|_|____|___/\__/_\_\/_/|_/_/___/\__/|___/
                  / __/  |/  / _ \
                 _\ \/ /|_/ / ___/
                /___/_/  /_/_/
  32-bit RISC-V Linux running on LiteX / VexRiscv-SMP.

login[76]: root login on 'console'
root@buildroot:~# ./hydro -i 
hydro      input.nml
root@buildroot:~# ./hydro -i input.nml 
+-------------------+
|GlobNx=80          |
|GlobNy=80          |
|nx=80              |
|ny=80              |
|nxystep=80         |
|tend=20000.000     |
|nstepmax=10        |
|noutput=0          |
|dtoutput=0.000     |
+-------------------+
DMalloc: allocating 28224
Centered test case : 42 42
sh: cpupower: not found
Hydro starts in double precision.
Hydro: Main process running on buildroot
Hydro: standard build
Hydro starts main loop.
Page offset 8
IMalloc: allocating 6720
IMalloc: allocating 6720
DMalloc: allocating 26880
DMalloc: allocating 6720
DMalloc: allocating 6720
DMalloc: allocating 80
DMalloc: allocating 80
Hydro: init mem 0.052840s
Hydro computes initial deltat: 1.336306e-03
--> step=   1,  1.33631e-03, 1.33631e-03 0.007 MC/s {4.45 Mflops 4187396 Ops} (0.942s)
--> step=   2,  2.67261e-03, 1.33631e-03 0.007 MC/s {4.55 Mflops 4020988 Ops} (0.884s)
--> step=   3,  5.70914e-03, 3.03653e-03 0.007 MC/s {4.48 Mflops 4187396 Ops} (0.935s)
--> step=   4,  8.74568e-03, 3.03653e-03 0.007 MC/s {4.54 Mflops 4020988 Ops} (0.885s)
--> step=   5,  1.24942e-02, 3.74854e-03 0.007 MC/s {4.47 Mflops 4187396 Ops} (0.936s)
--> step=   6,  1.62428e-02, 3.74854e-03 0.007 MC/s {4.54 Mflops 4020988 Ops} (0.886s)
--> step=   7,  2.06309e-02, 4.38811e-03 0.007 MC/s {4.47 Mflops 4187396 Ops} (0.937s)
--> step=   8,  2.50190e-02, 4.38811e-03 0.007 MC/s {4.53 Mflops 4020988 Ops} (0.887s)
--> step=   9,  2.95700e-02, 4.55100e-03 0.007 MC/s {4.47 Mflops 4187396 Ops} (0.938s)
--> step=  10,  3.41210e-02, 4.55100e-03 0.007 MC/s {4.53 Mflops 4020988 Ops} (0.888s)
Hydro ends in 00:00:09.134s (9.134) <4.50 MFlops>.
Average MC/s: 0.007059 min 0.006825, max 0.007221 sig 0.000189
GATCON    CONPRI    EOS       SLOPE     TRACE     QLEFTR    RIEMAN    CMPFLX    UPDCON    COMPDT    MAKBOU    ALLRED    
PE0_DP 0.4217    0.5333    0.2485    1.2615    0.9557    0.2455    4.4893    0.2837    0.4056    0.2537    0.0142    0.0000    
%      4.6275    5.8521    2.7272    13.8429   10.4878   2.6945    49.2646   3.1128    4.4510    2.7839    0.1556    0.0000    
Average MC/s: 0.007
root@buildroot:~# 

But i have to try again with one specific config disabled

@Dolu1990
Copy link
Contributor Author

I successfully recreated a hang on the hydro binary.
To do that i had to keep IBusCachedPlugin.injectorStage on its default (false). when things were working with rvc it was to true to improve timings.
But, that injectorStage feature impact a very specific spot which may impact how the instructions are forked to the FPU.
Only issue in the diagnostic is that right now without that option i get a -4ns slack, which is way to much.

So i'm trying to setup something simlar in SaxonSoc but with a slower main clock.

@Dolu1990
Copy link
Contributor Author

@rdolbeau So right, as a workaround for now, just turn injectorStage to true, it should solve the issue. I'm working now on recreating the hang in simulation.

@Dolu1990
Copy link
Contributor Author

Thanks for the catch ^^

@rdolbeau
Copy link

@Dolu1990 Hehe I was synthesizing so soon as I saw the prospective fix :-) So I can confirm yes, injectorStage=true solves the problem; hydro is working with RV32IMAFDC/ILP32D now! It also seem to improve WNS, but performance is a bit lower (the MFlops in the output is a rough estimate but is useful to compare configurations with the same binary).

as a mini-app, hydro is really made to test and evaluate systems, so it did its job once more :-)

Next step, back to my own merged branch and testing OpenMP parallelism.

@Dolu1990
Copy link
Contributor Author

Cool ^^

So, about the fpu, got 42 MFLOPS single threaded with mandelbrot test 100 Mhz. This could probably be pushed higher if that was properly scheduled in assembly.

@rdolbeau
Copy link

@Dolu1990 I tried my merged version with injectorStage=true, hydro works in that configuration as well, alongside @mjosaarinen test codes from https://github.com/rvkrypto/rvkrypto-fips .

So I now have a RV32IMAFDCBK core booting Linux and running codes :-) Perhaps the first one as FPUs aren't common in RV32 FPGA cores and B/K are still under review...

Won't be able to fit 2 cores in the 35T, I'm already at > 83% of LUTs:

+----------------------------+-------+-------+-----------+-------+
|          Site Type         |  Used | Fixed | Available | Util% |
+----------------------------+-------+-------+-----------+-------+
| Slice LUTs                 | 17319 |     0 |     20800 | 83.26 |
|   LUT as Logic             | 16507 |     0 |     20800 | 79.36 |
|   LUT as Memory            |   812 |     0 |      9600 |  8.46 |
|     LUT as Distributed RAM |   776 |     0 |           |       |
|     LUT as Shift Register  |    36 |     0 |           |       |
| Slice Registers            | 11385 |     0 |     41600 | 27.37 |
|   Register as Flip Flop    | 11385 |     0 |     41600 | 27.37 |
|   Register as Latch        |     0 |     0 |     41600 |  0.00 |
| F7 Muxes                   |   307 |     0 |     16300 |  1.88 |
| F8 Muxes                   |    62 |     0 |      8150 |  0.76 |
+----------------------------+-------+-------+-----------+-------+

@Dolu1990
Copy link
Contributor Author

@rdolbeau SpinalHDL/VexRiscv@6f481f5 fix the issue properly. Got RVC without injectorStage to work on Saxon (lower frequancy to meet timings)

So i would say, keep the injectorStage to ensure timing pass :)

@Dolu1990
Copy link
Contributor Author

So I now have a RV32IMAFDCBK core booting Linux and running codes :-) Perhaps the first one as FPUs aren't common in RV32 FPGA cores and B/K are still under review...

I tried my merged version with injectorStage=true, hydro works in that configuration as well, alongside @mjosaarinen test codes from https://github.com/rvkrypto/rvkrypto-fips .

Greate :D

Hmm one question that i have is how to verify that the output of the simulation is right ?
Because even on X86, seems like there is some seed in it, which made the result different each run XD

@rdolbeau
Copy link

rdolbeau commented Mar 25, 2021

If there's some major problem, the two columns after "step = XYZ" will be different (and therefore, wrong), so step 10 should be "step= 10, 3.41210e-02, 4.55100e-03" always. Everything else is timing-related and irrelevant to the accuracy (in particular, the last two lines may look like results but are actually raw timings and percentage of total timings for various parts of the computation). Anything computed on the domain is just thrown away by default, as it's really a test code and not a production code.

Raising the parameter 'nstepmax' in the NML file will let the simulation run longer, to see if it diverges from a reference run on a different machine. Not sure how long it will remain accurate, as the domain size is very small to fit the SoC compared to what would test normally (nx = ny = 80 vs. on the order of thousands for short single-system runs to millions or more for multi-systems runs).

And for kicks, if C is too easy, CloverLeaf from https://github.com/UK-MAC/ is quite similar but in Fortran :-) (actually the more relevant version of HydroC nowadays is the C++ version, but I find the C version easier to handle for a first test).

Edit: did not see the fix had already landed :-) Will try that ASAP.

@Dolu1990
Copy link
Contributor Author

"step= 10, 3.41210e-02, 4.55100e-03"

Cool thanks :D
That's a greate bench to have

Edit: did not see the fix had already landed :-) Will try that ASAP.

Be carefull, having RVC and not enabling the injectorStage can easily lead into critical path failure, especialy with the FPU integration. Keep an eye on the ciritcal slack ^^

@rdolbeau
Copy link

rdolbeau commented Mar 25, 2021

@Dolu1990
Mmm, with the last (two) commits from dev merged in my branch the kernel doesn't load, w/ or
w/o injector ; I get the Liftoff! marker and then nothing...

Edit: went back to previous stage, still works, got the two commits again, works with the injector ?!? ; trying again without...
Edit2: could be an artifact of timing failure, actually

@Dolu1990
Copy link
Contributor Author

Ahh that's weird.
Can you do a diff between the two states ?

@Dolu1990
Copy link
Contributor Author

(to be sure exactly what we are talking about)

@rdolbeau
Copy link

rdolbeau commented Mar 25, 2021

Actually... timings are failing when injector=false, as you predicted :-/
I'll check with an even lower frequency

Upd: works at a lower frequency even without the injector (which you definitely want for an instruction-rich core like mine...)

@Dolu1990 Dolu1990 closed this as completed Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants