-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FPU support #215
Comments
This is super cool! |
Got it to work !
It was also able to decode a MP3 using mpg123 with its default float backend, and write it into a wave file. So all should be good :) |
So, one thing, it was tested on ArtyA7 35T using the following configuration : The FPGA is quite full (89.07% LUT used), but the timings are ok. |
@enjoy-digital Ready for merge (#217) Also, i improved the SoC pipelining, and now one FPU will be created for each 4 cores, so it should scale. Synthesis with 8 cores pass timings on nexys_video |
Great work @Dolu1990! I'll review the integration code soon. |
Excellent :-) Seems I've got the FPU running as well, rebuilding the buildroot with a hard-float ABI now. @Dolu1990 : Is there a way to optionally configure more FPU than one for 4 cores ? (e.g. one-fortwo or one-for-one) |
Currently, the code could do it, but the parameter isn't accessible. Basicaly it is done here : (group cpus by 4) I will add the option to the litex integration |
@rdolbeau There is the feature to change the ratio between CPU count and FPU : |
Did some mandelbrot tests. With F64, got 21 clock per iteration using https://github.com/SpinalHDL/buildroot-spinal-saxon/blob/main/package/mandelbrot/src/mandelbrot.c#L248 realy hard to ensure the ordering in which GCC schedule operations XD |
@Dolu1990 Thanks :-) Now I need to figure out why 'init' crashes with a signal 11 when buildroot is compiled as ilp32d (hard-float)... |
on which board with which config are you trying it ?
You had to specify ilp32d manualy ? |
There is the images i'm using on artyA7 for a dual corp system : Can you give a try ? (maybe the DTB isn't compatible with your board, so keep your own ^^) |
@Dolu1990 Thanks will try that ASAP (as I might have a HW problem, I'm trying the merge of my three-operands branch with dev to get RV32IMAFDCBK...) |
Hoo i would say try without the merge first ^^ |
@Dolu1990 I'm an optimist :-) It seems your archive is soft-float (ilp32); the ELF signature have flags 0x0. Mine are 0x5 (I have both EF_RISCV_FLOAT_ABI_DOUBLE from using ilp32d and EF_RISCV_RVC as some test binaries from K used C and I enabled it by default). With my own soft-float buildroot, I was able to run a trivial test code in soft-float at -O0 (the lone I'll have to retry on a pure 'dev' build, with none of my own stuff in it (the merge wasn't very smooth...) to make sure. |
@Dolu1990 I seem to have the same issue when the core is generated from 'dev' (the only change is compressedGen = True):
|
Hmm that's right, i'm confused, i'm trying to rebuild things now
XD that's it. I mean, at least it will not work with that. The reason is that the FPU instruction of RVC isn't implemented yet. Will work on that. |
@Dolu1990 OK good to know, I'll retry without RVC when I can - I will have to remove some stuff from my buildroot as RVC really helps with the size (and it's already over 60 MiB by now, I needed C++ and python for some test codes...) |
Hoo i found something. CONFIG_FPU=n was in the linux.defconfig You need to set it to CONFIG_FPU=y too |
images.zip |
So just tested on ArtyA7, works for me, let's me know how things goes for you |
@Dolu1990 went a slightly different course - it's only 8 expansion patterns to add to the decompressor, I figured it would take less time to add them than recompile the full buildroot :-) I had (at least...) a small bug but init was fixed so it's definitely the issue. Second bitstream being generated, patch to VexRiscv to follow if it works-for-me. |
Hoooo i was adding the feature too XD I just finished now, but was looking into how to verify it. Unfortunatly riscv-test do not has any case for it XD So your PR is welcome, i will cross check with my implementation, that should give a good level of certainty. |
Oups didn't think you'd get around to it so fast... anyway I can do a PR, but while I go much further than before (init no longer crashes) I have a(n apparent) hang after/around the time haveged starts :
Then nothing happens anymore and the console is not responsive :-( |
Most of the time i had a hang in that spot, was due to process to actively waiting on /dev/random, which isn't loaded because of no entropy generation. The entropy generation in that system should be done by haveged, but here a it can't open the unix socket, things are stuck. Do you have the CONFIG_UNIX=y in the linux.config ? More broadly on saxonsoc ot have haveged working, i have : CONFIG_NET=y |
@Dolu1990 Generally speaking that could be the problem, but this configuration works in soft-float and I think that when waiting on /dev/urandom the console would still work as it doesn't rely on random data (but it was a while since I added haveged so I'm not 100% sure). I have all those but CONFIG_UNIX, I'll add it and try again to make sure. My current code in SpinalHDL/VexRiscv#167 Upd: unfortunately even with CONFIG_UNIX:
then, hang :-( |
Before the look in root, there is no console, i mean, when i had that /dev/random issue (not urandom), everything freezed. Do you have the openssh package ? |
no I didn't enable OpenSSH as this board has no ethernet (only micro-sd and serial; the new one crawling its way through the postal system should have ethernet ;-) ) Upd: I do have SSL and the AES stuff (patch to use K instead of the custom opcodes), but in this configuration the instructions are missing - but that should cause SIGILL rather than hang.
|
So Just for sanity, did you checked the timings reports of the synthesis ? (as that's quite a lot of additionnal features XD) Also your buildroot / linux config were working fine before the FPU addition right ? |
Not sure for the simulation, I know Vivado can do some stuff but I'm not enough of an expert. Anyway don't worry, I'll try to figure out what's wrong when I have more time. I can fall back on the soft-float ABI anyway, that one should have been fixex by the C stuff. |
Update; it might not be the ABI for me. Reloaded the soft-float buildroot, simple test binary works, but the 'real' binary (a numerical mini-app) hangs for me. Good news is, that's an easier test to share. |
which behaviour wasn't changed ? crashing behaviour ? or correct behaviour ?
Cool thanks, didn't know :) |
Didn't change either behavior :-) init crashes in hard-float w/ or w/o RVC in the kernel, works fine in soft-float w/ or w/o RVC in the kernel; as far as I can tell it only changes whether the kernel uses C instructions (with, it is much smaller). |
New update; partial PIBKAC - I had forgotten (again) to update that loathsome DTB
I had forgotten to patch the updated DTS to accomodate a large buildroot, so the buildroot was truncated - and apparently that was enough to enable booting!?! Anyway with enough space allocated in the FP-enabled DTS, same as before: hang after starting haveged in hard-float, boot but hand in hydro in soft-float. |
Is hydro using some of the custom instruction added in the CPU ? I mean, we are currently debugging a system with many tunned parameters at once. |
Normally no, it was compiled with the buildroot compiler:
Does it work for you? (I don't think it's ever been tried on RV32GC, though it is known to work in RV64GC under QEmu). If it works, then the problem might be specific to my hardware build. I was generating the core from a 'vanilla' dev branch, none of my stuff there to remove a variable. I also tried the same code with -mno-fdiv (this avoids all fdiv and fsqrt instructions, as experience has taught me they can be troublesome), but as far as I can tell they are not the culprit. I'll try again with a completely clean slate this week-end to make sure ; disabling my stuff might not have cleaned up everything properly. Hopefully, I'll also have access to a bigger FPGA - and if ethernet works in that one, I could have NFS to access binaries and speed up the testing considerably (although making NFS works from buildroot might be a project in itself...) |
OK, so starting with a clean slate (current GitHub repos with just enough patches to generate for my board): a) adding FD instructions in the core, buildroot, kernel, hydro, retaining the soft-float ABI (ilp32): hydro works So while SpinalHDL/VexRiscv#167 helped by enabling boot (init no longer crashes), there's still a C+FD issue as C itself was not causing issue for my crypto tests (including running large binaries such as python3). Other than that one issue, fantastic job to have a numerical code like hydro working fine on the SoC :-) |
Cool ^^
So, here, try to pull the vexriscv dev, i did one fix, which fix one bad interraction between wfi and the fpu which could in some very specific condition (i don't think that the issue here) hang the system. By the way what is that hydro binary doing ? |
Moving the core to dev (to get the fix) did not help unfortunately. For Hydro:
It's a numerical code really designed to run on multiple big HPC-oriented servers with high-speed interconnect, not inside a small SoC, so it's impressive it actually works :-) (though I've not enabled any parallelism yet). |
Ok ^^ Hmm, how many CPU core do you have in the SoC, and how many core per FPU ? Because that may be a important delta with my actual configuration. |
1 core so far. Will try with 2 cores/1 FPU (I don't think I can fit 2 cores/2 FPUs in my 35T). Update: 2 cores/1 FPU doesn't help, hydro still crashes the SoC for me on rv32imafdc/ilp32d |
ok, I'm trying now to build the whole system on litex |
Localy, i added the possibility to have RVC on linux-on-litex-vexriscv, got it to work :
But i have to try again with one specific config disabled |
I successfully recreated a hang on the hydro binary. So i'm trying to setup something simlar in SaxonSoc but with a slower main clock. |
@rdolbeau So right, as a workaround for now, just turn injectorStage to true, it should solve the issue. I'm working now on recreating the hang in simulation. |
Thanks for the catch ^^ |
@Dolu1990 Hehe I was synthesizing so soon as I saw the prospective fix :-) So I can confirm yes, injectorStage=true solves the problem; hydro is working with RV32IMAFDC/ILP32D now! It also seem to improve WNS, but performance is a bit lower (the MFlops in the output is a rough estimate but is useful to compare configurations with the same binary). as a mini-app, hydro is really made to test and evaluate systems, so it did its job once more :-) Next step, back to my own merged branch and testing OpenMP parallelism. |
Cool ^^ So, about the fpu, got 42 MFLOPS single threaded with mandelbrot test 100 Mhz. This could probably be pushed higher if that was properly scheduled in assembly. |
@Dolu1990 I tried my merged version with injectorStage=true, hydro works in that configuration as well, alongside @mjosaarinen test codes from https://github.com/rvkrypto/rvkrypto-fips . So I now have a RV32IMAFDCBK core booting Linux and running codes :-) Perhaps the first one as FPUs aren't common in RV32 FPGA cores and B/K are still under review... Won't be able to fit 2 cores in the 35T, I'm already at > 83% of LUTs:
|
@rdolbeau SpinalHDL/VexRiscv@6f481f5 fix the issue properly. Got RVC without injectorStage to work on Saxon (lower frequancy to meet timings) So i would say, keep the injectorStage to ensure timing pass :) |
Greate :D Hmm one question that i have is how to verify that the output of the simulation is right ? |
If there's some major problem, the two columns after "step = XYZ" will be different (and therefore, wrong), so step 10 should be "step= 10, 3.41210e-02, 4.55100e-03" always. Everything else is timing-related and irrelevant to the accuracy (in particular, the last two lines may look like results but are actually raw timings and percentage of total timings for various parts of the computation). Anything computed on the domain is just thrown away by default, as it's really a test code and not a production code. Raising the parameter 'nstepmax' in the NML file will let the simulation run longer, to see if it diverges from a reference run on a different machine. Not sure how long it will remain accurate, as the domain size is very small to fit the SoC compared to what would test normally (nx = ny = 80 vs. on the order of thousands for short single-system runs to millions or more for multi-systems runs). And for kicks, if C is too easy, CloverLeaf from https://github.com/UK-MAC/ is quite similar but in Fortran :-) (actually the more relevant version of HydroC nowadays is the C++ version, but I find the C version easier to handle for a first test). Edit: did not see the fix had already landed :-) Will try that ASAP. |
Cool thanks :D
Be carefull, having RVC and not enabling the injectorStage can easily lead into critical path failure, especialy with the FPU integration. Keep an eye on the ciritcal slack ^^ |
@Dolu1990 Edit: went back to previous stage, still works, got the two commits again, works with the injector ?!? ; trying again without... |
Ahh that's weird. |
(to be sure exactly what we are talking about) |
Actually... timings are failing when injector=false, as you predicted :-/ Upd: works at a lower frequency even without the injector (which you definitely want for an instruction-rich core like mine...) |
Hi,
Got the FPU added in VexRiscv and linux-on-litex-vexriscv (not merged yet) via
There is some documentation about the FPU here :
https://github.com/SpinalHDL/VexRiscv/tree/fiber#fpu
In short,
TODO :
Getting the buildroot image to build (see #214)
The only requirements to get the FPU to work in buildroot is to add the following in the buildroot defconfig :
BR2_RISCV_ISA_CUSTOM_RVF=y
BR2_RISCV_ISA_CUSTOM_RVD=y
And set the CONFIG_FPU of the linux.config to y
The DTS generation is already fixed, opensbi do not seems to require a rebuild.
The text was updated successfully, but these errors were encountered: