picolibc: Rocket support broken? #1049

enjoy-digital · 2021-09-28T07:27:43Z

Reported by @gsomlo: https://libera.irclog.whitequark.org/litex/2021-09-27#30904583

enjoy-digital · 2021-09-28T12:38:03Z

The issue seems related to Atomics support: Changing the architecture from rv64imac to rv64imc here makes it work.

It can be reproduced by: litex_sim --threads 4 --opt-level Ofast --cpu-type rocket. The UART RX IRQ are behaving correctly but it seems that the call to picolibc's getchar() never returns.

enjoy-digital · 2021-09-28T12:56:31Z

a588c3b fixes the issue (at least provides a workaround).

gsomlo · 2021-09-28T13:56:39Z

While a588c3b does fix it in the simulator, on the FPGA I still get no console output at all. Using the nexys4ddr as an example:

litex-boards/litex_boards/targets/digilent_nexys4ddr.py --build --cpu-type rocket --cpu-variant linux --sys-clk-freq 50e6 --with-ethernet --with-sdcard

enjoy-digital · 2021-09-28T17:29:35Z

As discussed together @gsomlo, the issue can be reproduced in simulation when enabling ethernet:

litex_sim --threads 4 --opt-level Ofast --cpu-type rocket --integrated-main-ram-size=0x10000 (work)
litex_sim --threads 4 --opt-level Ofast --cpu-type rocket --integrated-main-ram-size=0x10000 --with-ethernet  (no longer work).

Removing -flto here fixes the --with-ethernet case but breaks the other...

gsomlo · 2021-09-28T19:04:26Z

Removing -flto allows the FPGA deployment (on nexys4ddr, with both ethernet and sdcard) to start, load linux, and work as well as before the picolibc switch-over.

It also results in no output whatsoever on the FPGA system's console when built without ethernet and sdcard (just the cpu) -- same tradeoff as shown by @enjoy-digital in simulation.

enjoy-digital · 2021-09-28T19:27:24Z

Thanks @gsomlo, I'll try to understand the issue tomorrow.

kgugala · 2021-09-28T19:49:09Z

@gsomlo @enjoy-digital I'll check this in renode - this should give us some insight what is happening here

enjoy-digital · 2021-09-30T17:08:17Z

@gsomlo: @kgugala will try to have someone looking at this. Sorry for the temporary breakage...

gsomlo · 2021-09-30T17:23:11Z

On Thu, Sep 30, 2021 at 10:08:28AM -0700, enjoy-digital wrote: @gsomlo: @kgugala will try to have someone looking at this. Sorry for the temporary breakage...

@enjoy-digital, @kgugala: thanks! I have no experience with LTO, and based on my brief googling it's a relatively new thing where people still run into the occasional toolchain bug with relatively higher frequency... That said, it's super weird that turning it off will fix the case when ethernet/sdcard peripherals are added, but break the case where no peripherals are configured (on both fpga and simulator). Can't wait to find out what the actual problem was! BTW ***@***.***): a588c3b (ability to type at the litex prompt) is apparently completely orthogonal to LTO, and the problem manifests (and is solved by that commit) identically on both fpga and simulator (again, specific to rocket only, apparently) -- just for the record :) Thanks again, --Gabriel

kgugala · 2021-09-30T17:32:09Z

@gsomlo which toolchain do you use?

gsomlo · 2021-09-30T17:39:19Z

On Thu, Sep 30, 2021 at 10:32:19AM -0700, Karol Gugala wrote: @gsomlo which toolchain do you use?

https://github.com/riscv/riscv-gnu-toolchain I'm on git commit 7553f35 (from mid-December 2020). Not sure what @enjoy-digital is using, but he's experienced the same symptoms on his end.

gsomlo · 2021-09-30T17:42:20Z

On Thu, Sep 30, 2021 at 01:39:14PM -0400, Gabriel L. Somlo wrote: https://github.com/riscv/riscv-gnu-toolchain I'm on git commit 7553f35 (from mid-December 2020).

Oh and I should mention there's a pre-built binary of that I've made available at http://www.contrib.andrew.cmu.edu/~somlo/BTCP/RISCV-20201216git7553f35.tar.xz (description of how I built it at https://github.com/litex-hub/linux-on-litex-rocket part 3 of the Prerequisites section).

enjoy-digital · 2021-10-01T12:15:49Z

@kgugala: I've been testing with the one from litex_setup.py: riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64.

gsomlo · 2021-10-01T14:12:00Z

On Fri, Oct 01, 2021 at 05:16:00AM -0700, enjoy-digital wrote: @kgugala: I've been testing with the one from litex_setup.py: riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64.

I'm currently building the latest riscv-gnu-toolchain (commit b39e361 tagged "2021.09.21"), and will report back on whether using it makes any difference in the observed behavior of litex+rocket.

gsomlo · 2021-10-02T12:15:44Z

@kgugala @enjoy-digital : unfortunately, same behavior with the latest riscv-gnu-toolchain (pre-built binary here in case anyone wants to try it without waiting for the build).

j-piecuch · 2021-10-04T15:41:27Z

After some investigation with @kgugala and other team members, we've boiled it down to two issues:

Misaligned memory accesses due to lack of explicit .align directives in crt0.S - fixed by cpu/rocket: naturally align data defined in crt0.S #1057
The CPU faults with exception code 7 (Store/AMO access fault) upon execution of the following instruction in picolibc's fgetc():

00000000100021d6 <fgetc>:
    100021d6:   00454783                lbu     a5,4(a0)
    100021da:   8b85                    andi    a5,a5,1
    100021dc:   c3b9                    beqz    a5,10002222 <fgetc+0x4c>
    100021de:   1141                    addi    sp,sp,-16
    100021e0:   e022                    sd      s0,0(sp)
    100021e2:   e406                    sd      ra,8(sp)
    100021e4:   842a                    mv      s0,a0
    100021e6:   4781                    li      a5,0
    100021e8:   08f527af                amoswap.w       a5,a5,(a0) <= this instruction causes a Store/AMO access fault
    ...

I simulated the system using litex_sim, since I don't have access to hardware.

In my case, the faulting address is 0x11000000, which is right at the beginning of SRAM, so this is definitely not an alignment issue.
There's clearly a problem with atomic memory operations here. Perhaps atomic memory operations are not supported on SRAM (just a thought).

Compiling picolibc with atomic-ungetc=false prevents the compiler from emitting an atomic instruction here, so we don't get any faults. Still, this is just a workaround.

enjoy-digital · 2021-10-04T16:04:57Z

Thanks for looking at this @j-piecuch, @kgugala. Setting atomic-ungetc=false was indeed just a workaround.

gsomlo · 2021-10-04T20:14:18Z

In my case, the faulting address is 0x11000000, which is right at the beginning of SRAM, so this is definitely not an alignment issue. There's clearly a problem with atomic memory operations here. Perhaps atomic memory operations are not supported on SRAM (just a thought).

Rocket routes its load/store instructions based on its own internal memory map, through one of two AXI ports dedicated to either MMIO (< 0x80000000) or MEM (>= 0x80000000).

There's a point-to-point link between the Rocket MEM AXI port and LiteDRAM. Rocket also has its internal L1 [i,d]cache, and MEM load/store accesses visible on the external MEM AXI port are "south" of the L1 cache (and maintain cache coherency between Rocket's L1 and LiteDRAM).

Accesses below 0x80000000 are routed through the MMIO AXI port (bypassing the L1 cache), which is then converted to WB by LiteX and connected to the shared MMIO bus, which also has the SRAM connected to it.

So long story short, SRAM accesses are routed in a way that bypasses the L1 cache, which may or may not explain the behavior w.r.t. atomic memory operations. I'll try to understand this better myself, but figured it'd be useful to clarify just in case...

gsomlo · 2021-10-05T11:04:28Z

After a bit more digging, I found out that

Rocket throws a trap when atomics are used on an "unsupported region": Rocket should trap on unsupported AMOs chipsalliance/rocket-chip#473

The "mmio" range is such an unsupported region:

Generated Address Map
        0 -      1000 ARWX  debug-controller@0
     3000 -      4000 ARWX  error-device@3000
    10000 -     20000  R X  rom@10000
  2000000 -   2010000 ARW   clint@2000000
  c000000 -  10000000 ARW   interrupt-controller@c000000
 10000000 -  80000000  RWX  mmio-port-axi4@10000000
 80000000 - 100000000  RWXC memory@80000000

Atomics are supported only for regions that have attribute A or C, and mmio-port-axi has neither.

The question now is what, if anything, should be done about it. I'd argue that since the LiteX bios explicitly spins all but one of the cores while loading something into the main memory (which does support atomics), it doesn't really need to use (and/or have support for) atomics. But I'm arguing that with weak confidence, and I'm definitely open to counter arguments... :)

j-piecuch · 2021-10-05T13:15:53Z

@gsomlo good find! That address map clearly indicates the culprit. I'm assuming that C indicates that the region is cacheable, correct?

Your argument for disabling atomics is totally reasonable IMO. I think disabling them by removing a from -march is the way to go, although this might have the side effect of also disabling them for applications loaded into RAM.

On a related note, I was investigating atomic operations in Rocket, and I inserted some atomic operations on main RAM into the bios' main() function, just before the boot sequence. Execution stops at that instruction, probably in some exception loop, which surprised me, since atomic operations should work on main RAM, and I also don't think main RAM needs any initialization.

Do you have any insight into why this might be happening? I know this sounds weird, but maybe the address of the atomic instruction itself matters here? Maybe the atomic instruction itself must be in main RAM?

gsomlo · 2021-10-05T14:53:58Z

I'm assuming that C indicates that the region is cacheable, correct?

Yes, that's correct.

On a related note, I was investigating atomic operations in Rocket, and I inserted some atomic operations on main RAM into the bios' main() function [...]
Do you have any insight into why this might be happening? I know this sounds weird, but maybe the address of the atomic instruction itself matters here? Maybe the atomic instruction itself must be in main RAM?

I added the following patch to the bios main():

--- a/litex/soc/software/bios/main.c
+++ b/litex/soc/software/bios/main.c
@@ -83,6 +83,9 @@ int main(int i, char **c)
        struct command_struct *cmd;
        int nb_params;
        int sdr_ok;
+       unsigned long *foo = (unsigned long *)0x80000000;
+       __sync_fetch_and_add(foo, 2);
+       printf("\n###===--- Got Here\n");
 
 #ifdef CONFIG_CPU_HAS_INTERRUPT
        irq_setmask(0);

which translates into something like this in assembly:

0000000010001458 <main>:
    10001458:   714d                    addi    sp,sp,-336
    1000145a:   4785                    li      a5,1
    ...
    10001476:   07fe                    slli    a5,a5,0x1f
    10001478:   4709                    li      a4,2
    1000147a:   0f50000f                fence   iorw,ow
    1000147e:   04e7b02f                amoadd.d.aq     zero,a4,(a5)
    ...

While this does hang in the simulator, it works fine on a real FPGA (nexys4ddr in my case). Might it have something to do with the accuracy of how main-ram is simulated in litex_sim? @enjoy-digital might be able to speak to that with more authority.

EDIT: Oh, and @j-piecuch : sorry I missed your question about where the Rocket memory map came from. It's something that gets printed out to the console during chisel elaboration while building the precompiled rocket Verilog (part of what happens when you run https://github.com/litex-hub/pythondata-cpu-rocket/blob/master/pythondata_cpu_rocket/verilog/update.sh). The same information is stored (in much less human-readable json) in https://github.com/litex-hub/pythondata-cpu-rocket/tree/master/pythondata_cpu_rocket/verilog/generated-src/ *.memmap.json

EDIT 1: I also tried __sync_fetch_and_add() to address 0x11000000, and that did hang on the FPGA as well as in sim.

troibe · 2022-03-17T18:27:45Z

@gsomlo @enjoy-digital
What is the current state regarding the issue?
If it still exists is there some way I can contribute to solving it?

Currently on both Rocket and Blackparrot I'm unable to start BBL or the demo using my Arty board.
I guess this is what the issues #1168 and #1212 refer to as well.

Oh and not using -flto as mentioned in the beginning of this thread fixes it on Rocket (but not BlackParrot).

troibe · 2022-03-20T14:44:55Z

Nevermind turns out my toolchain was broken.
Upgrading to a more recent version allows me to leave -flto enabled.

enjoy-digital · 2022-03-29T07:42:30Z

I think all the issues have been solved and that we can now close this. @gsomlo: Please re-open if it's not the case.

enjoy-digital added the bug? label Sep 28, 2021

enjoy-digital added software-bug and removed bug? labels Sep 28, 2021

j-piecuch mentioned this issue Oct 4, 2021

cpu/rocket: naturally align data defined in crt0.S #1057

Merged

troibe mentioned this issue Mar 18, 2022

Version of LiteX tongchen126/Boot-Debian-On-Litex-Rocket#1

Open

enjoy-digital closed this as completed Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

picolibc: Rocket support broken? #1049

picolibc: Rocket support broken? #1049

enjoy-digital commented Sep 28, 2021

enjoy-digital commented Sep 28, 2021

enjoy-digital commented Sep 28, 2021

gsomlo commented Sep 28, 2021

enjoy-digital commented Sep 28, 2021 •

edited

Loading

gsomlo commented Sep 28, 2021

enjoy-digital commented Sep 28, 2021

kgugala commented Sep 28, 2021

enjoy-digital commented Sep 30, 2021

gsomlo commented Sep 30, 2021 via email

kgugala commented Sep 30, 2021

gsomlo commented Sep 30, 2021 via email

gsomlo commented Sep 30, 2021 via email

enjoy-digital commented Oct 1, 2021

gsomlo commented Oct 1, 2021 via email

gsomlo commented Oct 2, 2021

j-piecuch commented Oct 4, 2021 •

edited

Loading

enjoy-digital commented Oct 4, 2021

gsomlo commented Oct 4, 2021

gsomlo commented Oct 5, 2021

j-piecuch commented Oct 5, 2021 •

edited

Loading

gsomlo commented Oct 5, 2021 •

edited

Loading

troibe commented Mar 17, 2022 •

edited

Loading

troibe commented Mar 20, 2022

enjoy-digital commented Mar 29, 2022

picolibc: Rocket support broken? #1049

picolibc: Rocket support broken? #1049

Comments

enjoy-digital commented Sep 28, 2021

enjoy-digital commented Sep 28, 2021

enjoy-digital commented Sep 28, 2021

gsomlo commented Sep 28, 2021

enjoy-digital commented Sep 28, 2021 • edited Loading

gsomlo commented Sep 28, 2021

enjoy-digital commented Sep 28, 2021

kgugala commented Sep 28, 2021

enjoy-digital commented Sep 30, 2021

gsomlo commented Sep 30, 2021 via email

kgugala commented Sep 30, 2021

gsomlo commented Sep 30, 2021 via email

gsomlo commented Sep 30, 2021 via email

enjoy-digital commented Oct 1, 2021

gsomlo commented Oct 1, 2021 via email

gsomlo commented Oct 2, 2021

j-piecuch commented Oct 4, 2021 • edited Loading

enjoy-digital commented Oct 4, 2021

gsomlo commented Oct 4, 2021

gsomlo commented Oct 5, 2021

j-piecuch commented Oct 5, 2021 • edited Loading

gsomlo commented Oct 5, 2021 • edited Loading

troibe commented Mar 17, 2022 • edited Loading

troibe commented Mar 20, 2022

enjoy-digital commented Mar 29, 2022

enjoy-digital commented Sep 28, 2021 •

edited

Loading

j-piecuch commented Oct 4, 2021 •

edited

Loading

j-piecuch commented Oct 5, 2021 •

edited

Loading

gsomlo commented Oct 5, 2021 •

edited

Loading

troibe commented Mar 17, 2022 •

edited

Loading