Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

picolibc: Rocket support broken? #1049

Closed
enjoy-digital opened this issue Sep 28, 2021 · 24 comments
Closed

picolibc: Rocket support broken? #1049

enjoy-digital opened this issue Sep 28, 2021 · 24 comments

Comments

@enjoy-digital
Copy link
Owner

Reported by @gsomlo: https://libera.irclog.whitequark.org/litex/2021-09-27#30904583

cc @kgugala, @michalsieron.

@enjoy-digital
Copy link
Owner Author

The issue seems related to Atomics support: Changing the architecture from rv64imac to rv64imc here makes it work.

It can be reproduced by: litex_sim --threads 4 --opt-level Ofast --cpu-type rocket. The UART RX IRQ are behaving correctly but it seems that the call to picolibc's getchar() never returns.

@enjoy-digital
Copy link
Owner Author

a588c3b fixes the issue (at least provides a workaround).

@gsomlo
Copy link
Collaborator

gsomlo commented Sep 28, 2021

While a588c3b does fix it in the simulator, on the FPGA I still get no console output at all. Using the nexys4ddr as an example:

litex-boards/litex_boards/targets/digilent_nexys4ddr.py --build --cpu-type rocket --cpu-variant linux --sys-clk-freq 50e6 --with-ethernet --with-sdcard

@enjoy-digital
Copy link
Owner Author

enjoy-digital commented Sep 28, 2021

As discussed together @gsomlo, the issue can be reproduced in simulation when enabling ethernet:

litex_sim --threads 4 --opt-level Ofast --cpu-type rocket --integrated-main-ram-size=0x10000 (work)
litex_sim --threads 4 --opt-level Ofast --cpu-type rocket --integrated-main-ram-size=0x10000 --with-ethernet  (no longer work).

Removing -flto here fixes the --with-ethernet case but breaks the other...

@gsomlo
Copy link
Collaborator

gsomlo commented Sep 28, 2021

Removing -flto allows the FPGA deployment (on nexys4ddr, with both ethernet and sdcard) to start, load linux, and work as well as before the picolibc switch-over.

It also results in no output whatsoever on the FPGA system's console when built without ethernet and sdcard (just the cpu) -- same tradeoff as shown by @enjoy-digital in simulation.

@enjoy-digital
Copy link
Owner Author

Thanks @gsomlo, I'll try to understand the issue tomorrow.

@kgugala
Copy link
Collaborator

kgugala commented Sep 28, 2021

@gsomlo @enjoy-digital I'll check this in renode - this should give us some insight what is happening here

@enjoy-digital
Copy link
Owner Author

@gsomlo: @kgugala will try to have someone looking at this. Sorry for the temporary breakage...

@gsomlo
Copy link
Collaborator

gsomlo commented Sep 30, 2021 via email

@kgugala
Copy link
Collaborator

kgugala commented Sep 30, 2021

@gsomlo which toolchain do you use?

@gsomlo
Copy link
Collaborator

gsomlo commented Sep 30, 2021 via email

@gsomlo
Copy link
Collaborator

gsomlo commented Sep 30, 2021 via email

@enjoy-digital
Copy link
Owner Author

@kgugala: I've been testing with the one from litex_setup.py: riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64.

@gsomlo
Copy link
Collaborator

gsomlo commented Oct 1, 2021 via email

@gsomlo
Copy link
Collaborator

gsomlo commented Oct 2, 2021

@kgugala @enjoy-digital : unfortunately, same behavior with the latest riscv-gnu-toolchain (pre-built binary here in case anyone wants to try it without waiting for the build).

@j-piecuch
Copy link
Contributor

j-piecuch commented Oct 4, 2021

After some investigation with @kgugala and other team members, we've boiled it down to two issues:

00000000100021d6 <fgetc>:
    100021d6:   00454783                lbu     a5,4(a0)
    100021da:   8b85                    andi    a5,a5,1
    100021dc:   c3b9                    beqz    a5,10002222 <fgetc+0x4c>
    100021de:   1141                    addi    sp,sp,-16
    100021e0:   e022                    sd      s0,0(sp)
    100021e2:   e406                    sd      ra,8(sp)
    100021e4:   842a                    mv      s0,a0
    100021e6:   4781                    li      a5,0
    100021e8:   08f527af                amoswap.w       a5,a5,(a0) <= this instruction causes a Store/AMO access fault
    ...

I simulated the system using litex_sim, since I don't have access to hardware.

In my case, the faulting address is 0x11000000, which is right at the beginning of SRAM, so this is definitely not an alignment issue.
There's clearly a problem with atomic memory operations here. Perhaps atomic memory operations are not supported on SRAM (just a thought).

Compiling picolibc with atomic-ungetc=false prevents the compiler from emitting an atomic instruction here, so we don't get any faults. Still, this is just a workaround.

@enjoy-digital
Copy link
Owner Author

Thanks for looking at this @j-piecuch, @kgugala. Setting atomic-ungetc=false was indeed just a workaround.

@gsomlo
Copy link
Collaborator

gsomlo commented Oct 4, 2021

In my case, the faulting address is 0x11000000, which is right at the beginning of SRAM, so this is definitely not an alignment issue. There's clearly a problem with atomic memory operations here. Perhaps atomic memory operations are not supported on SRAM (just a thought).

Rocket routes its load/store instructions based on its own internal memory map, through one of two AXI ports dedicated to either MMIO (< 0x80000000) or MEM (>= 0x80000000).

There's a point-to-point link between the Rocket MEM AXI port and LiteDRAM. Rocket also has its internal L1 [i,d]cache, and MEM load/store accesses visible on the external MEM AXI port are "south" of the L1 cache (and maintain cache coherency between Rocket's L1 and LiteDRAM).

Accesses below 0x80000000 are routed through the MMIO AXI port (bypassing the L1 cache), which is then converted to WB by LiteX and connected to the shared MMIO bus, which also has the SRAM connected to it.

So long story short, SRAM accesses are routed in a way that bypasses the L1 cache, which may or may not explain the behavior w.r.t. atomic memory operations. I'll try to understand this better myself, but figured it'd be useful to clarify just in case...

@gsomlo
Copy link
Collaborator

gsomlo commented Oct 5, 2021

After a bit more digging, I found out that

  1. Rocket throws a trap when atomics are used on an "unsupported region": Rocket should trap on unsupported AMOs chipsalliance/rocket-chip#473
  2. The "mmio" range is such an unsupported region:
    Generated Address Map
            0 -      1000 ARWX  debug-controller@0
         3000 -      4000 ARWX  error-device@3000
        10000 -     20000  R X  rom@10000
      2000000 -   2010000 ARW   clint@2000000
      c000000 -  10000000 ARW   interrupt-controller@c000000
     10000000 -  80000000  RWX  mmio-port-axi4@10000000
     80000000 - 100000000  RWXC memory@80000000
    
    Atomics are supported only for regions that have attribute A or C, and mmio-port-axi has neither.

The question now is what, if anything, should be done about it. I'd argue that since the LiteX bios explicitly spins all but one of the cores while loading something into the main memory (which does support atomics), it doesn't really need to use (and/or have support for) atomics. But I'm arguing that with weak confidence, and I'm definitely open to counter arguments... :)

@j-piecuch
Copy link
Contributor

j-piecuch commented Oct 5, 2021

@gsomlo good find! That address map clearly indicates the culprit. I'm assuming that C indicates that the region is cacheable, correct?

Your argument for disabling atomics is totally reasonable IMO. I think disabling them by removing a from -march is the way to go, although this might have the side effect of also disabling them for applications loaded into RAM.

On a related note, I was investigating atomic operations in Rocket, and I inserted some atomic operations on main RAM into the bios' main() function, just before the boot sequence. Execution stops at that instruction, probably in some exception loop, which surprised me, since atomic operations should work on main RAM, and I also don't think main RAM needs any initialization.

Do you have any insight into why this might be happening? I know this sounds weird, but maybe the address of the atomic instruction itself matters here? Maybe the atomic instruction itself must be in main RAM?

@gsomlo
Copy link
Collaborator

gsomlo commented Oct 5, 2021

I'm assuming that C indicates that the region is cacheable, correct?

Yes, that's correct.

On a related note, I was investigating atomic operations in Rocket, and I inserted some atomic operations on main RAM into the bios' main() function [...]
Do you have any insight into why this might be happening? I know this sounds weird, but maybe the address of the atomic instruction itself matters here? Maybe the atomic instruction itself must be in main RAM?

I added the following patch to the bios main():

--- a/litex/soc/software/bios/main.c
+++ b/litex/soc/software/bios/main.c
@@ -83,6 +83,9 @@ int main(int i, char **c)
        struct command_struct *cmd;
        int nb_params;
        int sdr_ok;
+       unsigned long *foo = (unsigned long *)0x80000000;
+       __sync_fetch_and_add(foo, 2);
+       printf("\n###===--- Got Here\n");
 
 #ifdef CONFIG_CPU_HAS_INTERRUPT
        irq_setmask(0);

which translates into something like this in assembly:

0000000010001458 <main>:
    10001458:   714d                    addi    sp,sp,-336
    1000145a:   4785                    li      a5,1
    ...
    10001476:   07fe                    slli    a5,a5,0x1f
    10001478:   4709                    li      a4,2
    1000147a:   0f50000f                fence   iorw,ow
    1000147e:   04e7b02f                amoadd.d.aq     zero,a4,(a5)
    ...

While this does hang in the simulator, it works fine on a real FPGA (nexys4ddr in my case). Might it have something to do with the accuracy of how main-ram is simulated in litex_sim? @enjoy-digital might be able to speak to that with more authority.

EDIT: Oh, and @j-piecuch : sorry I missed your question about where the Rocket memory map came from. It's something that gets printed out to the console during chisel elaboration while building the precompiled rocket Verilog (part of what happens when you run https://github.com/litex-hub/pythondata-cpu-rocket/blob/master/pythondata_cpu_rocket/verilog/update.sh). The same information is stored (in much less human-readable json) in https://github.com/litex-hub/pythondata-cpu-rocket/tree/master/pythondata_cpu_rocket/verilog/generated-src/ *.memmap.json

EDIT 1: I also tried __sync_fetch_and_add() to address 0x11000000, and that did hang on the FPGA as well as in sim.

@troibe
Copy link
Contributor

troibe commented Mar 17, 2022

@gsomlo @enjoy-digital
What is the current state regarding the issue?
If it still exists is there some way I can contribute to solving it?

Currently on both Rocket and Blackparrot I'm unable to start BBL or the demo using my Arty board.
I guess this is what the issues #1168 and #1212 refer to as well.

Oh and not using -flto as mentioned in the beginning of this thread fixes it on Rocket (but not BlackParrot).

@troibe
Copy link
Contributor

troibe commented Mar 20, 2022

Nevermind turns out my toolchain was broken.
Upgrading to a more recent version allows me to leave -flto enabled.

@enjoy-digital
Copy link
Owner Author

I think all the issues have been solved and that we can now close this. @gsomlo: Please re-open if it's not the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants