Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM can't target the Playstation CPU (MIPS-I) #1

Open
simias opened this issue May 18, 2015 · 19 comments
Open

LLVM can't target the Playstation CPU (MIPS-I) #1

simias opened this issue May 18, 2015 · 19 comments

Comments

@simias
Copy link
Owner

simias commented May 18, 2015

I've been struggling for the past few days with extremely erratic behaviour from my playstation code. Simple code would work correctly but when things get more complicated (in particular function calls and BIOS calls) the program would appear to behave semi-randomly (bus errors, freezes, functions get executed several times in a row when they shouldn't etc...)

After some intense hair pulling I figured out the cause of the problem: the MIPS-I architecture has load delay slots. It means that when the CPU loads a value from memory (through a lw instruction for instance) the next instruction will run before the value is put in the target register. That means that you have to wait one cycle (using a nop or some independant instruction) before you can use the value.

The problem is that LLVM doesn't support MIPS-I at the moment and MIPS-II doesn't have this constraint. So for instance when the compiler generates the function epilogue that pops the value of ra (the return address) from the stack and jumps to it we end up with something like:

lw      ra,20(sp)
jr      ra

Unfortunately this code doesn't work on the Playstation's R2000 CPU: the lw doesn't have the time to finish before the jr is called (the jr is in the load delay slot), so the jump takes whatever value was in ra before the lw instruction. And that explains the very erratic behaviour.

Unfortunately as far as I can tell LLVM doesn't support MIPS-I at all, if I attempt to put "cpu": "mips1" in target.json I get this error:

LLVM ERROR: Code generation for MIPS-I is not implemented

Unfortunately there's no workaround. The solution would be to add MIPS-I support to LLVM but that's of course a pretty ambitious project and my plate is full at the moment.

As long as this issue is not fixed there's no point working on this SDK, nothing but the most trivial code will run as intended on the console.

@lf94
Copy link

lf94 commented Apr 18, 2018

If it's mostly the same, why not copy whatever MIPS-II does and change that behavior? Would be much easier than writing it all from scratch.

@simias
Copy link
Owner Author

simias commented May 8, 2018

I'm not at all familiar with LLVM so I didn't pursue this further. I assume it can't be too complicated but I really have no idea.

@asiekierka
Copy link

There is a patch available at impiaaa/llvm-project@f2ecffe but according to the developer it could use additional testing.

@simias
Copy link
Owner Author

simias commented Jan 17, 2020

That's great news although I must say that I wouldn't even know where to start to generate a rustc using a custom LLVM so I'm probably going to wait until it (hopefully) ends up in a stable release.

A few weeks ago I saw that there was a GCC backend for rustc in the works: https://github.com/sapir/gcc-rust

That might be an other way to get MIPSI support in Rust.

@XaviDCR92
Copy link

I wanted to play around with the Zig programming language (see https://github.com/XaviDCR92/psx-zig), whose compiler is also based on LLVM, so merged the patch into the release/10.x branch of https://github.com/XaviDCR92/llvm-project . It patch seems to work correctly, so I encourage you to resume your work on psx-sdk-rs. Well-known issues with Zig's packed structs are preventing further development, but I think should be now fine with Rust.

@ayrtonm
Copy link

ayrtonm commented Oct 27, 2020

I recently got rustc working with LLVM 11 patched to support load delay slots if there's still interest in this. My fork has instructions in the README along with @impiaaa's LLVM patch (with a tiny fix) and a patch to build target.json and psx.ld into rustc. I checked that both rustc and llc by itself handle load delay slots correctly, though the tests weren't very extensive and I've only run programs on ePSXe.

I think it would've been convenient to use the JSON target, but it seems building the target into rustc is required since libcore won't compile unless the Atomic* API is disabled (by setting max_atomic_width = Some(0) in the target spec). This is because MIPS I doesn't have a sync instruction, unlike MIPS II. Either way rustc has to be compiled using the patched LLVM so it wouldn't have saved much time in building the compiler anyway.

Also I threw in a stripped-down copy of @simias's Makefile for the linker and merged libbios into a submodule of libpsx to make the SDK more user-friendly.

@XaviDCR92
Copy link

@ayrtonm , the delay slot patch seems to work well for me too (I also had to apply the tiny fix to avoid a compile-time error).

However, I am having a curious issue that's making my Zig/C application crash by jumping into an invalid address. The root cause of this issue is $ra being assigned back to a value in $sp, 52, but $sp holds an invalid value and thus undefined behavior occurs.

After some investigation, it looks like bad generated code from the compiler side (I can't tell so far whether it's caused by either the LLVM backend or the Zig frontend). Consider the following lines of C code from PSXSDK:

for(a = 0; a < l; a+=2)
    GPU_DATA_PORT = image[a]|(image[a+1]<<16);

Among other low-level instructions, the following branch instruction is generated:

80012eb8:       14200017        bnez    at,80012f18 <LoadImage+0x1ac>

And this is what address 80012f18 contains (added some lines for context):

80012f00:       03c0e825        move    sp,s8
80012f04:       8fbe0000        lw      s8,0(sp)
80012f08:       8fbf0004        lw      ra,4(sp)
80012f0c:       00000000        nop
80012f10:       03e00008        jr      ra
80012f14:       27bd0008        addiu   sp,sp,8
        while(!(GPU_CONTROL_PORT & (1<<0x1c)));
80012f18:       0000000d        break

80012f1c <GsSetDrawEnv>:
{
80012f1c:       27bdfff8        addiu   sp,sp,-8

Therefore, 80012f18 is, much likely, not the address the program should be jumping to. Instead of leaving $sp to its original value and jumping to $ra, break is executed and GsSetDrawEnv is accidentally executed, causing undefined behaviour.

The compiler patch has added 18 nop instructions in LoadImage. I am not sure whether it's related or not, but 80012f18h - (18d * 4) = 80012ed0h, which casually belongs to the start of the loop.

        for(a = 0; a < l; a+=2)
80012ed0:       24e80002        addiu   t0,a3,2

Would it be possible that the LLVM backend is miscalculating destination addresses in branch instructions such as bnez since the patch is (apparently silently) adding several nop between instruction? I have not investigated other functions yet, but I must admit the application has already had erratic behavior several times now and this might explain it.

@simias
Copy link
Owner Author

simias commented Oct 31, 2020

I haven't touched this code in forever but I'm glad to see that there's progress!

@XaviDCR92 Have you tried to see the difference the various optimization levels make? Regardless of the miscalculated branch addresses it would be odd to end up with a bunch of useless NOPs in optimized code. What I'm getting at here is that a bunch of leftover NOPs with optimizations enabled seems like it would be a good sign that something messes up in the compiler.

@XaviDCR92
Copy link

@simias Zig provides a --release-small flag roughly equivalent to gcc's -Os, but my application would not work at all with these optimizations enabled. I still have to find a reason for this.

OTOH, these apparently useless nop instructions must be there because of the load delay slot problem on MIPS I that you reported on your first message. There is no workaround other than replacing them with useful instructions that do not depend on their previous instruction, but that's not a serious issue.

Surprisingly though, I have found a possible reason why the code above was inadvertently jumping to a break instruction - there is a buffer overflow caused by this loop when called from this function, as pal is only 2 elements long and LoadImage is attempting to access pal[0 .. 15].

gcc would not issue any diagnostic message on this, but zig instead adds runtime bound checks when building in debug mode, thus executing the break instruction once an out-of-bound access occurs. As a quick workaround, I expanded pal from 2 to 16 elements and now GsLoadFont executes successfully.

More worryingly, I have also found other places where zig generates wrong code with break instructions here and there e.g.: GsSetDrawEnv:

80012f1c <GsSetDrawEnv>:
{
80012f1c:       27bdfff8        addiu   sp,sp,-8
80012f20:       afbf0004        sw      ra,4(sp)
80012f24:       afbe0000        sw      s8,0(sp)
80012f28:       03a0f025        move    s8,sp
        draw_mode_packet = (0xe1<<24)|(drawenv->draw_on_display>=1)<<10|
80012f2c:       0000000d        break

80012f30 <gpu_data_ctrl>:
{
80012f30:       27bdfff8        addiu   sp,sp,-8
80012f34:       afbf0004        sw      ra,4(sp)

Compared to the implementation by mipsel-unknown-elf-gcc:

80034bc0 <GsSetDrawEnv>:
{
80034bc0:       27bdffe8        addiu   sp,sp,-24
80034bc4:       afbf0014        sw      ra,20(sp)
        draw_mode_packet = (0xe1<<24)|(drawenv->draw_on_display>=1)<<10|
80034bc8:       90820001        lbu     v0,1(a0)
80034bcc:       00000000        nop
80034bd0:       1440003a        bnez    v0,80034cbc <GsSetDrawEnv+0xfc>
80034bd4:       00802825        move    a1,a0
80034bd8:       3c02e100        lui     v0,0xe100
                (drawenv->dither>=1)<<9;
80034bdc:       90a30000        lbu     v1,0(a1)
        gpu_data_ctrl(0xe3, (drawenv->y<<10)|drawenv->x);
80034be0:       84a40004        lh      a0,4(a1)
                (drawenv->dither>=1)<<9;
80034be4:       0003182b        sltu    v1,zero,v1
80034be8:       00031a40        sll     v1,v1,0x9
        draw_mode_packet = (0xe1<<24)|(drawenv->draw_on_display>=1)<<10|
80034bec:       00431025        or      v0,v0,v1
80034bf0:       3c038008        lui     v1,0x8008
80034bf4:       ac625d78        sw      v0,23928(v1)
        GPU_CONTROL_PORT = 0x01000000;
80034bf8:       3c060100        lui     a2,0x100
80034bfc:       3c031f80        lui     v1,0x1f80
80034c00:       ac661814        sw      a2,6164(v1)
        gpu_data_ctrl(0xe3, (drawenv->y<<10)|drawenv->x);
80034c04:       84a80002        lh      t0,2(a1)
...

In the meantime, I'm improving my fork of PCSX-r so pcsxr notifies whenever it executes a break instruction when debugging with mipsel-unknown-elf-gdb.

@ayrtonm
Copy link

ayrtonm commented Nov 23, 2020

After playing around with rust on the playstation for a while, I haven't been able to reproduce @XaviDCR92's issue with out of place break and my demos are behaving as expected on emulators.

I haven't used zig extensively, but I think the issue may be runtime checks or UB on shift overflow. For example, drawenv->draw_on_display is a u8 and I'm not sure if it gets implicitly cast to bool after the >= or if it just becomes a 0 or 1 as a u8. But either way, it then gets shifted more than 8 bits which might be a problem depending on how strict zig's type system is.

I also tried compiling some C with zig (0.6.0) myself and found that unsigned int z = 0xe1 << 24 caused me to hit an illegal instruction while unsigned int z = (unsigned int)0xe1 << 24 didn't.

int f(unsigned char x, unsigned char y) {
  2014e0:       55                      push   %rbp
  2014e1:       48 89 e5                mov    %rsp,%rbp
    unsigned int z = (0xe1 << 24);
  2014e4:       0f 0b                   ud2
int f(unsigned char x, unsigned char y) {
  201560:       55                      push   %rbp
  201561:       48 89 e5                mov    %rsp,%rbp
    unsigned int z = ((unsigned int)0xe1 << 24);
    return z;
  201564:       b8 00 00 00 e1          mov    $0xe1000000,%eax
  201569:       5d                      pop    %rbp
  20156a:       c3                      retq
  20156b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

It seems to me that zig uses the smallest type possible to represent integer literals, but I couldn't find any specific documentation on that. Granted that was all on x86 not MIPS, but I think the issue might be in that general area.

@sajattack
Copy link

sajattack commented Dec 3, 2021

Not sure how relevant it is to you (does PSX have an FPU?), but floating point comparisons also have a delay slot rust-lang/rust#91442 (comment)

@ayrtonm
Copy link

ayrtonm commented Dec 3, 2021

nah the PSX doesn't have a floating point unit. It supports fixed point but that's all custom coprocessor opcodes from inline assembly so llvm's not an issue there.

sdardis added a commit to llvm/llvm-project that referenced this issue Apr 7, 2022
LLVM so far has only supported the MIPS-II and above architectures. MIPS-II is pretty close to MIPS-I, the major difference
being that "load" instructions always take one extra instruction slot to propogate to registers. This patch adds support for
MIPS-I by adding hazard handling for load delay slots, alongside MIPSR6 forbidden slots and FPU slots, inserting a NOP
instruction between a load and any instruction immediately following that reads the load's destination register. I also
included a simple regression test. Since no existing tests target MIPS-I, those all still pass.

Issue ref: simias/psx-sdk-rs#1

I also tested by building a simple demo app with Clang and running it in an emulator.

Patch by: @impiaaa

Differential Revision: https://reviews.llvm.org/D122427
@netthier
Copy link

Seems initial support for load delay slots is now part of the latest LLVM release if I understand that commit correctly.
Is this issue still relevant?

@ayrtonm
Copy link

ayrtonm commented Apr 13, 2022

It's not in upstream rust yet, but I was going to cherry-pick that commit onto the rustc branch and open a PR to add the psx target.

@impiaaa
Copy link

impiaaa commented Apr 13, 2022

Also, filling load delay slots in code generation is only one part of full MIPS-I support. Remaining tasks include:

  1. load delay slots in .set reorder
  2. sync instruction
  3. 64-bit coprocessor transfers
  4. Trap-on-condition instructions

@ayrtonm
Copy link

ayrtonm commented Apr 13, 2022

Yeah those would be definitely be good to upstream for MIPS-I, but the psx target's still fairly usable without them. I think the only really critical issue is the sync instruction since DMA is borderline unusable without it.

nikic pushed a commit to rust-lang/llvm-project that referenced this issue Apr 22, 2022
LLVM so far has only supported the MIPS-II and above architectures. MIPS-II is pretty close to MIPS-I, the major difference
being that "load" instructions always take one extra instruction slot to propogate to registers. This patch adds support for
MIPS-I by adding hazard handling for load delay slots, alongside MIPSR6 forbidden slots and FPU slots, inserting a NOP
instruction between a load and any instruction immediately following that reads the load's destination register. I also
included a simple regression test. Since no existing tests target MIPS-I, those all still pass.

Issue ref: simias/psx-sdk-rs#1

I also tested by building a simple demo app with Clang and running it in an emulator.

Patch by: @impiaaa

Differential Revision: https://reviews.llvm.org/D122427
@ayrtonm
Copy link

ayrtonm commented Jun 6, 2022

This target's now usable with just a target JSON on nightly, so building the rust compiler isn't necessary anymore. I updated the crate's instructions in case anyone wants to try it out.

nikic pushed a commit to rust-lang/llvm-project that referenced this issue Jun 20, 2022
LLVM so far has only supported the MIPS-II and above architectures. MIPS-II is pretty close to MIPS-I, the major difference
being that "load" instructions always take one extra instruction slot to propogate to registers. This patch adds support for
MIPS-I by adding hazard handling for load delay slots, alongside MIPSR6 forbidden slots and FPU slots, inserting a NOP
instruction between a load and any instruction immediately following that reads the load's destination register. I also
included a simple regression test. Since no existing tests target MIPS-I, those all still pass.

Issue ref: simias/psx-sdk-rs#1

I also tested by building a simple demo app with Clang and running it in an emulator.

Patch by: @impiaaa

Differential Revision: https://reviews.llvm.org/D122427
@ayrtonm
Copy link

ayrtonm commented Oct 5, 2022

I opened a PR adding a built-in target to rustc so hopefully the target JSON will soon be unnecessary.

mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
LLVM so far has only supported the MIPS-II and above architectures. MIPS-II is pretty close to MIPS-I, the major difference
being that "load" instructions always take one extra instruction slot to propogate to registers. This patch adds support for
MIPS-I by adding hazard handling for load delay slots, alongside MIPSR6 forbidden slots and FPU slots, inserting a NOP
instruction between a load and any instruction immediately following that reads the load's destination register. I also
included a simple regression test. Since no existing tests target MIPS-I, those all still pass.

Issue ref: simias/psx-sdk-rs#1

I also tested by building a simple demo app with Clang and running it in an emulator.

Patch by: @impiaaa

Differential Revision: https://reviews.llvm.org/D122427
@simias
Copy link
Owner Author

simias commented Oct 10, 2022

That's amazing work!

Now I have to find a way to get my test setup working again...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants