Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Entry point function cannot do a far jump #1243

Closed
weikengchen opened this issue Dec 18, 2023 · 2 comments · Fixed by #1257
Closed

[BUG] Entry point function cannot do a far jump #1243

weikengchen opened this issue Dec 18, 2023 · 2 comments · Fixed by #1257
Labels
bug Something isn't working

Comments

@weikengchen
Copy link
Contributor

weikengchen commented Dec 18, 2023

Bug Report

Based on the conversation in the Discord server (see: https://discord.com/channels/953703904086994974/1184915191411003463), if the guest program is sophisticated enough, the entry point function, showed below, cannot call the __start function, because it is too far.

@DefiCake reported this problem, from their codebase https://github.com/DefiCake/fuel-risc0-prover/.

// Entry point; sets up global pointer and stack pointer and passes
// to zkvm_start.  TODO: when asm_const is stablized, use that here
// instead of defining a symbol and dereferencing it.
#[cfg(target_os = "zkvm")]
core::arch::global_asm!(
    r#"
.section .text._start;
.globl _start;
_start:
    .option push;
    .option norelax;
    la gp, __global_pointer$;
    .option pop;
    la sp, {0}
    lw sp, 0(sp)
    jal ra, __start;
"#,
    sym STACK_TOP
);

where the definition of the __start function is as follows. This is the function that starts the guest main.

#[cfg(target_os = "zkvm")]
#[no_mangle]
unsafe extern "C" fn __start() -> ! {
    env::init();

    {
        extern "C" {
            fn main();
        }
        main()
    }

    env::finalize(true, 0);
    unreachable!();
}

jal is an RISC-V raw instruction that jumps to __start, where __start is represented by a signed 20-bit offset from the pc. This means that it can jump either backward up to 1MB or forward up to 1MB. But, if it takes more than 1MB to go to __start in the memory from the current pc, this instruction does not work.

This would result in a compilation time error, shown as follows.

= note: rust-lld: error: /Users/cusgadmin/r0/fuel-risc0-prover/target/riscv-guest/riscv32im-risc0-zkvm-elf/release/deps/prover-8ebe1890752312c1.prover.cc888d54-cgu.6.rcgu.o:(.text._start+0x14): relocation R_RISCV_JAL out of range: 1508240 is not in [-1048576, 1048575]; references __start
prover: >>> referenced by musl_memcpy.c
prover: >>> defined in /Users/cusgadmin/r0/fuel-risc0-prover/target/riscv-guest/riscv32im-risc0-zkvm-elf/release/deps/prover-8ebe1890752312c1.prover.cc888d54-cgu.6.rcgu.o

Solution

Tested locally. If we make a single-line change to the https://github.com/risc0/risc0/blob/main/risc0/zkvm/src/guest/mod.rs#L166.

- jal ra, __start;
+ call __start;

The problem can be fixed. call is a "pseudo" instruction, in that LLVM would expand it according to its need.

A remaining question is whether this change would affect the binary that is generated by previous code. I.e., whether the image file would be different. This is not really a big issue, but it would be a headache to, if necessary, update the IMAGE id that has been hardcoded.

I tested https://github.com/l2iterative/vfhe0 with Ghidra. The answer is positive---even if the jump can be done with "jal" using an offset, Rust (indeed, LLVM) would emit two instructions: auipc and jalr, as expected.

        002024d4 97 00 00 00     auipc      ra,0x0
        002024d8 e7 80 80 00     jalr       ra=>__start,ra,0x8

For this reason, I am reluctant to submit a PR since there is going to be a lot of work to update the IMAGE ids as it would affect everything. That being said, it is a good argument that this fix should be included in the incoming LTS release.

@weikengchen weikengchen added the bug Something isn't working label Dec 18, 2023
@weikengchen
Copy link
Contributor Author

weikengchen commented Dec 18, 2023

For more context, the pedantic definition for call is exactly:

auipc ra, TOP 20 bits
jalr ra, ra, LOWER 12 bits

There is no need to feel guilty about using pseudo instructions, as well. We are already using those. la is a pseudo instruction. la sp, {0} is as follows.

        002024c8 17 b1 10 00     auipc      sp,0x10b
        002024cc 13 01 c1 d5     addi       sp,sp,-0x2a4

@nategraf
Copy link
Contributor

Thanks for tracking down this issue! I've opened #1257 to replace jal with call.

This is not really a big issue, but it would be a headache to, if necessary, update the IMAGE id that has been hardcoded.

In general, we don't guarentee image ID stability unless the user pins to an exact version (git commit) of the zkVM and builds in Docker using cargo risczero. Any changes, including to comments or file locations, to the transitive dependencies of the guest will general result in a new image ID since the Cargo build system does not provide any reproducible build guarantees. We find in practice that building in Docker can at least reproduce a build with sufficient pinning of sources, Rust version, etc.

flaub added a commit that referenced this issue Dec 20, 2023
As described in #1243, our use of
the `jal` instruction in our entrypoint breaks when the jump distance
becomes too far. This PR changes the instruction to the `call`
psuedo-instruction to allow the assembler flexibility to solve this
issue and implement a far jump.

Co-authored-by: @weikengchen
Resolves: #1243

---------

Co-authored-by: Frank Laub <frank@risczero.com>
Co-authored-by: Frank Laub <flaub@risc0.com>
flaub added a commit that referenced this issue Dec 20, 2023
As described in #1243, our use of
the `jal` instruction in our entrypoint breaks when the jump distance
becomes too far. This PR changes the instruction to the `call`
psuedo-instruction to allow the assembler flexibility to solve this
issue and implement a far jump.

Co-authored-by: @weikengchen
Resolves: #1243

---------

Co-authored-by: Frank Laub <frank@risczero.com>
Co-authored-by: Frank Laub <flaub@risc0.com>
flaub added a commit that referenced this issue Jan 17, 2024
As described in #1243, our use of
the `jal` instruction in our entrypoint breaks when the jump distance
becomes too far. This PR changes the instruction to the `call`
psuedo-instruction to allow the assembler flexibility to solve this
issue and implement a far jump.

Co-authored-by: @weikengchen
Resolves: #1243

---------

Co-authored-by: Frank Laub <frank@risczero.com>
Co-authored-by: Frank Laub <flaub@risc0.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants