[BUG] Entry point function cannot do a far jump #1243

weikengchen · 2023-12-18T00:51:17Z

Bug Report

Based on the conversation in the Discord server (see: https://discord.com/channels/953703904086994974/1184915191411003463), if the guest program is sophisticated enough, the entry point function, showed below, cannot call the __start function, because it is too far.

@DefiCake reported this problem, from their codebase https://github.com/DefiCake/fuel-risc0-prover/.

// Entry point; sets up global pointer and stack pointer and passes
// to zkvm_start.  TODO: when asm_const is stablized, use that here
// instead of defining a symbol and dereferencing it.
#[cfg(target_os = "zkvm")]
core::arch::global_asm!(
    r#"
.section .text._start;
.globl _start;
_start:
    .option push;
    .option norelax;
    la gp, __global_pointer$;
    .option pop;
    la sp, {0}
    lw sp, 0(sp)
    jal ra, __start;
"#,
    sym STACK_TOP
);

where the definition of the __start function is as follows. This is the function that starts the guest main.

#[cfg(target_os = "zkvm")]
#[no_mangle]
unsafe extern "C" fn __start() -> ! {
    env::init();

    {
        extern "C" {
            fn main();
        }
        main()
    }

    env::finalize(true, 0);
    unreachable!();
}

jal is an RISC-V raw instruction that jumps to __start, where __start is represented by a signed 20-bit offset from the pc. This means that it can jump either backward up to 1MB or forward up to 1MB. But, if it takes more than 1MB to go to __start in the memory from the current pc, this instruction does not work.

This would result in a compilation time error, shown as follows.

= note: rust-lld: error: /Users/cusgadmin/r0/fuel-risc0-prover/target/riscv-guest/riscv32im-risc0-zkvm-elf/release/deps/prover-8ebe1890752312c1.prover.cc888d54-cgu.6.rcgu.o:(.text._start+0x14): relocation R_RISCV_JAL out of range: 1508240 is not in [-1048576, 1048575]; references __start
prover: >>> referenced by musl_memcpy.c
prover: >>> defined in /Users/cusgadmin/r0/fuel-risc0-prover/target/riscv-guest/riscv32im-risc0-zkvm-elf/release/deps/prover-8ebe1890752312c1.prover.cc888d54-cgu.6.rcgu.o

Solution

Tested locally. If we make a single-line change to the https://github.com/risc0/risc0/blob/main/risc0/zkvm/src/guest/mod.rs#L166.

- jal ra, __start;
+ call __start;

The problem can be fixed. call is a "pseudo" instruction, in that LLVM would expand it according to its need.

A remaining question is whether this change would affect the binary that is generated by previous code. I.e., whether the image file would be different. This is not really a big issue, but it would be a headache to, if necessary, update the IMAGE id that has been hardcoded.

I tested https://github.com/l2iterative/vfhe0 with Ghidra. The answer is positive---even if the jump can be done with "jal" using an offset, Rust (indeed, LLVM) would emit two instructions: auipc and jalr, as expected.

        002024d4 97 00 00 00     auipc      ra,0x0
        002024d8 e7 80 80 00     jalr       ra=>__start,ra,0x8

For this reason, I am reluctant to submit a PR since there is going to be a lot of work to update the IMAGE ids as it would affect everything. That being said, it is a good argument that this fix should be included in the incoming LTS release.

The text was updated successfully, but these errors were encountered:

weikengchen · 2023-12-18T00:53:26Z

For more context, the pedantic definition for call is exactly:

auipc ra, TOP 20 bits
jalr ra, ra, LOWER 12 bits

There is no need to feel guilty about using pseudo instructions, as well. We are already using those. la is a pseudo instruction. la sp, {0} is as follows.

        002024c8 17 b1 10 00     auipc      sp,0x10b
        002024cc 13 01 c1 d5     addi       sp,sp,-0x2a4

nategraf · 2023-12-19T23:13:11Z

Thanks for tracking down this issue! I've opened #1257 to replace jal with call.

This is not really a big issue, but it would be a headache to, if necessary, update the IMAGE id that has been hardcoded.

In general, we don't guarentee image ID stability unless the user pins to an exact version (git commit) of the zkVM and builds in Docker using cargo risczero. Any changes, including to comments or file locations, to the transitive dependencies of the guest will general result in a new image ID since the Cargo build system does not provide any reproducible build guarantees. We find in practice that building in Docker can at least reproduce a build with sufficient pinning of sources, Rust version, etc.

@weikengchen

As described in #1243, our use of the `jal` instruction in our entrypoint breaks when the jump distance becomes too far. This PR changes the instruction to the `call` psuedo-instruction to allow the assembler flexibility to solve this issue and implement a far jump. Co-authored-by: @weikengchen Resolves: #1243 --------- Co-authored-by: Frank Laub <frank@risczero.com> Co-authored-by: Frank Laub <flaub@risc0.com>

@weikengchen

As described in #1243, our use of the `jal` instruction in our entrypoint breaks when the jump distance becomes too far. This PR changes the instruction to the `call` psuedo-instruction to allow the assembler flexibility to solve this issue and implement a far jump. Co-authored-by: @weikengchen Resolves: #1243 --------- Co-authored-by: Frank Laub <frank@risczero.com> Co-authored-by: Frank Laub <flaub@risc0.com>

@weikengchen

As described in #1243, our use of the `jal` instruction in our entrypoint breaks when the jump distance becomes too far. This PR changes the instruction to the `call` psuedo-instruction to allow the assembler flexibility to solve this issue and implement a far jump. Co-authored-by: @weikengchen Resolves: #1243 --------- Co-authored-by: Frank Laub <frank@risczero.com> Co-authored-by: Frank Laub <flaub@risc0.com>

weikengchen added the bug Something isn't working label Dec 18, 2023

nategraf mentioned this issue Dec 19, 2023

Change jal to call in zkVM entrypoint #1257

Merged

flaub closed this as completed in #1257 Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Entry point function cannot do a far jump #1243

[BUG] Entry point function cannot do a far jump #1243

weikengchen commented Dec 18, 2023 •

edited

weikengchen commented Dec 18, 2023 •

edited

nategraf commented Dec 19, 2023

[BUG] Entry point function cannot do a far jump #1243

[BUG] Entry point function cannot do a far jump #1243

Comments

weikengchen commented Dec 18, 2023 • edited

Bug Report

Solution

weikengchen commented Dec 18, 2023 • edited

nategraf commented Dec 19, 2023

weikengchen commented Dec 18, 2023 •

edited

weikengchen commented Dec 18, 2023 •

edited