-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translating RISC-V programs directly from ELF files. #1453
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many but mostly small comments
@@ -155,6 +155,8 @@ jobs: | |||
key: ${{ runner.os }}-pilcom-node-modules | |||
- name: Install Rust toolchain 1.77 (with clippy and rustfmt) | |||
run: rustup toolchain install 1.77-x86_64-unknown-linux-gnu && rustup component add clippy --toolchain 1.77-x86_64-unknown-linux-gnu && rustup component add rustfmt --toolchain 1.77-x86_64-unknown-linux-gnu | |||
- name: Install test dependencies | |||
run: sudo apt-get install -y binutils-riscv64-unknown-elf lld |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not riscv32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no riscv32 in Ubuntu. You use the 64 for both.
.section .data | ||
.align 4 | ||
.global data_section | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this test manually crafted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you call ChatGPT "manually"...
# rvc.S | ||
#----------------------------------------------------------------------------- | ||
# | ||
# Test RVC corner cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did this test come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly from the official testsuite, but I left some comments on the modifications I had to make.
riscv/src/elf.rs
Outdated
// Otherwise, there is something more fishy going on, and we | ||
// panic. | ||
|
||
// TODO: we should just emit `unimp` for every unknown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can make the generated code very big and may mask other issues. I reworded the TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, there are unanswered questions: should this data be readable from the program itself? What about the valid instructions?
Ok((*r1, *r2, *r3)) | ||
} | ||
[Argument::Register(r1), Argument::Register(r2), Argument::Register(r3) | ||
| Argument::RegOffset(None | Some(Expression::Number(0)), r3)] => Ok((*r1, *r2, *r3)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain why the RegOffset
here? isn't there a rrro
case? Also, in the rr
case, the 0
offset case is not handled, should it be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1453 (comment).
But indeed, for consistency rr
should be able to handle the 0 case when decoding the instruction lr.w rd, 0(rs1)
, but we have never encountered this case in handwritten assembly (the test for this instruction, lrsc.S
, omits the 0), and the compiler doesn't place the zero there, either.
I fixed it.
#[no_mangle] | ||
fn main() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should all programs be pub fn main
? if so poseidon_gl_via_coprocessor
and evm
should also be changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed, and I don't know why I changed it. Reverted.
fn rro(&self) -> Result<(Register, Register, u32), Self::Error>; | ||
fn rrro(&self) -> Result<(Register, Register, Register, u32), Self::Error>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why isn't rrro
there anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rrro
stands for "register, register, register, offset", but there is no RISC-V instruction that encodes this much information (there is not enough space in the 32 bits). It existed because the assembly syntax for these instructions allowed for writing amoadd.d rd, rs2, 0(rs1)
, but there is no place to put that 0
in the actual encoding (if I try to assemble with any other value I get an error).
So, at first I replaced rrro()
with rrr()
, and changed it to accept both None
and 0 offset. Then I figured out that the order of the registers expected by these instructions was different from other instructions using rrr()
, so I created rrr2()
(ok, the name could be better) to say that this is an instruction that expects 3 registers, but in a different order from rrr()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, i think name is fine, can you add a small comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are comments on both rrr()
and rrr2()
explaining the difference between them.
riscv/src/lib.rs
Outdated
|
||
fs::write(powdr_asm_file_name.clone(), &powdr_asm).unwrap(); | ||
log::info!("Wrote {}", powdr_asm_file_name.to_str().unwrap()); | ||
|
||
Some((powdr_asm_file_name, powdr_asm)) | ||
} | ||
|
||
/// Compiles a riscv asm file all the way down to PIL and generates | ||
/// fixed and witness columns. | ||
#[allow(clippy::print_stderr)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment was outdated, but I have no idea why where this line came from. I am removing it.
@@ -9,65 +9,61 @@ kind of arithmetic over `.text` label and addresses, nor alignment or spacing | |||
directives in `.text` sections. Most unsupported instructions are related to | |||
this limitation. | |||
|
|||
One consequence of this limitation is that all the supported uses of `auipc` and | |||
`lui`, when referring to a `.text` label, is when they can be fused with the | |||
next instruction to make use of the loaded address immediately (e.g. `addi` or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this phrase is a bit confusing, is it saying that these instructions are only supported when they can be merged with a following instruction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to write it better. But using auipc
and lui
to load a .text
address is only supported if it can be fused with the next instruction. It works fine if they are used to load a data address.
- amoswap_w | ||
- amoxor_w | ||
- `amoand_w` | ||
- `momax_w` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amomax_w
?
#define RVC_TEST_CASE(n, r, v, code...) \ | ||
TEST_CASE (n, r, v, .option push; .option rvc; code; .align 2; .option pop) | ||
|
||
// Powdr note: there was a test case from the original riscv-tests here, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the //
comment notation work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because it worked.
2:j fail; \ | ||
1:) | ||
|
||
// Powdr note: tests 36 and 37 were removed, as they did arithmetic on text addresses, which we don't support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to comment them out instead of removing? also, 38 and 39 are not present here either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
38 and 39 were already missing. Maybe they are RISC-V 64 specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I restored the removed tests, but commented out.
fn run_instruction_test(assembly: &str, name: &str) { | ||
fn run_instruction_test(path: &Path, name: &str) { | ||
// Test from ELF path: | ||
verify_riscv_asm_file(path, &Runtime::base(), false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick as this is only used here i think: have elf
in the function name somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this also executes the test from the assembly path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah right, i got confused by the asm
in the name, but asm here is from the input not output
Co-authored-by: Leo <leo@powdrlabs.com>
|
||
/// The name of the function that should be called to start the program. | ||
fn start_function(&self) -> &str; | ||
fn start_function(&self) -> impl AsRef<str>; | ||
} | ||
|
||
/// Translates a RISC-V program to POWDR ASM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment about "will call the methods just once" can be removed now i think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still true...
riscv/src/elf.rs
Outdated
let section_data = &file_buffer[p.p_offset as usize..(p.p_offset + p.p_filesz) as usize]; | ||
|
||
// Test if executable | ||
if p.p_flags & 1 == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.is_executable()
then you can remove the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no such method. But I improved it by using the named constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry. rust-analyzer didn't find it, so I didn't think it existed. But I'll try again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
/// Find all the references to text addresses in the instructions and add them | ||
/// to the set. | ||
fn search_text_addrs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a big deal, but i wonder if all these were part of the InstructionLifter
it wouldn't make the code a bit simpler to follow? e.g., maybe then you don't need the TwoOrOneMapper
trait or the ReadOrWrite
enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason to have the TwoOrOneMapper
trait is to be able to write the algorithm in try_map_two_by_two()
independently of types, so it is was easier to write and reason about.
I don't see how to get rid of ReadOrWrite
enum without replacing with something very similar. The lifting algorithm has to run twice, once just collecting the references, and another making use of them.
riscv/src/elf.rs
Outdated
} | ||
|
||
/// Get the symbol or a default label formed from the address value. | ||
fn get_as_string(&self, addr: u32) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the method needed? get(addr).into()
is fine no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Co-authored-by: Leo <leo@powdrlabs.com>
Co-authored-by: Leo <leo@powdrlabs.com>
Co-authored-by: Leo <leo@powdrlabs.com>
CI failing |
The latests version our dependencies depends on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a
) -> (Option<PathBuf>, BTreeMap<String, PathBuf>) { | ||
pub struct CompilationResult { | ||
pub executable: Option<PathBuf>, | ||
assemblies: BTreeMap<String, PathBuf>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why just make one pub
and the other not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because there is no method to get the pub
one
@@ -359,9 +359,14 @@ fn compile_riscv_asm<F: FieldElement>( | |||
None => powdr_riscv::Runtime::base(), | |||
}; | |||
|
|||
powdr_riscv::compile_riscv_asm::<F>( | |||
powdr_riscv::compile_riscv_asm_bundle::<F>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is elf still the default compilation path here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no ELF path in this function, this only from assembly.
Co-authored-by: Leo <leo@powdrlabs.com>
The other thing that's lost are labels, which I assume are useful for the profiler? @pacheco For example, for the main:
trivial_dash_be3334ad2e9c9ee9___dot_Lfunc_begin0:
.debug loc 338 7 17;
tmp1 <== jump_dyn(x1); and via elf: main:
tmp1 <== jump_dyn(x1); |
There are many advantages in using standard assemblers and linkers, like:
std
).But it comes with a cost, as we need to lift references to text data back into labels, (because powdr operates on a higher abstraction level), and to do that, we need the ELF file to either be a PIE (Position Independent Executable), or to still have the linkage relocation tables (option
--emit-relocs
of GNU and LLVM linkers).