-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[BOLT][RISCV] Implement basic instrumentation #71664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This patch implements support for instrumenting direct function calls and jumps on RISC-V (anything that relies on `createInstrIncMemory`) as well as the necessary runtime library support (init, fini, and syscall wrappers). Only `createInstrIncMemory` is implemented by this patch. However, most of the other `create*` functions are called when instrumenting even the simplest binary on RISC-V and their default implementation triggers an assert. Therefore, dummy implementations are added by this patch. Atomic instructions (used to increment counters) are only available on RISC-V when the A extension is used. A check was added to err when trying to instrument a binary that was compiled without this extension enabled. The syscall wrappers in the runtime library have been implemented by leveraging the fact that the syscall calling convention matches the user space one (at least when up to six scalars are passed as arguments): for every wrapper, a forward declaration is defined in C whereas its implementation is defined in assembly and simply loads the syscall number and uses `ecall`. This way, the compiler will prepare all the arguments in the correct registers when calling the function, saving us from a lot of repetitive work. The implementation is done in global assembly (instead of inline assembly in a C function) to make sure the compiler cannot optimize away the call and skip preparing the argument registers. I've taken a slightly different approach than the existing targets with respect to the data types used by syscalls. I'm not quite sure why `common.h` defines some system types but I believe it is fine to just use the system headers for this (we only want to avoid linking against system libraries; using their headers should not cause problems). For this reason, I've made sure the definitions in `common.h` are not used for RISC-V.
|
|
||
| createLA(Insts, RISCV::X5, Target, *Ctx); | ||
|
|
||
| MCInst LI = MCInstBuilder(RISCV::ADDI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks nice and readable, we need to use it more extensively.
| #ifndef LLVM_TOOLS_LLVM_BOLT_SYS_RISCV64 | ||
| #define LLVM_TOOLS_LLVM_BOLT_SYS_RISCV64 | ||
|
|
||
| #include <dirent.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't depend on the host definitions – it should be possible to cross-compile an instrumentation runtime, as is the case for the rest of the targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When cross-compiling, these headers come from the sysroot (not from the host) so this shouldn't be an issue. Or am I missing something?
By the way, I have been cross-compiling this while developing instrumentation. This is my workflow:
$ riscv64-linux-gnu-g++ -c -ffreestanding -fno-exceptions -fno-rtti -fno-stack-protector -fPIC $LLVM_SRC/bolt/runtime/instr.cpp -o instr.o -I $LLVM_BUILD/tools/bolt/bolt_rt-bins/
$ ar -crs $LLVM_BUILD/lib/libbolt_rt_instr-riscv64.a instr.o
$ $LLVM_BUILD/bin/llvm-bolt --instrument --runtime-instrumentation-lib=libbolt_rt_instr-riscv64.a ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The runtime files should match the system your input binaries were compiled for. Because BOLT is independent of the compiler, historically we never assumed that you actually have the full target toolchain, so the few target constants we depend on are hardcoded. But we can probably change to include the kernel header files directly and then make sure our cmake rules correctly configure the compiler used to build this runtime lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was reading our current cmake file and we explicitly disable building the runtime if llvm is being cross-compiled. I imagine we would need to update the build rules to better support building the runtime for different scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned on another PR, the clean solution would probably be to move to runtimes build model (sanitizer, instrumentation runtimes etc). This should enable proper native and cross- compilation. But since we're not there yet, I'd stick with hard-coded definitions for syscall numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But since we're not there yet, I'd stick with hard-coded definitions for syscall numbers.
I feel like I'm missing the problem we're trying to solve here. It seems to me this would only be useful in a situation where the user does have a (cross-) compiler available for the target (otherwise they couldn't compile the instrumentation library) but somehow does not have those system headers available. Is this a realistic scenario? I don't think I have ever seen a toolchain in the wild that doesn't ship those headers.
I also just noticed that runtime/common.h includes cstddef and cstdint so we are already relying on the toolchain to provide certain headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping.
This patch implements support for instrumenting direct function calls and jumps on RISC-V (anything that relies on
createInstrIncMemory) as well as the necessary runtime library support (init, fini, and syscall wrappers).Only
createInstrIncMemoryis implemented by this patch. However, most of the othercreate*functions are called when instrumenting even the simplest binary on RISC-V and their default implementation triggers an assert. Therefore, dummy implementations are added by this patch.Atomic instructions (used to increment counters) are only available on RISC-V when the A extension is used. A check was added to err when trying to instrument a binary that was compiled without this extension enabled.
The syscall wrappers in the runtime library have been implemented by leveraging the fact that the syscall calling convention matches the user space one (at least when up to six scalars are passed as arguments): for every wrapper, a forward declaration is defined in C whereas its implementation is defined in assembly and simply loads the syscall number and uses
ecall. This way, the compiler will prepare all the arguments in the correct registers when calling the function, saving us from a lot of repetitive work. The implementation is done in global assembly (instead of inline assembly in a C function) to make sure the compiler cannot optimize away the call and skip preparing the argument registers.I've taken a slightly different approach than the existing targets with respect to the data types used by syscalls. I'm not quite sure why
common.hdefines some system types but I believe it is fine to just use the system headers for this (we only want to avoid linking against system libraries; using their headers should not cause problems). For this reason, I've made sure the definitions incommon.hare not used for RISC-V.