Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to raise binaries of Csmith-generated sources #166

Open
Hanseltu opened this issue Feb 18, 2022 · 4 comments
Open

Failure to raise binaries of Csmith-generated sources #166

Hanseltu opened this issue Feb 18, 2022 · 4 comments
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed x86-64 Relates top raising x86-64 binaries

Comments

@Hanseltu
Copy link

Hello, forks,

May I ask is it possible to make such a promising tool compatible with Csmith, which is a notable random program generator that is able to yield a bunch of C programs with abundant features. However, llvm-mctoll seems can not handle those programs generated by Csmith.

For example, consider the following code

/*
 * This is a RANDOMLY GENERATED PROGRAM.
 *
 * Generator: csmith 2.3.0
 * Git version: 30dccd7
 * Options:   --no-pointers --no-packed-struct --no-volatile-pointers --no-volatiles --no-const-pointers --no-structs --no-unions --no-inline-function --max-funcs 1
 * Seed:      16720983486250502810
 */

#include "csmith.h"

static long __undefined;

/* --- Struct/Union Declarations --- */
/* --- GLOBAL VARIABLES --- */
static uint32_t g_10 = 4UL;
static const int64_t g_23 = 0x624B6749EFEDEDC8LL;
static uint32_t g_24 = 1UL;
static uint8_t g_58 = 252UL;
static int32_t g_59[4][9][1] = {{{0xF274C111L},{0xED9A830FL},{0xED9A830FL},{0xF274C111L},{0xED9A830FL},{0xED9A830FL},{0xF274C111L},{0xED9A830FL},{0xED9A830FL}},{{0xF274C111L},{0xED9A830FL},{0xED9A830FL},{0xF274C111L},{0xED9A830FL},{0xED9A830FL},{0xF274C111L},{0xED9A830FL},{0xED9A830FL}},{{0xF274C111L},{0xED9A830FL},{0xED9A830FL},{0xF274C111L},{0xED9A830FL},{0xED9A830FL},{0xF274C111L},{0xED9A830FL},{0xED9A830FL}},{{0xF274C111L},{0xED9A830FL},{0xED9A830FL},{0xF274C111L},{0xED9A830FL},{0xED9A830FL},{0xF274C111L},{0xED9A830FL},{0xED9A830FL}}};
static int32_t g_60[6] = {0x89989A98L,0x89989A98L,0x89989A98L,0x89989A98L,0x89989A98L,0x89989A98L};

/* --- FORWARD DECLARATIONS --- */
static uint64_t  func_1(void);

/* --- FUNCTIONS --- */
/* ------------------------------------------ */
/* 
 * reads : g_10 g_23 g_24 g_60
 * writes: g_10 g_24 g_58 g_59 g_60
 */
static uint64_t  func_1(void)
{ /* block id: 0 */
    uint8_t l_13 = 253UL;
    int32_t l_20[5] = {0L,0L,0L,0L,0L};
    uint64_t l_21 = 0UL;
    int32_t l_22 = 0xDD5D0881L;
    const uint8_t l_25 = 255UL;
    int32_t l_57 = 0xE9A03A1BL;
    int i;
    g_24 = (safe_mul_func_int8_t_s_s((safe_div_func_int32_t_s_s((safe_lshift_func_uint8_t_u_s(((safe_add_func_uint16_t_u_u((++g_10), l_13)) , (((safe_div_func_int16_t_s_s((safe_rshift_func_uint16_t_u_s(g_10, (safe_div_func_uint64_t_u_u(0x2B4837182ABBF9ACLL, l_13)))), g_10)) || (((((l_20[4] ^= 0xF2FDL) == ((((((g_10 | ((g_10 || (((g_10 >= l_13) | 0x2F20920A6FF609D7LL) < 0xD635290CL)) <= l_21)) | g_10) , g_10) || g_10) , l_22) <= l_13)) != l_22) , 0xC930L) <= l_22)) ^ l_21)), g_23)), 4L)), g_23));
    l_20[3] = (l_25 <= (l_20[0] >= (((g_60[3] = (g_59[3][4][0] = (safe_lshift_func_uint8_t_u_u(((safe_lshift_func_int16_t_s_s((safe_add_func_int64_t_s_s((((~((g_58 = (safe_add_func_int64_t_s_s((safe_div_func_int8_t_s_s((safe_rshift_func_uint16_t_u_s((safe_div_func_uint64_t_u_u((safe_mod_func_uint64_t_u_u(((g_24 , (safe_lshift_func_int16_t_s_u((safe_rshift_func_int16_t_s_u((safe_lshift_func_int16_t_s_s((safe_mul_func_uint8_t_u_u((l_57 &= (l_25 & (l_20[4] | (safe_add_func_int8_t_s_s((safe_mod_func_uint16_t_u_u(l_13, (safe_rshift_func_uint8_t_u_s(l_20[4], 2)))), (l_25 == g_24)))))), 0UL)), 11)), 0)), l_25))) & l_20[4]), l_22)), l_13)), g_24)), g_24)), l_22))) , l_22)) , 0xB76F056EL) & (-1L)), 0x77138BF9CC53A0DCLL)), 3)) , 0xC9L), l_25)))) != g_23) != l_21)));
    l_20[1] &= g_60[3];
    return l_13;
}
/* ---------------------------------------- */
int main (int argc, char* argv[])
{
    func_1();
    return 0;
}

Build with GCC-9 and then decompile it

$gcc-9 -w -fno-stack-protector test.c -o test
$./llvm-mctoll -I /usr/include/stdio.h -I /usr/include/string.h  test
MUL8m $rbp, 1, $noreg, -8, $noreg, <0x45a1538>, implicit-def $al, implicit-def $eflags, implicit-def $ax, implicit $al
llvm-mctoll: /home/haoxin/disk-dut/research/compilers/llvm-mctoll/llvm/tools/llvm-mctoll/X86/X86MachineInstructionRaiserUtils.cpp:75: llvm::Value *X86MachineInstructionRaiser::getMemoryRefValue(const llvm::MachineInstr &): Assertion `false && "Encountered unhandled memory load instruction"' failed.
 #0 0x0000000000f9d103 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (./llvm-mctoll+0xf9d103)
 #1 0x0000000000f9aecc llvm::sys::RunSignalHandlers() (./llvm-mctoll+0xf9aecc)
 #2 0x0000000000f9d5d6 SignalHandler(int) (./llvm-mctoll+0xf9d5d6)
 #3 0x00007ffff7bc6980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 #4 0x00007ffff65e6fb7 raise /build/glibc-S9d2JN/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #5 0x00007ffff65e8921 abort /build/glibc-S9d2JN/glibc-2.27/stdlib/abort.c:81:0
 #6 0x00007ffff65d848a __assert_fail_base /build/glibc-S9d2JN/glibc-2.27/assert/assert.c:89:0
 #7 0x00007ffff65d8502 (/lib/x86_64-linux-gnu/libc.so.6+0x30502)
 #8 0x0000000000fbf1f5 X86MachineInstructionRaiser::getMemoryRefValue(llvm::MachineInstr const&) (./llvm-mctoll+0xfbf1f5)
 #9 0x0000000000fb561a X86MachineInstructionRaiser::raiseMemRefMachineInstr(llvm::MachineInstr const&) (./llvm-mctoll+0xfb561a)
#10 0x0000000000fbdb7d X86MachineInstructionRaiser::raiseMachineFunction() (./llvm-mctoll+0xfbdb7d)
#11 0x0000000000fbdd56 X86MachineInstructionRaiser::raise() (./llvm-mctoll+0xfbdd56)
#12 0x00000000004a7ab5 MachineFunctionRaiser::runRaiserPasses() (./llvm-mctoll+0x4a7ab5)
#13 0x00000000004a6498 ModuleRaiser::runMachineFunctionPasses() (./llvm-mctoll+0x4a6498)
#14 0x000000000045f094 DisassembleObject(llvm::object::ObjectFile const*, bool) (./llvm-mctoll+0x45f094)
#15 0x000000000045591b main (./llvm-mctoll+0x45591b)
#16 0x00007ffff65c9bf7 __libc_start_main /build/glibc-S9d2JN/glibc-2.27/csu/../csu/libc-start.c:344:0
#17 0x00000000004531fa _start (./llvm-mctoll+0x4531fa)

*** Please submit an issue at https://github.com/microsoft/llvm-mctoll
*** along with a back trace and a reproducer, if possible.
Stack dump:
0.	Program arguments: ./llvm-mctoll -I /usr/include/stdio.h -I /usr/include/string.h test
Aborted (core dumped)

Build with Clang-9 and decompile it

$clang-9 -w -fno-stack-protector test.c -o test
$./llvm-mctoll -I /usr/include/stdio.h -I /usr/include/string.h  test
*** Generic instruction not raised : func_1
	XOR64i32 62205, <0x4290008>, implicit-def $rax, implicit-def $eflags, implicit $rax
llvm-mctoll: /home/haoxin/disk-dut/research/compilers/llvm-mctoll/llvm/include/llvm/Support/Casting.h:104: static bool llvm::isa_impl_cl<llvm::UnreachableInst, const llvm::Instruction *>::doit(const From *) [To = llvm::UnreachableInst, From = const llvm::Instruction *]: Assertion `Val && "isa<> used on a null pointer"' failed.
 #0 0x0000000000f9d103 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (./llvm-mctoll+0xf9d103)
 #1 0x0000000000f9aecc llvm::sys::RunSignalHandlers() (./llvm-mctoll+0xf9aecc)
 #2 0x0000000000f9d5d6 SignalHandler(int) (./llvm-mctoll+0xf9d5d6)
 #3 0x00007ffff7bc6980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 #4 0x00007ffff65e6fb7 raise /build/glibc-S9d2JN/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #5 0x00007ffff65e8921 abort /build/glibc-S9d2JN/glibc-2.27/stdlib/abort.c:81:0
 #6 0x00007ffff65d848a __assert_fail_base /build/glibc-S9d2JN/glibc-2.27/assert/assert.c:89:0
 #7 0x00007ffff65d8502 (/lib/x86_64-linux-gnu/libc.so.6+0x30502)
 #8 0x0000000002720c0e (anonymous namespace)::unifyUnreachableBlocks(llvm::Function&) (./llvm-mctoll+0x2720c0e)
 #9 0x000000000272090e llvm::UnifyFunctionExitNodesLegacyPass::runOnFunction(llvm::Function&) (./llvm-mctoll+0x272090e)
#10 0x00000000008b1658 llvm::FPPassManager::runOnFunction(llvm::Function&) (./llvm-mctoll+0x8b1658)
#11 0x00000000008b7ae8 llvm::FPPassManager::runOnModule(llvm::Module&) (./llvm-mctoll+0x8b7ae8)
#12 0x00000000008b1ca7 llvm::legacy::PassManagerImpl::run(llvm::Module&) (./llvm-mctoll+0x8b1ca7)
#13 0x0000000000fbde72 X86MachineInstructionRaiser::raise() (./llvm-mctoll+0xfbde72)
#14 0x00000000004a7ab5 MachineFunctionRaiser::runRaiserPasses() (./llvm-mctoll+0x4a7ab5)
#15 0x00000000004a6498 ModuleRaiser::runMachineFunctionPasses() (./llvm-mctoll+0x4a6498)
#16 0x000000000045f094 DisassembleObject(llvm::object::ObjectFile const*, bool) (./llvm-mctoll+0x45f094)
#17 0x000000000045591b main (./llvm-mctoll+0x45591b)
#18 0x00007ffff65c9bf7 __libc_start_main /build/glibc-S9d2JN/glibc-2.27/csu/../csu/libc-start.c:344:0
#19 0x00000000004531fa _start (./llvm-mctoll+0x4531fa)

*** Please submit an issue at https://github.com/microsoft/llvm-mctoll
*** along with a back trace and a reproducer, if possible.
Stack dump:
0.	Program arguments: ./llvm-mctoll -I /usr/include/stdio.h -I /usr/include/string.h test
1.	Running pass 'Function Pass Manager' on module 'test'.
2.	Running pass 'Unify function exit nodes' on function '@func_1'
Aborted                 (core dumped) ./llvm-mctoll -I /usr/include/stdio.h -I /usr/include/string.h test

Here is the version of llvm-mctoll I used, running in a ubuntu18.04 Linux system.

LLVM (http://llvm.org/):
  LLVM version 14.0.0git
  Optimized build with assertions.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: skylake-avx512

  Registered Targets:
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64

During my simple testing, although I use the option "--no-pointers --no-packed-struct --no-volatile-pointers --no-volatiles --no-const-pointers --no-structs --no-unions --no-inline-function --max-funcs 1" to enforce Csmith to generate test programs with quite small features and only one function, it seems a large portion of these programs could make llvm-mctoll crash.

Is this the correct usage of llvm-mctoll and anything wrong from my side? If not, is it possible to make llvm-mctoll compatible with Csmith? How much effort will be taken in terms of implementation?

Thanks,
Haoxin

@bharadwajy
Copy link
Contributor

Thanks for your interest in the project and for your question.

If the input to llvm-mctoll is a legitimate binary - whether built from a randomly generated source or otherwise - I'd like llvm-mctoll to be able to raise it correctly. So, it should not matter whether the source code was generated by CSmith or by another means, as long as it compiles to a well-behaved and correct binary.

In reality, CSmith would/could serve as a feeder of test cases for llvm-mctoll.

The examples you provided expose

  1. an unhandled memory instruction while raising the gcc-generated binary
  2. a bug in the pass Unify function exit nodes while raising clang-generated binary.

Thanks for the bug report. I'll plan to look at them. However, if you or anyone else can help out before I get to them, I'd very much appreciate the help.

@Hanseltu
Copy link
Author

Hi @bharadwajy. Thanks for your reply and bug confirmation.

Yeah, I thinkllvm-mctoll is a cool tool that has many benefits over other existing lifting tools (e.g., mcsema and retdec), and I am happy if it could become stronger and more scalable.

I am sorry I can not help in the implementation part, but if you require more test cases that trigger different assertion failures, I would like to help and find more useful test cases (with their reduced version) to assistant debugging. Do you need such cases? If so, is it better to file a new issue for each failure or just pack them all in a decompressed file then upload it here?

Best,
Haoxin

@bharadwajy
Copy link
Contributor

Thanks @Hanseltu The more test cases we have to make the tool robust and useful, the better.

I'd prefer if you can create issues one per kind of failure with sources (either C or assembly) that are as minimal as possible to help focus on the actual failure. It would also help if the sources can incorporate a way to verify the correctness of the translation. Currently the tests are set up to raise a given binary, recompile the raised IR back to x64 target. Then the output of original binary and the raised binary are compared to verify the correctness of raised IR. So, if your bug report sources can incorporate a way to output some verifiable set of results or provide some other way to verify the correctness of the raised IR, it would be very helpful.

Thanks again for your offer to help.

@Hanseltu
Copy link
Author

Hanseltu commented Feb 21, 2022

You are welcome @bharadwajy! I am happy I can help here!

By the way, the reduced versions (reduced by Creduce) of source code are as follows:

The source code of GCC

int a;
char(f)() {
  unsigned char c = 0;
  return a * c;
}
int main() {return 0;}

and the source code of clang

void a() {
  int b[5] = {0,0,0,0,0};
  b[4] ^= 253L;
}
int main() {return 0;}

For the correctness of the translation, may I ask a further question for you? To verify the translation process, I think two fundamental requirements are needed to check it. One comes from the lifting tool itself, i.e., the transferred IR should be able to be recompiled, and aother is that we need to define what should be the correct behavior after executing the compiled/re-compiled binary. Here, we can find a way to verify the correctness in llvm-mctoll as the IR lifted by llvm-mctoll can be recompiled which satisfies the first requirement and the programs generated by Csmith can meet the second requirement. However, for other lifting tools (e.g., retdec), the lifted LLVM IR is not recompiled based on some issues (e.g., avast/retdec#529). Is it possible to directly cross-check the lifted LLVM IR code without recompiling them? Do you have any suggestions?

For new bug reports, I will try to continuously file new issues for them in my spare time. Thanks!

Best,
Haoxin

@bharadwajy bharadwajy changed the title [Question] Possibility to make llvm-mctoll compatible with Csmith? Failure to raise binaries of Csmith-generated sources Feb 28, 2022
@bharadwajy bharadwajy added bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed x86-64 Relates top raising x86-64 binaries labels Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed x86-64 Relates top raising x86-64 binaries
Projects
None yet
Development

No branches or pull requests

2 participants