New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[qubes-os] Debugging non-reproducible timeout crashes #703

Open
paraschetal opened this Issue Jul 6, 2017 · 17 comments

Comments

Projects
None yet
3 participants
@paraschetal
Contributor

paraschetal commented Jul 6, 2017

Based on the clusterfuzz logs, the libqubes-rpc-filecopy fuzz target is timing out (t > 25 seconds) on certain inputs which is causing it to crash. Apparently the testcase is not reproducible. How should I debug the crashes then?

I attempted to download the testcase and run it locally in the base-runner docker container, and it did not timeout:

root@d359bcf8f6ae:/out# ./libqubes-rpc-filecopy clusterfuzz-testcase-4748420839374848 < /dev/null
INFO: Seed: 1867781274
INFO: Loaded 1 modules (110 guards): [0x885410, 0x8855c8), 
./libqubes-rpc-filecopy: Running 1 inputs 1 time(s) each.
Running: clusterfuzz-testcase-4748420839374848
Executed clusterfuzz-testcase-4748420839374848 in 0 ms
***
*** NOTE: fuzzing was not performed, you have only
***       executed the target code on a fixed set of inputs.
***
==785==LeakSanitizer has encountered a fatal error.
==785==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==785==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)

There was a LeakSanitizer error, but I believe that is not the issue here? Because the LeakSanitizer seems to encounter a fatal error on any input testcase.

root@d359bcf8f6ae:/out# echo "AAAA" > testcase
root@d359bcf8f6ae:/out# ./libqubes-rpc-filecopy testcase < /dev/null
INFO: Seed: 2200940229
INFO: Loaded 1 modules (110 guards): [0x885410, 0x8855c8), 
./libqubes-rpc-filecopy: Running 1 inputs 1 time(s) each.
Running: testcase
Executed testcase in 0 ms
***
*** NOTE: fuzzing was not performed, you have only
***       executed the target code on a fixed set of inputs.
***
==788==LeakSanitizer has encountered a fatal error.
==788==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==788==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)

@Dor1s Dor1s self-assigned this Jul 6, 2017

@Dor1s

This comment has been minimized.

Collaborator

Dor1s commented Jul 6, 2017

Good that you've noticed that. Let me check what's going on.

Can you try to run:

LSAN_OPTIONS=verbosity=1:log_threads=1 ./libqubes-rpc-filecopy testcase
@paraschetal

This comment has been minimized.

Contributor

paraschetal commented Jul 6, 2017

It gives some more debugging info

root@d359bcf8f6ae:/out# export LSAN_OPTIONS=verbosity=1:log_threads=1
root@d359bcf8f6ae:/out# ./libqubes-rpc-filecopy clusterfuzz-testcase-4748420839374848 < /dev/null
==736==AddressSanitizer: failed to intercept '__isoc99_printf'
==736==AddressSanitizer: failed to intercept '__isoc99_sprintf'
==736==AddressSanitizer: failed to intercept '__isoc99_snprintf'
==736==AddressSanitizer: failed to intercept '__isoc99_fprintf'
==736==AddressSanitizer: failed to intercept '__isoc99_vprintf'
==736==AddressSanitizer: failed to intercept '__isoc99_vsprintf'
==736==AddressSanitizer: failed to intercept '__isoc99_vsnprintf'
==736==AddressSanitizer: failed to intercept '__isoc99_vfprintf'
==736==AddressSanitizer: failed to intercept '__cxa_throw'
==736==AddressSanitizer: libc interceptors initialized
|| `[0x10007fff8000, 0x7fffffffffff]` || HighMem    ||
|| `[0x02008fff7000, 0x10007fff7fff]` || HighShadow ||
|| `[0x00008fff7000, 0x02008fff6fff]` || ShadowGap  ||
|| `[0x00007fff8000, 0x00008fff6fff]` || LowShadow  ||
|| `[0x000000000000, 0x00007fff7fff]` || LowMem     ||
MemToShadow(shadow): 0x00008fff7000 0x000091ff6dff 0x004091ff6e00 0x02008fff6fff
redzone=16
max_redzone=2048
quarantine_size_mb=256M
thread_local_quarantine_size_kb=1024K
malloc_context_size=30
SHADOW_SCALE: 3
SHADOW_GRANULARITY: 8
SHADOW_OFFSET: 0x7fff8000
==736==Installed the sigaction for signal 11
==736==Installed the sigaction for signal 7
==736==Installed the sigaction for signal 8
==736==T0: stack [0x7ffd88632000,0x7ffd88e32000) size 0x800000; local=0x7ffd88e30128
==736==AddressSanitizer Init done
INFO: Seed: 4098379181
INFO: Loaded 1 modules (110 guards): [0x885410, 0x8855c8), 
==736==T1: stack [0x7f1812c79000,0x7f1813478f80) size 0x7fff80; local=0x7f1813478e88
./libqubes-rpc-filecopy: Running 1 inputs 1 time(s) each.
Running: clusterfuzz-testcase-4748420839374848
Executed clusterfuzz-testcase-4748420839374848 in 0 ms
***
*** NOTE: fuzzing was not performed, you have only
***       executed the target code on a fixed set of inputs.
***
==738==Could not attach to thread 736 (errno 1).
==738==Could not attach to thread 737 (errno 1).
==738==Failed suspending threads.
==736==LeakSanitizer has encountered a fatal error.
==736==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==736==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
@paraschetal

This comment has been minimized.

Contributor

paraschetal commented Jul 6, 2017

The Could not attach to thread seems suspicious. There is no multithreading involved in the fuzz target, so I think it's a libFuzzer error.

@Dor1s

This comment has been minimized.

Collaborator

Dor1s commented Jul 6, 2017

Below what I did:

$ python infra/helper.py build_image qubes-os && python infra/helper.py build_fuzzers qubes-os && python infra/helper.py shell qubes-os
Pull latest base images (compiler/runtime)? (y/N): y
Pulling latest base images...
....
# Copied testcase to the docker
$ /out/libqubes-rpc-filecopy ./testcase 
INFO: Seed: 2894952457
INFO: Loaded 1 modules (110 guards): [0x883350, 0x883508), 
/out/libqubes-rpc-filecopy: Running 1 inputs 1 time(s) each.
Running: ./testcase
Executed ./testcase in 2 ms
***
*** NOTE: fuzzing was not performed, you have only
***       executed the target code on a fixed set of inputs.
***
$ /out/libqubes-rpc-filecopy -runs=100 ./testcase 
INFO: Seed: 2900002431
INFO: Loaded 1 modules (110 guards): [0x883350, 0x883508), 
/out/libqubes-rpc-filecopy: Running 1 inputs 100 time(s) each.
Running: ./testcase
Executed ./testcase in 84 ms
***
*** NOTE: fuzzing was not performed, you have only
***       executed the target code on a fixed set of inputs.
***

The timeout is not reproducible (that happens, we simply do not report non reproducible crashes), but everything works fine.

@Dor1s

This comment has been minimized.

Collaborator

Dor1s commented Jul 6, 2017

Why do you run the target with < /dev/null in the end of command line? That shouldn't be needed, and our infrastructure doesn't run fuzz targets that way.

@Dor1s

This comment has been minimized.

Collaborator

Dor1s commented Jul 6, 2017

Below is my debugging info:

$ LSAN_OPTIONS=verbosity=1:log_threads=1 /out/libqubes-rpc-filecopy ./testcase 
==235==AddressSanitizer: failed to intercept '__isoc99_printf'
==235==AddressSanitizer: failed to intercept '__isoc99_sprintf'
==235==AddressSanitizer: failed to intercept '__isoc99_snprintf'
==235==AddressSanitizer: failed to intercept '__isoc99_fprintf'
==235==AddressSanitizer: failed to intercept '__isoc99_vprintf'
==235==AddressSanitizer: failed to intercept '__isoc99_vsprintf'
==235==AddressSanitizer: failed to intercept '__isoc99_vsnprintf'
==235==AddressSanitizer: failed to intercept '__isoc99_vfprintf'
==235==AddressSanitizer: failed to intercept '__cxa_throw'
==235==AddressSanitizer: libc interceptors initialized
|| `[0x10007fff8000, 0x7fffffffffff]` || HighMem    ||
|| `[0x02008fff7000, 0x10007fff7fff]` || HighShadow ||
|| `[0x00008fff7000, 0x02008fff6fff]` || ShadowGap  ||
|| `[0x00007fff8000, 0x00008fff6fff]` || LowShadow  ||
|| `[0x000000000000, 0x00007fff7fff]` || LowMem     ||
MemToShadow(shadow): 0x00008fff7000 0x000091ff6dff 0x004091ff6e00 0x02008fff6fff
redzone=16
max_redzone=2048
quarantine_size_mb=256M
thread_local_quarantine_size_kb=1024K
malloc_context_size=30
SHADOW_SCALE: 3
SHADOW_GRANULARITY: 8
SHADOW_OFFSET: 0x7fff8000
==235==Installed the sigaction for signal 11
==235==Installed the sigaction for signal 7
==235==Installed the sigaction for signal 8
==235==T0: stack [0x7ffe3c719000,0x7ffe3cf19000) size 0x800000; local=0x7ffe3cf1748c
==235==AddressSanitizer Init done
INFO: Seed: 3166578510
INFO: Loaded 1 modules (110 guards): [0x883350, 0x883508), 
==235==T1: stack [0x7ffb8cf79000,0x7ffb8d778f80) size 0x7fff80; local=0x7ffb8d778e9c
/out/libqubes-rpc-filecopy: Running 1 inputs 1 time(s) each.
Running: ./testcase
Executed ./testcase in 3 ms
***
*** NOTE: fuzzing was not performed, you have only
***       executed the target code on a fixed set of inputs.
***
==237==Processing thread 235.
==237==Stack at 0x7ffe3c719000-0x7ffe3cf19000 (SP = 0x7ffe3cf15318).
==237==TLS at 0x7ffb925d5000-0x7ffb925d6080.
==237==Processing thread 236.
==237==Stack at 0x7ffb8cf79000-0x7ffb8d778f80 (SP = 0x7ffb8d778e50).
==237==TLS at 0x7ffb8d778f80-0x7ffb8d77a000.
@paraschetal

This comment has been minimized.

Contributor

paraschetal commented Jul 6, 2017

#681 (comment)
@oliverchang mentioned that the targets are run as such

@Dor1s

This comment has been minimized.

Collaborator

Dor1s commented Jul 6, 2017

This can be the problem on your machine: google/sanitizers#764 (comment)

@Dor1s

This comment has been minimized.

Collaborator

Dor1s commented Jul 6, 2017

Yes, you're right, @oliverchang added that since some targets may incorrectly read something from stdin. It's not an issue though. Looks like the problem happens due to ptrace or --privileged

@paraschetal

This comment has been minimized.

Contributor

paraschetal commented Jul 6, 2017

Yes, running the docker container with the --privileged flag no longer gives any LSan errors

root@14dfcd0793e2:/out# ./libqubes-rpc-filecopy clusterfuzz-testcase-4588065282588672 
INFO: Seed: 3664674487
INFO: Loaded 1 modules (126 guards): [0x8865f0, 0x8867e8), 
./libqubes-rpc-filecopy: Running 1 inputs 1 time(s) each.
Running: clusterfuzz-testcase-4588065282588672
Executed clusterfuzz-testcase-4588065282588672 in 2 ms
***
*** NOTE: fuzzing was not performed, you have only
***       executed the target code on a fixed set of inputs.
***
@paraschetal

This comment has been minimized.

Contributor

paraschetal commented Jul 6, 2017

How should I fix the timeouts happening in clusterfuzz then, since the testcase is not reproducible?

@Dor1s

This comment has been minimized.

Collaborator

Dor1s commented Jul 6, 2017

Cool! The last question is why that non reproducible timeout happens so often... Looking at the stacktrace, it seems that there is an attempt to open a file?

  | #0 0x4e8483 in __sanitizer_print_stack_trace _asan_rtl_
  | #1 0x53c652 in fuzzer::Fuzzer::AlarmCallback() /src/libfuzzer/FuzzerLoop.cpp:234:7
  | #2 0x7fb2d300e38f in libpthread.so.0
  | #3 0x7fb2d270847c in __open64 /build/glibc-9tT8Do/glibc-2.23/sysdeps/unix/syscall-template.S:84
  | #4 0x7fb2d268bacd in _IO_file_open /build/glibc-9tT8Do/glibc-2.23/libio/fileops.c:221
  | #5 0x7fb2d268bd34 in _IO_file_fopen /build/glibc-9tT8Do/glibc-2.23/libio/fileops.c:328
  | #6 0x7fb2d267fd33 in __fopen_internal /build/glibc-9tT8Do/glibc-2.23/libio/iofopen.c:86
  | #7 0x49677a in __interceptor_fopen /src/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:5379
  | #8 0x515fa9 in LLVMFuzzerTestOneInput /src/qubes-os/linux-utils/qrexec-lib/fuzzer/fuzzer.cc:22:19
  | #9 0x53f3c9 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:460:13
  | #10 0x53fbfa in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:399:3
  | #11 0x541321 in fuzzer::Fuzzer::MutateAndTestOne() /src/libfuzzer/FuzzerLoop.cpp:598:30
  | #12 0x541ae7 in fuzzer::Fuzzer::Loop() /src/libfuzzer/FuzzerLoop.cpp:626:5
  | #13 0x522368 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:744:6
  | #14 0x516148 in main /src/libfuzzer/FuzzerMain.cpp:20:10
  | #15 0x7fb2d263282f in __libc_start_main /build/glibc-9tT8Do/glibc-2.23/csu/libc-start.c:291
  | #16 0x41cb18 in _start
@Dor1s

This comment has been minimized.

Collaborator

Dor1s commented Jul 6, 2017

Yeah...

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
{
  
  const char* in_filename = "/tmp/in";
  const char* out_filename = "/tmp/out";
  FILE* in_file = fopen(in_filename, "wb+");
  int in_fd = fileno(in_file);
  write_all(in_fd, data, size);

  FILE* out_file = fopen(out_filename, "wb+");
  int out_fd = fileno(out_file);
  
  dup2(out_fd, STDOUT_FILENO);

  qfile_pack_init();
  register_error_handler(display_error);
  do_fs_walk(in_filename, 0);
  
  fclose(in_file);
  fclose(out_file);
  
  return 0;
}

Is there any way to avoid files creation?

@paraschetal

This comment has been minimized.

Contributor

paraschetal commented Jul 6, 2017

Yes. The target basically packs the data from an input file /tmp/in and sends it to STDOUT which I have dup2'd to an output file /tmp/out

@Dor1s Dor1s removed their assignment Jul 6, 2017

@paraschetal

This comment has been minimized.

Contributor

paraschetal commented Jul 6, 2017

Since it is a filecopy library I think file creation might be necessary. Still, I'll take a closer look and see if there can be a possible workaround.

@paraschetal

This comment has been minimized.

Contributor

paraschetal commented Jul 9, 2017

I don't think a feasible workaround is possible here. I could make the fuzzing work without file creation, but

  1. I would have to make a lot of patches in the library. This would not be favorable for the continuous fuzzing we are aiming for.
  2. The coverage of the target will be reduced.

Can the issue be resolved some other way? What exactly is causing libfuzzer to timeout due to file I/O?

@Dor1s

This comment has been minimized.

Collaborator

Dor1s commented Jul 10, 2017

Oliver @oliverchang do you have any ideas why existing fuzz target always dies with a non reproducible timeout?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment