Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failures #4

Closed
qrilka opened this issue Mar 20, 2019 · 8 comments
Closed

Test failures #4

qrilka opened this issue Mar 20, 2019 · 8 comments

Comments

@qrilka
Copy link
Collaborator

qrilka commented Mar 20, 2019

I run tests on

$ uname -a
Linux qdesktop 4.14.65-gentoo #1 SMP Sat Sep 22 15:19:49 MSK 2018 x86_64 Intel(R) Core(TM) i5-2300 CPU @ 2.80GHz GenuineIntel GNU/Linux

and I get the following reproducible failures

Failures:

  test/HatraceSpec.hs:68:7: 
  1) Hatrace.traceCreateProcess does not hang when the traced program segfaults
       expected: ExitFailure 139
        but got: ExitFailure 11

  To rerun use: --match "/Hatrace/traceCreateProcess/does not hang when the traced program segfaults/"

  test/HatraceSpec.hs:84:7: 
  2) Hatrace.sourceTraceForkExecvFullPathWithSink allows obtaining all syscalls as a list for hello.asm
       expected: [KnownSyscall Syscall_execve,KnownSyscall Syscall_write,KnownSyscall Syscall_exit]
        but got: [KnownSyscall Syscall_rt_sigprocmask,KnownSyscall Syscall_execve,KnownSyscall Syscall_write,KnownSyscall Syscall_exit]

  To rerun use: --match "/Hatrace/sourceTraceForkExecvFullPathWithSink/allows obtaining all syscalls as a list for hello.asm/"

  test/HatraceSpec.hs:122:9: 
  3) Hatrace.sourceTraceForkExecvFullPathWithSink, subprocess tracing, can trace 'bash -c ./hello'
       expected: [Syscall_clone,Syscall_write]
        but got: [Syscall_write]

  To rerun use: --match "/Hatrace/sourceTraceForkExecvFullPathWithSink/subprocess tracing/can trace 'bash -c ./hello'/"

Randomized with seed 1551383026

I'll try to dig what could be wrong here but maybe @nh2 you could give some hints what I should check first?

@nh2
Copy link
Owner

nh2 commented Mar 21, 2019

       expected: ExitFailure 139
        but got: ExitFailure 11

I've observed this before as well.

When we run it in the shell, we get exit code 139 (tried bash and zsh).

When running under strace, we get exit code 139.

But when running under hatrace, we get 11.

Relevant code is perhaps:

      (newState, event) <- case status of
        -- `Exited` means that the process chose to exit by itself,
        -- as in calling `exit()` (as opposed to e.g. getting killed
        -- by a signal).
        Exited i -> do
          case i of
            0 -> pure (state, Death $ ExitSuccess)
            _ -> pure (state, Death $ ExitFailure i)
        Continued -> error $ "waitForTraceEvent: BUG: Continued status appeared even though WCONTINUE was not passed to waitpid"
        -- Note that `Signaled` means that the process was *terminated*
        -- by a signal.
        -- Signals that come in without killing the process appear in
        -- the `Stopped` case.
        Signaled _sig -> pure (state, Death $ ExitFailure (fromIntegral fullStatus))

Should check which of the two Death constructors is happening here.

@nh2
Copy link
Owner

nh2 commented Mar 21, 2019

For the third one, absence of clone() it may be an optimisation in newer bash.

On Ubuntu 16.04, in GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu), I get with strace:

% strace -fye clone,execve bash -c 'true && example-programs-build/hello-linux-x86_64'
execve("/bin/bash", ["bash", "-c", "true && example-programs-build/h"...], [/* 47 vars */]) = 0
clone(strace: Process 11440 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f43759f39d0) = 11440
[pid 11440] execve("example-programs-build/hello-linux-x86_64", ["example-programs-build/hello-lin"...], [/* 46 vars */]) = 0
Hello, world!
[pid 11440] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=11440, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
+++ exited with 0 +++

While @qrilka gets:

On Gentoo, GNU bash, version 4.4.23(1)-release (x86_64-unknown-linux-gnu):

$ strace -fye clone,execve bash -c 'true && example-programs-build/hello-linux-x86_64'
execve("/home/qrilka/.nix-profile/bin/bash", ["bash", "-c", "true && example-programs-build/h"...], 0x7ffe60983030 /* 63 vars */) = 0
execve("example-programs-build/hello-linux-x86_64", ["example-programs-build/hello-lin"...], 0x720008 /* 63 vars */) = 0
Hello, world!
+++ exited with 0 +++

@nh2
Copy link
Owner

nh2 commented Mar 21, 2019

When trying with Bash 4.4.23 via nix on my Ubuntu, I also don't have the clone():

% strace -fye clone,execve /nix/store/czx8vkrb9jdgjyz8qfksh10vrnqa723l-bash-4.4-p23/bin/bash -c 'true && example-programs-build/hello-linux-x86_64'
execve("/nix/store/czx8vkrb9jdgjyz8qfksh10vrnqa723l-bash-4.4-p23/bin/bash", ["/nix/store/czx8vkrb9jdgjyz8qfksh"..., "-c", "true && example-programs-build/h"...], [/* 47 vars */]) = 0
execve("example-programs-build/hello-linux-x86_64", ["example-programs-build/hello-lin"...], [/* 47 vars */]) = 0
Hello, world!
+++ exited with 0 +++

So that really looks like an optimisation, and the true && I put into that test to trick bash into actually making a clone syscall doesn't work any more in newer bash.

@nh2
Copy link
Owner

nh2 commented Mar 21, 2019

OK, we found the solution to this one: true && example-programs-build/hello-linux-x86_64 should be example-programs-build/hello-linux-x86_64 && true to force bash to clone(). I'll fix it.

@nh2
Copy link
Owner

nh2 commented Mar 21, 2019

For the second one, regarding the extraneous Syscall_rt_sigprocmask, we found that reverting commit 874b29c (fork_exec_with_ptrace: Use raise() instead of kill()) gets rid of it.

From man 3 raise:

SYNOPSIS
       #include <signal.h>

       int raise(int sig);

DESCRIPTION
       The  raise()  function sends a signal to the calling process or thread.  In a sin‐
       gle-threaded program it is equivalent to

           kill(getpid(), sig);

       In a multithreaded program it is equivalent to

           pthread_kill(pthread_self(), sig);

       If the signal causes a handler to be called, raise() will return  only  after  the
       signal handler has returned.

This indicates that due to glibc differences between us, @qrilka's glibc does more (an extra rt_sigprocmask()) in raise().

My glibc version:

% ldd --version
ldd (Ubuntu GLIBC 2.23-0ubuntu11) 2.23

@qrilka's glibc version:

$ ldd --version
ldd (Gentoo 2.27-r6 p3) 2.27

@nh2
Copy link
Owner

nh2 commented Mar 21, 2019

Reverting the commit is the correct solution, as the extra handling for multi-threaded programs isn't necessary: We are just after fork() so there cannot be any other threads.

nh2 added a commit that referenced this issue Mar 21, 2019
This reverts commit 874b29c.

Also some commentary added.
@qrilka
Copy link
Collaborator Author

qrilka commented Mar 21, 2019

Ending with b968ad1 all of this seem to get resolved

@nh2
Copy link
Owner

nh2 commented Mar 21, 2019

Yep, that should be the case.

Commits 877fae7 and 8cd9ec5 fix the exit code 11 vs 139 issue, though I still do not really understand what's going on (comment added).

The two commits above fix the other issues.

@nh2 nh2 closed this as completed Mar 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants