Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recording FF: Hanging at waitpid() after trying to step tracee into sighandler frame #288

Closed
joneschrisg opened this issue Aug 6, 2013 · 10 comments · Fixed by #359
Closed
Milestone

Comments

@joneschrisg
Copy link
Contributor

No guesses yet. The task being waited on is still alive, and in the interruptible sleep state.

@joneschrisg
Copy link
Contributor Author

I just reproduced this while working on another bug. This happens when a SIGABRT is being delivered. Odd ...

@joneschrisg
Copy link
Contributor Author

It's a non-main thread raising the signal to the "pid" (tgid). Perhaps linux wants to do something in the context of the thread-group leader, and we're waiting on the wrong task. This is getting tricky :(.

@joneschrisg
Copy link
Contributor Author

Call the off-main-thread T, main thread (thread-group leader) M.

Linux (or ptrace) does different things depending on termsig disposition. For the "dump-core" signal, ptrace seems to want us to do the following

  • (T gets the ptrace-signal-stop notification, say SIGABRT)
  • PTRACE_SINGLESTEP T, delivering the SIGABRT
  • waitpid(-1, __WALL) --- M stops with SIGTRAP, and ptrace event EXIT (msg 6).
  • waitpid(-1, __WALL) --- T stops with SIGTRAP, ptrace event EXIT (msg 134).

So now both tasks are dead, but we can't wait on the one we delivered the signal to :/.

For "terminate"-disposition signals, we get a ptrace notification for the signaled task. HOWEVER, the tasks' status is WIFSIGNALED(), whereas for above it's WIFSTOPPED(). The WIFSIGNALED() fails an assertion.

The first case is going to be hard to handle without reworking the main recorder loop, which I've wanted to do for a while anyway. We might be able to hack it in. Not sure about all the details of the SIGTERM case yet.

@joneschrisg
Copy link
Contributor Author

The SIGTERM case is a little simpler

  • (T gets the ptrace-signal-stop notification for SIGTERM)
  • PTRACE_SINGLESTEP T delivers the SIGTERM
  • waitpid of T shows SIGTRAP, ptrace event EXIT, code 15
  • waitpid(-1) --- M stops with SIGTRAP, ptrace event EXIT, code 15

We may able to squeeze this in without any changes to the main loop. Not sure about the SIGABRT case yet though.

@joneschrisg
Copy link
Contributor Author

Even SIGTERM is going to be kind of annoying to handle, because we don't have a way to report "tracee-exited" from handle_signal. (I added USR_NOOP for another similar case, but don't want to pile onto that.) We may have a small step available where we split the logic of handle_signal to return "handled" or "not handled", and for the latter, advance execution from the central location and check for ptrace events. But we'll have to see.

@joneschrisg
Copy link
Contributor Author

I don't think the kind of hacks we'd write in a detour to support this are going to get us closer to the 0.1 goals. So I'm thinking #293 is the way to go.

@joneschrisg
Copy link
Contributor Author

Running into a pernicious problem now where delivering signals "correctly" is causing tracee status to change behind rr's back, while rr thinks the tracee is idle. We can't synchronously waitpid on the scheduled idle tracee, because its status may not have changed. I think I have a solution though ...

@joneschrisg
Copy link
Contributor Author

Solution works, but there's a potential scheduling fairness problem when rr records multiple task groups. May need to teach rr about those. More details in a bit.

@joneschrisg
Copy link
Contributor Author

The problem is that, when we get ready to deliver a signal to a tracee t, we may have another set of tracees S, some of which may be idle. It's an invariant of the recorder that tracee status only changes by rr resuming its execution. But when we deliver the signal to t, after-effects may ripple back to some s \in S, for example triggering a premature exit on a death signal. That violates the invariant above, and unsurprisingly, causes a variety of bugs where tracees die when rr doesn't expect them to, etc.

A solution is, when we deliver a signal (one that doesn't have a user handler, to be precise), set an "unstable" bit on all the tracees. The scheduler won't schedule an runnable but unstable task. Initially, the scheduler will have to enter a waitpid(-1) to ask the kernel for the next runnable task. At least one must change state: at least the one to which the signal was delivered. After that, if the task to which the signal was delivered resumes normally, we assume that the signal was ~ignored, and no more after-effects are expected. That lets us clear the "unstable" bit on all the tasks. Or if some other task is selected and runs, we clear just its unstable bit. Until all the unstable bits are cleared, we keep either (i) asking waitpid(-1) for a runnable task; (ii) scheduling tasks from the "stable" pool.

This seems to work pretty well within the same task group (process), but multiple tgs will cause problems. In the worst case, imagine there are two processes, p1 and p2. Let's say p1 has a pending SIG_DFL SIGSEGV, and let's say p2 was stopped by a time-slice interrupt. With the naive algorithm above, we'll deliver the segfault to p1, mark both p1 and p2 unstable, and then continue. ptrace will most likely schedule p1 next, and it will die. p2 won't be affected. However, p2 will still be marked "unstable", so the scheduler will go into a waitpid(-1) wait. Since p2 is just sitting idle, its process status will never change, and rr will hang forever.

The solution, I think, is if a signal is delivered to a task t, only mark "unstable" the other tasks in its task group. It's not too hard to track that, but there are some annoying details to get right.

I would normally punt this with glee, but FF forks a process early in startup to poke the GLX implementation, and it's fully expected that that process will crash. But since rr only works on a very restricted subset of drivers, we can probably sneak by punting this until we have broader driver support.

@joneschrisg
Copy link
Contributor Author

Solution works,

I take that back: it works great for SIGABRT (core-dumping), but still intermittently fails with SIGTERM (terminating). Looking over the notes I took above, it looks like we want to use the behavior described in the prior comment for core-dumping signals, and synchronously waitpid the signal deliveree for the terminating signals. Sigh. Or punt the terminating signals, since they'll be far less common than abort()s ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant