Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ptrace support for BSDs #949

Merged
merged 1 commit into from Oct 21, 2018
Merged

ptrace support for BSDs #949

merged 1 commit into from Oct 21, 2018

Conversation

@xd009642
Copy link
Contributor

@xd009642 xd009642 commented Oct 7, 2018

This PR adds support to the ptrace API for BSDs to close #947. It also adds a read and write method for reading and writing to a traced processes memory. The ptrace API created for linux offers this via a deprecated function so I added this so they can be feature equivalent without replicating a deprecated part of the API.

Due to the differences in ptrace on BSD and linux I've made a ptrace module to keep things readable.

Still to do - revert travis config to remove my feature branch and update the changelog.

Copy link
Member

@asomers asomers left a comment

Good start so far. You should also try enabling the tests in test/sys/test_ptrace.rs.

.travis.yml Outdated
@@ -141,6 +141,7 @@ branches:
# bors-ng branches; see https://bors-ng.github.io/getting-started/
- trying
- staging
- issue-947

This comment has been minimized.

@asomers

asomers Oct 7, 2018
Member

This line shouldn't be necessary. Travis automatically builds PRs. This line is only necessary if you want nix-rust/nix to have a branch named "issue-947".

None => 0,
};
unsafe {
ptrace_other(Request::PT_CONTINUE, pid, ptr::null_mut(), data).map(|_| ()) // ignore the useless return value

This comment has been minimized.

@asomers

asomers Oct 7, 2018
Member

Please line wrap to 80 cols.

/// }
/// }
/// ```
#[cfg(not(any(target_os = "netbsd", target_os = "openbsd")))]

This comment has been minimized.

@asomers

asomers Oct 7, 2018
Member

Why not enabled for these OSes? They both define PT_STEP.

This comment has been minimized.

This comment has been minimized.

@krytarowski

krytarowski Oct 8, 2018

NetBSD supports PT_STEP in CPU specific headers, not all CPUs support it.


/// Step isn't supported on netbsd or openbsd
#[cfg(any(target_os = "netbsd", target_os = "openbsd"))]
pub fn Step<T: Into<Option<Signal>>>(pid: Pid, sig: T) -> Result<()> {

This comment has been minimized.

@asomers

asomers Oct 7, 2018
Member

You shouldn't define this function. It's better to make programs fail at compile time than at runtime. Also, the capitalization is wrong.

@@ -20,7 +20,7 @@ cfg_if! {
}

libc_enum!{
#[cfg_attr(not(any(target_env = "musl", target_os = "android")), repr(u32))]
#[cfg_attr(not(any(target_env = "musl", target_os = "android", target_os = "macos")), repr(u32))]

This comment has been minimized.

@asomers

asomers Oct 7, 2018
Member

This change shouldn't be necessary, since the file is not included on OSX.

@xd009642 xd009642 force-pushed the xd009642:issue-947 branch from 94c8190 to ed943a2 Oct 7, 2018
@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 7, 2018

Well that didn't quite go as expected. I'm unsure what setup the buildbot instances are (guessing freebsd though) and the apple ptrace calls just seem to be returning ENOTSUP As that's returned if you use things like PT_DENY_ATTACH in apple I'm thinking maybe it's blocking ptrace calls for security reasons. Though that is just a guess.

Copy link
Member

@asomers asomers left a comment

The compile failure is because src/sys/mod.rs needs a change, too.

@@ -26,6 +26,5 @@ mod test_uio;
#[cfg(target_os = "linux")]
mod test_epoll;
mod test_pthread;
#[cfg(any(target_os = "android",
target_os = "linux"))]
#[cfg(not(target_os = "ios"))]

This comment has been minimized.

@asomers

asomers Oct 8, 2018
Member

This #[cfg()] statement is too broad. Nix runs on many platforms. You should opt-in, not opt-out.

@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 8, 2018

Okay well I've just opened a libc PR that adds PT_STEP for the relevant archs. So once that's merged I'll update accordingly.

Any idea about the ENOTSUP for the apple ptrace tests? Might be a qemu based issue? I noticed that in the test comments it says it returns ENOSYS when it can't ptrace, might it do ENOTSUP on apple?

@krytarowski
Copy link

@krytarowski krytarowski commented Oct 8, 2018

"libc PR that adds PT_STEP" where?

@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 8, 2018

In libc rust-lang/libc#1091 (comment) none of the ptrace requests were available in BSDs so I did a PR, did another one to add the arch specific requests today 😄

Edit: Just realised you're more of a BSD person not a rust person! libc is a rust library we use to reexport libc functions and constants etc.

@asomers
Copy link
Member

@asomers asomers commented Oct 8, 2018

Okay well I've just opened a libc PR that adds PT_STEP for the relevant archs. So once that's merged I'll update accordingly.

Any idea about the ENOTSUP for the apple ptrace tests? Might be a qemu based issue? I noticed that in the test comments it says it returns ENOSYS when it can't ptrace, might it do ENOTSUP on apple?

I don't know. Have you tried it on real Apple hardware? I don't have any.

@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 8, 2018

I don't have any either, I'm trying to add OSX support for one of my projects for a user that asked for it. I'll ask them if they can run it

@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 8, 2018

Well it doesn't seem to work on an actual apple pc as well. I think it might need additional fiddling via functions that aren't in libc but are in https://github.com/fitzgen/mach/

In the interest of moving this along I could remove apple support from bsd, fix the buildbot errors then work on apple support as an additional PR? Or try and do it all in this one.

How would you feel about adding a dependency on https://github.com/fitzgen/mach/ for apple users?

@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 8, 2018

I might be onto something, I've got test_ptrace_cont passing on apple! So I'm going to assume there's now no need for mach as a crate

@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 8, 2018

Okay I might be all good now!

One potential issue with the tests I noticed there was a spurious test_wait failure, this might be caused by test_wait and test_ptrace_cont being ran at the same time by rustc-test if waitpid will respond to any signal from the same PGID on the system in question. I've had similar issues in my own project when trying to test things that use wait or ptrace.

Either way I'll check back tomorrow and see what travis/buildbot says

@asomers
Copy link
Member

@asomers asomers commented Oct 8, 2018

Tsk Tsk. You should always compile and test before pushing to Travis. Travis shouldn't be finding your unclosed delimiters for you.

@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 8, 2018

I know, it's this apple stuff has made me sloppy. I realised doing stuff in the bsd mode that cargo check and build didn't flag up syntax errors at all.

If I make a habit of OSX based PRs I'll try to figure out how to cross compile for OSX

@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 9, 2018

So I know I can't just change wait to waitpid in test_wait, but I just wanted to prove my theory on the last failure was corrected. With the design of test_wait you're vulnerable to stealing test_ptrace_cont signals if cargo test is ran with multiple threads.

As a result I've made test-threads=1 I know this will slow down CI runs, so there's also an alternative to make test_wait ignored then run it on it's own in a separate call to cargo test than the others.

While that did sort the apple issue for that one run TEST_THREADS was already 1 in the cargo cfg. So that might just be down to the fact the multiple processes essentially make this is a threading probably. I've added an extra ptrace::cont which isn't checked after the check for the kill just because some systems may need a continue to transition from raising the SIGKILL signal to actually exiting (I'll also try detach if that doesn't work).

Copy link
Member

@asomers asomers left a comment

test_ptrace_cont should grab the FORK_MTX, just like other tests that fork. That will fix your conflict with test_wait.

@@ -21,7 +21,7 @@ main() {

# Run tests on debug and release targets.
cross test --target $t
cross test --target $t --release
cross test --target $t --release

This comment has been minimized.

@asomers

asomers Oct 9, 2018
Member

Make sure to revert this whitespace change.

@@ -10,25 +12,29 @@ fn test_ptrace() {
// Just make sure ptrace can be called at all, for now.
// FIXME: qemu-user doesn't implement ptrace on all arches, so permit ENOSYS
let err = ptrace::attach(getpid()).unwrap_err();
assert!(err == Error::Sys(Errno::EPERM) || err == Error::Sys(Errno::ENOSYS));
assert!(err == Error::Sys(Errno::EPERM) || err == Error::Sys(Errno::EINVAL) ||

This comment has been minimized.

@asomers

asomers Oct 9, 2018
Member

Why are you allowing EINVAL?

This comment has been minimized.

@xd009642

xd009642 Oct 9, 2018
Author Contributor

https://www.freebsd.org/cgi/man.cgi?query=ptrace the BSDs seem to use EINVAL for attempting to attach to yourself. I don't know if all BSDs do that or if some mimic linux and do EPERM so I just had it as that for now. I can investigate and make it more specific though

This comment has been minimized.

@krytarowski

krytarowski Oct 9, 2018

NetBSD returns EINVAL

@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 9, 2018

@asomers FORK_MTX appears to have had no effect which seems super odd to me 😕

@krytarowski
Copy link

@krytarowski krytarowski commented Oct 9, 2018

If you find anything wrong with NetBSD and ptrace(2), please let me know!

@krytarowski
Copy link

@krytarowski krytarowski commented Oct 9, 2018

BTW. you might find it useful http://netbsd.org/~kamil/ptrace-netbsd/presentation.html a little bit dated, but quite good starter.

@asomers
Copy link
Member

@asomers asomers commented Oct 9, 2018

The compile failure on NetBSD is purely a problem in your PR. You have an unused use statement on line 7. Delete that and it should compile. The runtime failure on OSX is more interesting. It looks like the test_ptrace_const test is leaving behind a zombie process on OSX, even though it doesn't on Linux or FreeBSD. It would be interesting to run the test on OSX with this patch, and see if there is any zombie in the output. You'll have to run it with --nocapture, of course.

diff --git a/test/sys/test_ptrace.rs b/test/sys/test_ptrace.rs
index 68d7922a..73966b30 100644
--- a/test/sys/test_ptrace.rs
+++ b/test/sys/test_ptrace.rs
@@ -96,6 +96,9 @@ fn test_ptrace_cont() {
                 }
                 _ => panic!("The process should have been killed"),
             }
+            use std::process;
+            let output = process::Command::new("ps").arg("-axwwd").output().unwrap();
+            println!("{}", String::from_utf8_lossy(&output.stdout));
         },
     }
 }
@asomers
Copy link
Member

@asomers asomers commented Oct 10, 2018

ps -axwwd is evidently not the right command to list parent-child relationships in OSX. But nonetheless the theory is correct. The (test-4342e99f348) process looks like the child being traced. So fixing the test requires figuring out what combination of ptrace(2), kill(2), and/or wait(2) will reap that child process.

@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 10, 2018

Great, I figured as much.

I'll try things that come to mind as potential fixes, (like currently ptrace::detach) and also in parallel try to find some documentation or blogs that shed light on the differences between apple and the rest of the world

@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 10, 2018

I've seemingly cracked it, this part of the linux man page for ptrace helped me realise what may be happening:

PTRACE_KILL
      Send  the  tracee a SIGKILL to terminate it.  (addr and data are
      ignored.)

      This operation is deprecated; do not use it!   Instead,  send  a
      SIGKILL  directly  using kill(2) or tgkill(2).  The problem with
      PTRACE_KILL is that it requires the  tracee  to  be  in  signal-
      delivery-stop,  otherwise  it  may  not work (i.e., may complete
      successfully but won't kill the tracee).  By contrast, sending a
      SIGKILL directly has no such limitation.

And then in the apple man page for ptrace

PT_KILL       The traced process terminates, as if PT_CONTINUE had been
              used with SIGKILL given as the signal to be delivered.

I know they're different OSes but if PT_KILL in apple is subject to the same potential pitfalls of linux that would explain the issues we've observed. Therefore after waitpid catches the kill we sent, just in case it's a false positive kill I send a direct kill to the tracee.

I've also added a comment to document this behaviour in the test to explain the rationale for future maintainers.

@asomers
Copy link
Member

@asomers asomers commented Oct 10, 2018

I've seemingly cracked it, this part of the linux man page for ptrace helped me realise what may be happening:

PTRACE_KILL
      Send  the  tracee a SIGKILL to terminate it.  (addr and data are
      ignored.)

      This operation is deprecated; do not use it!   Instead,  send  a
      SIGKILL  directly  using kill(2) or tgkill(2).  The problem with
      PTRACE_KILL is that it requires the  tracee  to  be  in  signal-
      delivery-stop,  otherwise  it  may  not work (i.e., may complete
      successfully but won't kill the tracee).  By contrast, sending a
      SIGKILL directly has no such limitation.

And then in the apple man page for ptrace

PT_KILL       The traced process terminates, as if PT_CONTINUE had been
              used with SIGKILL given as the signal to be delivered.

I know they're different OSes but if PT_KILL in apple is subject to the same potential pitfalls of linux that would explain the issues we've observed. Therefore after waitpid catches the kill we sent, just in case it's a false positive kill I send a direct kill to the tracee.

I've also added a comment to document this behaviour in the test to explain the rationale for future maintainers.

The Linux manpage surely doesn't apply. ptrace(2) is very different in Linux than in the BSDs. OSX takes after the BSDs, but adds its own differences to ptrace(2) due to Mach. I'm not saying that your solution is wrong, but I don't accept the rationale.

@krytarowski
Copy link

@krytarowski krytarowski commented Oct 10, 2018

No idea about Darwin, but while PT_KILL is broken on Linux, it's perfectly fine on BSDs (at least NetBSD).

@xd009642 xd009642 force-pushed the xd009642:issue-947 branch from 1bca5fc to 9d4dad4 Oct 10, 2018
@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 10, 2018

It might be tenuous but it is the only source that could explain this strange apple behaviour. I'm now going to see if PT_KILL works on apple different to PT_CONTINUE + SIGKILL.

I've also gone over all the signal and tracing stuff in http://newosxbook.com/MOXiI.pdf and can find no justification. Apart from the fact ptrace is crippled on apple for security/obfuscation reasons, there are undocumented parts and it's generally a pain as a result.

@xd009642 xd009642 force-pushed the xd009642:issue-947 branch from 9d4dad4 to 76f7120 Oct 10, 2018
@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 12, 2018

Thinking of the kill, does it actually effect what the test is attempting to show. test_ptrace_cont wants to show that PT_CONTINUE works and also that you can use it to forward signals onto the tracee. This behaviour is still shown with the additional kill call in as we check the kill signal was received when sent via ptrace.

@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 15, 2018

@asomers any thoughts/issues RE my last comment?

Copy link
Member

@asomers asomers left a comment

I think I understand what you're saying: the extra trouble you go through to kill the process is irrelevant for the test coverage we're trying to achieve. I'm inclined to accept it. @Susurrus do you have any comments before we merge? @xd009642 you'll also need to rebase and squash.

@xd009642 xd009642 force-pushed the xd009642:issue-947 branch from f917ebf to 915b8ab Oct 15, 2018
@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 15, 2018

Yeah that's exactly the point I was making 😄 also rebased and squashed!

@xd009642 xd009642 force-pushed the xd009642:issue-947 branch from 915b8ab to 1955c24 Oct 17, 2018
@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 17, 2018

So I've just had to rebase this onto master again because of another changelog change, @Susurrus any comments before merging?

Copy link
Contributor

@Susurrus Susurrus left a comment

A few minor nits. I would prefer to see the kill, read, write additions to linux done first as a separate commit from adding all the BSD support, but I won't hold up merging this just for that change.

@@ -38,7 +38,13 @@ pub mod mman;

pub mod pthread;

#[cfg(any(target_os = "android", target_os = "linux"))]
#[cfg(any(target_os = "dragonfly",

This comment has been minimized.

@Susurrus

Susurrus Oct 19, 2018
Contributor

These should be in alphabetical order.

This comment has been minimized.

@Susurrus

Susurrus Oct 21, 2018
Contributor

This still isn't alphabetical.

This comment has been minimized.

@xd009642

xd009642 Oct 21, 2018
Author Contributor

Fixed, squashed and pushed. Sorry about that!

pub type RequestType = c_int;

cfg_if! {
if #[cfg(any(target_os = "macos",

This comment has been minimized.

@Susurrus

Susurrus Oct 19, 2018
Contributor

Alphabetical order here as well.

PT_WRITE_U,
PT_CONTINUE,
PT_KILL,
#[cfg(any(any(target_os = "macos",

This comment has been minimized.

@Susurrus

Susurrus Oct 19, 2018
Contributor

Alphabetical order

@@ -26,6 +26,11 @@ mod test_uio;
#[cfg(target_os = "linux")]
mod test_epoll;
mod test_pthread;
#[cfg(any(target_os = "android",
target_os = "linux"))]
#[cfg(any(target_os = "dragonfly",

This comment has been minimized.

@Susurrus

Susurrus Oct 19, 2018
Contributor

Alphabetical order here too.

@@ -50,10 +57,12 @@ fn test_ptrace_setsiginfo() {
fn test_ptrace_cont() {
use nix::sys::ptrace;
use nix::sys::signal::{raise, Signal};
use nix::sys::wait::{waitpid, WaitStatus};
use nix::sys::wait::{waitpid, WaitStatus, WaitPidFlag};

This comment has been minimized.

@Susurrus

Susurrus Oct 19, 2018
Contributor

Alphabetical order

src/sys/ptrace/bsd.rs Show resolved Hide resolved
src/sys/ptrace/linux.rs Show resolved Hide resolved
src/sys/ptrace/bsd.rs Show resolved Hide resolved
}

/// Issues a kill request as with ptrace(PT_KILL, ..-)
/// This request is equivalent to ptrace(PT_CONTINUE, ..., SIGKILL);

This comment has been minimized.

@Susurrus

Susurrus Oct 19, 2018
Contributor

Please put a blank comment line above this one. Otherwise this line looks weird in the generated docs. I know this was carried over from the Linux one, but we should fix it here anyways (feel free to fix it there if you'd like too).

This comment has been minimized.

@xd009642

xd009642 Oct 19, 2018
Author Contributor

Fix it as in a blank line above each function? I'm not sure which bit looks weird in the generated docs. Is it the function summary? https://docs.rs/nix/0.11.0/nix/sys/ptrace/index.html

src/sys/ptrace/linux.rs Show resolved Hide resolved
src/sys/ptrace/linux.rs Show resolved Hide resolved
src/sys/ptrace/linux.rs Show resolved Hide resolved
Copy link
Contributor

@Susurrus Susurrus left a comment

Just these 3 things, then let's squash and merge!


/// Issues a kill request as with ptrace(PTRACE_KILL, ..-)
/// This request is equivalent to ptrace(PTRACE_CONT, ..., SIGKILL);
///

This comment has been minimized.

@Susurrus

Susurrus Oct 20, 2018
Contributor

Sorry I wasn't clear, can you put this blank comment line between the two existing comment lines on 326 and 327? A blank line separates a brief summary line from a further longer description. Otherwise this shows as one really long line on the main ptrace doc page.

src/sys/ptrace/bsd.rs Show resolved Hide resolved
@xd009642 xd009642 force-pushed the xd009642:issue-947 branch 2 times, most recently from 71f77ea to 373536c Oct 20, 2018
@xd009642
Copy link
Contributor Author

@xd009642 xd009642 commented Oct 21, 2018

@Susurrus I think we're all good to go now 👍

* Moved ptrace API into it's own module with cfg'ed modules exported for linux/android or BSDs.
* Replicated current linux API for BSD
* Added API functions to peek and poke memory to avoid needing to replicate deprecated linux API and remaining feature complete
* Added helper function for `PTRACE_KILL` requests
* Updated tests based on new API changes
* Added addition kill calls to `test_ptrace_cont` as inferior death doesn't happen immediately on OSX which caused issues in the tests.
@xd009642 xd009642 force-pushed the xd009642:issue-947 branch from 373536c to f1573a7 Oct 21, 2018
@Susurrus
Copy link
Contributor

@Susurrus Susurrus commented Oct 21, 2018

bors r+

bors bot added a commit that referenced this pull request Oct 21, 2018
Merge #949
949: ptrace support for BSDs r=Susurrus a=xd009642

This PR adds support to the ptrace API for BSDs to close #947. It also adds a read and write method for reading and writing to a traced processes memory. The ptrace API created for linux offers this via a deprecated function so I added this so they can be feature equivalent without replicating a deprecated part of the API.

Due to the differences in ptrace on BSD and linux I've made a ptrace module to keep things readable.

Still to do - revert travis config to remove my feature branch and update the changelog. 

Co-authored-by: xd009642 <danielmckenna93@gmail.com>
@bors bors bot merged commit f1573a7 into nix-rust:master Oct 21, 2018
4 checks passed
4 checks passed
bors Build succeeded
Details
buildbot/nix-rust/nix amd64_fbsd11 Build done.
Details
buildbot/nix-rust/nix i386_fbsd11 Build done.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
bors bot added a commit that referenced this pull request Dec 1, 2019
Merge #1083
1083: Allow signal injection in ptrace::syscall and ptrace::detach r=asomers a=frangio

Fixes #1049

Should I add tests for this functionality?

By the way, I noticed that the BSD module doesn't have `ptrace::syscall`. I couldn't find a reason behind this in #949. Should we add it?

Co-authored-by: Francisco Giordano <frangio.1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

4 participants