Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support subprocess creation and management (was "fork and exec") #1987

Closed
3 of 5 tasks
sporksmith opened this issue Mar 24, 2022 · 9 comments
Closed
3 of 5 tasks

Support subprocess creation and management (was "fork and exec") #1987

sporksmith opened this issue Mar 24, 2022 · 9 comments
Assignees
Labels
Type: Bug Error or flaw producing unexpected results

Comments

@sporksmith
Copy link
Contributor

sporksmith commented Mar 24, 2022

Supporting fork and exec would make it easier to delegate complexity to wrapper shell or python scripts instead of adding more features to Shadow itself.

e.g. rather than Shadow natively supporting compression (#1554), a user could use a config like:

-path /bin/bash
-args -c "tor | gzip -"

This could also be used to address clean shutdown of processes (#1491) with something like:

hostname:
  processes:
  - path: /bin/bash
    args: -c "tor -f torrc & PID=$! ; sleep 100 && kill $PID"

Using killall #1986 might be a better alternative for this particular case, but a shell script would allow for greater flexibility

The biggest potential blocker right now is that not all file descriptors support duplication yet. However, unix pipes and regular files are probably sufficient to cover a lot of use cases. In the meantime we can log a warning for any file descriptors we can't duplicate into the child.

@stevenengler
Copy link
Contributor

The biggest potential blocker right now is that not all file descriptors support duplication yet. However, unix pipes and regular files are probably sufficient to cover a lot of use cases. In the meantime we can log a warning for any file descriptors we can't duplicate into the child.

We now support duplicating all descriptor types. There is still an existing issue with TCP sockets that would prevent accept() calls on duplicated listening sockets from working correctly in a forked process.

@robgjansen
Copy link
Member

FYI: Someone at USENIX ATC'22 told me that support for fork would unlock a lot of value for their use cases.

@stevenengler
Copy link
Contributor

We may also need to consider futexes shared between processes:

// TODO: currently only supports uaddr from the same virtual address space (i.e., threads)
// Support across different address spaces requires us to compute a unique id from the
// hardware address (i.e., page table and offset). This is needed, e.g., when using
// futexes across process boundaries.
SysCallReturn syscallhandler_futex(SysCallHandler* sys, const SysCallArgs* args) {

@sporksmith
Copy link
Contributor Author

Another motivating example: arti's way of using the obs4 pluggable transport currently involves forking and execing an obs4 process.

Investigating whether it makes sense to add an alternative mechanism to arti that would allow it to use an independently started process...

@trinity-1686a
Copy link
Contributor

fwiw, using fork/exec to start a pluggable transport such as obfs4 is not an arti thing, it also concerns tor, and is actually a requirement from the PT spec.

The parent (arti/tor/...) also wants access to PT stdout, so such a solution based on independently started process would probably require some form of UDS/named pipes

@sporksmith
Copy link
Contributor Author

sporksmith commented Dec 8, 2022

fwiw, using fork/exec to start a pluggable transport such as obfs4 is not an arti thing, it also concerns tor, and is actually a requirement from the PT spec.

The parent (arti/tor/...) also wants access to PT stdout, so such a solution based on independently started process would probably require some form of UDS/named pipes

@trinity-1686a In practice tor definitely supports a separately started proxy. e.g. the ClientTransportPlugin config param optionally accepts a socks IP and port instead of a binary to exec.

From my quick, possibly incorrect, read, I think the spec is intended to allow this, and is just made a little confusing here by trying to be both general and concise. I think the "parent process" that initially forks the PT process could be a shell rather than tor or arti themselves. The rest of the spec seems to indicate that all communication between tor/arti and the PT would be over the socks connection, with some configuration passed around in environment variables. (e.g. nothing about the client needing to capture stdin/stdout or use a named pipe etc)

@sporksmith sporksmith self-assigned this May 4, 2023
@sporksmith
Copy link
Contributor Author

sporksmith commented May 4, 2023

Starting to look at this now. My plan is roughly:

  • Migrate the clone handler to Rust
  • Implement clone3 (which is a superset of clone), and change the clone handler to share the internal implementation
  • Add support for the flags that would effectively do a fork
  • Add a fork handler that uses the same clone3 internal implementation
  • Implement exec

A couple other thoughts:

  • We want the native forked processes' native parent to (effectively) be shadow so that we can waitpid them. I think we can accomplish this with the CLONE_PARENT flag to clone/clone3
  • ChildPidWatcher's current mechanism of creating a pipe s.t. shadow's end of the pipe is notified when the child exits won't work well when shadow isn't the direct parent. we might be able to do it by sending a descriptor between shadow and the managed parent process over a unix pipe, but that's a bit complex. Changing ChildPidWatcher to use pidfd_open would be a much nicer solution but would mean requiring a backports kernel for Debian 10, since it needs kernel version 5.3. EDIT: This is done now: ChildPidWatcher: use pidfd_open #2937

sporksmith added a commit to sporksmith/shadow that referenced this issue May 11, 2023
Currently uses `clone` internally.

Progress on shadow#1987
sporksmith added a commit to sporksmith/shadow that referenced this issue May 11, 2023
Currently uses `clone` internally.

Progress on shadow#1987
sporksmith added a commit to sporksmith/shadow that referenced this issue May 11, 2023
Currently uses `clone` internally.

Progress on shadow#1987
sporksmith added a commit to sporksmith/shadow that referenced this issue May 11, 2023
Currently uses `clone` internally.

Progress on shadow#1987
sporksmith added a commit to sporksmith/shadow that referenced this issue May 12, 2023
Currently uses `clone` internally.

Progress on shadow#1987
@sporksmith
Copy link
Contributor Author

sporksmith commented May 13, 2023

Thinking a bit about how seccomp filters are going to work after exec.

Currently the filter is created and loaded in the LD_PRELOAD'd shim during initialization. It assumes that the SIGSYS signal handler has already been installed (earlier in shim initialization), which is where we route syscalls we want to emulate. We allow native syscalls from the shim itself by inspecting the instruction pointer of the call site and seeing if it's in one of the shim's functions.

Both of these will go wrong if we allow the managed process itself to exec. A syscall could be made before we've had a chance to reinstall the SIGSYS handler, which would result in a crash. The shim may also be loaded at a different address, so it's native syscall functions would no longer be correctly allow-listed. (And some other random code will be allow-listed).

We can't uninstall the filter before doing the exec - that's impossible by design.

I don't think there's a feasible way to have some sort of "shadow cookie" that the seccomp filter recognizes. The filter is BPF (not eBPF), and only gets read-only access to the instruction pointer and syscall number and args. If we stored a cookie in, for example, the high bits of the syscall number, we wouldn't be able to clear the cookie before allowing the syscall, so Linux would reject it as an invalid syscall number.

We can't access the program's other registers or memory, so e.g. storing a cookie on the stack also won't work.

It might be possible to replace our seccomp usage with eBPF, which has richer functionality, but using eBPF typically requires root access.

The best solution I can think of is to handle exec by killing the native process and spawning a fresh process from shadow's process. We'll need to be careful to migrate over simulated state that is preserved across exec, such as file descriptors.

sporksmith added a commit to sporksmith/shadow that referenced this issue May 15, 2023
This is mostly a refactor in preparation for supporting fork.
shadow#1987

There are also a couple minor behavioral changes, that shouldn't affect
any of our supported thread libraries:

* Some flags that we don't actually care about are no longer required
(e.g. CLONE_SYSVEM).
* CLONE_SETTLS *is* now required. Previously we would have tried to go
ahead and execute the native clone without it, but the shim would
misbehave since it depends on thread local storage.
sporksmith added a commit to sporksmith/shadow that referenced this issue May 15, 2023
This is mostly a refactor in preparation for supporting fork.
shadow#1987

There are also a couple minor behavioral changes, that shouldn't affect
any of our supported thread libraries:

* Some flags that we don't actually care about are no longer required
(e.g. CLONE_SYSVEM).
* CLONE_SETTLS *is* now required. Previously we would have tried to go
ahead and execute the native clone without it, but the shim would
misbehave since it depends on thread local storage.
sporksmith added a commit to sporksmith/shadow that referenced this issue May 15, 2023
The previous design using a pipe would have been painful to use while
implementing fork, since we would somewhow want the two ends of the pipe
owned by shadow and its "grandchild" process being forked.
shadow#1987

Conversely, using pidfd_open is simpler and more flexible in general. We
couldn't use it before since it requires Linux 5.3 or later, but we now
require such a kernel.
sporksmith added a commit to sporksmith/shadow that referenced this issue May 15, 2023
The previous design using a pipe would have been painful to use while
implementing fork, since we would somewhow want the two ends of the pipe
owned by shadow and its "grandchild" process being forked.
shadow#1987

Conversely, using pidfd_open is simpler and more flexible in general. We
couldn't use it before since it requires Linux 5.3 or later, but we now
require such a kernel.
sporksmith added a commit to sporksmith/shadow that referenced this issue May 16, 2023
The previous design using a pipe would have been painful to use while
implementing fork, since we would somewhow want the two ends of the pipe
owned by shadow and its "grandchild" process being forked.
shadow#1987

Conversely, using pidfd_open is simpler and more flexible in general. We
couldn't use it before since it requires Linux 5.3 or later, but we now
require such a kernel.
sporksmith added a commit that referenced this issue May 16, 2023
The previous design using a pipe would have been painful to use while
implementing fork, since we would somewhow want the two ends of the pipe
owned by shadow and its "grandchild" process being forked.
#1987
    
Conversely, using pidfd_open is simpler and more flexible in general. We
couldn't use it before since it requires Linux 5.3 or later, but we now
require such a kernel.
sporksmith added a commit to sporksmith/shadow that referenced this issue May 16, 2023
This is mostly a refactor in preparation for supporting fork.
shadow#1987

There are also a couple minor behavioral changes, that shouldn't affect
any of our supported thread libraries:

* Some flags that we don't actually care about are no longer required
(e.g. CLONE_SYSVEM).
* CLONE_SETTLS *is* now required. Previously we would have tried to go
ahead and execute the native clone without it, but the shim would
misbehave since it depends on thread local storage.
sporksmith added a commit that referenced this issue May 16, 2023
This is mostly a refactor in preparation for supporting fork.
#1987

There are also a couple minor behavioral changes, that shouldn't affect
any of our supported thread libraries:

* Some flags that we don't actually care about are no longer required
(e.g. CLONE_SYSVEM).
* CLONE_SETTLS *is* now required. Previously we would have tried to go
ahead and execute the native clone without it, but the shim would
misbehave since it depends on thread local storage.
sporksmith added a commit that referenced this issue May 22, 2023
The previous initialization had a complex inter-relationship between
ManagedThread, Thread, and Process; e.g. we needed the Process's shared
memory handle to create the ManagedThread, but don't create the Process
itself until the ManagedThread and Thread have been created.
Additionally, the thread shared memory handle was being sent through an
env variable, which won't work with fork.

The first thing this does is detangle that so that we create a
ManagedThread first, then a Thread, then a Process. We do this by moving
the Process's and Thread's shared memory handles into an initialization
message that is sent the first time we want to run the thread, so that
we no longer need them when *creating* thread.

That step creates a large initialization message, which increased the
size of all shadow -> shim messages because of the enum used for such
messages. So we next move the initialization data out of band and change
the "start handshake" so that the plugin first sends pointers to the
data to be initialized. Shadow writes the data directly, and finishes
with another message to let the shim know it's done.

Progress on #1987
sporksmith added a commit to sporksmith/shadow that referenced this issue May 23, 2023
sporksmith added a commit to sporksmith/shadow that referenced this issue May 23, 2023
sporksmith added a commit to sporksmith/shadow that referenced this issue May 23, 2023
sporksmith added a commit to sporksmith/shadow that referenced this issue May 24, 2023
When a Process is forked (without CLONE_FILES), the child gets a copy of
the descriptor table. IIUC simply cloning the table should give us the
right behavior - the child will gets its own table, with cloned
references to the same descriptors.

Progress on shadow#1987
sporksmith added a commit to sporksmith/shadow that referenced this issue May 24, 2023
When we fork a Process, the child will use the same strace file (if
any). This is analogous to `strace -f`.

Progress on shadow#1987.
sporksmith added a commit to sporksmith/shadow that referenced this issue May 24, 2023
When we fork a Process, the child will use the same strace file (if
any). This is analogous to `strace -f`.

Progress on shadow#1987.
sporksmith added a commit that referenced this issue May 25, 2023
For clone in particular the logic was distributed in a way that was a
bit confusing, and was about to become more confusing as we started
implementing emulation for more CloneFlags; different components would
need to handle emulating different flags, making it difficult to keep
track of where each flag is handled.

The clone syscallhandler now directly creates the `ManagedThread` and
`Thread`, and adds the `Thread` to the current `Process`. It will later
be extended to support creating a new `Process` containing the new
`Thread` instead.

We likewise cut out the `Thread::spawn` "middle-man" and handle all the
orchestration from `Process::spawn`.

Progress on #1987.
@sporksmith sporksmith changed the title Support fork and exec syscalls Support subprocess creation and management (was "fork and exec") Jun 1, 2023
sporksmith added a commit to sporksmith/shadow that referenced this issue Aug 1, 2023
When a Process is forked (without CLONE_FILES), the child gets a copy of
the descriptor table. IIUC simply cloning the table should give us the
right behavior - the child will gets its own table, with cloned
references to the same descriptors.

Progress on shadow#1987
sporksmith added a commit that referenced this issue Aug 1, 2023
When a Process is forked (without CLONE_FILES), the child gets a copy of
the descriptor table. IIUC simply cloning the table should give us the
right behavior - the child will gets its own table, with cloned
references to the same descriptors.

Progress on #1987
@sporksmith
Copy link
Contributor Author

sporksmith commented Sep 25, 2023

Closing this, since the MVP is done. Leaving open the execveat and vfork issues, and the "support subprocess creation" milestone, to track further enhancements and fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Error or flaw producing unexpected results
Projects
None yet
Development

No branches or pull requests

4 participants