Support subprocess creation and management (was "fork and exec") #1987

sporksmith · 2022-03-24T18:29:58Z

Supporting fork and exec would make it easier to delegate complexity to wrapper shell or python scripts instead of adding more features to Shadow itself.

e.g. rather than Shadow natively supporting compression (#1554), a user could use a config like:

-path /bin/bash
-args -c "tor | gzip -"

This could also be used to address clean shutdown of processes (#1491) with something like:

hostname:
  processes:
  - path: /bin/bash
    args: -c "tor -f torrc & PID=$! ; sleep 100 && kill $PID"

Using killall #1986 might be a better alternative for this particular case, but a shell script would allow for greater flexibility

The biggest potential blocker right now is that not all file descriptors support duplication yet. However, unix pipes and regular files are probably sufficient to cover a lot of use cases. In the meantime we can log a warning for any file descriptors we can't duplicate into the child.

The text was updated successfully, but these errors were encountered:

stevenengler · 2022-06-30T18:18:30Z

The biggest potential blocker right now is that not all file descriptors support duplication yet. However, unix pipes and regular files are probably sufficient to cover a lot of use cases. In the meantime we can log a warning for any file descriptors we can't duplicate into the child.

We now support duplicating all descriptor types. There is still an existing issue with TCP sockets that would prevent accept() calls on duplicated listening sockets from working correctly in a forked process.

robgjansen · 2022-07-14T17:36:36Z

FYI: Someone at USENIX ATC'22 told me that support for fork would unlock a lot of value for their use cases.

stevenengler · 2022-08-09T16:44:18Z

We may also need to consider futexes shared between processes:

shadow/src/main/host/syscall/futex.c

Lines 143 to 147 in ee157d0

    
           // TODO: currently only supports uaddr from the same virtual address space (i.e., threads) 
        
           // Support across different address spaces requires us to compute a unique id from the 
        
           // hardware address (i.e., page table and offset). This is needed, e.g., when using 
        
           // futexes across process boundaries. 
        
           SysCallReturn syscallhandler_futex(SysCallHandler* sys, const SysCallArgs* args) {

sporksmith · 2022-11-30T18:44:39Z

Another motivating example: arti's way of using the obs4 pluggable transport currently involves forking and execing an obs4 process.

Investigating whether it makes sense to add an alternative mechanism to arti that would allow it to use an independently started process...

trinity-1686a · 2022-12-04T00:26:55Z

fwiw, using fork/exec to start a pluggable transport such as obfs4 is not an arti thing, it also concerns tor, and is actually a requirement from the PT spec.

The parent (arti/tor/...) also wants access to PT stdout, so such a solution based on independently started process would probably require some form of UDS/named pipes

sporksmith · 2022-12-08T15:46:34Z

fwiw, using fork/exec to start a pluggable transport such as obfs4 is not an arti thing, it also concerns tor, and is actually a requirement from the PT spec.

The parent (arti/tor/...) also wants access to PT stdout, so such a solution based on independently started process would probably require some form of UDS/named pipes

@trinity-1686a In practice tor definitely supports a separately started proxy. e.g. the ClientTransportPlugin config param optionally accepts a socks IP and port instead of a binary to exec.

From my quick, possibly incorrect, read, I think the spec is intended to allow this, and is just made a little confusing here by trying to be both general and concise. I think the "parent process" that initially forks the PT process could be a shell rather than tor or arti themselves. The rest of the spec seems to indicate that all communication between tor/arti and the PT would be over the socks connection, with some configuration passed around in environment variables. (e.g. nothing about the client needing to capture stdin/stdout or use a named pipe etc)

sporksmith · 2023-05-04T22:44:59Z

Starting to look at this now. My plan is roughly:

Migrate the clone handler to Rust
Implement clone3 (which is a superset of clone), and change the clone handler to share the internal implementation
Add support for the flags that would effectively do a fork
Add a fork handler that uses the same clone3 internal implementation
Implement exec

A couple other thoughts:

We want the native forked processes' native parent to (effectively) be shadow so that we can waitpid them. I think we can accomplish this with the CLONE_PARENT flag to clone/clone3
ChildPidWatcher's current mechanism of creating a pipe s.t. shadow's end of the pipe is notified when the child exits won't work well when shadow isn't the direct parent. we might be able to do it by sending a descriptor between shadow and the managed parent process over a unix pipe, but that's a bit complex. Changing ChildPidWatcher to use pidfd_open would be a much nicer solution but would mean requiring a backports kernel for Debian 10, since it needs kernel version 5.3. EDIT: This is done now: ChildPidWatcher: use pidfd_open #2937

Currently uses `clone` internally. Progress on shadow#1987

sporksmith · 2023-05-13T19:14:22Z

Thinking a bit about how seccomp filters are going to work after exec.

Currently the filter is created and loaded in the LD_PRELOAD'd shim during initialization. It assumes that the SIGSYS signal handler has already been installed (earlier in shim initialization), which is where we route syscalls we want to emulate. We allow native syscalls from the shim itself by inspecting the instruction pointer of the call site and seeing if it's in one of the shim's functions.

Both of these will go wrong if we allow the managed process itself to exec. A syscall could be made before we've had a chance to reinstall the SIGSYS handler, which would result in a crash. The shim may also be loaded at a different address, so it's native syscall functions would no longer be correctly allow-listed. (And some other random code will be allow-listed).

We can't uninstall the filter before doing the exec - that's impossible by design.

I don't think there's a feasible way to have some sort of "shadow cookie" that the seccomp filter recognizes. The filter is BPF (not eBPF), and only gets read-only access to the instruction pointer and syscall number and args. If we stored a cookie in, for example, the high bits of the syscall number, we wouldn't be able to clear the cookie before allowing the syscall, so Linux would reject it as an invalid syscall number.

We can't access the program's other registers or memory, so e.g. storing a cookie on the stack also won't work.

It might be possible to replace our seccomp usage with eBPF, which has richer functionality, but using eBPF typically requires root access.

The best solution I can think of is to handle exec by killing the native process and spawning a fresh process from shadow's process. We'll need to be careful to migrate over simulated state that is preserved across exec, such as file descriptors.

This is mostly a refactor in preparation for supporting fork. shadow#1987 There are also a couple minor behavioral changes, that shouldn't affect any of our supported thread libraries: * Some flags that we don't actually care about are no longer required (e.g. CLONE_SYSVEM). * CLONE_SETTLS *is* now required. Previously we would have tried to go ahead and execute the native clone without it, but the shim would misbehave since it depends on thread local storage.

The previous design using a pipe would have been painful to use while implementing fork, since we would somewhow want the two ends of the pipe owned by shadow and its "grandchild" process being forked. shadow#1987 Conversely, using pidfd_open is simpler and more flexible in general. We couldn't use it before since it requires Linux 5.3 or later, but we now require such a kernel.

The previous design using a pipe would have been painful to use while implementing fork, since we would somewhow want the two ends of the pipe owned by shadow and its "grandchild" process being forked. #1987 Conversely, using pidfd_open is simpler and more flexible in general. We couldn't use it before since it requires Linux 5.3 or later, but we now require such a kernel.

This is mostly a refactor in preparation for supporting fork. shadow#1987 There are also a couple minor behavioral changes, that shouldn't affect any of our supported thread libraries: * Some flags that we don't actually care about are no longer required (e.g. CLONE_SYSVEM). * CLONE_SETTLS *is* now required. Previously we would have tried to go ahead and execute the native clone without it, but the shim would misbehave since it depends on thread local storage.

This is mostly a refactor in preparation for supporting fork. #1987 There are also a couple minor behavioral changes, that shouldn't affect any of our supported thread libraries: * Some flags that we don't actually care about are no longer required (e.g. CLONE_SYSVEM). * CLONE_SETTLS *is* now required. Previously we would have tried to go ahead and execute the native clone without it, but the shim would misbehave since it depends on thread local storage.

The previous initialization had a complex inter-relationship between ManagedThread, Thread, and Process; e.g. we needed the Process's shared memory handle to create the ManagedThread, but don't create the Process itself until the ManagedThread and Thread have been created. Additionally, the thread shared memory handle was being sent through an env variable, which won't work with fork. The first thing this does is detangle that so that we create a ManagedThread first, then a Thread, then a Process. We do this by moving the Process's and Thread's shared memory handles into an initialization message that is sent the first time we want to run the thread, so that we no longer need them when *creating* thread. That step creates a large initialization message, which increased the size of all shadow -> shim messages because of the enum used for such messages. So we next move the initialization data out of band and change the "start handshake" so that the plugin first sends pointers to the data to be initialized. Shadow writes the data directly, and finishes with another message to let the shim know it's done. Progress on #1987

Progress on shadow#1987

When a Process is forked (without CLONE_FILES), the child gets a copy of the descriptor table. IIUC simply cloning the table should give us the right behavior - the child will gets its own table, with cloned references to the same descriptors. Progress on shadow#1987

When we fork a Process, the child will use the same strace file (if any). This is analogous to `strace -f`. Progress on shadow#1987.

For clone in particular the logic was distributed in a way that was a bit confusing, and was about to become more confusing as we started implementing emulation for more CloneFlags; different components would need to handle emulating different flags, making it difficult to keep track of where each flag is handled. The clone syscallhandler now directly creates the `ManagedThread` and `Thread`, and adds the `Thread` to the current `Process`. It will later be extended to support creating a new `Process` containing the new `Thread` instead. We likewise cut out the `Thread::spawn` "middle-man" and handle all the orchestration from `Process::spawn`. Progress on #1987.

When a Process is forked (without CLONE_FILES), the child gets a copy of the descriptor table. IIUC simply cloning the table should give us the right behavior - the child will gets its own table, with cloned references to the same descriptors. Progress on shadow#1987

When a Process is forked (without CLONE_FILES), the child gets a copy of the descriptor table. IIUC simply cloning the table should give us the right behavior - the child will gets its own table, with cloned references to the same descriptors. Progress on #1987

sporksmith · 2023-09-25T23:44:17Z

Closing this, since the MVP is done. Leaving open the execveat and vfork issues, and the "support subprocess creation" milestone, to track further enhancements and fixes.

sporksmith added the Type: Bug Error or flaw producing unexpected results label Mar 24, 2022

sporksmith mentioned this issue Mar 24, 2022

Ability to shut down managed processes gracefully #1491

Closed

stevenengler mentioned this issue Jun 28, 2022

Remove ownerProcess field from LegacyDescriptor #2245

Merged

sporksmith self-assigned this May 4, 2023

sporksmith added a commit to sporksmith/shadow that referenced this issue May 11, 2023

Implement clone3

df4c150

Currently uses `clone` internally. Progress on shadow#1987

sporksmith added a commit to sporksmith/shadow that referenced this issue May 11, 2023

Implement clone3

eb308c1

Currently uses `clone` internally. Progress on shadow#1987

sporksmith added a commit to sporksmith/shadow that referenced this issue May 11, 2023

Implement clone3

01e04ab

Currently uses `clone` internally. Progress on shadow#1987

sporksmith added a commit to sporksmith/shadow that referenced this issue May 11, 2023

Implement clone3

0dad119

Currently uses `clone` internally. Progress on shadow#1987

sporksmith added a commit to sporksmith/shadow that referenced this issue May 12, 2023

Implement clone3

2a7e710

Currently uses `clone` internally. Progress on shadow#1987

sporksmith mentioned this issue May 15, 2023

clone handler: handle individual flags more explicitly #2935

Merged

sporksmith mentioned this issue May 15, 2023

ChildPidWatcher: use pidfd_open #2937

Merged

sporksmith mentioned this issue May 18, 2023

Rework process initialization for fork #2948

Merged

sporksmith added a commit to sporksmith/shadow that referenced this issue May 23, 2023

Add stubs for fork and vfork

6fe8ad1

Progress on shadow#1987

sporksmith added a commit to sporksmith/shadow that referenced this issue May 23, 2023

Add stubs for fork and vfork

983d088

Progress on shadow#1987

sporksmith added a commit to sporksmith/shadow that referenced this issue May 23, 2023

Add stubs for fork and vfork

14dadf7

Progress on shadow#1987

sporksmith mentioned this issue May 24, 2023

Make DescriptorTable clonable #2970

Merged

sporksmith added a commit to sporksmith/shadow that referenced this issue May 24, 2023

Make Process's strace file shareable across Processes

4c83888

When we fork a Process, the child will use the same strace file (if any). This is analogous to `strace -f`. Progress on shadow#1987.

sporksmith mentioned this issue May 24, 2023

Make Process's strace file shareable across Processes #2971

Closed

sporksmith added a commit to sporksmith/shadow that referenced this issue May 24, 2023

Make Process's strace file shareable across Processes

f765f76

When we fork a Process, the child will use the same strace file (if any). This is analogous to `strace -f`. Progress on shadow#1987.

sporksmith mentioned this issue May 24, 2023

Consolidate construction of new threads and processes #2972

Merged

sporksmith added this to the Support shell-style spawning and management of processes milestone Jun 1, 2023

sporksmith changed the title ~~Support fork and exec syscalls~~ Support subprocess creation and management (was "fork and exec") Jun 1, 2023

This was referenced Jun 1, 2023

Implement fork + equivalent clone/clone3 #2987

Closed

Implement execve #2988

Closed

Implement wait4 (syscall for waitpid, wait) #2989

Closed

sporksmith mentioned this issue Aug 22, 2023

Implement vfork syscall #3123

Closed

This was referenced Sep 23, 2023

Support linux kernel self tests #3167

Open

Process's current working directory (cwd) is *partly* virtualized and has some related bugs #2960

Open

sporksmith closed this as completed Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support subprocess creation and management (was "fork and exec") #1987

Support subprocess creation and management (was "fork and exec") #1987

sporksmith commented Mar 24, 2022 •

edited

stevenengler commented Jun 30, 2022

robgjansen commented Jul 14, 2022

stevenengler commented Aug 9, 2022

sporksmith commented Nov 30, 2022

trinity-1686a commented Dec 4, 2022

sporksmith commented Dec 8, 2022 •

edited

sporksmith commented May 4, 2023 •

edited

sporksmith commented May 13, 2023 •

edited

sporksmith commented Sep 25, 2023 •

edited

Support subprocess creation and management (was "fork and exec") #1987

Support subprocess creation and management (was "fork and exec") #1987

Comments

sporksmith commented Mar 24, 2022 • edited

stevenengler commented Jun 30, 2022

robgjansen commented Jul 14, 2022

stevenengler commented Aug 9, 2022

sporksmith commented Nov 30, 2022

trinity-1686a commented Dec 4, 2022

sporksmith commented Dec 8, 2022 • edited

sporksmith commented May 4, 2023 • edited

sporksmith commented May 13, 2023 • edited

sporksmith commented Sep 25, 2023 • edited

sporksmith commented Mar 24, 2022 •

edited

sporksmith commented Dec 8, 2022 •

edited

sporksmith commented May 4, 2023 •

edited

sporksmith commented May 13, 2023 •

edited

sporksmith commented Sep 25, 2023 •

edited