New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support subprocess creation and management (was "fork and exec") #1987
Comments
We now support duplicating all descriptor types. There is still an existing issue with TCP sockets that would prevent |
FYI: Someone at USENIX ATC'22 told me that support for fork would unlock a lot of value for their use cases. |
We may also need to consider futexes shared between processes: shadow/src/main/host/syscall/futex.c Lines 143 to 147 in ee157d0
|
Another motivating example: arti's way of using the obs4 pluggable transport currently involves forking and execing an obs4 process. Investigating whether it makes sense to add an alternative mechanism to arti that would allow it to use an independently started process... |
fwiw, using fork/exec to start a pluggable transport such as obfs4 is not an arti thing, it also concerns tor, and is actually a requirement from the PT spec. The parent (arti/tor/...) also wants access to PT stdout, so such a solution based on independently started process would probably require some form of UDS/named pipes |
@trinity-1686a In practice From my quick, possibly incorrect, read, I think the spec is intended to allow this, and is just made a little confusing here by trying to be both general and concise. I think the "parent process" that initially forks the PT process could be a shell rather than tor or arti themselves. The rest of the spec seems to indicate that all communication between tor/arti and the PT would be over the socks connection, with some configuration passed around in environment variables. (e.g. nothing about the client needing to capture stdin/stdout or use a named pipe etc) |
Starting to look at this now. My plan is roughly:
A couple other thoughts:
|
Currently uses `clone` internally. Progress on shadow#1987
Currently uses `clone` internally. Progress on shadow#1987
Currently uses `clone` internally. Progress on shadow#1987
Currently uses `clone` internally. Progress on shadow#1987
Currently uses `clone` internally. Progress on shadow#1987
Thinking a bit about how seccomp filters are going to work after Currently the filter is created and loaded in the Both of these will go wrong if we allow the managed process itself to We can't uninstall the filter before doing the I don't think there's a feasible way to have some sort of "shadow cookie" that the seccomp filter recognizes. The filter is BPF (not eBPF), and only gets read-only access to the instruction pointer and syscall number and args. If we stored a cookie in, for example, the high bits of the syscall number, we wouldn't be able to clear the cookie before allowing the syscall, so Linux would reject it as an invalid syscall number. We can't access the program's other registers or memory, so e.g. storing a cookie on the stack also won't work. It might be possible to replace our seccomp usage with The best solution I can think of is to handle |
This is mostly a refactor in preparation for supporting fork. shadow#1987 There are also a couple minor behavioral changes, that shouldn't affect any of our supported thread libraries: * Some flags that we don't actually care about are no longer required (e.g. CLONE_SYSVEM). * CLONE_SETTLS *is* now required. Previously we would have tried to go ahead and execute the native clone without it, but the shim would misbehave since it depends on thread local storage.
This is mostly a refactor in preparation for supporting fork. shadow#1987 There are also a couple minor behavioral changes, that shouldn't affect any of our supported thread libraries: * Some flags that we don't actually care about are no longer required (e.g. CLONE_SYSVEM). * CLONE_SETTLS *is* now required. Previously we would have tried to go ahead and execute the native clone without it, but the shim would misbehave since it depends on thread local storage.
The previous design using a pipe would have been painful to use while implementing fork, since we would somewhow want the two ends of the pipe owned by shadow and its "grandchild" process being forked. shadow#1987 Conversely, using pidfd_open is simpler and more flexible in general. We couldn't use it before since it requires Linux 5.3 or later, but we now require such a kernel.
The previous design using a pipe would have been painful to use while implementing fork, since we would somewhow want the two ends of the pipe owned by shadow and its "grandchild" process being forked. shadow#1987 Conversely, using pidfd_open is simpler and more flexible in general. We couldn't use it before since it requires Linux 5.3 or later, but we now require such a kernel.
The previous design using a pipe would have been painful to use while implementing fork, since we would somewhow want the two ends of the pipe owned by shadow and its "grandchild" process being forked. shadow#1987 Conversely, using pidfd_open is simpler and more flexible in general. We couldn't use it before since it requires Linux 5.3 or later, but we now require such a kernel.
The previous design using a pipe would have been painful to use while implementing fork, since we would somewhow want the two ends of the pipe owned by shadow and its "grandchild" process being forked. #1987 Conversely, using pidfd_open is simpler and more flexible in general. We couldn't use it before since it requires Linux 5.3 or later, but we now require such a kernel.
This is mostly a refactor in preparation for supporting fork. shadow#1987 There are also a couple minor behavioral changes, that shouldn't affect any of our supported thread libraries: * Some flags that we don't actually care about are no longer required (e.g. CLONE_SYSVEM). * CLONE_SETTLS *is* now required. Previously we would have tried to go ahead and execute the native clone without it, but the shim would misbehave since it depends on thread local storage.
This is mostly a refactor in preparation for supporting fork. #1987 There are also a couple minor behavioral changes, that shouldn't affect any of our supported thread libraries: * Some flags that we don't actually care about are no longer required (e.g. CLONE_SYSVEM). * CLONE_SETTLS *is* now required. Previously we would have tried to go ahead and execute the native clone without it, but the shim would misbehave since it depends on thread local storage.
The previous initialization had a complex inter-relationship between ManagedThread, Thread, and Process; e.g. we needed the Process's shared memory handle to create the ManagedThread, but don't create the Process itself until the ManagedThread and Thread have been created. Additionally, the thread shared memory handle was being sent through an env variable, which won't work with fork. The first thing this does is detangle that so that we create a ManagedThread first, then a Thread, then a Process. We do this by moving the Process's and Thread's shared memory handles into an initialization message that is sent the first time we want to run the thread, so that we no longer need them when *creating* thread. That step creates a large initialization message, which increased the size of all shadow -> shim messages because of the enum used for such messages. So we next move the initialization data out of band and change the "start handshake" so that the plugin first sends pointers to the data to be initialized. Shadow writes the data directly, and finishes with another message to let the shim know it's done. Progress on #1987
Progress on shadow#1987
Progress on shadow#1987
Progress on shadow#1987
When a Process is forked (without CLONE_FILES), the child gets a copy of the descriptor table. IIUC simply cloning the table should give us the right behavior - the child will gets its own table, with cloned references to the same descriptors. Progress on shadow#1987
When we fork a Process, the child will use the same strace file (if any). This is analogous to `strace -f`. Progress on shadow#1987.
When we fork a Process, the child will use the same strace file (if any). This is analogous to `strace -f`. Progress on shadow#1987.
For clone in particular the logic was distributed in a way that was a bit confusing, and was about to become more confusing as we started implementing emulation for more CloneFlags; different components would need to handle emulating different flags, making it difficult to keep track of where each flag is handled. The clone syscallhandler now directly creates the `ManagedThread` and `Thread`, and adds the `Thread` to the current `Process`. It will later be extended to support creating a new `Process` containing the new `Thread` instead. We likewise cut out the `Thread::spawn` "middle-man" and handle all the orchestration from `Process::spawn`. Progress on #1987.
When a Process is forked (without CLONE_FILES), the child gets a copy of the descriptor table. IIUC simply cloning the table should give us the right behavior - the child will gets its own table, with cloned references to the same descriptors. Progress on shadow#1987
When a Process is forked (without CLONE_FILES), the child gets a copy of the descriptor table. IIUC simply cloning the table should give us the right behavior - the child will gets its own table, with cloned references to the same descriptors. Progress on #1987
Closing this, since the MVP is done. Leaving open the |
Supporting fork and exec would make it easier to delegate complexity to wrapper shell or python scripts instead of adding more features to Shadow itself.
e.g. rather than Shadow natively supporting compression (#1554), a user could use a config like:
This could also be used to address clean shutdown of processes (#1491) with something like:
Using killall #1986 might be a better alternative for this particular case, but a shell script would allow for greater flexibility
The biggest potential blocker right now is that not all file descriptors support duplication yet. However, unix pipes and regular files are probably sufficient to cover a lot of use cases. In the meantime we can log a warning for any file descriptors we can't duplicate into the child.
The text was updated successfully, but these errors were encountered: