You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Each managed process has a working directory stored in Shadow, which is returned by process_getWorkingdir.
When spawning a new managed process, we currently set the native working directory to match that directory (by executing a native chdir from the shim).
We currently don't allow chdir, but if we did we'd need to be careful to both change the native working directory and update the stored directory string in Process.
Some things that can currently go wrong:
If the directory or an ancestor is renamed, the path stored in Process will be incorrect
If the directory is removed, shadow will no longer have a way of accessing files inside of it I think this is impossible. rmdir fails if the directory isn't empty; experimentally new files can't be created in an unlinked directory; experimentally it's impossible to create a hard link to a directory.
If the stored strings gets out of sync somehow, things will go wrong. e.g. we implement chdir or some other syscall that can change the working directory (are there any?) and don't correctly update both
Potential solutions/mitigations:
We could drop the stored string completely and only use the native working directory. process_workingDir could be implemented to get the current native working directory of the process, e.g. by reading the /proc/<pid>/cwd symlink. This makes it a bit slower since it'd require a system call, but it probably wouldn't be in the hot path. This might still leave problems if the directory is removed.
We could store a file descriptor to the working directory instead of a string path. We would still need to resynchronize it when implementing chdir, but renames would be handled transparently. With this approach we could change our handlers that call process_workingDir to use this fd instead; e.g. use thid fd with renameat. There's no statat, but we might be able to work around it with openat + fstat.
Fully virtualize the current working directory. e.g. stop setting the native working directory at all, and when implementing chdir, don't change the native working directory; only change the path stored in Process. This would require implementing handlers for a handful of syscalls that we currently allow natively.
I'm leaning towards doing the former for the moment since it's easy to implement and will hopefully head off future confusion and bugs. We could always start tracking the dir fd and migrating handlers to use it at some later point.
The text was updated successfully, but these errors were encountered:
sporksmith
changed the title
process current working directory is *partly* virtualized and has some related bugs
Process current working directory (cwd) is *partly* virtualized and has some related bugs
May 23, 2023
sporksmith
changed the title
Process current working directory (cwd) is *partly* virtualized and has some related bugs
Process's current working directory (cwd) is *partly* virtualized and has some related bugs
May 23, 2023
Somewhat related: If the thread is created using clone() with CLONE_THREAD but not CLONE_FS, the threads will have individual working directories. So the working directory should be a per-thread property rather than a per-process one (but some threads will share a working directory if they did use CLONE_FS).
Another bit of complexity here is that there are a fair number of places where we assume that the native working directory is the same as the virtualized working directory. In particular for a lot of file-system related syscalls we just forward to the kernel and let it run it natively, using the native cwd.
Eventually it'd probably be good to go through and fix all of those to not do that (e.g. by translating paths to absolute paths using the virtualized cwd before forwarding to the kernel), but in the shorter term it might be worthwhile to implement chdir and fchdir, and have them just update both the native and virtualized cwd. Once #1987 is fixed, not supporting these syscalls will likely be the next blocker for a decent number of use-cases (such as #3167).
While it's a bit ugly, I think those are the only two places we'd need to be careful to update both places, so it's not too bad as an intermediate step.
Each managed process has a working directory stored in Shadow, which is returned by
process_getWorkingdir
.When spawning a new managed process, we currently set the native working directory to match that directory (by executing a native
chdir
from the shim).We currently don't allow
chdir
, but if we did we'd need to be careful to both change the native working directory and update the stored directory string inProcess
.Some things that can currently go wrong:
Process
will be incorrectIf the directory is removed, shadow will no longer have a way of accessing files inside of itI think this is impossible.rmdir
fails if the directory isn't empty; experimentally new files can't be created in an unlinked directory; experimentally it's impossible to create a hard link to a directory.chdir
or some other syscall that can change the working directory (are there any?) and don't correctly update bothPotential solutions/mitigations:
process_workingDir
could be implemented to get the current native working directory of the process, e.g. by reading the/proc/<pid>/cwd
symlink. This makes it a bit slower since it'd require a system call, but it probably wouldn't be in the hot path. This might still leave problems if the directory is removed.chdir
, but renames would be handled transparently. With this approach we could change our handlers that callprocess_workingDir
to use this fd instead; e.g. use thid fd withrenameat
. There's nostatat
, but we might be able to work around it withopenat
+fstat
.chdir
, don't change the native working directory; only change the path stored inProcess
. This would require implementing handlers for a handful of syscalls that we currently allow natively.I'm leaning towards doing the former for the moment since it's easy to implement and will hopefully head off future confusion and bugs. We could always start tracking the dir fd and migrating handlers to use it at some later point.
The text was updated successfully, but these errors were encountered: