Shared process state management #92

trungnt2910 · 2023-03-18T12:10:10Z

Background

During the last few months blink has been greatly expanded to serve more use cases than its original purpose of distributing portable apps. From features such as dynamic object support and support for a virtual root filesystem, blink is becoming more and more suitable as a portable Linux compatibility layer, or in other words, a replacement for the abandoned flinux project or somewhat a substitute for the dying WSL1⁽¹⁾.

To serve this use case, more syscalls and some other features (for example, a complete procfs) need to be implemented. Many of these:

Are Linux-specific features and without counterparts on popular platforms.
Require storing, managing, and retrieving cross-process data/states.

Therefore, I propose to add a "kernel server daemon" mode to blink. This is similar to how other similar popular projects solve this problem, such as Wine's wineserver or DarlingHQ's darlingserver

Scenarios

`eventfd` support (#76)

eventfds can be implemented as an anonymous device on the VFS layer, with all calls forwarded to the daemon process:

int VfsEventfd(unsigned int initval, int flags) {
  struct VfsInfo *output;
  int ret;
  if (EventfdCreate(initval, flags, &output) == -1) {
    return -1;
  }
  ret = VfsAddFd(output);
  if (ret != -1) {
    unassert(!VfsFreeInfo(output));
  }
  return ret;
}

int EventfdCreate(unsigned int initval, int flag, struct VfsInfo **output) {
  int servercookie;
  servercookie = ServerCall(EVENTFD_CREATE, initval, flag);
  if (servercookie < 0) {
    return -servercookie;
  }
  if (VfsCreateInfo(output) == -1) {
    servercookie = ServerCall(EVENTFD_CLOSE, servercookie);
    return -1;
  }
  *output->data = (void *)(intptr_t)servercookie;
  return 0;
}

int EventfdRead(struct VfsInfo *info, void *buf, size_t nbyte) {
  return ServerCall(EVENTFD_READ, info->data, buf, nbyte);
}

The daemon process would manage the eventfd object the way Linux manages it.

`ptrace` support (#56)

If/When ptrace is implemented using the "cooperative debugging" mentioned in the related issue, it can use the daemon process as a means of communication instead of having to open a temporary UNIX socket.

`procfs` support

While #88 brought some initial support for procfs, it is nowhere near enough for some common UNIX tools like ps to function, because the only information this procfs implementation gives is about the current process.

A daemon process can store all required information for implementing procfs.
For example, some functions could be re-written:

// https://github.com/jart/blink/blob/0bdacfedaeb77a3c122bbd80c9f12394f17da772/blink/procfs.c#L980

int ProcfsRegisterExe(i32 pid, const char *path) {
  // The server should know the process's mount namespace
  // and should be able to traverse and resolve `path`.
  unassert(!ServerCall(EXE_REGISTER, pid, path));
}

// https://github.com/jart/blink/blob/0bdacfedaeb77a3c122bbd80c9f12394f17da772/blink/procfs.c#L1114

static ssize_t ProcfsPiddirExeReadlink(struct VfsInfo *info, char **buf) {
  ssize_t ret, len;
  len = PATH_MAX;
  *buf = malloc(len);
  ret = ServerCall(EXE_GET, pid, buf, len);
  // reallocate buf until it is large enough
}

Running `init`

init would complain on blink for not being on PID 1.
This can be solved by letting the daemon process manage all blink PIDs, effectively putting all blink processes under a new emulated PID namespace.

The daemon could also emulate wait calls for the emulated PID 1 to make sure init manages blink processes whose parent has died/exited.

Requirements

Goals

The daemon should:

Correctly manage process information and cross-process objects.
Properly clean up related data when a blink process exits.
Not consume an unreasonably high amount of the host's resources.
Not have too much overhead for commonly used syscalls.
Not cost anything for those who don't need it.

Non-goals

The daemon should not/does not need to:

Manage processes outside of blink. This includes native processes execved from a blink process.
Manage the process state of multiple different direct blink invocations. This means that each time the user runs blink from the host shell, a different daemon process is created.

Design

Build

As this is a costly feature and is not required solely for the original purpose of "distributing portable apps", this, similar to the VFS feature, should be disabled by default.

A flag DISABLE_DAEMON should be created. To enable this feature, builders of blink should pass an additional argument to ./configure:

./configure --enable-daemon

The --disable-daemon flag should also exist to negate a previously passed --enable-daemon flag.

Runtime

When a special flag is passed to blink, for example, -d, instead of directly executing the required binary, blink should fork itself.

The parent process should enter a daemon mode: It sets up necessary subsystems and open a UNIX socket with a fixed name located at the root of $BLINK_PREFIX.

The child process should wait for the parent to complete its setup and continue with normal blink operations.
Every time a descendant process starts, before emulating, it should attempt to connect to the UNIX socket opened by the daemon right after initialization of the VFS subsystem.

When this special flag is not passed to blink, processes are created normally and features that require this daemon are disabled.

Process lifetime

When a process/thread is initialized successfully, it should send a message to the daemon. The daemon then allocates a PID/TID as well as some other necessary resources.
When a process/thread exits normally, it should also send a message to the daemon with the status code. The deamon should then close the file descriptor associated with the thread and clean all related process information if there are no more threads for that PID.
When a process execves into a native process, it should send a message to the daemon. The daemon should clean all resources as if the process exited.

Server calls

A server call should be one write call to the socket followed by a read call for a 64-bit result.
If the read call fails with EINTR, a message should be sent to the daemon that the process wants to interrupt this call. After sending this message, the process should block all signals and read until it gets a reply from the daemon.

Q & A

Why bother creating a separate process? Isn't shared memory enough?

Some problems can be solved with shared memory. However, if a process dies unexpectedly (killed by Task Manager or through kill -9 on the host), there's no way to know it died and clean up the shared memory.

Furthermore, as many features require this shared memory, a good memory allocator is required (it is not optimal to map a few pages per feature per process). The amount of shared memory would also be limited as the shared pages has to be mapped by the first init process.

Why bother creating this feature in the first place, no one will run it.

That is true for everyone who's sticks to blink's original purposes.
Similarly, nobody would use --enable-vfs except for someone who needs his Alpine root to run out of the box, and someone who needs /proc/self/exe to work properly.
Similarly, people using blink as a compatibility layer and userspace emulator don't care much about the system emulation and 16-bit features, as QEMU and DOSBox already does the job well. But there seems to be a thriving community around this use.

Isn't this going too far? With PID management aren't `blink` processes too isolated from the host now? At this point, isn't it better to just run a Linux image?

No. A lot of things are still integrated, such as viewing (and killing) blink processes from the host's Task Manager, sharing files and UNIX sockets between the host and emulated processes, using the same networks as the host,... all of which cannot be achieved by solutions that run a whole Linux image like WSL2.

Conclusion

This proposal was quickly written in just more than an hour so it may have unclear points and/or mistakes of some type. Or my idea is simply non-optimal and there might be better ways to share process state or implement these Linux features without even sharing process state.

If you support this proposal, please give it a 👍 so I can see that it's worth allocating my time for.

If you have any suggestion, please let me know through this issue or ping me on the redbean Discord for a quicker reply.

⁽¹⁾: For any potential pedantic readers, no, blink as an emulator can never replace WSL1, as WSL1 uses NT kernel magic to natively execute instructions while blink has to use a JIT. However, this doesn't mean that blink cannot catch up with WSL1 in terms of userland emulation.

The text was updated successfully, but these errors were encountered:

tkchia · 2023-03-20T19:37:19Z

@trungnt2910 : sounds like such a feature might be useful. I guess one challenge will be to make it small and lightweight.

Thank you!

trungnt2910 mentioned this issue Mar 18, 2023

eventfd support for Node.js #87

Open

jart added the enhancement New feature or request label Mar 19, 2023

trungnt2910 self-assigned this Mar 28, 2023

jart mentioned this issue Apr 25, 2023

VFS: Add documentation #115

Merged

trungnt2910 mentioned this issue Apr 27, 2023

epoll support #73

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared process state management #92

Shared process state management #92

trungnt2910 commented Mar 18, 2023 •

edited

tkchia commented Mar 20, 2023

Shared process state management #92

Shared process state management #92

Comments

trungnt2910 commented Mar 18, 2023 • edited

Background

Scenarios

eventfd support (#76)

ptrace support (#56)

procfs support

Running init

Requirements

Goals

Non-goals

Design

Build

Runtime

Process lifetime

Server calls

Q & A

Why bother creating a separate process? Isn't shared memory enough?

Why bother creating this feature in the first place, no one will run it.

Isn't this going too far? With PID management aren't blink processes too isolated from the host now? At this point, isn't it better to just run a Linux image?

Conclusion

tkchia commented Mar 20, 2023

trungnt2910 commented Mar 18, 2023 •

edited

`eventfd` support (#76)

`ptrace` support (#56)

`procfs` support

Running `init`

Isn't this going too far? With PID management aren't `blink` processes too isolated from the host now? At this point, isn't it better to just run a Linux image?