Implement pipelines #10
While the syntax parser is practically complete (and well designed too), the interpreter is functional but still needs quite some work. Currently pipelines with more than one step (as in, a regular statement) are ignored by the interpreter with a
There are several possible ways of parallelizing pipelines:
Pipes and fork
In this approach, you set up N-1 anonymous pipes, where N is the number of steps, fork N-1 times, and set the stdin/stdout of the child processes to use the anonymous pipes in order to form a chain. Then you poll the right-most step, or execute it directly if it a shell function. This is the traditional approach taken by many shells, including popular ones like Bash and Z shell.
The advantage of this approach is that it unifies the pipeline implementation; you always pipe multiple processes together, whether they are external commands, or forks of the shell. The biggest downside is that the subshells are disconnected from the scope they came from and cannot (easily) communicate with the parent. This can cause lots of confusion when learning how to write scripts in these languages, and so I decided that this approach was unacceptable for Riptide. Take the following example code:
While you might expect the above script to print out
Another approach is to use background threads or a thread pool to run multiple steps in a pipeline in parallel. "Script" steps are run in threads, while external commands are run in real processes. We then use abstractions inside these threads to make them behave as if they were normal processes. The current variable scope is then shared between these threads behind a mutex, so that they can all mutate their scope normally. Interestingly, this seems to be the approach that Fish Shell takes.
This lets you pipeline multiple script blocks together and be able to have the expected "normal" scoping rules for each, but has several disadvantages:
This is the approach I propose we use in Riptide (see #11). We turn every potentially parallel bit of code into an asynchronous task, and then execute all tasks on the main thread using a single-threaded executor. This has many benefits:
This will require some refactoring, since the interpreter must essentially be asynchronous and be able to "yield" in the middle of a script and be resumed later. Typically this would be incredibly difficult to do using a tree-walk interpreter, but Rust's async/await makes this almost trivial by generating the complex state machine for us.
We could also accomplish the same thing by rewriting the interpreter into a JIT VM, but that would be quite a bit more work and would increase the complexity of the interpreter by quite a lot. I'd rather take tree walking as far as it can possibly go in order to keep the implementation simple, and only reconsider if it is an actual bottleneck.