Project Goals

andychu edited this page Apr 1, 2017 · 39 revisions


  • Immediate goal: Implement a bash-compatible shell called OSH.
  • Long term goal: Design a modern Unix shell language called Oil that can do everything bash/zsh/etc. can do, and more.

Oil treats shell seriously as a programming language, in terms of both its implementation and defining its semantics.

For a more immediate view of the project, see the Oil blog. In particular, this blog entry was written at the same time as this page.

Use Cases

  • System Administration
    • Building Linux distributions (e.g. Arch Linux uses bash for PKGBUILD).
    • Startup scripts
    • Configure and build scripts. Reproducible and distributed builds.
  • Distributed Computing
    • Building containers
    • Specifying remote jobs
    • Feedback and Monitoring: performance measurement, security testing.
  • Data Science / Scientific Computing
    • Heterogeneous "big data" and small data pipelines. The language should scale down as well as scale up, i.e. low startup latency for small jobs.
    • Incorporate features of "workflow languages" and systems in the MapReduce family.
    • Concise data cleaning, transformation, and summarization.
    • Reproducible Research.
    • Non-goal: mathematical modeling. That should be left to specialized languages like R, Julia, and Matlab. Communicate with those languages through coprocesses (to avoid startup overhead and concurrency.)
  • Interactive Computing
    • A general purpose REPL (terminal and probably a Jupyter kernel).
  • Document Publishing
    • and many programming books are built and orchestrated with shell scripts / Makefiles

Oil Language Design Goals

  • Easy upgrade path from bash, the most popular shell in the world.
    • To do this, I've written a very compatible bash parser, which will allow automatic conversion of bash (osh) to oil. So the language has a different syntax and a superset of bash semantics.
  • Consistent syntax.
  • Fix sh and bash semantics to be more developer-friendly (in a backward compatible way).
    • Proper Arrays
    • Strict mode for developer productivity (enhanced set -o errexit, nounset, pipefail)
  • Enhance the shell language; treat it as a real programming language.
    • Fill in obvious gaps, like abspath, etc.
    • Compound data structures
    • Example: Completion functions in bash have a bad API involving globals and are difficult to write. It should feel more like writing completion functions in Python or JavaScript.
    • Selected influences: Python, R, Ruby, Perl 6, Lua (API), ML, C and C++. Power Shell.
  • Reduce language cacophony in shell programming by reimplementing tools closely related to the shell.
    • Example: combine shell, awk, and make.
    • Also combine tools like find (which has its own expression parser and starts processes), and xargs/GNU parallel, which start processes in parallel. GNU parallel is actually mentioned in the bash manual.
  • Richer constructs for concurrency and parallelism.
    • Folding in make -j and xargs -P goes a long way.
  • Allow secure programs to be written.
    • In emitting strings: escaping
    • In reading strings: error checking should be easy, better control over "read" delimiters, etc.
    • Fix issues with globs and flags, i.e. untrusted file system and untrusted variables
  • C and C++ bindings
    • provide access to advanced Linux kernel features - namespaces, cgroups, seccomp, tracing, /proc, etc. (but remain portable to other Unices)
    • It should be possible to write a busybox in oil.
  • Should be the best language for writing quick command line tools.
    • In particular, replace the getopt interface in bash with something much better.
  • Expand the range of things that can be done with the "polyglot" model.
    • Coprocesses
    • Built-in serialization formats like CSV, JSON, maybe HTML
    • Maybe some binary formats as libraries
  • No extra "macro processing" on top of the parser. History substitution will be built in, but disabled in batch mode. procs can be used instead of aliases.

Language Design Style

  • Imperative on the scale of code, but declarative/functional/concurrent on scale of architecture, not unlike sh itself.

Implementation Goals

  • Proper error messages like Clang/Swift. Static Parsing.
  • Provide end-to-end tracing and profiling tools (e.g. for pipelines that run for hours)
  • Library-based design like LLVM. Example: the same parser is used in batch mode as well as completion mode, which is not true of all shell implementations. The parser can be used for auto-formatting and linting, which is also not true of other implementations.
  • Few dependencies so it can be used in bootstrapping Unix systems and clusters. (e.g. distributed as a C++ file and optional oil source.)
  • Much of oil should be written in oil (which means the VM needs to be fast enough for this).

Longer Term Goals

  • Expose our toolkit for little languages -- lexing, parsing, AST representation, etc. So that other languages can be built in the same way.
  • Metaprogramming with ASTs as first class data structures.
  • FastCGI Scripts on shared hosting (using strict input validation and hygienic text generation).