Skip to content

Language Design Principles

andychu edited this page Apr 11, 2024 · 92 revisions

General Principles

  • Syntax and Semantics Should Correspond
    • The same semantics should use the same syntax
    • Different semantics should use different syntax
    • e.g. discussion in The Five Meanings of #
    • e.g. find -type f and the I'm too lazy to write a lexer pattern is BANNED!
  • Users don't read the manual -- So syntax should be "guessable" based on common, established behavior -- Python, JavaScript, C, JSON, etc.
  • The common behavior should be the default behavior. The short thing should be the right thing.
    • For example, simple word evaluation makes it so that you can use $var instead of "$var". That's almost always what you want.
    • read -r should have been the default in bash -- i.e. it inhibits backslash processing, which most people didn't intend with read
    • Note that bin/ysh has all the right defaults with shopt --set ysh:all. bin/osh is compatible.
  • Failures should not be ignored.
    • Example: in bash, when evaluating strftime in printf strings like %(%Y)T, if the result overflows a 128 byte buffer, it's silently truncated!
  • Every feature should have Predictable, Linear Performance (extended globs break this rule with backtracking, so they're not in YSH)
  • Minimize the use of global options (shopt)
    • YSH started out with many such options, but I eliminated them over time because it got unwieldy to explain and document.
    • There are still many of them and they should be used sparingly. But note that the strict_ ones don't really have any cost, because they abort your program on disallowed behavior. They don't silently change the semantics.
    • Rationale: Global state makes code harder to read. It's a "hidden mode".
    • They should mostly be hidden under groups like ysh:upgrade
    • Counterexample: simple_word_eval is probably the most important one that silently changes behavior, and I think it's justified in that case.
  • Borrowed from the Zen of Python (import this)
    • If the implementation is hard to explain, it's a bad idea.
    • If the implementation is easy to explain, it may be a good idea.

Oils Principles (both OSH and YSH)

YSH Principles

YSH is less constrained by compatibility, although there is still some consideration for it.

  • It should be a smooth upgrade from OSH. Avoid "wild" breakage.
    • We keep all the good concepts and throw out some bad ones.
  • It should be explainable as clean slate language! This principle is heavily in conflict with the first, but there were surprisingly few compromises necessary!
  • Avoid inventing syntax that doesn't exist in any other language. Most of YSH should look familiar to programmers and shell users.
    • @ has precedent in Perl, PowerShell, etc.
    • the expression syntax comes from Python, JavaScript, etc.
    • However, a corollary of the principle above is: If YSH has completely new semantics, then inventing a new syntax is justified.
    • See YSH Language Influences
  • YSH should be familiar to Python and JavaScript users. Common features like assignment should behave similarly.
    • This principle has "leaked" into OSH when omitting declare -i. Also, our initial reluctance to implement $a == ${a[0]} is shaped by this.
  • Conversely, if our syntax looks like JavaScript or Python, it should behave like JavaScript or Python, unless we're fixing a wart.
    • e.g. See #language-design > Things Oils Shipped Without
    • This is a corollary of "syntax and semantics should correspond", but across languages
  • Don't break the interactive shell / top level / examples printed in books
    • e.g. We don't break redirect syntax, and we don't break PYTHONPATH=. foo.py
  • There Should Only Be One Kind of Expression
    • Shell has 3 to 4 recursive expression languages: arith, bool, word. And bash has regexes.
    • In contrast, YSH has just one expression language. Note that eggexes are "first class".
    • Exception: Globs are still a separate expression language. (But they're unchanged in YSH, inherited from POSIX)
  • Avoid single-letter flags and names. This was OK in the 70's but no longer scales!
    • For example, shopt --set is better than shopt -s; test --file is better than test -f
  • Arrays are first class
    • In particular, no silent splitting and joining, as happens with unquoted substitutions, $@, echo and eval, etc.
  • YSH has reference semantics in general, but value semantics for everything that shell does
    • Making copies of List
    • Passing List as ARGV
    • But for Python and JS stuff, you have reference semantics
  • It's OK to make common things look pretty, even if they are slightly inconsistent
    • if is-main is nicer than if (_is_main()), even though it conflates success/fail and true/false. (It is also a builtin, so it doesn't have errexit pitfalls.)
  • Avoid syntax with confusing corner cases -- e.g. What does ${####} Mean? and Shell WTFs
  • Avoid adding syntax that will be used rarely.
    • Example: All of these are valid in YSH, and will be common: --flag foo, --flag 'foo', --flag $mystr, --flag=$mystr, --flag u'\n', etc.
    • There is a corner case for --flag=u'' - the u is not significant. But so far, all proposed "cures" are worse than the disease.
  • Don't take on problems you can't solve correctly
    • a major example of this that we don't assume we know the syntax of external commands like cp, ls, etc.
    • for both completion and linting
  • No implicit serialization / deserialization from typed data to strings
    • e.g. flags, env vars, or J8 notation
    • Conversions are always explicit. This is mainly because they always involve the possibility of errors, and we don't hide errors.

OSH Principles

OSH is a "cleaned up shell/bash" and heavily constrained by compatibility. But there are edge cases where we have to make choices. The spec tests have uncovered dozens of cases where existing shells disagree, so we have to make a choice!

  • Avoid complex "line noise" syntax. We won't add more syntax that looks like ${x@P), ${x^^}, cat <<< 'hi', or exec 2>&-. It's too elaborate and unfamiliar.
  • The Common Subset Principle -- In general, OSH shouldn't introduce incompatible semantics for the same syntax and be very compatible with its legacy shells. It might not run every last bash script. However, in those cases, you should be able to make small modifications to allow your script to run under both, OSH and bash. Most often these changes are to improve clarity.
    • Example: In bash, echo X > @(*.py) means the same thing as echo X > '@(*.py)' (yes really). OSH disallows the former for clarity, but the latter is in the common subset of OSH and bash.
    • Example: The meaning of () in declare -A assoc=() is changed to obey the common subset principle. It means empty assoc array rather than empty indexed array because the context is clear, and because in bash declare -A dict means something different.
  • Static Parsing
    • Dynamic Parsing (parsing at runtime) Confuses Code and Data.
  • Consider Interactions Between Language Features (bash doesn't do this, e.g. extended globs)
  • Minimize the combined OSH+YSH language size to the degree possible.
    • Where YSH duplicates functionality from OSH (like arithmetic), it has to be significantly better.
    • This partly explains why we keep OSH string literals in YSH, and why bash declare -a/-A behave differently in YSH, and why declare -i isn't supported.
    • It also explains some constraints on the syntax, i.e. that we only have a ShCommand lexer mode, and no YshCommand lexer mode
  • Don't Silently Change What Code Means. Instead choose a new syntax
    • Early on, I wanted to take over set for assignment (leaving all options for shopt. But now it's setvar. It was tempting to take it over, but a bad idea.
    • cols could have been select, but that rare feature was taken.
    • An exception is shopt -s simple_word_eval, which does (silently) change the meaning of unquoted $x. But most newcomers and even some long-time shell users are surprised by the splitting; that is, many shell scripts actually only operate correctly on names without spaces. So in many cases this option will silently fix bugs, but will require adding an explicit split() where looping over unquoted variables.
  • Local reasoning about code. You shouldn't have to look at the top of the file constantly to figure out how code behaves.
    • Blocks like shopt --set errexit { } allow local reasoning, rather than setting the global permanently
    • redefine_proc prevents distant definitions from clobbering your code
    • TODO: tag procs with ysh:all ? issue 1147

Blog: HOW OSH Is Designed / Why OSH Isn't Bash

Interchange Format / Protocol Design Principles

  • You should be able to express arbitrary byte strings. Everything should be "8-bit clean" by default.
  • UTF-8 is an optional (but common) layer on top. (Ditto for other encodings.)
  • You should be able to use existing Unix tools with new protocols. (e.g. grep still works with lines of QSN. In contrast, the \0 delimited format of find -print0 is doesn't work with grep.)
    • This is a narrow waist argument -- conforming to the waist enables code reuse

(referring to: CSTR Proposal and TSV2 Proposal. And the deferred Shellac Protocol Proposal, and Coprocess Protocol Proposal)

Related

Clone this wiki locally