Language Design and Theory of Computation

andychu edited this page Dec 17, 2017 · 24 revisions

Undecidable Parsing

Deciding whether to print "syntax error" shouldn't require simulating a Turing machine.

  • Parsing Bash is Undecidable
    • Bash, Perl, and Make can't be statically parsed because they interleave parsing and execution.
    • C++ can't be statically parsed because it has two Turing complete computation mechanisms -- templates at compile time and normal code at runtime -- and parsing sits in between them.

Accidentally Turing Complete

  • Shell wasn't Turing complete when first designed.

  • Make wasn't Turing complete when first designed. See Example Code in Shell, Awk, and Make.

  • sedlisp is proof that sed is Turing complete. Not sure what constructs are used.

  • Accidentally Turing Complete -- nice comprehensive list, including some games which we don't care about here

  • What about TeX and PostScript? I suspect those were both intended to be Turing Complete, although the syntax is odd.

  • CSS3 proven to be Turing Complete --

  • printf is Turing complete. %n specifier writes to memory?

    When we control the arguments to printf(), it is possible to obtain Turing-complete computation. We show this formally in Appendix B by giving calls to printf() which create logic gates.

  • x86 MMU Fault Handling

    • trapcc -- Compute with 0 instructions on Intel!

    This is a proof by construction that the Intel MMU's fault handling mechanism is Turing complete. We have constructed an assembler that translates 'Move, Branch if Zero, Decrement' instructions to C source that sets up various processor control tables. After this code has executed, the CPU computes by attempting to fault without ever executing a single instruction.

Context-Free Grammars are not Powerful Enough in Practice

Parsing is Difficult

TODO: Move to Parsing Is Difficult page?

Humans are better than computers at parsing programming languages.

To-do: collect instances of people arguing for grammars . For tools, IDEs?

Yacc isn't Powerful Enough in Practice

  • Zinc Abstract Machine paper for CamlLight: the yacc parser had ~50 reduce/reduce conflicts and ~500 shift/reduce conflicts, or it had to be written in a convoluted style. Problem: Caml doesn't have closing delimiters.

  • CoffeeScript video: "Rise of Transpilers": "cheating" by adding fake tokens in the input

  • JRuby: cheating by massaging output

  • How Clang handles the type/variable name ambiguity of C++ -- My gripe is not with the grammar itself (although I admit it's needlessly complex), it's with the inability of Yacc-generated LALR(1) parsers to parse it without considerable hacks. As I've mentioned numerous times before, industrial-strength compilers for C/C++ exist after all, so they do manage to somehow parse these languages.

Alternative idea: parsers should be architected as pure functions from tokens to AST... (Doesn't matter what computational model) And then you can instrument them and empirically determine backtracking/efficiency ?

  • TODO Review/Summarize papers
    • Yakker -- scannerless, code available in OCaml
    • Niall -- for data formats, length prefixes and checksums, etc.
    • Pure Declarative Syntax Lost and Regained (GLR). From Nix authors.
    • Langsec: this is a different but related issue. CFG is the wrong model. Models should be based on reality.
      • finite vs infinite languages. Length prefixes.

Sweet spot? Context-sensitive / Turing complete, linear time lexing, and statically parseable

Lexing and Parsing should be Separate.

PEGs combine them, only for reasons of presentation.

Had this discussion twice: at work about GLR, and maybe about Gazelle. And on HN: https://news.ycombinator.com/item?id=13042835

Related Issues

  • Turing completeness vs. side effects/capabilities. Both of these are design issues and shouldn't be conflated.
  • Whether there is abstraction power like functions. I think Awk was Turing complete at first, but it didn't have functions. Related lobsters thread.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.