Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upLanguage Design and Theory of Computation
Undecidable Parsing
Deciding whether to print "syntax error" shouldn't require simulating a Turing machine.
-
Parsing Bash is Undecidable
- Bash, Perl, and Make can't be statically parsed because they interleave parsing and execution.
- C++ can't be statically parsed because it has two Turing complete computation mechanisms -- templates at compile time and normal code at runtime -- and parsing sits in between them.
Accidentally Turing Complete
-
Shell wasn't Turing complete when first designed.
-
Make wasn't Turing complete when first designed. See Example Code in Shell, Awk, and Make.
-
sedlisp is proof that sed is Turing complete. Not sure what constructs are used.
-
Accidentally Turing Complete -- nice comprehensive list, including some games which we don't care about here
- SQL
- However, with the features Common Table Expressions and Windowing, SQL is Turing Complete.
- https://wiki.postgresql.org/wiki/Cyclic_Tag_System
- Apache rewrite rules
- sendmail
- MediaWiki templates
- SQL
-
What about TeX and PostScript? I suspect those were both intended to be Turing Complete, although the syntax is odd.
-
printf is Turing complete.
%nspecifier writes to memory?- printbf -- Brainfuck interpreter in printf
- Section 6.2.1 of http://nebelwelt.net/publications/files/15SEC.pdf
When we control the arguments to printf(), it is possible to obtain Turing-complete computation. We show this formally in Appendix B by giving calls to printf() which create logic gates.
-
x86 MMU Fault Handling
- trapcc -- Compute with 0 instructions on Intel!
This is a proof by construction that the Intel MMU's fault handling mechanism is Turing complete. We have constructed an assembler that translates 'Move, Branch if Zero, Decrement' instructions to C source that sets up various processor control tables. After this code has executed, the CPU computes by attempting to fault without ever executing a single instruction.
Context-Free Grammars are not Powerful Enough in Practice
TODO: Move to Parsing Is Difficult page?
Humans are better than computers at parsing programming languages.
To-do: collect instances of people arguing for grammars . For tools, IDEs?
- Nice series by Trevor Jim
- Python is not context-free? by Trevor Jim
-
Is Java context-free? by Trevor Jim.
- two grammars is a sign that context-free
- about 4 or 5 other posts
- [The Context-Sensitivity of C's Grammar](http://eli.thegreenplace.net/2007/11/24/the-context-sensitivity-of-cs grammar)
- Perl/C++/Make : not really on the table because they're not statically parseable.
- Ruby: experience with writing a full Ruby parser in Ruby https://news.ycombinator.com/item?id=13041646
Yacc isn't Powerful Enough in Practice
-
Zinc Abstract Machine paper for CamlLight: the yacc parser had ~50 reduce/reduce conflicts and ~500 shift/reduce conflicts, or it had to be written in a convoluted style. Problem: Caml doesn't have closing delimiters.
-
CoffeeScript video: "Rise of Transpilers": "cheating" by adding fake tokens in the input
-
JRuby: cheating by massaging output
-
How Clang handles the type/variable name ambiguity of C++ -- My gripe is not with the grammar itself (although I admit it's needlessly complex), it's with the inability of Yacc-generated LALR(1) parsers to parse it without considerable hacks. As I've mentioned numerous times before, industrial-strength compilers for C/C++ exist after all, so they do manage to somehow parse these languages.
Alternative idea: parsers should be architected as pure functions from tokens to AST... (Doesn't matter what computational model) And then you can instrument them and empirically determine backtracking/efficiency ?
- TODO Review/Summarize papers
- Yakker -- scannerless, code available in OCaml
- Niall -- for data formats, length prefixes and checksums, etc.
- Pure Declarative Syntax Lost and Regained (GLR). From Nix authors.
- Langsec: this is a different but related issue. CFG is the wrong model. Models should be based on reality.
- finite vs infinite languages. Length prefixes.
Sweet spot? Context-sensitive / Turing complete, linear time lexing, and statically parseable
Lexing and Parsing should be Separate.
PEGs combine them, only for reasons of presentation.
Had this discussion twice: at work about GLR, and maybe about Gazelle. And on HN: https://news.ycombinator.com/item?id=13042835
Related Issues
- Turing completeness vs. side effects/capabilities. Both of these are design issues and shouldn't be conflated.
- Whether there is abstraction power like functions. I think Awk was Turing complete at first, but it didn't have functions. Related lobsters thread.