Skip to content

kopchik/deadscript

Repository files navigation

Deadscript

My experiments with compilers.

How it works

  1. Indent parsing

    Parses source into tree with scopes defined by indentation (python-like).

  2. Grammar parsing

    Build abstract syntax tree over the previous tree. It's a ``top-down'' parser. No any semantic analysis yet, the tree is 1 to 1 match the original program.

  3. Semantic analysis

    Find functions, loops, branches and other main building blocks.

  4. Type inference

    The compiler tries guess the types of variables.

  5. Sanity check

    Checks that, e.g., main() has correct arguments and so on.

  6. Code generation

    AST traversing and generation of ``llvm intermediate representation''.

  7. Compilation Invoce llvm to build the program.

Design Goals

What do computer programs? They provide a sequency of transformatins to you data in way to get the meaningful output. The goal of a computer language is to support writing such transformations

  1. Be safe, compact and friendly
  2. Static typing
  3. ML-like syntax (inspired by LiveScript)
  4. Public/Protected/Private attributes of the classes
  5. Built-in regexp support
  6. Built-in shell commands invocation
  7. Function overloading
  8. Custom operators
  9. Garbage collection
  10. Will alarm on useless statements (like forget to call function)
  11. Substitute vars in strings: "Hello, {username}!"
  12. UTF8 strings
  13. Assigments in if-clause (but it should evaluate to bool <- safety measure)
  14. Support comments: shell-style # blah cpp // here is the comment C /* Hi! */
  15. All programs can be opened as libraries
  16. No header files needed, everything is in elf (possibly in compressed format).
  17. Keep it simple (to learn, to read, to extend)
  18. Error-resistant coding

FILES

  1. dead.py -- just launcher of all stuff
  2. peg.py -- PEG parser that allows to define grammar in a bnf-like way
  3. pratt.py -- Pratt parser, used to parse expressions
  4. tokenizer.py -- split input into tokens, uses PEG
  5. ast.py -- abstract syntax tree and rewrite tools
  6. codegen.py -- a small helper script to write correctly-indented code

Other

The normal assumtion is that memory allocation will never fail. This is because most of programs anyway don't know how to deal with these errors. If a program must not silently fail there is a method to provisionally allocate required amount of memory.

Why static: Just today I found typing bugs in pypeg and modgrammar. I see typing problems almost every day in many programs and libraries!

Why methods instead of functions: Python's namespaces highly polluted with abs, len, sum, all, vars, min/max, next, list, id, to, dict, etc...

Terminology

  1. Parser (definitions are from https://siod.svn.codeplex.com/svn/winsiod/pratt.scm, A simple Pratt-Parser for SIOD: 2-FEB-90, George Carrette, GJC@PARADIGM.COM): 1. NUD -- NUll left Denotation (op has nothing to its left (prefix)) 1. LED -- LEft Denotation (op has something to left (postfix or infix)) 1. LBP -- Left Binding Power (the stickiness to the left) 1. RBP -- Right Binding Power (the stickiness to the right)

Other project names

  1. Brainduck (busy)
  2. Concrete mixer

References

Simple top-level parsing

  1. http://en.wikipedia.org/wiki/Parsing_expression_grammar

Expression parsing (with precedence)

  1. http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/
  2. http://effbot.org/zone/simple-top-down-parsing.htm

Useful Links

  1. http://roscidus.com/blog/blog/2013/06/20/replacing-python-round-2/#syntax
  2. http://en.wikipedia.org/wiki/Linear_type_system

Types

Str Int Bool Tuple Array

About

My experiments with compilers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published