-
Notifications
You must be signed in to change notification settings - Fork 2
phillbush/hoc
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
hoc: High-Order Calculator This is an implementation of a stage-6 hoc(1) with added features. Hoc (High-Order Calculator) is an interpreter for a simple language used for floating point arithmetic with C-like syntax. § FILES • README: This file. • Makefile: The makefile. • code.[hc]: Routines for executing the machine instructions. • error.[hc]: Routines for error printing. • main.c: The main routine. • lex.l: The lexical analyzer. • gramm.y: The grammar. § USAGE $ hoc [file [arguments ...]] Hoc reads the file given as argument and interpret it. If no file is given, or if file is “-”, hoc interprets the standard input; in this case, I recommend running hoc with rlwrap(1), for a better interactive shell with history support. § TODO • Add a facility to execute system commands from within hoc (and assign their return value to variables) (exercise 8-7). • Add interrupt handling, so that a runaway computation can be stopped without losing the state of variables already computed (exercise 8-16). • Add arrays to hoc. Pass they by reference to function and procedures. Return a pointer to them (exercise 8-20). • Add string concatenation. • Add option -e to read code from command-line. • Read environment variables by a getenv() built-in function. § FEATURES Exercise 8-2 (modulus and unary +). This version of hoc(1) supports the modulus operator (%) and the unary plus operator (+). Exercise 8-3 (printed value). The previously printed value can be accessed using the special variable `.` (period). Exercise 8-4 (semicolon). This version of hoc(1) supports both semicolon and newlines as statement separators. (See the `term` production rule on gramm.y). Exercise 8-5 (prohibit assignment to constants). In this version of hoc(1), constants such as pi are implemented as nullary built-in functions. So, instead of using `PI`, you must use `pi()`. Since built-in functions cannot be assigned, this prohibit assignment to constants. Exercise 8-6 (n-ary functions). This version of hoc(1) supports built-in functions with variable arity. It also has the built-in function rand(), which returns a floating point random variable uniformly distributed on the interval (0,1). Exercise 8-8 (table of built-in functions). This version of hoc(1) uses a table instead of a set of essentially identical functions for implementing the built-in functions. Exercise 8-10 (dynamic machine). The sizes of `stack` and `prog` are dynamic, so hoc never runs out of space as memory can be obtained by calling malloc(3). Exercise 8-12 (debug). By compiling with -DDEBUG=1, the machine instructions hoc generates are printed in a readable form for debugging. Exercise 8-13 (assignment operators, short-circuit). This version of hoc(1) supports the assignment operators of C, such as +=, *=, etc, and the increment and decrement operators ++ and --. It also supports short-circuit evaluation of the && and || operators, so they guarantee left-to-right evaluation and early termination, as in C. Exercise 8-14 (for loops). This version of hoc(1) supports for loops, and break and continue statements. Each element of the for loop (pre-loop; condition; post-loop) can be omitted, as in C. An omitted condition implies in an infinite loop. Exercise 8-15 (comments). This version of hoc(1) supports comments beginnign with “#”. Exercise 8-18 (named formal parameters). This version of hoc(1) supports named formal parameters in subroutines as an alternative to $1, etc. Exercise 8-19 (local variables). This version of hoc(1) supports local variables by the same inelegant way that awk(1) does. Exercise 8-21 (string handling). This version of hoc(1) supports generalized string handling, so that variables can hold strings instead of numbers. (I have to add a concatenation operand though). It also has facilities for output formatting (the printf statement). Lex. This version of hoc(1) uses lex(1) for implementing the lexical analyzer. Expression list. This version of hoc(1) supports list of expressions separated by comma, for example in the for condition `for (i = 0, j = 1; i < 3; i++, j++)`. The value of a expression list is that of the rightmost expression. print, printf and sprintf(). The `print` statement in this version of hoc actually works like awk(1)'s `print` statement (separates each argument with a space, and prints a newline at the end). As awk(1), this version of hoc(1) also has a `printf` statement and a `sprintf()` built-in function. getline The `getline VAR` expression reads a line into the variable VAR. do-while. This version of hoc(1) supports do-while statements. Access command-line arguments. This version of hoc(1) can access command-line arguments with the '$' operator. $1 is the first command-line argument, $2 is the second, and so on. $0 is the name of the input file. Escape line break. This version of hoc(1) can escape new lines by ending a line with a backslash (\). § NON-FEATURES Exercise 8-11 will not be implemented. It requires using a switch on the type operation instead of calling functions from a function pointer. Using function pointer is more elegant and easier to maintain than using switch cases. Exercise 8-17 will not be implemented. If you want editing features use a shell wrapper such as rlwrap(1). Reading from multiple files will not be implemented. Use cat(1). § HOW IT WORKS Hoc compiles input into a stack machine. As input is parsed, code is generated for a simple computer instead of immediately computing answers. Once the end of a statement is reached, the generated code is executed (interpreted) to compute the desired result. The simple computer is a ‘stack machine’: when an operand is encountered, it's pushed onto a stack (more precisely, code is generated to push it onto a stack). Stack machines usually result in simple interpreters (it's just an array containing operators and operands). The operators are the machine instructions; each is a function call with its arguments, if any, following the instruction. Before parsing and execution begins, the machine is initialized and the symbol table is populated by init(). Calls to the function install() installs both keywords and built-in functions in the symbol table. The main loop reverts the stack machine to its initial state and parses the input, one statement at a time. While the input is parsed, code is generated for later execution by calls to the function `code()`, which simply puts an `Inst` data (see bellow) into the next free spot in the program memory (pointed by `prog.progp`). Once a statement is parsed, the generated code is printed if DEBUG is set and then executed. For example, to handle the assignment `x = 2 * y`, the following code is generated. When this code is executed, the expression is evaluated and the result is stored in x. The final `pop` clears the value off the stack because it is not needed any longer. OPERATION: constpush Push a constant onto stack VALUE: 2 … the constant 2 OPERATION: varpush Push symbol table pointer onto stack SYMBOL: y … for the variable y OPERATION: eval Evaluate (replace pointer by value) OPERATION: mul Multiply top two items; product replaces them OPERATION: varpush Push symbol table pointer onto stack SYMBOL: x … for the variable x OPERATION: assign Store value in variable, pop pointer OPERATION: pop Clear top value from stack OPERATION: STOP End of instruction sequence The machine itself is a list of entries of data of type `Inst`. An `Inst` is a union of stuff that can go into the memory; such as pointer to routines like `mul` that perform an arithmetic operation; or pointer to a entry in the symbol table; or a floating point number (for constant values); or a pointer to another entry in the machine (used by control flow statements), a string, etc. Execution of the machine is simple. Each cycle calls `execute()`, which executes the function pointed to by the instruction pointed to by the program counter `prog.pc`, and makes `prog.pc` point to the next instruction so it's ready for the next instruction. An NULL instruction terminates the execution. Some instructions also increment `prog.pc` to step over any arguments that follows the instruction. The code generated for while and if needs particular study. When the keyword `while` is encountered, the operation whilecode() is generated, and its position in the machine is returned as the value of the while production. At the same time, however, the two following positions in the machine are also reserved, to be filled in later. The next code generated is the expression that makes up the condition part of the `while`. After the whole `while` statement has been recognized, the two extra positions reserved after the `whilecode` instruction are filled with pointers to the locations of the loop body and the statement that follows the loop. Code generated for `for` and `if` works similar. You can compile with -DDEBUG=1 for hoc to print the generated machine code after it is generated. Strings used as values are allocated in two linked-lists of strings. `autostrings` contains strings that appears during evaluation as the result of an expression. Strings in `autostrings` are freed after each execution. `finalstrings` contain strings that are the value of a variable, and thus should not be freed. Different from the book, where symbols are used for variables, keywords, built-in functions, etc, in this implementation there is two different structures, Name and Symbol. The first is looked up while the input is being read, while Symbols are looked up during execution. Names for variable, functions and keywords are installed in a name table. There is a global name table, used for keywords, function names and global variable names; and a name table for each function definition, used for local variables. When a variable is evaluated, it is looked up in a symbol table, which associate names with values. There is a global symbol table, for global values; and one symbol table for each function frame, containing automatic values. § SEE ALSO The UNIX Programming Environment, by Brian W. Kernighan and Rob Pike, Prentice Hall, 1984. ISBN: 0-13-937681-X.