Skip to content
This repository

GCI: Relocate PIRCTasklist #15

Closed
wants to merge 1 commit into from

2 participants

Matt Rajca Andrew Whitworth
Matt Rajca

I converted the entire wiki article on PIRC development into .mdown files. I also fixed all broken links.

Andrew Whitworth
Owner

Your changes have been pulled into the gci_pirctasklist branch in the PIRC repository for review. Thanks!

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 1 unique commit by 1 author.

Dec 04, 2010
Matt Rajca mattrajca Added PIRC Development docs f3135e7
This page is out of date. Refresh to see the latest.
16 docs/TODO.mdown
Source Rendered
... ... @@ -0,0 +1,16 @@
  1 +PIRC Development Tasks
  2 +======================
  3 +
  4 +Shouldn't-be-too-hard tasks
  5 +---------------------------
  6 +
  7 +* write tests for the generated output.
  8 +
  9 +Hardcore hacking tasks
  10 +----------------------
  11 +
  12 +* Fix parser to "calculate" the right signature for ops such as: `$P0 = new ['Integer']`
  13 +
  14 +Currently, the argument is encoded as `_ksc`, for key, string-constant.
  15 +
  16 +* Convert all C strings in PIRC into `STRINGs`. All identifiers and strings that are scanned should be stored as `STRING` objects, not C strings.
4 docs/current_status.mdown
Source Rendered
... ... @@ -0,0 +1,4 @@
  1 +PIRC Status
  2 +===========
  3 +
  4 +PIRC is not complete yet. All stages are implemented (lexer, parser, bytecode generator), but all of them need some additional work to complete them. See the section below for the specific items that need to be fixed. Once these are fixed, PIRC will be done about 98%.
49 docs/general.mdown
Source Rendered
... ... @@ -0,0 +1,49 @@
  1 +PIRC Introduction
  2 +=================
  3 +
  4 +PIRC is a fresh implementation of the PIR language. It is being developed as a replacement for the current PIR compiler, IMCC. Somewhere in the future, we all hope to be able to finish it. However, some help is needed. Most of the tricky parts have been done for you, such as implement all sorts of weird features of the PIR language.
  5 +
  6 +The basic workflow of PIRC is as follows. The lexer and parser are implemented with Flex and Bison specifications. During the parsing phase, a data structure is built that represents the input. To stick with compiler jargon, let's call this the Abstract Syntax Tree (AST). After the parse, this AST is traversed and for each instruction the appropriate bytecode is emitted. Registers are allocated by the built-in vanilla register allocator. This means that for the following code:
  7 +
  8 + .sub main
  9 + $S12 = "Hi there"
  10 + print $S12
  11 + $I44 = 42
  12 + print $I44
  13 + .end
  14 +
  15 +`$S12` and `$I44` will be mapped to the registers `S0` and `I0` respectively (yes, you guessed it, it starts allocating from 0). As you would expect, the vanilla register allocator is pretty stupid, but the generated bytecode is not too bad, really. If you want to optimize the register usage (which saves runtime memory), you can activate the register optimizer. The register optimizer is based on a Linear Scan Register allocator. The original algorithm, as described in [this paper](http://www.google.ie/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ucla.edu%2F~palsberg%2Fcourse%2Fcs132%2Flinearscan.pdf&ei=w9F5SvzVDpOqsAa_7tyeBQ&usg=AFQjCNETIxGGy87F9GzLawd4euXEaldcnQ&sig2=Hd7nnjdQrgnOqix-8sx92g), assumes a fixed number of available registers. Since Parrot has a variable number of registers available per subroutine, the algorithm has been changed here and there. See the file [/src/pirregalloc.c](https://github.com/parrot/pirc/blob/master/src/pirregalloc.c) for the implementation.
  16 +
  17 +PIRC vs IMCC
  18 +============
  19 +
  20 +While PIRC is an implementation of the PIR language which is specified in PDD19, there are some subtle differences with the current implementation, IMCC. In case you were wondering, IMCC stands for IMC Compiler, with IMC being the old name of the PIR language, standing for Intermediate Machine Code. The name was changed a long time ago.
  21 +
  22 +* "nested" heredocs, can be handled by PIRC, not by IMCC. Yes, it was very painful to implement which is why IMCC doesn't.
  23 +* comments or whitespace in the parameter lists are accepted by PIRC, but not by IMCC. It sounds like an easy fix, but it isn't. Hence, PIRC!
  24 +* reentrant: PIRC is, IMCC is not.
  25 +* checks for improper use syntactic sugar with OUT arguments, such as `$S0 = print`. PIRC checks for this, IMCC doesn't. Again, it sounds (and looks) like an easy fix, but it isn't.
  26 +* PIRC handles macros in the lexer and parser; the syntax to define macros is defined in the parser. IMCC on the other hand implements macro's completely in the lexer.
  27 +
  28 +Building and running PIRC
  29 +=========================
  30 +
  31 +PIRC is located in compilers/pirc. In order to compile, do the following:
  32 +
  33 + cd compilers/pirc
  34 + make
  35 + make test
  36 +
  37 +At this point (August 5, 2009) some tests are failing, so don't be alarmed if you see them failing.
  38 +
  39 +In order to run PIRC:
  40 +
  41 + ./pirc -h
  42 + ./pirc -b test.pir # will generate a file a.pbc
  43 +
  44 +Enjoy!
  45 +
  46 +PIRC Data Flow
  47 +==============
  48 +
  49 +The data flow of PIRC is as follows. First, the [heredoc preprocessor](https://github.com/parrot/pirc/blob/master/src/hdocprep.l) takes the PIR file and flattens all heredoc strings. The output is written to a temporary file, which is then parsed by the [Flex based lexer](https://github.com/parrot/pirc/blob/master/src/pir.l) and the [Bison based parser](https://github.com/parrot/pirc/blob/master/src/pir.y). The parser create an Abstract Syntax Tree (AST); the AST nodes for that are defined in [/src/pircompunit.c](https://github.com/parrot/pirc/blob/master/src/pircompunit.c). If the parse was successful, control is passed on to the [/src/piremit.c](https://github.com/parrot/pirc/blob/master/src/piremit.c) module, which traverses the AST. During the traversal, bytecode is generated through the [/src/bcgen.c](https://github.com/parrot/pirc/blob/master/src/bcgen.c) module. The output is written to a file named a.pbc; the name of the output file can be overridden with the `-o[wi` option.
285 docs/internals.mdown
Source Rendered
... ... @@ -0,0 +1,285 @@
  1 +PIRC Internals
  2 +==============
  3 +
  4 +In this section, PIRC's guts are dissected in order to explain what exactly is going on under the hood. If you are interested in the nitty-gritty details, keep on reading. (Note that this is a work-in-progress and will take some time to be completed).
  5 +
  6 +PIRC Lexer
  7 +----------
  8 +
  9 +Heredoc processor
  10 +-----------------
  11 +
  12 +The Heredoc processor has only one task: flattening heredoc strings. By "flattening", I mean the following. This string:
  13 +
  14 + $S0 = <<'EOS'
  15 + This is
  16 + a multi-line
  17 + heredoc
  18 + string
  19 + with
  20 + increasing
  21 + indention
  22 + on each line.
  23 + EOS
  24 +
  25 +is "flattened" into:
  26 +
  27 + $S0 = "This is a multi-line\n heredoc\n string\n with\n increasing\n indention\n on each line."
  28 +
  29 +Note that "newline" characters are inserted as well, so that the string is equivalent to the original heredoc string. Besides assigning heredoc strings to String registers, the PIR specification also allows you to use heredoc strings as arguments in subroutine invocations:
  30 +
  31 + .sub main
  32 + foo(<<'A')
  33 + This is a heredoc
  34 + string argument
  35 + A
  36 + .end
  37 +
  38 + .sub foo
  39 + # ...
  40 + .end
  41 +
  42 +Again, the heredoc string (delimited by the string "A") will be flattened. According to the PIR specification, you can even pass multiple heredoc string arguments, like so:
  43 +
  44 + .sub main
  45 + foo(<<'A', 42, <<'B', 3.14, <<'C')
  46 + I have a Parrot
  47 + A
  48 + It is not a bird
  49 + B
  50 + It is a virtual machine
  51 + C
  52 + .end
  53 +
  54 +Note that the heredoc arguments may be mixed with other, simple arguments such as integers and numbers. In the rest of this section, the implementation will be discussed.
  55 +
  56 +Heredoc parsing implementation
  57 +------------------------------
  58 +
  59 +The implementation of the Heredoc preprocessor can be found in [/src/hdocprep.l](https://github.com/parrot/pirc/blob/master/src/hdocprep.l). It is a Lex/Flex lexer specification, which means you need the Flex program to generate the C code for this preprocessor. The preprocessor takes a PIR file that contains heredoc strings, and flattens out all heredoc strings. It writes a temporary file to disk that is exactly the same as the original PIR file, except that all heredoc strings are flattened.
  60 +
  61 +For this discussion, it is assumed you have a basic understanding of the Flex program. For instance, you need to know what "state" means in Flex context. If you don't know, please refer to [the Flex documentation page](http://flex.sourceforge.net/manual/).
  62 +
  63 +In order to make the heredoc preprocessor reentrant, no global variables are used. Instead, [lines 83 to 98](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L83) define a `struct global_state`. The comments in the code briefly describe what each field is for, but they will be discussed in more detail later if we walk through the actual processing of the heredocs. A new instance of this struct can be created by invoking [`init_global_state`](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L157). For now, it is useful to know that this struct has a pointer to a Parrot interpreter object, the name of the file being processed, and a pointer to the output file.
  64 +
  65 +The function [`process_heredocs`](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L208) is the main function of the heredoc preprocessor that the main compiler program (PIRC) invokes. This function opens the file to be processed, initializes the lexer, creates a new `global_state` struct instance, as described above, invokes the lexer to do the processing and cleans up afterwards.
  66 +
  67 +We will now walk through two different scenarios, in order to simplify the discussion. Scenario 1 discussed the case of single heredoc parsing, and Scenario 2 discusses multiple heredoc parsing. Multiple heredoc parsing starts out with Scenario 1, but is a bit more advanced.
  68 +
  69 +**Scenario 1a: single heredoc string parsing**
  70 +
  71 +Consider the following input:
  72 +
  73 + .sub main
  74 + $S0 = <<'EOS'
  75 + This
  76 + is
  77 + a
  78 + heredoc
  79 + string.
  80 +
  81 + EOS
  82 + .end
  83 +
  84 +The lexer starts out in the `INITIAL` state by default (as per Flex specification). When reading input such as `<<'EOS'`, the rule on [line 306](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L306) is activated. The actual string ("EOS") is stored in the field `state->delimiter`, and an escaped newline character is stored in the heredoc buffer.
  85 +
  86 +Since the preprocessor does not build a data structure representing the input, but instead writes the output directly (to a file), the "rest of the line" needs to be stored somewhere. This is because the `<<'EOS'` heredoc token is basically a placeholder for the actual (heredoc) string contents. Hence, the [activation of `SAVE_REST_OF_LINE` state](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L318).
  87 +
  88 +The state `SAVE_REST_OF_LINE` has only one function, and that is to SAVE the REST OF the LINE :-). It will match all the text after the `<<'EOS'` heredoc marker up to and include the end-of-line character. This, including an additional "\n" character is stored in the `linebuffer` field, which always contains the "rest of the line". As you can see, in this scenario there is no "rest of the line", except for the end-of-line character ("\n", or "\r\n" on Windows). See Scenario 1b below for a variant on this, in which the "rest of the line" contains a closing parenthesis of a subroutine invocation.
  89 +
  90 +After the heredoc marker the actual heredoc string must be scanned, hence the activation of the `HEREDOC_STRING` state on [line 331](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L331). In the state `HEREDOC_STRING`, there are three different types of input:
  91 +
  92 +1. "end-of-line" characters, basically an empty line (see [line 357](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L357)). An escaped newline character ("\\n") will be stored as part of the heredoc string.
  93 +2. "normal" heredoc string lines (see [line 376](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L376). First the newline character is removed, because we may have found the heredoc string delimiter, that was stored earlier. In order to compare the strings, the newline character is chopped off (see [lines 381-384](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L381)). Then, a string comparison is done in order to see whether we just read the heredoc string delimiter. If so, then we need to continue scanning the "rest of the line" that was saved earlier. However, since we need to switch back later to the current buffer, we need to store this current buffer ([line 395](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L395)). Also, the lexer's state is changed to `SCAN_STRING`, since we're going to scan a saved string. Then, the lexer's told to read the next input from the string buffer ([line 406](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L406)). If however, we did not read the heredoc delimiter, then it's just a line that's part of the heredoc string, which needs to be stored. In that case, a new buffer is allocated to store the heredoc string so far, plus the new line that's just been scanned. The old buffer is released.
  94 +3. End of file ([line 423](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L423)). When the lexer encounters end-of-file, an error is printed to the screen, and the lexer terminates.
  95 +
  96 +Once the heredoc string has been completely scanned, the `SCAN_STRING` state is activated. Again, there's a number of different input patterns that may be scanned:
  97 +
  98 +1. Another heredoc marker (`<<{Q_STRING}`, [line 428](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#428)). See Scenario 2 for a discussion of this.
  99 +2. End of line ([line 447](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#447)). Nothing is done.
  100 +3. Any character ([line 449](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L449)). The character (for instance, a parenthesis) is written to the output.
  101 +4. End of file ([line 451](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L451)). End of file, in this context, means end of string. So, we've finished scanning the "rest of line" string buffer, so now the lexer needs to switch back to read the next input from the file again. Also, the lexer's state is switched back to the default state (`INITIAL`).
  102 +
  103 +This completes the processing of a single heredoc string.
  104 +
  105 +**Scenario 1b: single heredoc argument parsing**
  106 +
  107 +Scenario 1b is almost the same as Scenario 1a, except that instead of a heredoc string being assigned to some target (register), the heredoc string is an argument to a function. Consider the following input:
  108 +
  109 + .sub main
  110 + foo(<<'EOS')
  111 + This
  112 + is
  113 + a
  114 + heredoc
  115 + string.
  116 +
  117 + EOS
  118 + .end
  119 +
  120 +The process of parsing this heredoc string is pretty much the same as in Scenario 1a, except that the "rest of the line" contains the closing parenthesis ")" to close the argument list of the invocation of `foo`.
  121 +
  122 +**Scenario 2: multiple heredoc parsing**
  123 +
  124 +Consider the following input:
  125 +
  126 + .sub main
  127 + foo(<<'A', 42, <<'B', <<C')
  128 + heredoc text a
  129 + A
  130 + heredoc text b
  131 + B
  132 + heredoc text c
  133 + C
  134 +
  135 + .end
  136 +
  137 +Now, scanning up to and including the first heredoc marker:
  138 +
  139 + foo(<<'A'
  140 +
  141 +is done exactly the same as described in Scenario 1. Assume that the lexer just found the heredoc delimiter for heredoc string A. The lexer's current state is `HEREDOC_STRING`, but as can be seen in [line 404](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L404), the lexer will now switch to `SCAN_STRING` state in order to scan the "rest of the line". The rest of the line buffer contains:
  142 +
  143 + , 42, <<'B', <<'C')
  144 +
  145 +First the comma and whitespace is scanned, handled by [line 449](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L449). Then the argument "42" is matched ([line 449](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L449), "any character") as well as the comma.
  146 +
  147 +Then the heredoc marker for heredoc B is scanned ([line 428](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L428)). This section of code is almost similar to the section that matches heredoc markers in the `INITIAL` state ([line 306](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L306)). The difference is that instead of activating `SAVE_REST_OF_LINE` state, the `SAVE_REST_AGAIN` state is activated. `SAVE_REST_AGAIN` is almost the same to `SAVE_REST_OF_LINE` state. The difference is, that in `SAVE_REST_OF_LINE`, the lexer is still reading from the file buffer, whereas when the lexer is in `SAVE_REST_AGAIN`, it is scanning a string buffer. Therefore, the lexer must switch from the string buffer to reading the file buffer, which is done in [line 350](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L350).
  148 +
  149 +At this point, heredoc string B is scanned. After that, heredoc string C is scanned. It is left as the proverbial exercise to the reader to try to understand how this is done. The previous discussion of the involved lexer states should greatly help in this.
  150 +
  151 +**POD parsing**
  152 +
  153 +POD comments are filtered out from the input. This is implemented in [lines 287 to 301](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L287)). Note that [line 287](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L287) is very important: it matches a "=cut" directive (which ends a POD comment) in the `INITIAL` state (so, when no previous POD comment was seen yet). If this pattern wouldn't be matched in the `INITIAL` state, the "=cut" directive would actually activate the POD state. This is because "=cut" starts with a "=", which is the first character of a POD directive (see [line 289](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L289)).
  154 +
  155 +**include directives**
  156 +
  157 +The `.include` directive is logically a macro expansion directive. It takes one argument, which is the name of a file. If the `.include` directive is encountered, the lexer switches to the specified file, and starts reading from that file. Once the end of the file has been reached, the lexer switches back to the original file.
  158 +
  159 +The `.include` directive is implemented in the heredoc preprocessor. This is necessary in order to be able to use heredoc strings in the included file. If the directive would have been implemented in the normal PIR lexer (that implements macro expansion), then the heredoc preprocessor would have to be invoked first on the included file.
  160 +
  161 +Once the `.include` directive is read, the lexer switches state from `INITIAL` to `INCLUDE` ([line 479](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L479)). This is done using the built-in state stack in the Flex-generated lexer. The `INCLUDE` state is pushed onto the state stack, and immediately activated. (Once the state is popped off, the lexer switches to the state that's then the new top-of-stack. Since an included file can include other files, a stack is used to keep track of this. Four different input patterns are distinguished:
  162 +
  163 +1. whitespace ([line 483](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L483)). Whitespace is skipped.
  164 +2. a quoted string, which is the name of the file to be included ([line 485](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L485)). Once the quoted string is stripped from its quotes, the file is located and the lexer will start processing that file.
  165 +3. end of line [(line 528](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L528)). This would be the end-of-line after the quoted string that was included. Once this is encountered, the included file has already been completely processed. Therefore, the lexer's state is popped off the lexer state stack.
  166 +4. any other character ([line 532](https://github.com/parrot/pirc/blob/master/src/hdocprep.l#L532)), resulting in an error message.
  167 +
  168 +Macro layer
  169 +-----------
  170 +
  171 +The macro layer is implemented in both the lexer and the scanner. The syntax to define and expand macros is defined in the parser. This is a fundamental difference from how macros are implemented in IMCC. In IMCC, the macro layer is completely implemented in the lexer.
  172 +
  173 +Currently, basic macros work, but nested macros do not. This needs to be fixed.
  174 +
  175 +PIRC Parser
  176 +-----------
  177 +
  178 +The parser is implemented in [/src/pir.y](https://github.com/parrot/pirc/blob/master/src/pir.y). This is a parser specification that needs to be processed by the Bison program in order to generate the C file.
  179 +
  180 +Symbol Management
  181 +-----------------
  182 +
  183 +Symbol management is implemented in [/src/pirsymbol.c](https://github.com/parrot/pirc/blob/master/src/pirsymbol.c). Symbols declared using the `.local` directive are stored in a symbol table. Whenever an identifier is parsed, it will be looked up in this symbol table.
  184 +
  185 +All uses of PIR registers (e.g. `$I42`) are registered as well. The first time a PIR register is used, it is assigned a PASM register. This process is called "coloring". The word "color" is often used in the context of register allocation, since the "classic" algorithm to do so is called "graph-coloring". While the vanilla register allocator does not such algorithm, the field "color" is used for storing the actual PASM register number that was assigned.
  186 +
  187 +Constant Folding
  188 +----------------
  189 +
  190 +Strength Reduction
  191 +------------------
  192 +
  193 +Abstract Syntax Tree
  194 +--------------------
  195 +
  196 +During the parsing phase, an Abstract Syntax Tree (AST) is constructed. There are a number of different node types. There were two approaches for defining the node types:
  197 +
  198 +1. Define one node type, that contains all fields that could be needed. An advantage of this approach would be that it simplifies the code. On the other hand, it would probably make the code more obscure to read (since you can't really see what a node represents anymore), and also it would waste memory, since many fields would not be used by most of the instances. Furthermore, it would be easier to misuse certain fields for other purposes than the field was supposed to be used for.
  199 +2. Define specialized types. This is the approach taken.
  200 +
  201 +PIRC defines the following node types in [/src/pircompunit.h](https://github.com/parrot/pirc/blob/master/src/pircompunit.h):
  202 +
  203 +* [constdecl](https://github.com/parrot/pirc/blob/master/src/pircompunit.h#L162), used for a .const or `.globalconst` declaration
  204 +* [constant](https://github.com/parrot/pirc/blob/master/src/pircompunit.h#L172), used to represent literal constants in the source code (e.g. 42, 3.14, "hello")
  205 +* [label](https://github.com/parrot/pirc/blob/master/src/pircompunit.h#L180), used to store a label and its instruction offset
  206 +* [expression](https://github.com/parrot/pirc/blob/master/src/pircompunit.h#L196), used to represent an instruction operand. Since there are many different AST node types, and an instruction can have various types of operands, the expression node type is used to wrap these.
  207 +* [key_entry](https://github.com/parrot/pirc/blob/master/src/pircompunit.h#L216), used to represent a key value; for instance the key [1;"hi"] has 2 entries: 1 and "hi".
  208 +* [key](https://github.com/parrot/pirc/blob/master/src/pircompunit.h#L225), used to represent a key; it has a pointer to the first key value, and keeps track of the total number of key entries ([1;"hi"] has 2 key entries)
  209 +* [target](https://github.com/parrot/pirc/blob/master/src/pircompunit.h#L238), used to represent a left-hand side (LHS) object. As such, it can be assigned a value (hence the name target), and it can be used as a right-hand side (RHS) value.
  210 +* [argument](https://github.com/parrot/pirc/blob/master/src/pircompunit.h#L255), used to represent argument values for subroutine invocations, or for return statements. It has a pointer to an expression node that is the actual value, an `flags` field that encodes any flags (such as `:flat`, and an `alias` field, if the argument is passed by name.
  211 +* [invocation](https://github.com/parrot/pirc/blob/master/src/pircompunit.h#L275), used to temporarily represent a subroutine invocation or a return statement. It is used only temporarily; `invocation` nodes are not stored in the AST. Instead, they are converted into a set of instructions after the subroutine invocation or return statement has been parsed.
  212 +* [instruction](https://github.com/parrot/pirc/blob/master/src/pircompunit.h#L288), used to represent a single instruction.
  213 +* [subroutine](https://github.com/parrot/pirc/blob/master/src/pircompunit.h#L354), used to represent a subroutine definition.
  214 +
  215 +Vanilla Register Allocator
  216 +--------------------------
  217 +
  218 +PIRC has a built-in vanilla register allocator. The vanilla register allocator (or "register allocator" as we shall call it from now) maps PIR registers, such as `$P44`, `$I9999`, etc., to actual Parrot registers (or "PASM registers" as they are also referred to). Parrot allocates a variable number of registers per sub invocation. Some simple subs only need a few registers, whereas complex subroutines may need several tens of registers.
  219 +
  220 +Now, how does this work? PIR registers should be considered as "pre-declared" symbols; they are just symbols that you can use without declaring them. If you want fancy names, you would use the `.local` directive to declare them, after which you can use symbolic names (which are more descriptive than PIR registers).
  221 +
  222 +Basically, PIR registers and declared symbols are the same. The register allocator is reset for each subroutine. Whenever a new register is needed, it will start at 0, and increment a counter. PIR registers will always be allocated a PASM register, whereas declared symbols will only be assigned a PASM register if the symbol is actually used. This is because you could declare a bunch of `.local` symbols, but never use them. Allocating registers to them would be wasteful.
  223 +
  224 +Register Usage Optimizer
  225 +------------------------
  226 +
  227 +The vanilla register allocator is pretty dumb, in the sense that it does not consider the lifetime of variables. Or, put in another way, it assumes that all registers' lifetime is the complete subroutine. However, in real life, a register is typically only used in a small part of the subroutine. Consider this example:
  228 +
  229 + .sub main
  230 +
  231 + .local int a, b, c
  232 + a = 1
  233 + b = 2
  234 + c = 3
  235 +
  236 + .end
  237 +
  238 +The vanilla register will allocate registers 0 to 2 to these symbols `a`, `b` and `c`. However, as you can guess, since `a` is never used after the initial assignment, there is no need to assign a different register to `b`. Likewise for `b`, which can share the same register with `c`. So, in the above example, there is really only one register needed.
  239 +
  240 +However, suppose we change the example into the following:
  241 +
  242 + .sub main
  243 +
  244 + .local int a, b, c
  245 + a = 1
  246 + b = 2
  247 + c = 3
  248 + print a
  249 + print b
  250 +
  251 + .end
  252 +
  253 +In this case, the lifetime of `a` and `b` are extended, as both variables are used in the `print` statements. So, `a` cannot share a register with `b` nor with `c`. The rest of this subsection explains how this can be calculated.
  254 +
  255 +The register optimizer is a variant of the Linear Scan Register allocation algorithm as described in [this paper](http://www.google.ie/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ucla.edu%2F~palsberg%2Fcourse%2Fcs132%2Flinearscan.pdf&ei=w9F5SvzVDpOqsAa_7tyeBQ&usg=AFQjCNETIxGGy87F9GzLawd4euXEaldcnQ&sig2=Hd7nnjdQrgnOqix-8sx92g). Since that algorithm assumes there's a fixed number of registers (which is the case for hardware processors), the algorithm is changed in a few places.
  256 +
  257 +The implementation can be found in [/src/pirregalloc.c](https://github.com/parrot/pirc/blob/master/src/pirregalloc.c). Whether or not to use the register optimizer depends on how your program is used. If you have a large program that you will run many times, and memory usage is important, then you should activate it. If, on the other hand, runtime performance (compilation time included) is important, you should not activate it, as it takes additional time to perform the register optimization. In order to activate the register optimizer, use the `-r` command line option when running PIRC.
  258 +
  259 +For each symbol (PIR register or declared symbol), a [live_interval](https://github.com/parrot/pirc/blob/master/src/pirregalloc.h#L29) struct instance is allocated. Most important are the `startpoint` and `endpoint` fields, which keep track of the start and end point respectively of the live interval of the variable. Consider the following example:
  260 +
  261 + .sub main
  262 + 0 $I10 = 1
  263 + 1 $I11 = 2
  264 + 2 print $I0
  265 + 3 print $I1
  266 + .end
  267 +
  268 +In this code snippet, the numbers in front of the statements indicate the sequence of instructions. As you can see, `$I0` lives from 0 to 2, whereas `$I1` lives from 1 to 3. Since these live intervals are overlapping, this means that these variables cannot share a register. On the other hand, consider the following example:
  269 +
  270 + .sub main
  271 + 0 $I0 = 1
  272 + 1 print $I0
  273 + 2 $I1 = 2
  274 + 3 print $I1
  275 + .end
  276 +
  277 +In this case, `$I0` lives from 0 to 1, whereas `$1` lives from 2 to 3. Since they do not overlap, these variables can share a register. This can be calculated by the algorithm described in the above mentioned paper. These details will not be discussed here; instead the reader is referred to the paper.
  278 +
  279 +Now you know the basic working and purpose of the register optimizer, let's look at the implementation. Following the design principle of PIRC to be as modular as possible, the register optimizer's state is stored in a struct. A new [`lsr_allocator`](https://github.com/parrot/pirc/blob/master/src/pirregalloc.h#L66) object (lsr stands for Linear Scan Register) can be created in the function [`new_linear_scan_register_allocator`](https://github.com/parrot/pirc/blob/master/src/pirregalloc.h#L85). This constructor takes a pointer to the PIRC compiler struct instance. Yes, this does mean it is somewhat dependent on this other object, but it made the implementation somewhat easier. The struct keeps a list of all "active" live intervals (one for each variable that's alive).
  280 +
  281 +Bytecode Generation
  282 +-------------------
  283 +
  284 +Running code at compile time: the :immediate flag
  285 +-------------------------------------------------

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.