Skip to content


Subversion checkout URL

You can clone with
Download ZIP
tree: e4b6eff95e
Fetching contributors…

Cannot retrieve contributors at this time

1389 lines (887 sloc) 28.688 kB


pirgrammar.pod - The Grammar of languages/PIR


This document provides a more readable grammar of languages/PIR. The actual specification for PIR is a bit more complex. This grammar for humans does not contain error handling and other issues unimportant for this PIR reference.


For a bugs and issues, see the section KNOWN ISSUES AND BUGS.

The grammar includes some constructs that are in the IMCC parser, but are not implemented.

Please note that languages/PIR is not the official definition of the PIR language. The reference implementation of PIR is IMCC, located in parrot/compilers/imcc. However, languages/PIR tries to be as close to IMCC as possible. IMCC's grammar could use some cleaning up; languages/PIR might be a basis to start with a clean reimplementation of PIR in C (using Lex/Yacc).




PIR Directives

PIR has a number of directives. All directives start with a dot. Macro identifiers (when using a macro, on expansion) also start with a dot (see below). Therefore, it is important not to use any of the PIR directives as a macro identifier. The PIR directives are:

  .arg            .invocant          .begin_call
  .const          .lex               .call
                  .line              .end_return
  .end            .loadlib           .end_yield
  .endnamespace   .local             .end
                  .meth_call         .pragma
  .get_results    .namespace         .return
  .globalconst    .nci_call          .result
  .HLL_map        .param             .sub
  .HLL            .begin_return      .yield
  .include        .begin_yield


PIR has two types of registers: real registers and symbolic or temporary (or virtual if you like) registers. Real registers are actual registers in the Parrot VM. The symbolic, or temporary registers are mapped to those actual registers. Real registers are written like:

  [S|N|I|P]n, where n is a positive integer.

whereas symbolic registers have a $ prefix, like this: $P10.

Symbolic registers can be thought of local variable identifiers that don't need a declaration. This prevents you from writing .local directives if you're in a hurry. Of course, it would make the code more self-documenting if .locals would be used.


An integer constant is a string of one or more digits. Examples: 0, 42.

A floatin-point constant is a string of one or more digits, followed by a dot and one or more digits. Examples: 1.1, 42.567

A string constant is a single or double quoted series of characters. Examples: 'hello world', "Parrot".

TODO: PMC constants.


An identifier starts with a character from [_a-zA-Z], followed by zero or more characters from [_a-zA-Z0-9].

Examples: x, x1, _foo


A label is an identifier with a colon attached to it.

Examples: LABEL:

Macro identifiers

A macro identifier is an identifier prefixed with an dot. A macro identifier is used when expanding the macro (on usage), not in the macro definition.

Examples: .myMacro


Compilation Units

A PIR program consists of one or more compilation units. A compilation unit is a global, sub, constant or macro definition, or a pragma or emit block. PIR is a line oriented language, which means that each statement ends in a newline (indicated as "nl"). Moreover, compilation units are always separated by a newline. Each of the different compilation units are discussed in this document.

    compilation_unit [ nl compilation_unit ]*

    | sub_def
    | const_def
    | expansion
    | pragma

Subroutine definitions

    ".sub" sub_id sub_modifiers* nl body

    identifier | string_constant

    | ":init"
    | ":immediate"
    | ":postcomp"
    | ":main"
    | ":anon"
    | ":lex"
    | vtable_modifier
    | multi_modifier
    | outer_modifier

    ":vtable" parenthesized_string?

    "(" string_constant ")"

    ":multi" "(" multi_types? ")"

    ":outer" "(" sub_id ")"

    multi_type [ "," multi_type ]*

    | "_"
    | keylist
    | identifier
    | string_constant


    ".param"  [ [ string_constant "=>" ] type identifier ] [ get_flags | ":unique_reg" ]* nl

    [ ":slurpy"
    | ":optional"
    | ":opt_flag"
    | named_flag

    ":named" parenthesized_string?

Examples subroutine

The simplest example for a subroutine definition looks like:

  .sub foo
  # PIR instructions go here

The body of the subroutine can contain PIR instructions. The subroutine can be given one or more flags, indicating the sub should behave in a special way. Below is a list of these flags and their meaning. The flag :unique_reg is discussed in the section defining local declarations.

  •  :load

    Run this subroutine during the load_library opcode. :load is ignored, if another subroutine in that file is marked with :main. If multiple subs have the :load modifier, the subs are run in source code order.

  •  :init

    Run the subroutine when the program is run directly (that is, not loaded as a module). This is different from :load, which runs a subroutine when a library is being loaded. To get both behaviours, use :init :load.

  •  :postcomp

    Same as :immediate, except that the sub is not executed when compilation was triggered by a load_bytecode instruction (in a different file).

  •  :immediate

    This subroutine is executed immediately after being compiled. (Analagous to BEGIN in perl5.)

  •  :main

    Indicates that the sub being defined is the entry point of the program. It can be compared to the main function in C.

  •  :method

    Indicates the sub being defined is an instance method. The method belongs to the class whose namespace is currently active. (so, to define a method for a class 'Foo', the 'Foo' namespace should be currently active). In the method body, the object PMC can be referred to with self.

  •  :vtable or vtable('x')

    Indicates the sub being defined replaces a vtable entry. This flag can only be used when defining a method.

  •  :multi(type [, type]*)

    Engage in multiple dispatch with the listed types.

  •  :outer('bar')

    Indicates the sub being defined is lexically nested within the subroutine 'bar'.

  •  :anon

    Do not install this subroutine in the namespace. Allows the subroutine name to be reused.

  •  :lex

    Indicates the sub being defined needs to store lexical variables. This flag is not necessary if any lexical declarations are done (see below), the PIR compiler will figure this out by itself. The :lex attribute is necessary to tell Parrot the subroutine will store or find lexicals.

The sub flags are listed after the sub name. The subroutine name can also be a string instead of a bareword, as is shown in this example:

  .sub 'foo' :load :init :anon
  # PIR body

Parameter definitions have the following syntax:

  .sub main
    .param int argc :optional
    .param int has_argc :optional
    .param num nParam
    .param pmc argv :slurpy
    .param string sParam :named('foo')
    # body

As shown, parameter definitions may take flags as well. These flags are listed here:

  •  :slurpy

    The parameter should be of type pmc and acts like a container that slurps up all remaining arguments. Details can be found in PDD03 - Parrot Calling Conventions.

  •  :named('x')

    The parameter is known in the called sub by name 'x'. The :named flag can also be used without an identifier, in combination with the :flat or :slurpy flag, i.e. on a container holding several values:

      .param pmc args :slurpy :named


      .arg args :flat :named
  •  :optional

    Indicates the parameter being defined is optional.

  •  :opt_flag

    This flag can be given to a parameter defined after an optional parameter. During runtime, the parameter is automatically given a value, and is not passed by the caller. The value of this parameter indicates whether the previous (optional) parameter was present.

The correct order of the parameters depends on the flag they have.

PIR instructions

    label? instr nl

    label? pasm_instr nl

    pir_instr | pasm_instr

NOTE: the rule 'pasm_instr' is not included in this reference grammar. pasm_instr defines the syntax for pure PASM instructions.

    | lexical_decl
    | const_def
    | globalconst_def
    | conditional_stat
    | assignment_stat
    | return_stat
    | sub_invocation
    | macro_invocation
    | jump_stat
    | source_info

Local declarations

    ".local" type local_id_list

    local_id [ "," local_id ]*

    identifier ":unique_reg"?

Examples local declarations

Local temporary variables can be declared by the directives .local.

  .local int i
  .local num a, b, c

The optional :unique_reg modifier will force the register allocator to associate the identifier with a unique register for the duration of the compilation unit.

  .local int j :unique_reg

Lexical declarations

    ".lex" string_constant "," target

Example lexical declarations

The declaration

  .lex 'i', $P0

indicates that the value in $P0 is stored as a lexical variable, named by 'i'. Once the above lexical declaration is written, and given the following statement:

  $P1 = new 'Integer'

then the following two statements have an identical effect:

  •   $P0 = $P1
  •   store_lex "i", $P1

Likewise, these two statements also have an identical effect:

  •   $P1 = $P0
  •   $P1 = find_lex "i"

Instead of a register, one can also specify a local variable, like so:

  .local pmc p
  .lex 'i', p

The same is true when a parameter should be stored as a lexical:

  .param pmc p
  .lex 'i', p

So, now it is also clear why .lex 'i', p is not a declaration of p: it needs a separate declaration, because it may either be a .local or a .param. The .lex directive merely is a shortcut for saving and retrieving lexical variables.

Constant definitions

    ".const" type identifier "=" constant_expr

Example constant definitions

  .const int answer = 42

defines an integer constant by name 'answer', giving it a value of 42. Note that the constant type and the value type should match, i.e. you cannot assign a floating point number to an integer constant. The PIR parser will check for this.

Global constant definitions

    ".globalconst" type identifier "=" constant_expr

Example global constant definitions

This directive is similar to const_def, except that once a global constant has been defined, it is accessible from all subroutines.

  .sub main :main
    .global const int answer = 42

  .sub foo
    print answer # prints 42

Conditional statements

      [ "if" | "unless" ]
    [ [ "null" target "goto" identifier ]
    | [ simple_expr [ relational_op simple_expr ]? ]
    ] "goto" identifier

Examples conditional statements

The syntax for if and unless statements is the same, except for the keyword itself. Therefore the examples will use either.

  if null $P0 goto L1

Checks whether $P0 is null, if it is, flow of control jumps to label L1

  unless $P0 goto L2
  unless x   goto L2
  unless 1.1 goto L2

Unless $P0, x or 1.1 are 'true', flow of control jumps to L2. When the argument is a PMC (like the first example), true-ness depends on the PMC itself. For instance, in some languages, the number 0 is defined as 'true', in others it is considered 'false' (like C).

  if x < y goto L1
  if y != z  goto L1

are examples that check for the logical expression after if. Any of the relational operators may be used here.

Branching statements

    "goto" identifier

Examples branching statements

  goto MyLabel

The program will continue running at label 'MyLabel:'.


      "==" | "!=" | "<=" | "<" | <"=" | <""

      "+"  | "-"   | "/"  | "**"
    | "*"  | "%"   | "<<" | <">>"
    | <">" | "&&"  | "||" | "~~"
    | "|"  | "&"   | "~"  | "."

      "+=" | "-=" | "/=" | "%="  | "*="  | ".="
    | "&=" | "|=" | "~=" | "<<=" | <">=" | <">>="

      "!" | "-" | "~"


    | simple_expr binary_op simple_expr
    | unary_op simple_expr

    | int_constant
    | string_constant
    | target

Example expressions

  42 + x
  1.1 / 0.1
  "hello" . "world"
  str1 . str2

Arithmetic operators are only allowed on floating-point numbers and integer values (or variables of that type). Likewise, string concatenation (".") is only allowed on strings. These checks are not done by the PIR parser.


      target "=" short_sub_call
    | target "=" target keylist
    | target "=" expression
    | target "=" "new" string_constant
    | target "=" "new" keylist
    | target "=" "find_type" [ string_constant | string_reg | id ]
    | target "=" heredoc
    | target assign_op simple_expr
    | target keylist "=" simple_expr
    | result_var_list "=" short_sub_call

NOTE: the definition of assignment statements is not complete yet. As languages/PIR evolves, this will be completed.

    "[" keys "]"

    key [ sep key ]*

    "," | ";"


    "(" result_vars? ")"

    result_var [ "," result_var ]*

    target get_flags?

Examples assignment statements

  $I1 = 1 + 2
  $I1 += 1
  $P0 = foo()
  $I0 = $P0[1]
  $I0 = $P0[12.34]
  $I0 = $P0["Hello"]
  $P0 = new 42 # but this is really not very clear, better use identifiers

  $S0 = <<'HELLO'

  .local int a, b, c
  (a, b, c) = foo()


NOTE: the heredoc rules are not complete or tested. Some work is required here.

    "<<" string_constant nl

    ^^ identifier

    [ \N | \n ]*

Example Heredoc

  .local string str
  str = <<'ENDOFSTRING'
    this text
         is stored in the
      named 'str'. Whitespace and newlines
    are                  stored as well.

Note that the Heredoc identifier should be at the beginning of the line, no whitespace in front of it is allowed. Printing str would print:

    this text
         is stored in the
      named 'str'. Whitespace and newlines
    are                  stored as well.

In IMCC, a heredoc identifier can be specified as an argument, like this:

    foo(42, "hello", <<'EOS')

    This is a heredoc text argument.


In IMCC, only one such argument can be specified. The languages/PIR implementation aims to allow for any number of heredoc arguments, like this:

    foo(<<'STR1', <<'STR2')

    argument 1
    argument 2

Currently, this is not working.

Invoking subroutines and methods

    long_sub_call | short_sub_call

    ".begin_call" nl
    [ method_call | non_method_call] nl
    [ local_decl nl ]*

    [ ".call" | ".nci_call" ] target

    ".invocant" target nl
    ".meth_call" [ target | string_constant ]

    "(" args ")"

    arg [ "," arg ]

    [ float_constant
    | int_constant
    | string_constant [ "=>" target ]?
    | target

    [ ".arg" simple_expr set_flags? nl ]*

    [ ".result" target get_flags? nl ]*

    [ ":flat"
    | named_flag

Example long subroutine call

The long subroutine call syntax is very suitable to be generated by a language compiler targeting Parrot. Its syntax is rather verbose, but easy to read. The minimal invocation looks like this:

  .call $P0

Invoking instance methods is a simple variation:

  .invocant $P0
  .meth_call $P1

Passing arguments and retrieving return values is done like this:

  .arg 42
  .call $P0
  .local int res
  .result res

Arguments can take flags as well. The following argument flags are defined:

  •  :flat

    Flatten the (aggregate) argument. This argument can only be of type pmc.

  •  :named('x')

    Pass the denoted argument into the named parameter that is denoted by 'x', like so:

     .param int myX :named('x')   # the type 'int' is just an example

    As was mentioned at the parameter declaration section, the :named section can be used on an aggregate value in combination with the :flat flag.

     .arg pmc myArgs :flat :named
  .local pmc arr
  arr = new .Array
  arr = 2
  arr[0] = 42
  arr[1] = 43
  .arg arr :flat
  .arg $I0 :named('intArg')
  .call foo

The Native Calling Interface (NCI) allows for calling C routines, in order to talk to the world outside of Parrot. Its syntax is a slight variation; it uses .nci_call instead of .call.

  .nci_call $P0

Short subroutine invocation

    invocant? [ target | string_constant ] parenthesized_args


Example short subroutine call

The short subroutine call syntax is useful when manually writing PIR code. Its simplest form is:


Or a method call:

  obj.'toString'() # call the method 'toString'
  obj.x() # call the method whose name is stored in 'x'.

Note that no spaces are allowed between the invocant and the dot; "obj . 'toString'" is not valid, this will be interpreted as a concatenation.

And of course, using the short version, passing arguments can be done as well, including all flags that were defined for the long version. The same example from the 'long subroutine invocation' is now shown in its short version:

  .local pmc arr
  arr = new .Array
  arr = 2
  arr[0] = 42
  arr[1] = 43
  foo(arr :flat, $I0 :named('intArg'))

In order to do a Native Call Interface invocation, the subroutine to be invoked needs to be in referenced from a PMC register, as its name is not visible from Parrot. A NCI call looks like this:

  .local pmc nci_sub, nci_lib
  .local string c_function, signature

  nci_lib = loadlib "myLib"

  # name of the C function to be called
  c_function = "sayHello"

  # set signature to "void" (no arguments)
  signature  = "v"

  # get a PMC representing the C function
  nci_sub = dlfunc nci_lib, c_function, signature

  # and invoke

Return values from subroutines

    | short_return_stat
    | long_yield_stat
    | short_yield_stat
    | tail_call

    ".begin_return" nl

    ".return" simple_expr set_flags? nl

Example long return statement

Returning values from a subroutine is in fact similar to passing arguments to a subroutine. Therefore, the same flags can be used:

  .return 42 :named('answer')
  .return $P0 :flat

In this example, the value 42 is passed into the return value that takes the named return value known by 'answer'. The aggregate value in $P0 is flattened, and each of its values is passed as a return value.

Short return statement

    ".return" parenthesized_args

Example short return statement

  .return(myVar, "hello", 2.76, 3.14);

Just as the return values in the long return statement could take flags, the short return statement may as well:

  .return(42 :named('answer'), $P0 :flat)

Long yield statements

    ".begin_yield" nl

Example long yield statement

A yield statement works the same as a normal return value, except that the point where the subroutine was left is stored somewhere, so that the subroutine can be resumed from that point as soon as the subroutine is invoked again. Returning values is identical to normal return statements.

  .sub foo
    .return 42

    # and later in the sub, one could return another value:

    .return 43

  # when invoking twice:
  foo() # returns 42
  foo() # returns 43

Short yield statements

    ".yield" parenthesized_args

Example short yield statement

Again, the short version is identical to the short version of the return statement as well.

  .yield("hello", 42)

Tail calls

    ".return" short_sub_call

Example tail call

  .return foo()

Returns the return values from foo. This is implemented by a tail call, which is more efficient than:

  .local pmc results = foo()

The call to foo can be considered a normal function call with respect to parameters: it can take the exact same format using argument flags. The tail call can also be a method call, like so:

  .return obj.'foo'()


    | include
    | pasm_constant

    ".include" string_constant

    ".macro_const" identifier [ constant_value | register ]


    ".macro" identifier macro_parameters? nl

    "(" id_list? ")"

    ".endm" nl

    macro_id parenthesized_args?

Note that before a macro body will be parsed, some grammar rules will be changed. In a macro body, local variable declarations are done using the .macro_local directive. TODO: decide on keyword for this.

The .label directive is available for declaring unique labels.

    ".macrolabel" "$"identifier":"

Example Macros

When the following macro is defined:

  .macro add2(n)
    inc .n
    inc .n

then one can write in a subroutine:

  .sub foo
    .local int myNum
    myNum = 42
    print myNum  # prints 44

PIR Pragmas

    | loadlib
    | namespace
    | hll_mapping
    | hll_specifier
    | source_info

    ".pragma" "n_operators" int_constant

    ".loadlib" string_constant

    ".namespace" [ "[" namespace_id "]" ]?

    ".HLL" string_constant "," string_constant

    ".HLL_map" string_constant "," string_constant

    string_constant [ ";" string_constant ]*

    ".line" int_constant [ "," string_constant ]?

    identifier [ "," identifier ]*

Examples pragmas

  .include "myLib.pir"

includes the source from the file "myLib.pir" at the point of this directive.

  .pragma n_operators 1

makes Parrot automatically create new PMCs when using arithmetic operators, like:

  $P1 = new 'Integer'
  $P2 = new 'Integer'
  $P1 = 42
  $P2 = 43
  $P0 = $P1 * $P2
  # now, $P0 is automatically assigned a newly created PMC.

  .line 100
  .line 100, "myfile.pir"

NOTE: currently, the line directive is implemented in IMCC as #line. See the PROPOSALS document for more information on this.

  .namespace ['Foo'] # namespace Foo

  .namespace ['Object';'Foo'] # nested namespace

  .namespace # no [ id ] means the root namespace is activated

The first line opens the namespace 'Foo'. When doing Object Oriented programming, this would indicate that sub or method definitions belong to the class 'Foo'. Of course, you can also define namespaces without doing OO-programming.

Please note that this .namespace directive is different from the .namespace directive that is used within subroutines.

  .HLL "Lua", "lua_group"

is an example of specifying the High Level Language (HLL) for which the PIR is being generated. It is a shortcut for setting the namespace to 'Lua', and for loading the PMCs in the lua_group library.

  .HLL_map "Integer", "LuaNumber"

is a way of telling Parrot, that whenever an Integer is created somewhere in the system (C code), instead a LuaNumber object is created.

  .loadlib "myLib"

is a shortcut for telling Parrot that the library "myLib" should be loaded when running the program. In fact, it is a shortcut for:

  .sub _load :load :anon
    loadlib "myLib"

TODO: check flags and syntax for this.

Tokens, types and targets

    [ encoding_specifier? charset_specifier ]?  quoted_string


    | "binary:"
    | "unicode:"
    | "iso-8859-1:"

    | "num"
    | "pmc"
    | "string"

    identifier | register

Notes on Tokens, types and targets

A string constant can be written like:

  "Hello world"

but if desirable, the character set can be specified:

  unicode:"Hello world"

When using the "unicode" character set, one can also specify an encoding specifier; currently only utf8 is allowed:

  utf8:unicode:"hello world"

IMCC currently allows identifiers to be used as types. During the parse, the identifier is checked whether it is a defined class. The built-in types int, num, pmc and string are always available.

A target is something that can be assigned to, it is an L-value (but of course may be read just like an R-value). It is either an identifier or a register.


Klaas-Jan Stol []


Some work should be done on:

  • Heredoc parsing
  • Test. A lot.

    Bugs or improvements may be sent to the author, and are of course greatly appreciated. Moreover, if you find any missing constructs that are in IMCC, indications of these would be appreciated as well.

    Please see the PROPOSALS document for some proposals of the author to clean up the official grammar of PIR (as defined by the IMCC compiler).


  • languages/PIR/lib/ - The actual PIR grammar implementation
  • PDD03 - Parrot Calling Conventions
  • PDD20 - Lexically scoped variables in Parrot
  • docs/pdds/draft/pdd19_pir.pod



  • Remove .namespace for scopes
  • Some clean-ups


  • Remove .pcc_ prefix on PCC directives
  • Remove .emit and .eom directives.


  • Many clean ups; remove experimental :wrap flag, remove .global directive, remove .sym directive, add .label directive for macros, remove .sub; remove some comments that are not true any more. In all, it's getting much cleaner!


  • Added expansion rule, moved include and macro_def rules to that rule. Added pasm_constant definition.
  • Removed newlines in operator definition to save some lines for readability.


  • Updated short sub invocation for NCI invocations.
  • Added an example for .globalconst.
  • Added some remarks at section for Macros.
  • Added some remarks here and there, and fixed some style issues.


  • Removed .immediate, it is :immediate, and thus not a PIR directive, but a flag. This was a mistake.
  • Added .globalconst
  • Added macro parsing example (it is now fixed in languages/PIR).
  • Added reference to official doc for IMCC syntax.
  • Added :unique_reg to allowed flags for incoming parameters.


  • Switch to x.y.z version number; many fixes will follow.
  • Added more examples.
  • Fixed some errors.


  • Initial version having a version number.
Jump to Line
Something went wrong with that request. Please try again.