compiler.pod - overview of Niecza compiler pipeline
The Perl 6 script src/niecza is the command line wrapper that controls the startup of Niecza.
After reading command line options it constructs a compiler object,
tweaks its properties and calls one of its compile methods -
The compiler object is defined in src/NieczaCompiler.pm6.
Each compile method translates its arguments into a common format and then delegates to
This sets up some important contextuals like
$*backend and delegates to
parse in src/NieczaFrontendSTD.pm6.
These two methods may be worth combining.
Between them these two methods perform necessary setup and teardown and start the compilation process.
The compilation process proper is best thought of as a succession of stages, although for BEGIN reasons the stages are actually interleaved - a sub must be completely ready to run soon after the closing brace because a BEGIN might need to call it.
The binary representation of the program being passed between the components is an abstract syntax tree (AST) of some type.
Four types of AST are used in Niecza.
"Op" tree nodes are objects in subclasses of the base class
Op from src/Op.pm6.
"CgOp" tree nodes are actually
Array instances for some combination of flexibility and historical reasons; they are constructed by the methods in class
CgOp from src/CgOp.pm6.
CgOp tree nodes can be flattened into a JSON-like form,
which is used to move them from Perl 6 space to C# space,
and also as part of the bootstrap procedure (see below).
In C# space there are two more AST types we are concerned with,
both defined in lib/CodeGen.cs.
CpsOp nodes exist in direct correspondence with sections of the
CgOp tree and can do any Perl 6 task.
ClrOp nodes are created as the output of the ANF-converter,
and are restricted to only the kinds of control flow that the CLR natively supports.
ClrOp data is handed off to Mono and is no longer of our concern.
It should be noted that all of these data types are used only for executable statements and expressions. Structural information from the source code is directly passed to the backend where it is used to create the ClassHOW, SubInfo, etc objects that will be used at runtime.
The first stage is the parser,
which accepts source code (read in by src/NieczaPathSearch.pm6) and converts it into a tree of
Match objects while calling action methods.
The parser exists mostly in src/STD.pm6 with some Niecza extensions coded in src/NieczaGrammar.pm6 and src/NieczaFrontendSTD.pm6.
The grammar itself is a branch of Larry Wall's standard Perl 6 grammar,
which continues to evolve at https://github.com/perl6/std,
but Niecza tries to track relevant changes.
It is worth noting that there is a significant degree of feedback into the parser,
especially for disambiguating types and function calls.
The actions module in src/NieczaActions.pm6 accepts
Match objects and method calls from the grammar; it uses a collection of other modules other modules (Op,
Operator) to create the Op AST from Perl 6 source code.
The parser triggers each action when it matches the corresponding token in the grammar.
The actions system directly calls into the backend to create metaobjects for non-code grammatical constructs,
and can even run code for BEGIN and constants.
While constructing subs,
NieczaActions uses two external tree walkers that are perhaps best regarded as stages in their own right.
These perform specific optimizations; it should be noted that they are applied to one sub at a time,
again for BEGIN reasons.
The two external walkers are in src/NieczaPassSimplifier.pm6 and src/OptRxSimple.pm6.
They use a combination of top-down and bottom-up analysis to convert certain expressions or regex subterms,
into simpler forms.
NieczaPassSimplifier handles inlining of some simple functions that just wrap a single runtime operator,
After simplification the
Op tree must be converted into a
This is done recursively by the
cgop methods on
You should implement
code in your subclasses but call
cgop should not be overridden because it is responsible for adding line number annotations,
and possibly more stuff later.
(Think of it as the
augment/inner emulation pattern.)
Once the code is converted to
CgOp it is passed to the backend via the
finish method on static sub objects.
The code is then marshalled over to the C# side of the fence and saved.
The final code generation step is postponed to after UNITCHECK time in order that as much information as possible be available for optimizations. However, the code generator is integrated with the runloop, so if a function needs to be invoked early it can be code-generated early. Due to limitations of the CLR (essentially, a class must be closed before it can be used), functions which are used early will need to be code-generated a total of twice if they are to be saved - one copy going into the saved class, which cannot be used yet, and one copy to be used immediately, which cannot be very usefully saved.
The code generation process is controlled by
NamProcessor.Scan in lib/CodeGen.cs,
which walks over the C# version of the
CgOp tree bottom-up mapping it into
CpsOp tree is converted into a
ClrOp tree by the smart constructors in the
CpsOp tree only exists in a notional sense,
as the data flow graph of the constructor calls.) What the smart constructors do is to rearrange the code so that it can be used in a context with language-defined control flow,
such as resumable exceptions and gather/take.
The process is often inaccurately referred to as "Continuation Passing Style" (CPS) conversion; a better term would most likely be "Applicative Normal Form" (ANF),
since the functions are not actually being split into separate continuation blocks.
At last the
ClrOp data is made executable by the
CodeGen methods on the various
which produce MSIL in the form of calls to methods on a
Actually this is a two-step process.
Language-defined control flow requires the use of a master switch statement at the beginning of each CLR-level function.
ListCases methods on the op nodes are called first to calculate the correct indexes into the switch.
The IL generated is then dealt with appropriately by the underlying runtime.
If we are precompiling a module,
it will be saved (by the call to
AssemblyBuilder.Save in lib/Kernel.cs) into a
it will be converted into native code by the JIT (involving several more intermediate stages,
Mono method IR,
possibly even LLVM IR).
The corresponding metaobjects and constants are then saved alongside the module into a
There is an important subtlety with regards to the Perl 6 / C# transition.
The compiler is,
compiled using (an earlier version of) Niecza,
and is running on top of a C# kernel with a copy of
it cannot be used.
In order to continue evolving Niecza we need the flexibility to make incompatible changes to the runtime library!
So the compiler must access a kernel from the current Niecza,
simultaneously with running on the old Niecza.
This is accomplished by a renaming trick.
All files associated with the current Niecza are prefixed with
The CLR will of course happily load two files,
Kernel.dll and one named
and allow them to be independent; although just renaming the file isn't enough,
it has to be compiled twice to get the "assembly name" correct.
(Previous versions of Niecza used a more general feature called application domains instead.
This was changed because it was too slow.)
So the compiler,
can compile user code using
Run.Kernel.dll while referencing
Works fine but there is one remaining catch.
When the compiler is to compile a new version of itself,
it needs to generate an assembly linked against a new version of
which it cannot load.
The workaround used here is to allow the compiler to only partially compile itself,
.ser files; then the newly-compiled
Kernel.dll can finish the job,
Niecza.ser linked against itself,
without loading the old compiler or old Kernel.dll.
It turned out to be more convenient to merge all modules into a single output file at the same time.
It would also be worth pointing out src/CompilerBlob.cs, which is a C# module that extends the compiling compiler's kernel, but can be updated along with the current compiler. I try not to think about it too hard, but it's very useful.