-
Notifications
You must be signed in to change notification settings - Fork 89
Macros tutorial
Nemerle type-safe macros
Basically every macro is a function, which takes fragments of code as parameters and returns some other code. On the highest level of abstraction it doesn't matter if the parameters are function calls, type definitions or just a sequence of assignments. The most important fact is that they are not common objects (e.g. instances of some types, like integer numbers), but that they are internal representations in the compiler (i.e. syntax trees).
A macro is defined in the program just like any other function, using common Nemerle syntax. The only difference is the structure of the data it operates on and the way in which it is used (executed at compile-time).
A macro, once created, can be used to process some parts of the code. It's done by calling it with blocks of code as parameters. This operation is in most cases indistinguishable from a common function call (like f(1)), so a programmer using a macro would not be confused by unknown syntax. The main concept of our design is to make the usage of macros as transparent as possible. From the user point of view, it is not important if particular parameters are passed to a macro, (which would process them at the compile-time and insert some new code in their place), or to an ordinary function.
Writing a macro is as simple as writing a common function. It looks
the same, except that it is preceded by a keyword macro
and it lives at the top level (not inside any class).
This will make the compiler know about how to use the defined method
(i.e. run it at the compile-time in every place where it is used).
Macros can take zero (if we just want to generate new code) or more parameters. They are all elements of the language grammar, so their type is limited to the set of defined syntax objects. The same holds for a return value of a macro.
Example:
macro generate_expression ()
{
MyModule.compute_some_expression ();
}
This example macro does not take any parameters and is used in the
code by simply writing generate_expression ();
.
The most important is the difference between generate_expression
and compute_some_expression
- the first one is a function
executed by the compiler during compilation, while the latter is just
some common function that must return syntax tree of expressions
(which is here returned and inserted into program code by
generate_expression
).
In order to create and use a macro you have to write a
library, which will contain its executable form. You simply
create a new file mymacro.n
, which can contain for
example
macro m () {
Nemerle.IO.printf ("compile-time\n");
<[ Nemerle.IO.printf ("run-time\n") ]>;
}
and compile it with command
ncc -r Nemerle.Compiler.dll -t:dll mymacro.n -o mymacro.dll
Now you can use m()
in any program, like here
module M {
public Main () : void {
m ();
}
}
You must add a reference to mymacro.dll
during
compilation of this program. It might look like
ncc -r mymacro.dll myprog.n -o myprog.exe
Write a macro, which, when used, should slow down the compilation by 5 seconds
(use System.Timers
namespace) and print the version of the operating
system used to compile program (use System.Environment
namespace).
Definition of function compute_some_expression
might look
like:
using Nemerle.Compiler.Parsetree;
module MyModule
{
public mutable debug_on : bool;
public compute_some_expression () : PExpr
{
if (debug_on)
<[ System.Console.WriteLine ("Hello, I'm debug message") ]>
else
<[ () ]>
}
}
The examples above show a macro, which conditionally inlines expression
printing a message. It's not quite useful yet, but it has introduced the
meaning of compile-time computations and also some new syntax used only
in writing macros and functions operating on syntax trees.
We have written here the <[ ... ]>
constructor to
build a syntax tree of expression (e.g. '()
').
<[ ... ]>
is used in both the construction and
decomposition of syntax trees. Those operations are similar to
quotation of code. Simply, everything which is written inside
<[ ... ]>
, corresponds to its own syntax tree.
It can be any valid Nemerle code, so a programmer does not have to
learn the internal representation of syntax trees in the compiler.
macro print_date (at_compile_time)
{
match (at_compile_time) {
| <[ true ]> => MyModule.print_compilation_time ()
| _ => <[ WriteLine (DateTime.Now.ToString ()) ]>
}
}
The quotation alone allows using of only constant expressions, which
is insufficient for most tasks. For example, to write the function
print_compilation_time
, we must be able to create an expression
based on a value known at compile-time. In the next sections we introduce
the rest of the macros' syntax to operate on general syntax trees.
When we want to decompose some large code (or more precisely, its syntax tree), we must bind its smaller parts to variables. Then we can process them recursively or just use them in an arbitrary way to construct the result.
We can operate on entire subexpressions by writing
$( ... )
or $ID
inside the quotation operator
<[ ... ]>
. This means binding the value of
ID
or the interior of parenthesized expression to the part of
syntax tree described by corresponding quotation.
macro for (init, cond, change, body)
{
<[
$init;
def loop () : void {
if ($cond) { $body; $change; loop() }
else ()
};
loop ()
]>
}
The above macro defines function for
, which is
similar to the loop known from C. It can be used like this
for (mutable i = 0, i < 10, i++, printf ("%d", i))
Later we show how to extend the language syntax to make the syntax
of for
exactly as in C.
Sometimes quoted expressions have literals inside of them (like strings, integers, etc.) and we want to operate on their value, not on their syntax trees. It is possible, because they are constant expressions and their runtime value is known at the compile-time.
Let's consider the previously used function print_compilation_time
.
using System;
using Nemerle.Compiler.Parsetree;
module MyModule {
public print_compilation_time () : PExpr
{
<[ System.Console.WriteLine ($(DateTime.Now.ToString () : string)) ]>
}
}
Here we see some new extension of splicing syntax where we
create a syntax tree of string literal from a known value.
It is done by adding : string
inside the
$(...)
construct. One can think about it as of
enforcing the type of spliced expression to a literal (similar
to common Nemerle type enforcement), but in the matter
of fact something more is happening here - a real value
is lifted to its representation as syntax tree of a literal.
Other types of literals (int
, bool
, float
,
char
) are treated the same.
This notation can be used also in pattern matching. We can
match constant values in expressions this way.
There is also a similar scheme for splicing and matching
variables of a given name. $(v : name)
denotes a
variable, whose name is contained by object v
(of special type Name
). There are some good
reasons for encapsulating a real identifier
within this object.
You might have noticed, that Nemerle has a few grammar elements,
which are composed of a list of subexpressions. For example, a sequence
of expressions enclosed with {
.. }
braces may
contain zero or more elements.
When splicing values of some expressions, we would like to decompose
or compose such constructs in a general way - i.e. obtain all expressions
in a given sequence. It is natural to think about them as if a list of
expressions and to bind this list to some variable in meta-language.
It is done with special syntax ..
:
mutable exps = [ <[ printf ("%d ", x) ]>, <[ printf ("%d ", y) ]> ];
exps = <[ def x = 1 ]> :: <[ def y = 2 ]> :: exps;
<[ {.. $exps } ]>
We have used { .. $exps }
here to create the sequence of
expressions from list exps : list[PExpr];
.
A similar syntax is used to splice the content of tuples (( .. $elist )
)
and other constructs, like array []
:
using Nemerle.Collections;
macro castedarray (e) {
match (e) {
| <[ array [.. $elements ] ]> =>
def casted = List.Map (elements, fun (x) { <[ ($x : object) ]> });
<[ array [.. $casted] ]>
| _ => e
}
}
If the exact number of expressions in tuple/sequence is known during writing the quotation, then it can be expressed with
<[ $e_1; $e_2; $e_3; x = 2; f () ]>
The ..
syntax is used when there are e_i : Expr
for
1 <= i <= n
.
Write a macro rotate
, which takes two parameters: a pair of
floating point numbers (describing a point in 2D space) and an angle (in
radians). The macro should return a new pair -- a point rotated by the given
angle. The macro should use as much information as is available at the
compile-time, e.g. if all numbers supplied are constant, then only the final
result should be inlined, otherwise the result must be computed at runtime.
After we have written the for
macro, we would like the compiler
to understand some changes to its syntax. Especially the C-like notation
for (mutable i = 0; i < n; --i) {
sum += i;
Nemerle.IO.printf ("%d\n", sum);
}
In order to achieve that, we have to define which tokens and grammar
elements may form a call of for
macro. We do that by changing
its header to
macro for (init, cond, change, body)
syntax ("for", "(", init, ";", cond, ";", change, ")", body)
The syntax
keyword is used here to define a list of elements forming
the syntax of the macro call. The first token must always be an unique identifier
(from now on it is treated as a special keyword triggering parsing of
defined sequence). It is followed by tokens composed of operators or
identifiers passed as string literals or names of parameters of macro.
Each parameter must occur exactly once.
Parsing of syntax rule is straightforward - tokens from input
program must match those from definition, parameters are parsed
according to their type. Default type of a parameter is
Expr
, which is just an ordinary expression (consult Nemerle
grammar in Reference). All allowed parameter types
will be described in the extended version of reference manual corresponding
to macros.
Add a new syntactic construct forpermutation
to your program.
It should be defined as the macro
macro forp (i, n : int, m : int, body)
and introduce syntax, which allows writing the following program
mutable i = 0;
forpermutation (i in 3 to 10) Nemerle.IO.printf ("%d\n", i)
It should create a random permutation p
of numbers
x_j, m <= x_j <= n
at the compile-time.
Then generate the code executing body of the loop
n - m + 1
times, preceding each of them with assignment of
permutation element to i
.
Nemerle macros are simply plugins to the compiler. We decided not to restrict them only to operations on expressions, but allow them to transform almost any part of program.
Macros can be used within custom attributes written near methods, type declarations, method parameters, fields, etc. They are executed with those entities passed as their parameters.
As an example, let us take a look at Serializable
macro.
Its usage looks like this:
[Serializable]
class S {
public this (v : int, m : S) { a = v; my = m; }
my : S;
a : int;
}
From now on, S
has additional method Serialize
and it implements interface ISerializable
. We can use
it in our code like this
def s = S (4, S (5, null));
s.Serialize ();
And the output is
<a>4</a>
<my>
<a>5</a>
<my>
<null/>
</my>
</my>
The macro modifies type S at compile-time and adds some code to it. Also inheritance relation of given class is changed, by making it implement interface ISerializable
public interface ISerializable {
Serialize () : void;
}
In general, macros placed in attributes can do many transformations
and analysis of program objects passed to them. To see
Serializable
macro's internals and discuss some design
issues, let's go into its code.
[Nemerle.MacroUsage (Nemerle.MacroPhase.BeforeInheritance, Nemerle.MacroTargets.Class,
Inherited = true)]
macro Serializable (t : TypeBuilder)
{
t.AddImplementedInterface (<[ ISerializable ]>)
}
First we have to add interface, which given type is about to
implement. But more important thing is the phase modifier
BeforeInheritance
in macro's custom attribute. In general,
we separate three stages of execution for attribute macros.
BeforeInheritance
specifies that the macro will be able to change
subtyping information of the class it operates on.
So, we have added interface to our type, we now have to create Serialize () method.
[Nemerle.MacroUsage (Nemerle.MacroPhase.WithTypedMembers, Nemerle.MacroTargets.Class,
Inherited = true)]
macro Serializable (t : TypeBuilder)
{
/// here we list its fields and choose only those, which are not derived
/// or static
def fields = t.GetFields (BindingFlags.Instance | BindingFlags.Public %|
BindingFlags.NonPublic | BindingFlags.DeclaredOnly);
/// now create list of expressions which will print object's data
mutable serializers = [];
/// traverse through fields, taking their type constructors
foreach (x : IField in fields) {
def tc = x.GetMemType ().TypeInfo;
def nm = Macros.UseSiteSymbol (x.Name);
if (tc != null)
if (tc.IsValueType)
/// we can safely print value types as strings
serializers = <[
printf ("<%s>", $(x.Name : string));
System.Console.Write ($(nm : name));
printf ("</%s>\n", $(x.Name : string));
]>
:: serializers
else
/// we can try to check, if type of given field also implements ISerializable
if (x.GetMemType ().Require (<[ ttype: ISerializable ]>))
serializers = <[
printf ("<%s>\n", $(x.Name : string));
if ($(nm : name) != null)
$(nm : name).Serialize ()
else
printf ("<null/>\n");
printf ("</%s>\n", $(x.Name : string));
]>
:: serializers
else
/// and finally, we encounter case when there is no easy way to serialize
/// given field
Message.FatalError ("field `" + x.Name + "' cannot be serialized")
else
Message.FatalError ("field `" + x.Name + "' cannot be serialized")
};
// after analyzing fields, we create method in our type, to execute created
// expressions
t.Define (<[ decl: public Serialize () : void
implements ISerializable.Serialize {
.. $serializers
}
]>);
}
Analysing object-oriented hierarchy and class members is a separate pass of the compilation. First it creates inheritance relation between classes, so we know exactly all base types of given type. After that every member inside of them (methods, fields, etc.) is being analysed and added to the hierarchy and its type annotations are resolved. After that also the rules regarding implemented interface methods are checked.
For the needs of macros we have decided to distinguish three moments in this pass at which they can operate on elements of class hierarchy. Every macro can be annotated with a stage, at which it should be executed.
- BeforeInheritance stage is performed after parsing whole program and scanning declared types, but before building subtyping relation between them. It gives macro a freedom to change inheritance hierarchy and operate on parse-tree of classes and members
- BeforeTypedMembers is when inheritance of types is already set. Macros can still operate on bare parse-trees, but utilize information about subtyping.
- WithTypedMembers stage is after headers of methods, fields are already analysed and in bound state. Macros can easily traverse entire class space by reflecting type constructors of fields, method parameters, etc. Original parse-trees are no longer available and signatures of class members cannot be changed.
Every executed attribute macro operates on some element of class hierarchy, so it must be supplied with an additional parameter describing the object, on which macro was placed. This way it can easily query for properties of that element and use compiler's API to reflect or change the context in which it was defined.
For example a method macro declaration would be
[Nemerle.MacroUsage (Nemerle.MacroPhase.WithTypedMembers,
Nemerle.MacroTargets.Method)]
macro MethodMacro (t : TypeBuilder, f : MethodBuilder, expr)
{
// use 't' and 'f' to query or change class-level elements
// of program
}
Macro is annotated with additional attributes specifying respectively the stage in which macro will be executed and the macro target.
The available parameters contain references to class hierarchy elements that given macro operates on. They are automatically supplied by compiler and they vary on the target and stage of given macro. Here is a little table specifying valid parameters for each stage and target of attribute macro.
MacroTarget | MacroPhase.BeforeInheritance | MacroPhase.BeforeTypedMembers | MacroPhase.WithTypedMembers |
---|---|---|---|
Class | TypeBuilder | TypeBuilder | TypeBuilder |
Method | TypeBuilder, ParsedMethod | TypeBuilder, ParsedMethod | TypeBuilder, MethodBuilder |
Field | TypeBuilder, ParsedField | TypeBuilder, ParsedField | TypeBuilder, FieldBuilder |
Property | TypeBuilder, ParsedProperty | TypeBuilder, ParsedProperty | TypeBuilder, PropertyBuilder |
Event | TypeBuilder, ParsedEvent | TypeBuilder, ParsedEvent | TypeBuilder, EventBuilder |
Parameter | TypeBuilder, ParsedMethod, ParsedParameter | TypeBuilder, ParsedMethod, ParsedParameter | TypeBuilder, MethodBuilder, ParameterBuilder |
Assembly | (none) | (none) | (none) |
The intuition is that every macro has parameter holding its target and additionally objects containing it (like TypeBuilder is available in most of the attribute macros).
After those implicitly available parameters there come standard parameters explicitly supplied by user. They are the same as for expression level macros.
Identifiers in quoted code (object code) must be treated in a special way, because we usually do not know in which scope they would appear. Especially they should not mix with variables with the same names from the macro-use site.
Consider the following macro defining a local function f
macro identity (e) { <[ def f (x) { x }; f($e) ]> }
Calling it with identity (f(1))
might generate
confusing code like
def f (x) { x }; f (f (1))
To preserve names capture, all macro generated variables should be renamed to their unique counterparts, like in
def f_42 (x_43) { x_43 }; f_42 (f (1))
The idea of separating variables introduced by a macro from those defined in the plain code (or other macros) is called `hygiene' after Lisp and Scheme languages. In Nemerle we define it as putting identifiers created during a single macro execution into a unique namespace. Variables from different namespaces cannot bind to each other.
In other words, a macro cannot create identifiers capturing any external variables or visible outside of its own generated code. This means, that there is no need to care about locally used names.
The Hygiene is obtained by encapsulating identifiers in special
Name
class. The compiler uses it to distinguish names
from different macro executions and scopes (for details of
implementation consult paper about macros).
Variables with appropriate information are created
automatically by quotation.
def definition = <[ def y = 4 ]>;
<[ def x = 5; $definition; x + y ]>
When a macro creates the above code, identifiers y
and
x
are tagged with the same unique mark. Now they
cannot be captured by any external variables (with a
different mark). We operate on the Name
class, when the
quoted code is composed or decomposed and we use
<[ $(x : name) ]>
construct. Here x
is bound to an object of type Name
, which we can use
in other place to create exactly the same identifier.
An identifier can be also created by calling method
Macros.NewSymbol()
, which returns Name
with an unique identifier, tagged with a current mark.
def x = Macros.NewSymbol ();
<[ def $(x : name) = 5; $(x : name) + 4 ]>
Sometimes it is useful to generate identifiers, which bind to variables visible in place where a macro is used. For example one of macro's parameters is a string with some identifiers inside. If we want to use these as real identifiers, then we need to break automatic hygiene. It is especially useful in embedding domain-specific languages, which reference symbols from the original program.
As an example consider a Nemerle.IO.sprint (string literal)
macro (which have the syntax shortcut $"some text $id "
).
It searches given string literal for $var and creates a code concatenating text before and after $var to the value of var.ToString ()
.
def x = 3;
System.Console.WriteLine ($"My value of x is $x and I'm happy");
expands to
def x = 3;
System.Console.WriteLine ({
def sb = System.Text.StringBuilder ("My value of x is ");
sb.Append (x.ToString ());
sb.Append (" and I'm happy");
sb.ToString ()
});
Breaking of hygiene is necessary here, because we generate code (reference to x), which need to have the same context as variables from invocation place of macro.
To make given name bind to the symbols from macro usesite, we use Nemerle.Macros.UseSiteSymbol (name : string) : Name
function, or
special splicing target usesite in quotations. Their use would be like in this simplified implementation of macro
macro sprint (lit : string)
{
def (prefix, symbol, suffix) = Helper.ExtractDollars (lit);
def varname = Nemerle.Compiler.Macros.UseSiteSymbol (symbol);
<[
def sb = System.Text.StringBuilder ($(prefix : string));
sb.Append ($(varname : name).ToString ());
// or alternatively $(symbol : usesite)
sb.Append ($(suffix : string));
sb.ToString ()
]>
}
Note that this operations is 'safe', that is it changes context of variable to the place where macro invocation was created (see paper for more details).
Sometimes it is useful to completely break hygiene, where programmer only want to experiment with new ideas. From our experience, it is often hard to reason about correct contexts for variables, especially when writing class level macros. In this case it is useful to be able to easily break hygiene.
Nemerle provides it with <[$("id"]> construct. It makes produced variable break hygiene rules and always bind to the nearest definition with the same name.
If youu need match cases of 'match' operator, you can use 'case' specifier:
match (caseExpr)
{
| <[ case: $(x : name) => $exp ]> with exc = <[ System.Exception ]>
| <[ case: $(x : name) is $exc => $exp ]> =>
PT.TryCase.Catch(c.Location, PT.Splicable.Name(x), exc, exp)
| <[ case: $(PT.PExpr.Wildcard as wc) => $exp ]> with exc = <[ System.Exception ]>
...
match (caseExpr)
{
| <[ case: | ..$guards => $expr ]> =>
match ()
{
| <[ $_ when $_ ]> :: _ => // process guarded pattern
| _ :: _ => // process other pattern