New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation update on lexical lookup via quasis/injectiles, etc #410

Closed
masak opened this Issue Oct 22, 2018 · 0 comments

Comments

Projects
None yet
1 participant
@masak
Owner

masak commented Oct 22, 2018

This is probably something I should publish as a blog post on strangelyconsistent. For now, there's time only to write it up here as a self-closing issue.

Who knows, maybe there's some interested reader out there for whom this is useful. The main use is for me to info-dump what I feel is a first complete, workable solution to hygiene in 007 and Perl 6. (As a consequence, I won't pull any punches. If the below sounds like complete gobbledygook, assume that's me, not you. But maybe see the examples at the end.)

I should also link to this gist because it contains the germs of the thinking that's outlined here. Think of the gist as the exploration part and this issue as its final distillation.

Unique variables

In order to explain the solution, let's introduce a non-established term: unique variables.

In general, each variable in the source code is declared somewhere, and is then used a number of times:

my foo = ...;     # declaration
# ...
...foo...;        # usage (read)
# ...
foo = ...;        # usage (write)

Without any further information, we don't actually know whether the variable foo corresponds to one single allocated location in memory, or several. Here are a few reasons it might be several:

  • The surrounding block might be a loop. Each iteration of the loop, in the general case, needs fresh locations for all its variable declarations.
  • The surrounding block might be a routine. Each time the function is called, it needs fresh locations for all its variable and parameter declarations.
  • As a special case of the previous point, pointy blocks also have parameters.

But there are also a few well-known cases where a variable declaration corresponds exactly to one single location — let's call such a variable (both its declaration and its usages) a unique variable. Here are some examples of variables that turn out to be unique:

  • The built-ins (kind of implicit, but it fits the definition otherwise).
  • The mainline code of the main program.
  • The mainline code of any loaded module (because modules are singletons, this holds true even if a module is imported multiple times in the code base; its mainline is guaranteed to only run once).
  • Any code nested within such a mainline but without a pointy-block parameter — kind of "honorary mainline variables" — but here we're sort of hitting diminishing returns...
  • (Most important for this issue and the implementation it proposes) A macro as seen from inside its quasi blocks.

In all of these cases, we can start to think about optimizations where we do the lookup at compile time, and replace the runtime lookup with just the unique location. (If we can infer that all we're ever doing is reading form that location, we can further optimize reading from the location down to just its resulting constant value.)

Why are macro variables unique?

The last point in the preceding list is the odd one out, so it bears pausing for a bit and motivating that one. Macros are routines just like functions, so on the face of it, they should sort into the top list, not the bottom one.

While it is true that a macro can be called multiple times, and that each such call will generate a fresh location for each of its variables, code generated in the macro will only ever see a fixed variable. Whenever the macro gets called again, it's another quasi being called; one does not simply declare the same variable in the same macro call twice. It might be related to the fact that the macro runs at compile time, so what's compile time for the injectile is actually runtime for the macro, even though that's... the same time.

macro moo() {
    my x = ...;
    quasi {
        ...x...;        # for *this* quasi, `x` refers to a *unique* location
    }
}

I can't stress enough how this "happy coincidence" feels both necessary and sufficient in some way. In the sense that it's been really tricky to see how to make macro hygiene realistic... but the fact that quasis in macros only ever see unique variables in their surrounding macros means that, by a cosmic coincidence, all the runtime lookups that would have been "detached" (in the terminology of the gist) can instead be optimized away.

I dunno, YMMV, but to me it seems like quite a wonderful generalization. Who knew global variables and macro "closure" variables had this trait in common?

Implementation details

It gets cuter. Let's assume that it's possible both to uniquify a variable (to turn all its runtime lookups into global location accesses), and to de-uniquify it (to turn all its location accesses back into runtime lookups). Enough information needs to be stored in the location object itself to be able to restore it that way.

Two points in time are of interest during macro expansion:

  1. When we reach the line with the quasi (during the macro call) the quasi is interpolated, that is, for the first time it gets rendered as concrete code (a Qtree).
    • All the variables that are both defined and used inside the quasi block are left alone. (These are subject to normal lexical variable semantics inside the eventual injectile.)
    • All the variables that are defined outside the quasi block (maybe in the macro, maybe further out) are uniquified.
    • All the variables that are spliced into the quasi through unquotes are also left as they are. (If they come from macro arguments, they will find the corresponding declaration in the mainline. If they come from another Qtree, they will have already been uniqified.)
  2. At some point, the macro exits, we inject the quasi block into the mainline code (possibly after cloning it just to be safe), and
    • All the unique variables that are visible from that point in the mainline code are de-uniquified.

I think a good metaphor here is the freeze-drying of foodstuffs. We freeze-dry the food in order to transport it long distances or store it for a long time. When the food arrives at its destination, we can rehydrate it, returning it to its original fresh state.

In the case of the quasi, there are some variables that we know won't be lexically available from the mainline code. The two "lookup sites" involved form an inverted Y shape:

         builtins
         globals
            |
            |
            ^
           / \
quasi (1) /   \ (2) mainline

All the variables in the left branch of the Y shape will be unavailable from the point of view of (2), because they are no longer part of the sequence of OUTER blocks from (2). Preemptively, the quasi interpolation uniquifies all the variables used in the quasi but declared outside of it. (This includes the "stem" of the Y shape.) Later, the quasi injection de-uniquifies those variables declared in the stem of the Y.

Why? Because while uniquification is a "necessary evil" and something we need to do to get hygiene at all, non-unique variables are still preferable and more in line with the end user's intuition/expectations.

Some examples

my a = "mainline";
macro moo() {
    my a = "macro";
    return quasi { a };
}
say(moo());     # "macro" (because the macro's `a` has been uniquified)

my a = "mainline";
macro moo(x) {
    my a = "macro";
    return quasi { {{{x}}} };
}
a = a ~ " and mutable!";
say(moo(a));    # "mainline and mutable!" (because `a` from the mainline is being sent in
                # as a macro argument and spliced into the unquote; not uniquified)

BEGIN my a = "static mainline";
a = "mainline";
macro moo() {
    return quasi { a };
}
a = a ~ " and mutable!";
say(moo(a));    # "mainline and mutable!" (`a` is first uniqified and then de-uniquified;
                # if it weren't de-uniquified, the output would be "static mainline")

my a = "mainline";
macro moo(x) {
    return quasi { say(a, " and ", {{{x}}}) };
}
{
    my a = "block";
    moo(a);     # "mainline and block" (both `a` end up uniquified and then de-uniquified
                # but can correctly "see"/reach their definitions even though one shadows
                # the other)
}

my a = "mainline";
macro moo(x) {
    return quasi {
        my a = "quasi";
        say(a, " and ", {{{x}}});
    };
}
moo(a);         # "quasi and mainline" (the `a` being passed in has a lookup which
                # "leapfrogs" the lexical lookup of the `a` declared in the quasi)

my a = "mainline";
my moo;
{
    BEGIN my a = "static block";
    a = "block";
    macro car() {
        return quasi { a };
    }
    BEGIN moo = car;    # terribly realistic, this code
}
say(moo());     # "static block" (the uniqified variable is outside the macro, but still
                # not visible from the point of macro expansion; therefore not
                # de-uniquified)

# foo.007
BEGIN my a = "static module";
a = "module";
export macro moo() {
    return quasi { a };
}

# main.007
my a = "main program";
import { moo } from foo;
say(moo());     # "static module" (no lexical environment in common except for the
                # built-ins; the `a` is uniquified and not de-uniquified)

Hesitant addendum: with those last two examples, we might be able to get our cake and have it, too, getting "block" and "module" as outputs, respectively. A strong-enough "fixup mechanism" ought to be able to make those variables refer to their runtime locations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment