Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose control flow as an API #145

Open
masak opened this issue Jun 12, 2016 · 13 comments
Open

Expose control flow as an API #145

masak opened this issue Jun 12, 2016 · 13 comments

Comments

@masak
Copy link
Owner

masak commented Jun 12, 2016

I would like to be able to provide next, last, and redo in loops as statement macros. But I would also like the underlying mechanism that they lean upon to be exposed as a clean API.

What would the API look like? Something like this:

  • for and while loops would register their loop Q::Block in some way with the tag loop.
  • In a statement macro, control flow would somehow — handwave — be exposed to the user. Let's say the syntax is ctrlflow.goto(location).
  • The implementation of next would then be if loopcondition { ctrlflow.goto(findtag("loop").beginning); } else { ctrlflow.goto(findtag("loop").end); }
  • The implementation of last would be ctrlflow.goto(findtag("loop").end);
  • The implementation of redo would be ctrlflow.goto(findtag("loop").beginning);

I think we don't need to make this mechanism dynamic (like in Perl 6) — in fact, it's easier to statically check if we don't. Which also means that .map won't be considered a loop in the above sense. On the other hand, if someone wanted to write a custom loop, they'd simply provide it with the tag loop, and it'd play along.

@masak
Copy link
Owner Author

masak commented Jun 12, 2016

  • The implementation of next would then be if loopcondition { ctrlflow.goto(findtag("loop").beginning); } else { ctrlflow.goto(findtag("loop").end); }

I'm subtly and dangerously wrong above.

A next is not a simple goto. Both for and while loop blocks can accept parameters, and these need to be bound to their new values. The simplest way to do that, of course, is to cleanly exit the current iteration, and to call the block again with the new parameters.

It may seem a bit silly to provide .next as a method on ctrlflow, but that might be the easiest way to write the implementation of next in a sensible way:

ctrlflow.next();

Doing it this way would require slightly more than a loop tag to be attached to loop blocks; they'd also need a "next handler".

We could rewrite the runtime code so that it calls into the same handler during normal loop control flow. In the fullness of time, we might even be able to provide the loops themselves as statement macros, using the same API from with 007.

@masak
Copy link
Owner Author

masak commented Jun 12, 2016

Note that, even if we make the feature totally static and lexical, we will still have interaction between next et al. and the call stack:

for [1, 2, 3] -> e {
    sub foo() {
        if e == 2 {
            next;    # needs to unwind the call stack to the loop block
        }
    }
    foo();
    say(e);    # 1\n3\n
}

And, since the call stack is dynamic and subs are first-class, this could also happen:

my fn;
for [1, 2, 3] -> e {
    fn = sub () {
        next;    # lexically inside the loop block, but not dynamically
    };
}
fn();        # runtime error: attempt to `next` a completed loop

@masak
Copy link
Owner Author

masak commented Jul 13, 2016

Coming back to this issue and re-reading it, I say we might consider going the Python route and forbidding next et al. if there's a layer of sub between the loop and the next.

Yes, that's a sad cop-out, I know. But (a) since Python does this, it's within 007's rights to choose that route too, and (b) having the conservative version of the feature early is more useful than not having the ambitious version of the feature for a long time, and (c) seeing the feature implemented in a more static way might make us realize things we otherwise would not.

@masak
Copy link
Owner Author

masak commented Sep 19, 2016

This issue is bounded above in an interesting way by #166 and similar. When we compile to a different backend, we can no longer do very exotic things with the control flow. And since we want to have other backends, 007 also needs to be non-exotic enough so that code can be generated that corresponds to everything the control flow API can do.

@masak
Copy link
Owner Author

masak commented Jan 29, 2017

When we compile to a different backend, we can no longer do very exotic things with the control flow.

But (as I come back to this later and re-read it) this should be perfectly fine considering what we want to do. It simply means that all the control flow we want to be able to generate needs to be translatable down into conditionals and loops, possibly on gensym'd boolean variables. My intuition tells me this is true of next, last, and redo. It would also be true of something like #13.

Note, however, that it's not obviously true of the next-and-call-stack problems outlined above. In fact, the dictum that things be directly translatable down to local conditionals and loops may well replace "Python doesn't do it" as the prime reason for 007 to make control flow entirely static.

@masak
Copy link
Owner Author

masak commented Aug 2, 2017

Been thinking lately about how return interacts with catch and #156, and how that needs to be exposed in some sensible way. Essentially, return doesn't really mean "return immediately" in the cases where some surrounding cleanup mechanism has been installed; rather, it means "hand things over to the nearest cleanup handler".

I have a feeling this could be expressed/designed very neatly in terms of basic blocks. Just need to find the time to sit down and do so.

@masak
Copy link
Owner Author

masak commented Sep 15, 2017

Quite a bit of overlap here with #252 which I just filed.

@masak
Copy link
Owner Author

masak commented Feb 26, 2018

When I re-read this issue, I feel what's missing from the OP is that control flow is an aspect of code generation. (That's also why I just closed #252.)

Getting down to brass tacks instead of trying to explain in an abstract manner, here's what I imagine a code generation API might look like:

if statements

cg.jump_unless(condition_expr, "else")
cg.gen(then_body);
cg.label("else");
if else_q ~~ Q::Statement::If {
    # recursively generate an if statement here
}
else if else_q ~~ Q::Block {
    cg.gen(else_q);
}

while loops

cg.label("next");
cg.jump_unless(condition_expr, "last");
cg.label("redo");
cg.register_loop(WHILE_LOOP, "next", "redo", "last", func() {
    cg.gen(loop_body);
});
cg.jump("next");
cg.label("last");

try statements

cg.reroute_all_exits("finally", sub() {
    cg.catch_exceptions("catch", func() {    # *
        cg.gen(try_body);
    });
});
cg.gen(catch_blocks);
cg.label("finally");
cg.gen(finally_body);

Marked with an asterisk because the design is not all there. For one thing, the finally block will need to be informed of whether a return happened inside of the try block, so that it can resume that return (or override it) as appropriate.

with statements (as in #156)

my closable_ref = cg.gen(with_expression);
cg.reroute_all_exits("close", func() {
    cg.gen(with_body);
});
cg.label("close");
cg.gen(closing_code(closable_ref));

amb macro (as in #13)

(Would just re-write to a for loop, which would just be sugar for a while loop. An assert would code-gen to a conditional next with the registered for loop. The first parameter of register_loop indicates what type of loop it is. We could introduce a unique symbol AMB_LOOP, and have the assert code generation target it.)

Scattered notes about the code generation

  • The labels have human-readable names when we declare them, but on the generated bytecode end they will of course only be memory addresses.
  • Furthermore, the label names are local, in the sense that even when, say, many loops are nested in each other, all of them can locally call their next label "next" and there'll be no collision.
  • The cg object makes sure that for each local bit of code, all labels are the target of at least one jump, and all jumps have a declared label. It's not exactly Structured Programming, but it's, um, type-safe assembly. No sense in not making life comfortable for ourselves.
  • The above is mainly meant for the innards of a 007 compiler. But as an added bonus, macro authors could use cg; inside their module, and just make cg calls like we've done above. The compiler could intercept them and code-generate accordingly.

@masak
Copy link
Owner Author

masak commented Jul 13, 2018

Note that even the conservatism suggested in #145 (comment) is not going to suffice. Why? Because blocks with parameters code-generate as separate "units" of callable code in the general case, because that's the only way closures inside of them can work.

I didn't realize this until I started thinking about bytecode and trying to transcribe our example programs into bytecode.

I'm still mulling on it all, but I think this pushes us towards treating next et al. less like a goto and more like a control exception.

We're still free to lock it down into being syntactically lexical, but (again, from what I can see so far) there would be no simplification in semantics in the general case from doing so.

@masak
Copy link
Owner Author

masak commented Oct 11, 2018

Just driving by to point out that Symbol (#312) would be a really good fit for constructing labels at the bytecode level.

@masak
Copy link
Owner Author

masak commented Mar 31, 2019

  • The above is mainly meant for the innards of a 007 compiler. But as an added bonus, macro authors could use cg; inside their module, and just make cg calls like we've done above. The compiler could intercept them and code-generate accordingly.

I was reminded of this phrase ("The compiler could intercept them") as I read Dan Abramov's React as a UI Runtime, especially the section about Inversion of Control. To wit,

  • The direct way to render components would be for the user to call their functions directly.
  • Instead, the call happens as JSX, which desugars to a React.createElement call, which is controlled by React so that React can defer and otherwise strategically call the component when appropriate.
  • This extra IoC layer in-between is also what allows functional components to be stateful; each new function call can refer back into the IoC layer to persist state. (Cf. hooks.)
  • Conversely, the IoC layer is always involved in calling the component functions, and so it has knowledge of the component hierarchy, not just its output.

It seems to me we want the same sort of layer for cg calls; that is, just like React tooling enjoys knowing about the component hierarchy, 007 tooling enjoys knowing about the implicit "flow chart" generated by different Qnodes. Just as the React renderer reserves the right to lazily evaluate components, so the 007 code generator sometimes chooses to code-generate some bits but not others.

@masak
Copy link
Owner Author

masak commented Jan 26, 2021

After several years of waiting for the other shoe to drop, I now think that the control API should be phrased in terms of continuations. Hoping to be able to flesh that out in a concrete way soon, but basically:

  • Something like an if statement receives its continuation Done, and then constructs a control flow Start -> Then -> Done and Start -> Else -> Done; the code in the Then and Else parts can be opaque to the if statement during code generation.
  • Similarly, a for loop or a while loop concerns itself with code-generating the loop-controlling parts, and has an opaque body that it can code-generate for. The twist, though, is that it sends its End continuation down into a context surrounding the body so that a last; statement inside the body can invoke that continuation. Similarly for next and redo.

Again, this is but a rough sketch of the idea. Continuations will generally be compiled away into weaker constructs when at all possible. Either we rule out the remaining cases (which was basically the idea with forbidding loop jumps through function boundaries, Python-style), or we accept that continuations do exist sometimes, and basically base our runtime architecture on them. I'm not in a rush to do the latter, but I also think it wouldn't be so bad. Basically Alma could run on a CESK machine.

@masak
Copy link
Owner Author

masak commented Oct 18, 2022

Maybe I'm not really contradicting 2021-me when I say that it feels like we're essentially talking about algebraic effects here. Maybe the connection is this: if you have continuations and a CESK machine, then algebraic effects can be phrased in terms of having a bit more access to the machine itself, especially the K part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant