Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relationship of P6 and/or 007 macros to FEXPRs #302

Closed
raiph opened this issue Jun 2, 2018 · 117 comments
Closed

Relationship of P6 and/or 007 macros to FEXPRs #302

raiph opened this issue Jun 2, 2018 · 117 comments

Comments

@raiph
Copy link

raiph commented Jun 2, 2018

I'm curious to read an explanation of fundamental technical differences or similarities between Perl 6 macros (and, separately if appropriate, 007 macros) with Lisp FEXPRs. I thought ven especially might be able to shed light on the comparison.

(If this post, and ones like it, ought not be an issue in this repo, but rather, say, an SO or reddit post, and would still get your attention in this other location, please just close it with a comment explaining your preferences about where to post it and I'll cut/paste to that other location. I'm especially thinking it might be appropriate to start an SO tag for 007, perhaps initially justified as a tag to be used in conjunction with [perl6] but which would then immediately be available for use on its own anyway.)


The "Early Lisp macros" section of the wikipedia Macros page starts (apparently incorrectly) with:

Before Lisp had macros, it had so-called FEXPRs, function-like operators whose inputs were not the values computed by the arguments but rather the syntactic forms of the arguments

I say "(apparently incorrectly)" because the footnote at the end of the paragraph that begins as above links to an email that says:

The arguments to a FEXPR were definitely not the syntactic forms, but were instead the literal list structure of the source code as returned by read.

This suggests FEXPRs were, perhaps, very loosely akin to source filters in P5.

But, putting that aside, even if the wikipedia text is fundamentally wrong, and/or even if P6 and/or 007 macros are nothing whatsoever like FEXPRs, the first sentence in full sounds exactly like P6 macros to me, if one substitutes "abstract syntax tree forms of the arguments" for "syntactic forms of the arguments", and "compilation" for "computation":

Before Lisp had macros, it had so-called FEXPRs, function-like operators whose inputs were not the values computed by the arguments but rather the syntactic forms of the arguments, and whose output were values to be used in the computation.

Would you agree? In other words, is this accurate:

P6 / 007 macros are function-like operators whose inputs are not the values computed by the arguments but rather the abstract syntactic tree forms of the arguments, and whose outputs are values to be spliced into the compilation output.

The paragraph continues:

In other words, FEXPRs were implemented at the same level as EVAL, and provided a window into the meta-evaluation layer.

Does this map well to how P6 macros (and/or 007 macros) work?

The paragraph ends with:

This was generally found to be a difficult model to reason about effectively.

This difficulty, which clearly did indeed historically occur, leading to lisp macros, was perhaps due to the actual nature of FEXPRs (which I'll speculate made FEXPR macros akin to source filters) rather than the above apparently incorrect description of FEXPRs, which, as I have said, was rejected in the email that the wikipedia page footnotes and which sounds to me reminiscent of Perl 6 macros.

To summarize, I'm interested to see what comes out of y'all reflecting on the nature of P6 macros, and/or 007 macros, as they might relate to FEXPRs, especially as described in the quoted email. If they're fundamentally different, what is the nature of that difference? If there are similarities, why do you think P6/007 macros will avoid the reasoning difficulty?

@raiph raiph changed the title Relationship of 007 macros to FEXPRs Relationship of P6 and/or 007 macros to FEXPRs Jun 2, 2018
@masak
Copy link
Owner

masak commented Jun 3, 2018

I don't have the expertise to comment on FEXPRs, unfortunately. Especially since there seems to be some genuine debate about their true nature.

What I can say though is that from my vantage point there is one big difference between anything Lisp did in the early days and what we're doing in 007 today (and ultimately Perl 6): whether you're in a quasiquote or not, a variable that's used has to be declared. (And since this is a static relationship, the program won't compile if it doesn't hold.) This is so fundamental to the language that it ends up also permeating how macros work, leading eventually to macro hygiene.

If I understand correctly, Scheme's view on macros is quite close to this. Scheme has a flavor of lexical hygiene. I haven't studied it in debt, though I'm planning to.

Yesternight I wrote this gist, which is something I've been planning to write for a while. I meant to write it long ago, but found it hard going until inspiration struck yesterday. It's really related to #212 and figuring that one out in order to make #207 work. But I think it's also relevant to this issue, or at least to this comment. 😄

@vendethiel
Copy link
Collaborator

vendethiel commented Jun 4, 2018

Note: I got a lot about FEXPRs from this dissertation

This suggests FEXPRs were, perhaps, very loosely akin to source filters in P5.

I've seen the term "runtime macros" used, and I think I like that term for FEXPRs. They are more similar to runtime macros, and I'll come back to that.
The difference the email makes between "syntactic forms" and "literal list structure" is only interesting for a meta-circular interpreter, from what I'm reading in the email. Some other, related email linked to a paper called "Du déchiffrage des formules logico-mathématiques par la machine même dans la conception du programme", which is also an interesting read (...if you speak french. The abstract is available in 3 or 4 languages. It contains the first mentions of a "library" as we understand it today in programming).

A macro is a "source->source" transform. That's where they're similar to source filters.
FEXPRs are "source->?": they are very much allowed to do whatever they want to do, at runtime. The email mentions that they were really confusing, and the only cases that were legible enough are possible via macros.

The arguments to a FEXPR were definitely not the syntactic forms, but were instead the literal list structure of the source code as returned by read.
The difference is subtle but critical. It hinges on the implementation details of LISP I.

The only meta-circular interpreter I have looked at deeply enough would be Rubinius, which... isn't homoiconic, so obviously has huge differences between "source code read" and "syntactic forms".

If I were to draw a comparison, FEXPRs are like if you kept the "unevaluated arguments" parts of macros, but instead of being replaced by a quasi at compile-time, they instead sit there until runtime, where they get evaluated (often containing eval).

# FEXPR style
fexpr $if(cond, t, f) {
  my @fn = f, t # reverse arg, 0 = false, 1 = true
  eval(@fn[!!eval cond]);
}
$if(1, print("true"), print("false"));

Here, no transformation is done at compile-time. It's the FEXPR's job to eval its argument. Compare to a macro:

macro if(cond, f, t) {
  quasi {
    my @fn = {; {{{f}}} }, {; {{{t}}} };
    @fn[!!{{{cond}}}]();
  };
}
if(1, print("true"), print("false"));

which is transformed, at compile-time, to:

[{; print("false") }, {; print("true") }][!!1]();

(as a side-note: 007 used to have eval, called melt, but it doesn't anymore)


About @masak 's gist:

The only viable solution I've come up with is this: the Qnode corresponding to the quasi block needs to carry information about its outer pointer. It then gets returned from the macro to the thing in the parser that handles injection into the mainline, and this thing can then do the outer pointer dislodging as it needs to.

What is the "outer pointer" here? The macro context itself?

@masak
Copy link
Owner

masak commented Jun 4, 2018

Here's the Calculatrices digitales du déchiffrage. It's from 1954, and describes the first metacircular compiler, a few years before Lisp.

@masak
Copy link
Owner

masak commented Jun 4, 2018

melt was not meant to be eval, it was meant to turn ASTs encoding "fixed" values into those values. See #61.

I think we might get something like melt back. We just need to be more careful about whitelisting really constant/literal things. (Update: see #432.)

@masak
Copy link
Owner

masak commented Jun 4, 2018

@vendethiel:

What is the "outer pointer" here? The macro context itself?

Yes. In the quasi in the format macro, for example, the replaceAll function is referenced. The quasi's outer pointer is (the frame of this invocation of) the macro itself.

But the quasi is not what eventually runs. The injected code in the mainline (the "injectile") is what runs. It looks like the quasi, except unquotes have been filled in with Qtree fragments. The injectile's outer pointer is still the macro, though, just like with the quasi. Since the injectile is somewhere else, its outer pointer is "displaced".

@masak
Copy link
Owner

masak commented Jun 4, 2018

...huh.

As I look at the format macro so soon after mentioning melt, I notice there is something quite dissatisfying about it. But it's also a bit unrelated to this issue, so I'm going to open up a new one about it. 😄

@vendethiel
Copy link
Collaborator

melt was not meant to be eval

I know it wasn’t, but it really is quite similar in how it works.

@masak
Copy link
Owner

masak commented Jun 18, 2018

@raiph I created a stack-overflow label like you suggested.

Close this one? Or are we still waiting for some resolution wrt FEXPRs?

@masak
Copy link
Owner

masak commented Jul 13, 2018

This thread seems to have stalled, and there is no Definition of Done. Closing.

@masak masak closed this as completed Jul 13, 2018
@masak
Copy link
Owner

masak commented Apr 16, 2019

@masak
Copy link
Owner

masak commented May 3, 2019

@raiph Commenting in a closed thread, but (as I'm learning more about fexprs) I just found this quote:

The idea of first-class operative combiners, i.e., first-class combiners whose operands are never evaluated, has been around a long time. Such creatures were supported by mainstream Lisps through the 1970s, under the traditional name fexprs, but they made a mess out of the language semantics because they were non-orthogonal to the ordinary variety of procedures constructed via lambda — and, more insidiously, because at that time the mainstream Lisps were dynamically scoped (a language feature that causes more problems for fexprs than it does for the less powerful macros).

(And so it's kinda implied that we can do better nowadays, with modern tools and lexical scope.)

I found Kernel because allegedly its specification is very good. I haven't read it yet.

@masak
Copy link
Owner

masak commented May 3, 2019

@vendethiel This is my current take on fexprs (still not having read the dissertation):

There are three "phases of matter", three forms of computation that interest us:

  1. Code-as-structure (AST)
  2. Code-as-callable-value (first-class functions/lambdas but also code blocks)
  3. Code that's already running

A compiler basically takes us 1->2, and a runtime does 2->3. What Common Lisp, Scheme and 007 is keep us in 1 for as long as we need. What fexprs do is allow us to take fragments of 1 all the way into 3 and only then "evaluate" them (1->2->3) at compile time.

(Oh! The dissertation is by the Kernel author. D'oh.)

@vendethiel
Copy link
Collaborator

only then "evaluate" them (1->2->3) at compile time.

Should it still be code "compile-time" then? :)
I think this brings us back to meta-circularity (at least for a bit). C++ (i.e.) has heavy limitations on what can be done at constexpr time, and the mechanisms to implement this in the compilers often mimics an actual interpreter...


I now understand what it means to have something post interesting links faster than I can read them; serves me right... I'll try to read what you've sent me.

@masak
Copy link
Owner

masak commented May 4, 2019

Hey, no need to feel "behind" on reading. 😉 Did you see the new papers.md? That's an (incomplete) list of things I haven't read but probably should. Remember, the end goal of all of this is to bring decent macros to Perl 6.

@raiph
Copy link
Author

raiph commented Apr 28, 2020

Hi @masak,

My summary of my latest (but very tentative) understanding of fexprs and their relationship to Raku is that the macro declarator declares a compile-time fexpr, not a lisp macro:

https://www.reddit.com/r/rakulang/comments/g7xluv/question_about_macros/fow63tp/

I've drawn this (again, very tentative) conclusion based on reading Shutt's comments on fexpr's, mostly at LtU.

I note his commenting on how macros are more about filling in a template (which reminds me of stuff you've said about 007/alma macros), plus some sweeping simplifications that are possible related to hygiene and quasi quoting (which may at least trigger insights for you). Perhaps you have also already come to the same conclusions without realizing it's because you're implementing fexprs, not macros (in lisp parlance), presuming I'm at least somewhat right about that.

@masak
Copy link
Owner

masak commented Apr 29, 2020

Hi @raiph,

I'm sorry, no, I don't believe Perl 6 macros (or Alma macros) are fexprs at all. At the risk of revealing my still-very-shaky understanding of fexprs — the big distinction is that in Lisp macros (and Perl 6 macros, and Alma macros), after the macro has expanded at compile time, it's "spent". Only its expansion remains in the target code. A fexpr, on the other hand, inserts itself as the thing to be called, and so itself remains in the target code as the "expansion". That's very different, and according to that difference, Perl 6 does not have fexprs.

I'm sorry to knee-jerk reply like this without properly reading your reddit post. But I wanted to communicate my understanding and point out a possible confusion, even though my understanding is partial and the confusion is possibly also mine.

I promise to go back and read the reddit post (and the source material) more carefully.

@raiph
Copy link
Author

raiph commented Apr 29, 2020

after the macro has expanded at compile time, it's "spent".

I meant the same when I wrote "compile-time fexprs".

Aiui, an alma macro returns an AST. It's simply a value, one that happens to be an AST tree, that's then just directly spliced into the overall AST. Perhaps I'm wrong about that; perhaps what's returned from an alma macro is an expression that's then immediately evaluated, but I would be fairly shocked to hear that.

According to John Shutt, who oughtta know how this stuff works given how central it is to his doctoral dissertation that:

when an ordinary function produces its output, that output becomes the result of the call to the function. The same for a fexpr. But the output of the macro is an expression that is then evaluated

when you're trying to make sure your code will do what you really want it to do, it's easier to do so by actually specifying what you want done (the fexpr strategy), than by specifying how to build an expression that will specify what you want done (the macro strategy).

@masak
Copy link
Owner

masak commented Apr 30, 2020

@raiph I still believe what you are saying does not match what we are seeing in Alma and Perl 6. I say this as the implementor of macro systems in both languages, although I also don't doubt the credentials of John Shutt.

Aiui, an alma macro returns an AST. It's simply a value, one that happens to be an AST tree, that's then just directly spliced into the overall AST.

You are correct, as far as that goes.

But — since the format of comments allows me to interrupt you to ask a question — are you aware that this is exactly what Common Lisp and Scheme also do? And many many other Lisp dialects. And other languages, too. Including Perl 6.

To spell it out, just to make super-sure that we are on the same page:

# Perl 6 or Alma
macro sayHi() {
    quasi {
        say("hi");
    };
}
;; Bel
(mac prhi ()
  `(pr "hi"))

These two examples are as identical as the different syntax allows them. The little backquote in the Bel code corresponds to the quasi keyword in Perl 6.

Both of these are macro mechanisms. Neither of these is a fexpr mechanism. Talking about "compile-time fexprs" further confuses the issue when "macro" already correctly describes the thing. It's a bit like saying: "no no, I don't mean food, I'm talking about more like a kind of solid drink".

it's easier to do so by actually specifying what you want done (the fexpr strategy), than by specifying how to build an expression that will specify what you want done (the macro strategy).

Instead of trying to pinpoint and explain exactly what Mr Shutt means when he's writing this, let me show you what a Perl 6 or Alma macro would look like if it were a fexpr:

# imagined Perl 6 or Alma with fexprs
macro sayHi() {
    say("hi");
}

As you can see, the big difference is the lack of quasi. The macro is kind of "auto-quasi'd", and the expression being returned from the fexpr is the code itself in it.

Perl 6 and Alma do not work this way. Kernel and picolisp do, as far as I understand.

@masak
Copy link
Owner

masak commented Apr 30, 2020

Oh!

As I started reading the reddit post, it pretty quickly became clear to me where this confusion comes from.

Quoting the relevant part:

  • Macros look similar but they take (a list of) unevaluated things ("operands") and return a replacement in the form of (a list of) unevaluated things that are then immediately evaluated as if they were at the original macro call site.

  • Fexprs take (a list of) unevaluated things ("operands") and return a replacement in the form of (a list of) unevaluated things that are left as is, i.e. not immediately evaluated.

You are right, the macros in Perl 6 and Alma are not immediately evaluated. But...

...this is an artifact of a deeper language difference, more like a philosophical difference between "compile-then-run languages", and Lisp. (I've been returning to this point in various ways over the years when talking with @vendethiel. Only recently, like last year or so, do I grok this difference.)

  • There is a group of languages that look like this: compile ABC — run ABC
  • And then there is a group of languages that look like this: compile A, run A — compile B, run B — compile C, run C

Perl 5, Perl 6, Alma, C, C++, Java, C# and many others belong to the first group. Various Lisps belong to the second group. Python has some aspects of the second group — in that both class and function definitions "happen at runtime", but in the end it also belongs to the first group — in that compilation completes for a whole unit before the first statement ever runs.

In Alma and Perl 6, things are not immediately evaluated out of macros, because Alma and Perl 6 are not languages of this second type.

In languages of the second type, the macro tends to be "expanded at runtime", which is OK because compile time and runtime are so close together. That's why these languages talk about the macro returning something which is then immediately evaluated. Alma and Perl 6 not being of this second type, their macros don't — can't — work like this. More generally, in any language of the first type, macros don't work like this.

As far as I can tell, what the Raku design is calling macros are actually a limited form of compile-time fexprs such that their result is spliced back into the AST. So they have neither the complexity of lisp macros (no tongs) nor the complexity/downsides of lisp fexprs (no static analysis / optimization hit).

But because fexpr is a horrible name, the declarator is macro.

I respectfully disagree with this conclusion. They are macros; nothing to do with fexprs.

@masak
Copy link
Owner

masak commented Jan 10, 2021

Something, I don't quite remember what, prompted me to start reading John Shutt's dissertation about fexprs and vau calculus. And... wow.

Don't get me wrong, it's not that I'm now suddenly a raving fexpr convert. I'm not. (For example, I don't want to uproot everything we've done with Alma and start over, just to do it "right" with fexprs. In fact, I think the end result would maybe look more like Kernel and less like macros for Raku, which wouldn't be in line with 007's or Alma's mission statement.)

No, the reason I say "wow" is that it's almost ridiculous how much falls, for lack of a better metaphor, into place when I read this. There's so much to process and appreciate from the dissertation, and I'm not even through the entire text yet, but I still want to try and summarize it.

This will be a multi-comment kind of thing, by necessity. At some point when reading through the dissertation I needed to take a detour and read the Kernel Report. I heartily recommend doing so; more below.

Ping @raiph @vendethiel @jnthn.

@masak
Copy link
Owner

masak commented Jan 10, 2021

In medias res

Instead of building up to an exact description of my current understanding of Kernel's central abstraction — the operative — let's just define one, and talk about what falls out:

($define! $and?
   ($vau x e
      ($cond ((null? x)        #t)
             ((null? (cdr x))  (eval (car x) e)) ; tail context
             ((eval (car x) e) (apply (wrap $and?) (cdr x) e))
             (#t               #f))))

In textual order:

  • Operatives (which we'll provisionally define as a generalization both of Lisp macros and of "special forms") are prefixed with a $ to signal that they are not a normal "applicative", i.e. a function that expects its arguments pre-evaluated. Yes, the words "operative" and "applicative" are particular to Kernel and Shutt's work; one does get used to them quickly, though.

  • As in Scheme, something with a side effect (usually modifying an environment) is suffixed !. Unlike Scheme, this also goes for the $define! operative.

  • Speaking of which, ? goes on anything which is going to return a boolean. So predicates like null? get the ? suffix, but also (unlike other Scheme-like languages I know) combiners like $and?, including (not shown in the snippet above) relational combiners like <?.

  • By the way, there's both an operative $and? and an applicative and? in Kernel. Only the operative is short-circuiting. This is very in line with what I feel is the point of the language — to give you precise tools. I was going to say that no other languages offer both like this — but Raku has && and ?&, and Eiffel has and then and and.

  • When you get to ($vau x e, you would rightly wonder where someone put all the arguments to the $and? call. Short answer: they're in x. Ok, so what's e then? For that matter, what's $vau? Well, $vau is a built-in operative for defining operatives in the language — similar to lambda in a Lisp. Before the body, a $vau definition expects a so-called parameter tree (x) and an environment (e).

  • The parameter tree is just a clean generalization of a parameter list. Bel does pretty much exactly the same, so I'm quite used to both the idea of it, and the syntax. Because nested lists in a parameter tree gives you individual parameters easily — so (a b c) represents a parameter list with exactly three elements, while in (a b . c), the third element is a "rest parameter" absorbing zero or more remaining arguments — but if you just write your parameter tree as x, that means you bind the entire tree to x. It doesn't even have to be a cons pair in the general case. (Although in the definition of $and?, it's expected to be null or a pair.)

  • The e parameter for the environment, and the explicit environment passing in general, are central here. That's probably a good place to start unpacking what (Kernel's) fexprs and $vau operatives really are. Note especially how we end up using the e parameter in the $and? macr... operative: it gets sent along to two eval calls, and it gets passed in a recursive apply call, more about which will be said below.

  • The $and? operative is defined in terms of the $cond operative, which is not built-in but defined in terms of $if (which is). Stating the obvious, $cond is an operative because it also has short-circuiting semantics (similar to $if and $and?).

  • The four cases of the $cond combiner are: empty list (return true); singleton list (evaluate and return element); (evaluate) first argument true (recurse down); otherwise (first argument false, so return false).

  • Specifying which environment to evaluate things in makes a lot of sense. One thing stayed with me from the Kernel specification: just like booleans are conjugate with conditional jumps, and natural numbers are conjugate with bounded iteration — environments are conjugate with evaluation (!). (A footnote in the Kernel specification says that you get to pick "evaluation" or "variable lookup", but they are equi-powerful since having one lets you implement the other.)

  • Because the environment the $vau gets through e is the dynamic environment from the point in the program when the $and? is evaluated, and this e gets threaded down into the evals, whatever operands are in that $and? combination get treated hygienically. Hygiene by explicit environment passing.

  • As to (apply (wrap $and?) (cdr x) e), the main reason we're recursing via apply is that we want to pass the whole of (cdr x) as a single parameter tree, which needs an apply instead of a normal call.

  • So why the wrap? Well, you can't call apply on operatives, only on applicatives. ("Applicative", apply, get it?) The wrap built-in wraps our $and? operative into an applicative so that we can call apply on it. (But even wrapped, it doesn't lose its fexpr-ness.) Since the $vau expected an x and an e, that's what we pass it via the apply.

  • You might be used to seeing else or => instead of #t at the end of the $cond. Kernel has a number of opinions about things, and "unevaluated symbols" (as else and => are called) are generally considered to be a bad idea. I guess this is a bit clearer when $cond is defined as an operative instead of as a define-syntax... but it also feels like a small thing to give up given that #t always worked anyway.

@vendethiel
Copy link
Collaborator

vendethiel commented Jan 10, 2021

* I was going to say that no other languages offer both like this

Is it cheating to say & and &&? :)

* The `wrap` built-in wraps our `$and?` operative into an applicative so that we can call `apply` on it.

Is it special? Shouldn't it also have a dollar sign otherwise?

wraps our $and? operative into an applicative

So we could've used and? instead of (wrap $and?), right?

* But even wrapped, it doesn't lose its fexpr-ness

I'm interested in how.

@masak
Copy link
Owner

masak commented Jan 11, 2021

I was going to say that no other languages offer both like this

Is it cheating to say & and &&? :)

Yes and no, I think. 😉 It kind of depends on what you think qualifies as a boolean. By Kernel's/Shutt's standards, & doesn't qualify because it operates on unsigned integers/bit vectors. By C's standards, it's a perfectly cromulent "eager and" operation. I mean to talk a bit more about Kernel's strict treatment of booleans, which was a surprise to me.

The wrap built-in wraps our $and? operative into an applicative so that we can call apply on it.

Is it special? Shouldn't it also have a dollar sign otherwise?

Ah, trust @vendethiel to sniff out the one part of what I wrote where I felt I didn't have enough to stand on to be sure. But this is important, so let's try.

I think the answer to your question is in the metacircular evaluator (here using the one from SINK, Shutt's old prototype):

(define combine
  (lambda (combiner operand-tree env context)
    (cond ((operative? combiner)
             (operate combiner operand-tree env context))
          ((applicative? combiner)
             (combine (unwrap combiner)
                      (map-eval operand-tree env context combiner)
                      env
                      context))
          (else
             (error-pass (make-error-descriptor
                           (list "Tried to call a non-combiner: "
                                 (list combiner)))
                         context)))))

Note crucially that map-eval is called on the operand tree of an applicative, but not that of an operative. That is, during "normal" evaluation, the operands of an applicative (but not an operative) are subjected to evaluation before calling the underlying combiner. This should be non-surprising, as it parallels the difference between macros and procedures, or between special forms and procedures.

But when we call apply in the $and? code, we circumvent this part, and there is no evaluation of the operands:

($define! apply
   ($lambda (appv arg . opt)
      (eval (cons (unwrap appv) arg)
            ($if (null? opt)
                 (make-environment)
                 (car opt)))))

Maybe a different way to state this is that $and? is certainly special, and nothing of that is lost. If we bound the result of wrapping $and? to some name, it wouldn't deserve a dollar sign, because evaluating it in the normal way makes sure to also evaluate its arguments first.

wraps our $and? operative into an applicative

So we could've used and? instead of (wrap $and?), right?

But even wrapped, it doesn't lose its fexpr-ness

I'm interested in how.

These two questions are answered above, I believe, but I'm not sure how clearly.

In summary: when we call apply, we circumvent the normal evaluation of arguments that would have made a normal applicative call unsuitable for a (wrapped) operative.

and? and (wrap $and?) are not the same; there are two orthogonal changes going on here at the same time. Think of wrap (and unwrap) as the constructor (and destructur-or, respectively) of applicatives. We're used to thinking of applicatives ("normal functions") as being fundamental in some sense, but in Kernel, they are not — they have been factored into "evalutation of the arguments" + "operative", the latter being the fundamental concept. The "evaluation of the arguments" is the second change that's going on, but with apply we can circumvent it.

I realize this is bewildering; it's still a bit messy to describe. It gets even weirder when one realizes that $lambda in Kernel is defined as basically wrap of $vau — so where's that evaluation of the arguments, then? Well, the answer is that it happens when we evaluate the resulting lambda value as the head of a combination (form), in the normal course of a program.

I hope to get back to this, and be able to state it in a more straightforward way. In the meantime, hope this helps.

@masak
Copy link
Owner

masak commented Jan 11, 2021

Self-replying because I'm thinking about what I wrote and I'm kind of understanding it a bit deeper.

Combine this:

The e parameter [to $vau] for the environment, and the explicit environment passing in general, are central here.

With this:

[...] environments are conjugate with evaluation.

(And with the rather abstract things I attempted to clarify to @vendethiel.)

What you get is this: a $vau is like a lambda, but it has been provided with "the means of evaluation", to use at its discretion. The (dynamic) environment passed to it represents more of an opportunity than a mandate — interestingly, different library operatives have different policies about the environment they choose for evaluation.

Contrast this with a normal macro in the style of Alma (or Raku), where the focus is not on the evaluation more than perhaps indirectly; by passing things through from macro parameters to unquotes, you allow an invisible third party (the compiler + runtime) to evaluate things later. The focus is on the (quasiquoted) program text/structure/syntax, not on its semantics.

The fexpr paper calls the former style the explicit-evaluation paradigm, and the latter the implicit-evaluation paradigm. Macros have largely been heavily associated with the latter, in Lisp/Scheme and elsewhere. Apparently some experimental languages called reflective Lisps (of which I know next to nothing) from the 1980s share with Kernel their focus on explicit evaluation.

Quoting the fexpr dissertation:

Fexprs have an inherent clarity advantage over macros.

In part, this is because a fexpr specifies its entire computation explicitly, whereas
a macro specifies only the front (and usually lesser) half of its computation, arranging
the back half indirectly. Naturally, an algorithm is easier to understand when it is
actually stated.

@masak
Copy link
Owner

masak commented Jan 11, 2021

Kernel and the lack of a static phase

I was going to establish the terms "1-phase languages" and "2-phase languages" (analogously to Lisp-1 and Lisp-2), but now I fear if I say "two-face" too much, I'll only think of Harvey Dent.

Anyway. There are languages that just go ahead and run your program — but then there are also languages that first spend some time, well... type-checking, optimizing, and code-generating. You know, compiler-y stuff. Technically, this could be a difference between implementations, but usually it goes deep enough to be a feature of the language itself.

We could refer to these as "dynamic lanaguages" and "static languages". We could also talk about "interpreted languages" and "compiled languages". I think what we have here are three axes, but they're almost aligned.

Kernel has no compile phase. It's pretty into lexical lookup, which is usually compilation-friendly, but there's also something about operatives/fexprs that ruins all that static knowledge. (Shutt is aware of this. But it might not be an "all hope is lost" situation. The Kernel specification talks hopefully about a sufficiently smart "optimizing interpreter".)

With all this in mind, it's kind of interesting that a $vau ends up dealing with two environments, referred to as "the static environment" and "the dynamic environment". The former being the active environment in which the $vau combination is evaluated; the latter is that e parameter that comes in, later to be bound to the environment where the resulting operative is invoked.

Alma and Raku are two-phase languages. But the distinction between "static environment" and "dynamic environment" here is pretty much the same; the former belongs to the macro body, and the latter to the mainline where the macro is expanded. Biggest difference is that the "static environment" also happens in the "static phase" (BEGIN time), of which there is one.

A bit earlier I wrote

(For example, I don't want to uproot everything we've done with Alma and start over, just to do it "right" with fexprs. In fact, I think the end result would maybe look more like Kernel and less like macros for Raku, which wouldn't be in line with 007's or Alma's mission statement.)

This distinction is mostly what I meant. Neither Raku or Alma would be inclined at all to lose its compilation stage, not even for the pleasing theoretical properties of operatives.

@masak
Copy link
Owner

masak commented Jan 11, 2021

I want to link to an LTU comment by John Shutt, that explains the origins of Kernel.

Worth reading in its entirety, but here's the central paragraph:

Notice that the key innovation here is not reduction of the usual processing of an ordinary Scheme procedure call to special-procedure calls that don't evaluate their operands; not, somehow, "elimination" of operand-evaluations; but simply factorization of an ordinary Scheme procedure call into two stages: first, evaluate the operands to produce arguments, and second, apply the underlying special-procedure to the argument list. All the things done before are still done, we're just keeping track of them separately.

The factorization is the one where $lambda consists of a $vau and a wrap, as mentioned earlier.

@masak
Copy link
Owner

masak commented Jan 11, 2021

With the knowledge of Kernel as an intrinsically 1-phase language, I can now amend my reply to @raiph a little:

I'm sorry, no, I don't believe Perl 6 macros (or Alma macros) are fexprs at all. At the risk of revealing my still-very-shaky understanding of fexprs — the big distinction is that in Lisp macros (and Perl 6 macros, and Alma macros), after the macro has expanded at compile time, it's "spent". Only its expansion remains in the target code. A fexpr, on the other hand, inserts itself as the thing to be called, and so itself remains in the target code as the "expansion". That's very different, and according to that difference, Perl 6 does not have fexprs.

A fexpr/operative in Kernel doesn't much insert itself in any way. Very similar to a function, it just gets invoked. There's no expansion going on. But the interpreter knows the difference between operatives and applicatives, and only evaluates the operands for the latter type.

There's no way for Kernel operatives to be "spent" at compile time, because there's no compile time.

@JonChesterfield
Copy link

A small note on partial evaluation. Given a call to an unknown function, (f x y z), whether the arguments should be evaluated depends on whether f will resolve to an applicative or an operative. If this is rewritten, preferably by the compiler, to the moral equivalent (i.e. passing environment around) of:

(if (applicative? f)
  (f (eval x) (eval y) (eval z))
  (f x y z))

then the first call can compile to one where the arguments have been (partially-) evaluated and the second call to one where the arguments are passed as data.

I'm pretty sure that's sufficient to eliminate the overhead of fexpr relative to macros. It'll also constant fold correctly in the cases where f later turns out to be known.

@masak
Copy link
Owner

masak commented Nov 7, 2022

@JonChesterfield Interesting. Kind of a late-bound conditional wrap.

I agree, this looks feasible. I was thinking about how to do exactly this (compile Kernel) yesterday, and your technique will definitely help.

What I would also like to understand better is the following: when do we late-bind operatives? (Not to mention the applicative/operative distinction.) It's possible to read the entire Kernel spec and Shutt's entire PhD thesis and be none the wiser about this. Shutt seems to take pride in not knowing, with phrases like "in fact, not knowing this is an expected feature of a smooth language". [citation needed]


Ok, thinking about this a bit more while getting coffee. I'm not sure we're in the clear yet with this technique. Yes, it will cover the things that are in the code at program start, but — and this is crucial and impossible to ignore with Kernel — it will not cover things that are beyond an eval barrier (which includes most of what operatives do). So

If this is rewritten, preferably by the compiler

(emphasis mine); this is a good start but not enough. Think of the degenerate case of building a list (f x y z) from parts, and passing it to eval; the right thing still needs to happen no matter if f is applicative or operative.

Which is not to say that this technique won't work — it seems to me it might catch a lot of the "low-hanging fruit" in the actual program — only that it needs to be complemented by the regular (late-bound) discrimination between applicatives and operatives in eval.

I think what it comes down to is this: in the general case, we can't eliminate eval from a compiled program. But there might be something like a "90% case" where we can.

@JonChesterfield
Copy link

I agree with the conclusion. The runtime dispatch is not always eliminated, particularly in cases that look less like macros, so tagging remains. I'm not sure late bound conditional wrap matches the premise though. Perhaps I don't see what late bound means in a single phase language.

My working premise is that operatives and applications are the same thing modulo a Boolean flag that eval/apply queries to decide whether to evaluate arguments before binding to the parameter tree. Wrap returns the same underlying function with the Boolean true, applicative? tests it. So both applicative and operative functions exist at runtime, it's only a difference in calling convention.

Replacing evaluated lists with that branch is always meaning-preserving, where we know we have a list being passed to eval. Well strictly it needs to be a cond/case so that lists that don't start with either fail as they should. And care is needed if reflection is involved, depending on how you wish to expose that.

For lists built on the fly that thwart constant folding / partial evaluation, or equivalently when the compiler optimisation is disabled, the function still has the runtime tag for eval/apply to dispatch on. On second thoughts, there's a decent chance of being able to tell eval is being passed a list even when some elements are unknown if the construction is visible, in which case those elements which are known still qualify.

In general compilation is probably necessary to achieve competitive performance. I don't know the design tradeoffs in Bel. For Kernel, I think a caching JIT (transparent to the application) is the way to go.

@masak
Copy link
Owner

masak commented Nov 7, 2022

Perhaps I don't see what late bound means in a single phase language.

Yes, that was likely a careless phrase on my part. What I meant was more like "we surface in the code itself what used to be a hidden part in combiners".

My working premise is that operatives and applications are the same thing modulo a Boolean flag that eval/apply queries to decide whether to evaluate arguments before binding to the parameter tree. Wrap returns the same underlying function with the Boolean true, applicative? tests it. So both applicative and operative functions exist at runtime, it's only a difference in calling convention.

When you phrased it that explicitly, I realized it's not quite true: wrap takes a combiner, not just an operative. So instead of "Boolean flag", you'd actually have a non-negative number N of wraps; and instead of your if, you'd have a loop doing 1..N evals.

In general compilation is probably necessary to achieve competitive performance. I don't know the design tradeoffs in Bel. For Kernel, I think a caching JIT (transparent to the application) is the way to go.

Aye; "interpreter with a JIT compiler" is my working hypothesis/hope as well. Basically a lot of assumptions are made, asserted by guards (but usually not broken).

This made me associate to both macro-writer's bill of rights and Surgical Precision JIT Compilers.

@JonChesterfield
Copy link

Ah, so that's a subtlety I missed from the language. I didn't realize an operative could require N applications of wrap to get an applicative. That probably means an applicative can be wrapped to get one that evaluates its arguments twice, up to N. That's not obviously a good property to me, would prefer wrap on an applicative to return an error or the original applicative unchanged.

I hadn't seen the second reference, thanks. That's definitely not what I'm thinking as it involves the program explicitly invoking the JIT on parts of itself. By transparent I mean the program runs indentically under the interpreter to under the JIT compiler, which means no compiled? predicate and no compile function, as well as no behaviour change other than wall clock time.

Thinking of accepting transforms that send infinite loops in the interpreter to terminating ones under compilation, e.g. discarding an unused argument that loops forever, which is a slightly generous definition of only changing wall clock time.

@masak
Copy link
Owner

masak commented Nov 8, 2022

By transparent I mean the program runs indentically under the interpreter to under the JIT compiler, which means no compiled? predicate and no compile function, as well as no behaviour change other than wall clock time.

I think we have the same goal, but maybe we're thinking of it slightly differently. I agree about "no compiled? predicate" — to me, that's an implementation detail. I also agree about "no behavior change other than wall clock time". Recall that Bel is an untyped language, and that if you want to, you can declare a function with code that's clearly going to throw a runtime error as soon as you run it:

> (def foo ()
    (cdr 'boom))

There's no type checker here that will stop you from expressing this obviously bad thought. If you run it, though...

> (foo)
Error: cdr-on-atom

(In passing: error messages/error handling not yet good enough. Working on that.)

Now, was foo compiled under the hood at any time before I ran it? No, not in current Bel. Could it have been, even with the obvious runtime error? Yes! You can compile Bel code that "doesn't typecheck" into some target form whose meaning is "throw the error cdr-on-atom". Maybe it's just me, but it seems the world has unnecessarily split into two "extreme" views that miss this point:

  • Compileristas: "You can compile functions to efficient target code, but only if they typecheck first."
  • Interpreter beatniks: "We don't like type errors, so we're not going to compile at all."

As seen above, there's a middle ground: typechecking, in whatever form we imagine it, can be a "nice-to-have", but its absence need not rule out compilation — not if we're willing to emit target code that we know will "go wrong". (Edit: or even, in the less extreme case, code that we don't know won't go wrong. That annoying gray area that arises due to Gödel, Turing, and Rice.)

Girard makes this point well in Locus Solum (fair warning: this text is heavy stuff, and "not peer-reviewed"). In Example 1 on page 309, he introduces ✠, the daimon 𝕯𝖆𝖎, whose semantics in a program would be any of the following, whichever you feel makes the most sense:

(This suggests a certain kind of REPL- or feedback-based style of programming, well-known in Lisp and Smalltalk circles.)

The point is, we can get a richer notion of typechecking (and other kinds of checking) by not committing the very common error of thinking that type error = go home.

Tying this back to the original topic of compiling code: I think compilation can be an implementation detail. I think we can uphold the "transparency" of compilation (i.e. "any optimization that doesn't get you caught red-handed is allowed") if we're careful enough, and I believe such care consists mostly of what I mention above (never assume that types have to check out or that the typechecker has the right to call the shots), plus code-generating "guards" (in the sense JIT compilation uses them) for things that are impossible to guarantee statically (in Bel, things like untyped function parameters and dynamic variables get guards).

I'd say things don't even get particularly interesting or difficult until you start messing with the Bel interpreter. Now, there I confess I don't yet have the full story. But I'm really looking forward to getting to the point where I can easily ask such questions in the form of code.

@masak
Copy link
Owner

masak commented Nov 8, 2022

(This suggests a certain kind of REPL- or feedback-based style of programming, well-known in Lisp and Smalltalk circles.)

Now I also thought of Programming as collaborative reference by Oleg.

We argue that programming-language theory should face the pragmatic fact that humans develop software by interacting with computers in real time. This interaction not only relies on but also bears on the core design and tools of programming languages, so it should not be left to Eclipse plug-ins and StackOverflow. We further illustrate how this interaction could be improved by drawing from existing research on natural-language dialogue, specifically on collaborative reference.

I think "programming with holes" or the REPL/feedback based style of Lisp or Smalltalk refer to this kind of "collaborative reference", at least after the fact, as it were.

Allowing ✠ 𝕯𝖆𝖎 means making it socially acceptable to manipulate, inspect, and execute incomplete programs. I think the compileristas have adopted a too-narrow view that compilation requires completeness/validity — don't tell me it couldn't be useful (a) to run a program up to the point where it fails (getting, I dunno, side effects and debug/trace output along the way there), or even (b) to run a program to the point where it fails, confirming that it does fail. Getting the failure at runtime is strictly more informative than getting it at compile-time — for one thing, now you know that the failure occurs on a non-dead code path! As someone who likes these tight feedback loops, it counts as feedback saying "this ✠ is the one that needs your attention now". (Counterexample-Driven Development, CDD.) Similar to the beneficial effects of TDD, it's like the execution is throwing you back into the code, pointing to something that doesn't add up, the next counterexample-driven focus for your attention.

@masak
Copy link
Owner

masak commented Jan 30, 2023

Robert Harper, in this supplement to PFPL goes (if possible) even further, claiming that there is no essential difference between compilers and interpreters:

Peculiarly, it is commonly thought that interpreters and compilers are somehow in opposition to one another, with interpreters being interactive, and necessarily inefficient, and compilers being non-interactive, but efficient. This is total nonsense! There is no fundamental difference between an interpreter and a compiler. Rather, a compiler, which works by transformation, is just one strategy for implementing an interpreter. There is no bright-line distinction between the two concepts, and neither is inherently tied to being interactive or non-interactive.

For what it's worth, I think this is also part of Futamura's message.

@masak
Copy link
Owner

masak commented Mar 24, 2023

(How does Kernel fit into all that? That's something I'm still thinking actively about. Kernel takes a lot of the things that have been structured for decades, and "softens them up" again, conceptually taking us back to a soup of GOTOs. Apparently no-one has done the intellectual work to find out whether it must remain a soup, or whether optimizers can in practice re-establish enough guarantees to bring back the structure. Shutt himself clearly suspected that the latter was possible. The author of Klisp wanted to try, but seems to have run out of steam before he started.)

The other day this paper was uploaded to arXiv. It's named "Practical compilation of fexprs using partial evaluation: Fexprs can performantly replace macros in purely-functional Lisp", and (from a quick skim) seems to address the above question of, can we get structure (and performance) back after choosing fexprs?

Hoping to read it more in detail and write more.

@masak
Copy link
Owner

masak commented Mar 24, 2023

So where did checklet go? What took its place? The answer, prosaic as it is, is "small functions". Bel's style in bel.bel tends towards this already, and it acts as a kind of "community guideline" since there (frankly) isn't much else right now.

I'd like to add two things, almost two years later, as it were.

  1. If you compare the "before" and "after" bits of code in that comment, you'll notice that "before" has one relatively large, monolithic chunk of code, whereas "after" has five relatively smaller functions. That isn't the main characteristic of the refactor, though — I think it's the typical thing where we take on some tighter coupling (between functions) for the sake of higher cohesion (within functions). Someone who wanted to turn this into a principle would say something like "a function should do one thing and one thing only". I find such principles almost completely useless, seeing as how they try to dress up a value judgement as a law. We should be clear that it's a tradeoff — do we want large, monolithic functions, or do we want many small functions? Even though I side with the many small functions in this case, I acknowledge a case could certainly be made that the monolithic function more clearly exposes its control flow.

  2. There's a correspondence here between "before" relying more on Dijkstra's structured programming, and "after" relying more on tail calls. Not sure if it's that spectacular, really; the two while loops in "before" have turned into direct (tail-)recursive calls in "after". It's not that we couldn't still do it with a loop; it's just that it feels more straightforward and shorter this way.

So, why weren't checkpoint, checklet and retry needed in the end? Because the feature they implemented, "go back to this earlier point", was already provided by the much more straightforward mechanism of factoring out a function, and then calling that function to go back.

Right. I agree with past-me. Though I will point out that the first recursive call could have been replaced by a repeat-while loop, and the result would be one line shorter (because we wouldn't need to call ourselves). In the second case, when I try to replace the recursion with a loop, I get loop-and-a-half issues from needing to repeat the sort. Maybe there is a loop solution there, but I don't immediately see it.

@masak
Copy link
Owner

masak commented Mar 24, 2023

I was reading the first program in Basic Computer Games today, and was struck by how programs were structured back then, in the mid-70s. All those GOTOs, they work, mind you, but they put the onus entirely on me, the reader, to build a structured narrative, consisting of switch statements, if-else chains, loops (with and without breaks), even just well-placed and well-named functions on top of the substrate of GOTOs.

In all fairness I should add this, which I feel matters: the first edition of Dartmouth BASIC already had FOR...NEXT. That is, if your loop had a lower and an upper numeric bound, then you were in luck and could express that in a modern, structured way. (There were also no break and continue statements, although I imagine those could be emulated with GOTOs.)

It also had GOSUB...RETURN, which, you know, functions. Lambda calculus. 😄

@masak
Copy link
Owner

masak commented Apr 23, 2023

(How does Kernel fit into all that? That's something I'm still thinking actively about. Kernel takes a lot of the things that have been structured for decades, and "softens them up" again, conceptually taking us back to a soup of GOTOs. Apparently no-one has done the intellectual work to find out whether it must remain a soup, or whether optimizers can in practice re-establish enough guarantees to bring back the structure. Shutt himself clearly suspected that the latter was possible. The author of Klisp wanted to try, but seems to have run out of steam before he started.)

There's also this paper, which correctly identifies the problem of "everything gets pessimized because operatives":

In effect, all expressions must remain unmodified in order to not alter the meaning
of the program as long as it is not known whether an expression is to be evaluated by an applicative or operative combiner. This precludes all program optimization since there is no way to statically distinguish between applicative and operative combiners in the general case.

And then hints at a solution, which I'll summarize as "phase separation between modules". I don't have a sense of whether that's a good solution, or too limiting in practice, or something else.

@buzmeg
Copy link

buzmeg commented Apr 23, 2023

The other day this paper was uploaded to arXiv. It's named "Practical compilation of fexprs using partial evaluation: Fexprs can performantly replace macros in purely-functional Lisp", and (from a quick skim) seems to address the above question of, can we get structure (and performance) back after choosing fexprs?

Also, I'd like to point out the Github and website as they don't seem to be in the paper:
https://kraken-lang.org/
https://github.com/limvot/kraken

@masak Did you ever dig further on this?

In particular this is timely given that Zig seems to be somewhat flirting with partial static evaluation with "comptime". They're not going the whole way to changing the evaluation of arguments as far as I can see, though.

In the paper, the particularly interesting thing is just how many eval calls get elided--almost all wrap0 and wrap1. It's huge.

To me, I'm still looking for a small embedded language for microcontrollers that gives me full power but fits on <32KB systems. Kernel was conceptually simple but generated MASSIVE amounts of garbage to hold unwrapped evaluation results as well as environments (to the point I was wondering if we could go back to dynamic environments). I wonder if just a couple of optimizations gets most of the performance without bloating code too far.

@masak
Copy link
Owner

masak commented Apr 24, 2023

Also, I'd like to point out the Github and website as they don't seem to be in the paper: https://kraken-lang.org/ https://github.com/limvot/kraken

Thanks! Those are some intriguing references from that site, including the one to FUN-GLL, which I didn't know about before. Need to check that out more.

@masak Did you ever dig further on this?

Not the paper, no. I have it open now, but it's unlikely I will find the tuits soon to give it the attention it deserves.

For now, three very minor points:

  • I find it noteworthy (and a bit sad) that Kraken drops the $ sigils on operatives. This is a very "easy" thing to do, since it aligns with the mainstream in some way; note that macros are usually not marked in Lisp and Scheme and Racket dialects. But I know Shutt felt this detail (which, it must be pointed out, counts as a kind of Hungarian Notation, and can't be fully disentangled from Shutt's distaste for static types) was an important contribution, perhaps the most important contribution. On the surface, this sounds like a trifling syntax detail, but I think it's very related to the "loss of static certainty" that partial evaluation is also trying to address; the $ sigils just tries to address it on the code/readability level. When the Kraken page says that Kraken is "heavily inspiried [sic] by John Shutt's thesis", my first thought was "but no dollar signs".

  • From the examples on the web page, one of the first things the authors do is define quote. Yes, that is very easy to do. In fact, the Klisp test suite does the same in order to express certain ideas. But this is another thing that Shutt emphasizes many times, in his dissertation, in the Kernel specification, and in various comments on lambda-the-ultimate: operatives and quote do not mix. It's as if quote is the minimal combinator for introducing wrongness and undesired behavior into an operative-rich programming environment. Shutt usually calls this idea of leaning on quote "unevaluated symbols".

Here, let me excerpt from the Revised-1 Kernel Report:

In R5RS Scheme, the test on the last clause of a cond expression may be replaced by the keyword else. This is something like a local second-class special form operator (flouting G1a of §0.1.2); and, moreover, it would encourage the embedding of unevaluated symbols in expressions constructed for evaluation, and thus (indirectly) the likelihood of accidental bad hygiene (flouting G3 of §0.1.2); so Kernel omits it. The same effect can be achieved at least as clearly, and more uniformly, by specifying #t as the ❬test❭ on the last clause.

(For reference, G1a is "All manipulable entities should be first-class objects" (which else wouldn't be), and G3 is "Dangerous computation behaviors (e.g., hygiene violations), while permitted on general principle, should be difficult to program by accident." (i.e. the use of else and similar unevaluated symbols, and by extension the quote operative which is the primary means of generating unevaluated symbols, is a primary offender in "bad hygiene", and antithetical to the hygienic use of operatives).)

  • Thirdly, and here I will say the least because I haven't read the paper deeply enough, I don't think the technique they're employing would work directly on Kernel itself. Kernel is "too late-bound"; when you realize that things have been set up so that nothing is sacred and even something like $if or $eval can be re-bound in a way that affects all existing definitions, it gets hard even to imagine a solution that speeds things up by partial evaluation. I'm not quite sure how they deal with that in the paper.

@buzmeg
Copy link

buzmeg commented Apr 24, 2023

  • when you realize that things have been set up so that nothing is sacred and even something like $if or $eval can be re-bound in a way that affects all existing definitions, it gets hard even to imagine a solution that speeds things up by partial evaluation.

Is this actually true? Aren't most builtins accessing lexical scoping which effectively is static (chained to parent)? Yes, you can access "$if" via the dynamic environment which could get redefined, you're just not likely to unless you are specifcally attempting to emulate dynamic scoping (which couldn't be compiled anyway).

My read of the paper is that it is leaving breadcrumbs on the way down so that if you don't rebind "$if" along the way but bottom out in the partial evaluation, the system will unwind all the breadcrumbs and replace them with "$if symbol from definition environment".

One particular quote from the paper stuck out: "Any use of fexprs that is not partially evaluated away is something that could not be expressed via macros, and thus paying a performance penalty in these cases is not onerous."

@masak
Copy link
Owner

masak commented Apr 24, 2023

  • when you realize that things have been set up so that nothing is sacred and even something like $if or $eval can be re-bound in a way that affects all existing definitions, it gets hard even to imagine a solution that speeds things up by partial evaluation.

Is this actually true? Aren't most builtins accessing lexical scoping which effectively is static (chained to parent)? Yes, you can access "$if" via the dynamic environment which could get redefined, you're just not likely to unless you are specifcally attempting to emulate dynamic scoping (which couldn't be compiled anyway).

Yes, it's true. Here's my chain of reasoning about Kernel — feel free to fact-check it. I've annotated with section numbers from the Report to show where I get things from.

  • (4.9.1) You can use $define! to override a built-in operative like $if.
  • (3.2) Typically, a Kernel "session" (a REPL or a program) would run in a so-called "standard environment", which is an initially-empty environment with the ground environment as parent. The ground environment is where all the Kernel built-ins live; using $define! to override $if doesn't overwrite $if (this is explicitly disallowed), it only adds a new binding in the standard environment. Therefore, the mutation only affects the current program, but it does affect already-made definitions using $if earlier in the program, since these would make a late-bound lookup and find the new definition.
  • (6.7.3, 6.7.8) The suggested way to "shield" yourself from this kind of mutation by your peers is to use the built-in $let-safe; it's like a $let declaration, but evaluates the initializer expression and the body in a fresh standard environment. Note how opt-in this is, though; I find it significant that Kernel with its strict emphasis on not allowing bad behavior by accident, makes the mutating behavior the default in this case. You can totally change the meaning of $if for your peers (functions in the same program), and presumably only the functions that felt it was really important to shield against this would use $let-safe; for the others, the risk of built-in mutation is kind of part of the deal.
  • (15.2) The module system is not fully mapped out in the Kernel Report, but the description of get-module already makes it clear that other modules would evaluate in a fresh standard environment, and so they would be shielded as if by $let-safe.

It's less bad than I remembered (which is a relief), but it's still definitely bad. To what I write above, add the fact that such mutation can happen at any point and in arbitrarily dynamic ways, and that no static analysis will ever be able to catch it reliably due to Rice's theorem. Which makes it impossible to make any reliable assumptions for partial evaluation at compile time.

@buzmeg
Copy link

buzmeg commented Apr 24, 2023

To what I write above, add the fact that such mutation can happen at any point and in arbitrarily dynamic ways

While this is allowed, I don't think Shutt expected it to be common.

(5.3.2) Rationale:
In ordinary Kernel programming, applicatives that care about their dynamic environ-
ment are rare. Moreover, such applicatives pose a potential hazard to proper tail recursion
(as noted in §4.10.3); so, in the interest of making dangerous things difficult to do by acci-
dent (G3 of §0.1.2), $lambda constructs compound applicatives that ignore their dynamic
environments, encouraging the programmer to flag out the rare dynamic applicatives by
explicit use of wrap and $vau . See, for example, §6.7.2

Even $define! is considered "optional" (although I question whether that is really true ...). In addition, if you look at the uses of $define! that aren't just top-level variable setting, there's very few that are relying on redefinition to mutate during calls down the stack. And even those can often be replaced by constructs that don't rely on $define!, per se. I don't see any obvious usage of Shutt redefining something primitive once he makes it. If Shutt didn't use this stuff when defining his own language, it's probably not a tool he expected others to reach for very often.

I think the only mutational applicative is "set-last!" from "append"/"append!". "$letrec" is a little odd so I'd have to think about it. Everything else is just defining a helper applicative (no operatives).

In my opinion, a lot of the "we can redefine everything" stems from Shutt attempting to make Kernel axiomatic--based around a small, well-understood set of primitives to construct everything and avoid having to provide various pieces of ad hoc "implementation assistance" that other academics would challenge on theoretical grounds. He was, after all, wading into an area that was more than a little contentious by academic standards.

If we are attempting to precompile/partial evaluate for speed, we need "implementation assistance" and that assumption is already out the window. I, personally, don't find the idea of "shadowing a builtin severely hampers your ability to compile/pre-evaluate" a very problematic restriction. Redefining builtins like "$if" would be shocking to most programmers, anyway.

@masak
Copy link
Owner

masak commented Apr 25, 2023

To what I write above, add the fact that such mutation can happen at any point and in arbitrarily dynamic ways

While this is allowed, I don't think Shutt expected it to be common.

I totally understand where you're coming from. And I agree with the specifics of what you write after that. You are doing me the service right now of arguing in the exact way I was hoping you would, giving voice to one side of this debate (which I've kind of had with myself for the past two years or so).

So, when I disagree on the wider point, please don't take it the wrong way. In fact, I don't know you at all, and I don't have time for "someone is wrong on the Internet"-style discussions. I think there is a really interesting central point that needs to be discussed here, though.

Our discussion is an instance of the following common discussion:

P1: "Behavior X in this programming system is extremely disruptive and destroys all our static analysis!"
P2: "Yeah, but while it is allowed, it's not that common."

One instance of X is "pointer aliasing"; it wreaks havoc with the analysis of programs, up to and including data races, and there are reams of research published on how to combat this. Rust and Facebook Infer can be considered direct attacks on the problem. This story is not fully written.

Another instance of X is "late-bound evaluation of code". Forget about the injection attack problem, that's a red herring. The real issue is that anything might happen in that code, including arbitrary heap mutations. Any assumption about the heap might be invalidated. (Which conservatively means it is invalidated.)

One of my favorite issues in the TypeScript repository is #9998 Trade-offs in Control Flow Analysis. Look how quickly we lose control! The issue thread keeps being open and keeps being linked from all over the place, because a lot of other issues/requests are actually this issue in disguise.

Optimizations need to be based on some kind of static analysis, and static analysis needs to be "conservative" in the sense that it's not allowed to make assumptions about bad things X not happening — in fact, if X can happen, then for all we know it does, Murphy's Law-style.

In other words, P1 is right: X is bad, and it sucks. P2 is also right (X is less than 100% common), but P2's argument is not strong enough to refute P1's; it does not mean that We Can Have Nice Things again.

This is all very frustrating. We are prevented from doing nice static analyses and nice optimizations by the worst imaginable behavior from the components we can't see into. It has led some people to start talking about "soundiness" as a reasonable compromise and an alternative to strictly conservative analyses. I have a lot of sympathy for that, and I think of it as a pragmatic stance. I include Incorrectness Logic in that stance as well. But I think the trade-off it makes (away from the pessimistic/conservative point) means that it can't be used for optimizations.

What I do think is possible is to optimistically compile the efficient code, and then call it under a "guard" that checks that $if still hasn't been locally shadowed. This is still pretty good, and it can be made reasonably fast. The guards can be made reasonably sparse, because there are big chunks of contiguous code where we do have total knowledge of no dynamic environment mutation. The idea of "guards" is not mine; it's how JIT compilation works when it does specialization of monomorphic code; the specialization is a "gamble" in the sense that past experience may back it up, but the future may contain surprises. I also think this is what Shutt had in mind when he (occasionally) talked about an "optimizing interpreter".

So there is hope. It's just, it's not the wishful-thinking kind of hope, where we say "X is not common, let's ignore X". It's of the complicated kind, where we carefully insert guards that allow us to monitor our wishful assumption, and we stay ready to divert to a slower "Plan B" path when it's invalidated, which is hopefully very rare.

(Does the paper do it this way? I don't know, because I still haven't read it carefully enough. If it doesn't, it's either unsound, or not really capturing Kernel semantics.)

In my opinion, a lot of the "we can redefine everything" stems from Shutt attempting to make Kernel axiomatic--based around a small, well-understood set of primitives to construct everything and avoid having to provide various pieces of ad hoc "implementation assistance" that other academics would challenge on theoretical grounds. He was, after all, wading into an area that was more than a little contentious by academic standards.

Kernel is based on a small, well-understood set of primitives, yes. According to a previous comment in this issue thread, three. And of those three $define! is in an "optional module", because Shutt saw that it might be interesting to study a subset of Kernel where environment mutation is not allowed. (I back this up by agreeing that the loss of $define! would not mean the loss of any real expressivity in the language. Let me handwave that and say that things like recursion and first-class environments are pretty powerful; it's folklore that a global CPS transformation can simulate mutable state.)

For simplicity, let's keep $define! instead of simulating it the hard way. I think it's absolutely true that Shutt was aiming for a "small core" design. But I don't think Shutt was unaware of the redefine-$if consequence. I think he absolutely considered the design option to make $if "fixed" — more like a keyword than a shadowable identifier — and rejected it.

Meaning absolutely no disrespect to those who have passed before us, here's how I think that conversation would go. I'd ask whether name lookup in Kernel is "late-bound". Shutt would say yes, sure, because environments are first-class, and asking "what is myvar" is like sending a message ("at runtime") to the actor-like entity that is the current dynamic environment. I'd ask if that includes $if, and Shutt would say "yes, of course". I'd point out that this would preclude a lot of convenient up-front code optimizations that would assume that the $if in the code is the standard $if. Shutt would say I sound like one of those "preprocessor macros" people or static-types people, who wish for a world that is more orderly and less dynamic than it actually is. He would say there are a lot of very intelligent folks working on things like that, but he doesn't feel a need to add himself to their number, nor to cripple his language just to make their life easier by (say) adding a guarantee that $if (the identifier) always means $if (the built-in operative).

I never met Shutt in real life, but I've read a lot of his stuff at this point. My confidence that the conversation would go something like that is high.

If we are attempting to precompile/partial evaluate for speed, we need "implementation assistance" and that assumption is already out the window. I, personally, don't find the idea of "shadowing a builtin severely hampers your ability to compile/pre-evaluate" a very problematic restriction. Redefining builtins like "$if" would be shocking to most programmers, anyway.

You're either implementing Kernel the way it's specified, or you're not. If you make assumptions for your own convenience, but it means you deviate from the letter of the spec, then wouldn't it be easier to instead pick a language where those assumptions hold in the language specification? "Really late-bound everything" seems to be a vital and non-negotiable part of Kernel's feature set.

@buzmeg
Copy link

buzmeg commented Jun 5, 2023

If you haven't seen it, there is a nice post by Graydon Hoare talking about Rust ("The Rust I Wanted Had No Future"):
https://graydon2.dreamwidth.org/307291.html

I especially like the comment about lambda and environment capture:

I often say (provocatively) that "I hate lambda", but lambda is actually (at least) two separate language features: one is a notation for anonymous function literals; another is equipping those anonymous function literals with environment capture. I don't really care either way about having such a literal syntax, but I really do dislike equipping it with environment capture and think it's a mistake, especially in a systems language with observable mutation.

It's funny, the high-end CS people keep dancing around the whole idea that the environment is a thing and it really needs to be reified properly.

@masak
Copy link
Owner

masak commented Jun 6, 2023

If you haven't seen it, there is a nice post by Graydon Hoare talking about Rust ("The Rust I Wanted Had No Future"): https://graydon2.dreamwidth.org/307291.html

Interesting! I hadn't seen it. There's a HN discussion as well (for what it's worth).

I haven't completely digested the entire post, and I'll have to read it again more carefully. But it's easy to see that a lot of it is speculation and what-could-have-been thinking. Guess the truth is that language design and language development is contingent, path-dependent, and has a lot more choices than easy answers.

I especially like the comment about lambda and environment capture:

I often say (provocatively) that "I hate lambda", but lambda is actually (at least) two separate language features: one is a notation for anonymous function literals; another is equipping those anonymous function literals with environment capture. I don't really care either way about having such a literal syntax, but I really do dislike equipping it with environment capture and think it's a mistake, especially in a systems language with observable mutation.

It's funny, the high-end CS people keep dancing around the whole idea that the environment is a thing and it really needs to be reified properly.

It's part of the static/dynamic dichotomy, the eternal divide between Those Who Compile and Those Who Interpret.

If you're building a compiler, then the environment is something like a series of linked stack frames, and it's typically not visible. Some people from this camp even say that making it visible would be a huge security flaw, although others disagree.

If you're building an interpreter, then the environment is not strictly necessary — you could do everything "by substitution", and the environment just represents an obvious laziness-based optimization on that, to avoid traversing subtrees. So typically the environment is present; since the wall between interpreter and interpreted code is so thin and porous, it's much easier in this scenario to want to reify the environment and make it available to the program itself.

So, I dunno. I'm not sure about the phrasing "it [the environment] really needs to be reified properly". 😄 In a much earlier comment in this thread (over two years ago), I talk about the double-edged practice of first-classing things, including environments:

Kernel starts out by first-classing "special forms" (by turning them into operatives), which already totally hoses static analysis, but it can't stop there. The slogan is that "all manipulative entities should be first-class", and so it goes on to first-class environments, hosing static analysis again for emphasis.

I'm reminded of the phrase "what you lose on the swings, you gain on the roundabouts" (clearly coined by some amusement park mogul). In this case, the trade-off involved in reifying and first-classing environments is pretty clear: first-class environments are powerful and flexible in some objective, undeniable sense — but letting them loose like that also means that there's more "fog of war" statically, and less static analysis can be done. Since both flexibility and static guarantees are good things, it's not an obvious choice.

I have this idea of creating an object system in Kernel, a little bit like CLOS maybe. The objects themselves would be environments (for data and methods) wrapped in encapsulations (for access control). I still haven't done that; it would be nice to try. It's hard to say whether the exercise would make me more warm towards first-class environments, or less.

@buzmeg
Copy link

buzmeg commented Jun 6, 2023

Kernel starts out by first-classing "special forms" (by turning them into operatives), which already totally hoses static analysis

Except that I'm not necessarily sure it does.

As I poke more at the ideas behind Kernel, one of the things you can explicitly say is "Please evaluate this in the immutable ground environment." Once you do that, things get a lot cleaner.

Yes, you can still do things that rip the ground out from under yourself, but you can also not do them. If you start from the immutable ground environment, the only things not the same as ground are the things that you, yourself, changed. If you only use and define applicatives, then things seem to be able to be compiled.

Of course, once you open up the can of worms with an operative, that all goes out the window. But you don't have to use operatives, that is your choice.

I really wish Shutt had done an implementation of Kernel in something like C or Java. There is some wonkiness around environments and their implications that really needs a solid, non-lisp implementation to really lock down the semantics. With everything being Lisp-to-Lisp there's a bit too much hand waviness that you can get away with.

This also applies to a LOT of the "machinery". The whole wrap/unwrap machinery seems overly generic (I don't think I have seen any example that wraps more than once). Having to cons up arguments for applicatives is known to be a performance disaster ("CONS should not CONS its arguments"). Lots of stuff is merged into a single list in order to only cons/uncons when keeping things separate as (operator, args) would avoid generating a whole bunch of garbage associated with prepending/appending all the time. etc .

Shutt also puts a lot of emphasis on "$define!" to bootstrap everything which completely defeats any attempts at compilation/static analysis. It's not clear that this is actually required--being able to bootstrap from far less powerful primitives would benefit the performance of the interpreter significantly.

@masak
Copy link
Owner

masak commented Jun 7, 2023

As I poke more at the ideas behind Kernel, one of the things you can explicitly say is "Please evaluate this in the immutable ground environment." Once you do that, things get a lot cleaner.

Yes, you can still do things that rip the ground out from under yourself, but you can also not do them. If you start from the immutable ground environment, the only things not the same as ground are the things that you, yourself, changed. If you only use and define applicatives, then things seem to be able to be compiled.

Of course, once you open up the can of worms with an operative, that all goes out the window. But you don't have to use operatives, that is your choice.

Yes, I've been thinking something similar since my Apr 25 comment. It is like you say, that there are entire Kernel programs that one could validate as being non-weird at a glance, for example by saying "we're only calling known-good applicatives here".

It's possible to go even further than that. I bet there's a technique of finding a "trusted kernel" of code at one or more points in the program, and then using that as the base case of an inductive argument showing that (in many cases) the whole program is trusted. In this case, I use "trusted" to specifically mean "doesn't mess with environments in a dynamic way".

I still need to write that out formally, but I have a feeling it's true. The point is that a lot of the desired optimizations can be made, in the comfortable shadow of such an inductive argument.

This also applies to a LOT of the "machinery". The whole wrap/unwrap machinery seems overly generic (I don't think I have seen any example that wraps more than once). Having to cons up arguments for applicatives is known to be a performance disaster ("CONS should not CONS its arguments"). Lots of stuff is merged into a single list in order to only cons/uncons when keeping things separate as (operator, args) would avoid generating a whole bunch of garbage associated with prepending/appending all the time. etc .

Yes.

I think it's the same dynamic here. Once we have the inductive trust argument, we can also start reasoning about which values escape and which ones don't. For values that don't escape, an optimizer is free to change representations and improve performance under the hood.

Shutt also puts a lot of emphasis on "$define!" to bootstrap everything which completely defeats any attempts at compilation/static analysis. It's not clear that this is actually required--being able to bootstrap from far less powerful primitives would benefit the performance of the interpreter significantly.

I'm not sure this is a big deal-breaker either. In Scheme, there's a restriction saying that define declarations have to be first in a block, for reasons having to do with simpler static analysis. Even though Kernel doesn't have such a restriction, I bet if you went looking at how $define is actually used, you'll find it used first in a block in 90% of the cases or more. A static analysis could show (in many cases) that after a given point, the environment doesn't get mutated because there are no $define forms, the environment doesn't escape, etc.

@masak
Copy link
Owner

masak commented Jun 27, 2023

Coming at this from a slightly different angle.

I've been trying to think up a simple module system for Bel. Modules don't have to have a static component, but it's not far-fetched for them to do so, since this is a sensible level at which to ask "what names does it export?" This is a static question, in the sense of "shouldn't have to run the code to know that". (I can't back up this sentiment with any fact — it's more of a handwaving argument, along the lines of "what the heck would you be doing in your module code such that it makes the question 'what are its exports?' a hard one to answer!?".)

At the same time, I don't want to dumb it down or put up limits just for the sake of some module-analyzing code. Consider this first part of a prelude.bel module I am using more and more frequently:

(def caar (x) (car (car x)))
(def cdar (x) (cdr (car x)))

(def caaar (x) (car (caar x)))
(def caadr (x) (car (cadr x)))
(def cadar (x) (car (cdar x)))
(def cdaar (x) (cdr (caar x)))
(def cdadr (x) (cdr (cadr x)))
(def cddar (x) (cdr (cdar x)))
(def cdddr (x) (cdr (cddr x)))

(def caaaar (x) (car (caaar x)))
(def caaadr (x) (car (caadr x)))
(def caadar (x) (car (cadar x)))
(def caaddr (x) (car (caadr x)))
(def cadaar (x) (car (cdaar x)))
(def cadadr (x) (car (cdadr x)))
(def caddar (x) (car (cddar x)))
(def cadddr (x) (car (cdddr x)))
(def cdaaar (x) (cdr (caaar x)))
(def cdaadr (x) (cdr (caadr x)))
(def cdadar (x) (cdr (cadar x)))
(def cdaddr (x) (cdr (caddr x)))
(def cddaar (x) (cdr (cdaar x)))
(def cddadr (x) (cdr (cdadr x)))
(def cdddar (x) (cdr (cddar x)))
(def cddddr (x) (cdr (cdddr x)))

(Did you spot the error? I deliberately changed caaddr to have the wrong definition, and you didn't even notice, did you? The body is just boilerplate-ly repeating the information in the function name, and it quickly blends into noise.)

I'm writing it as those repetitive top-level definitions right now mainly due to "speed constraints". In a faster Bel, the first thing I would consider would be to start abstracting and writing it something like this:

(def-cxrs
  caar cdar
  caaar caadr cadar cdaar cdadr cddar cdddr
  caaaar caaadr caadar caaddr cadaar cadadr
  caddar cadddr cdaaar cdaadr cdadar cdaddr
  cddaar cddadr cdddar cddddr)

For some appropriately-defined def-cxrs macro.

We would then be in a situation where a bunch of defs are effectively code-generated by the def-cxrs macro, and (which is saying the same thing) syntactically abstracted by that macro. A module-analyzer whose job it is to statically answer "what are the exports?" would need to opportunistically expand the macro call, because the definitions of interest are in the expansion. This should be OK, because the def-cxrs is "first-order" or "harmless" in a sense I'm afraid I can't make very precise — it's a code-shuffling template-based thing which doesn't require any dynamic context or timing; furthermore, this can be seen from the macro's definition. (Very similar to the "trusted kernel" reasoning above.)

I hereby coin the term "scrutable" for module code that meets this definition. We want to be able to ask "what are the exports?", and not just get a clear answer back (which might involve opportunistically expanding macros a bit), but also have a guarantee that we didn't miss any definitions hiding anywhere. I posit that this is possible, because macros tend to be very well-behaved in the sense that they depend on static stuff, not dynamic stuff. Module code needs to be scrutable, which is what guarantees that its exports are statically known.

Keep in mind that in Bel, macros expand Late, at the same time as functions are called. So the fact that this should hold true and be possible is not obvious, and more than a little surprising. Bel macros have this ability to do dynamic stuff, but none of them use this ability.

@masak
Copy link
Owner

masak commented Jun 27, 2023

(continuing)

It's a unification of these two points of view:

  • Static: The module is the document. (What you see is what you get.)
  • Dynamic: The module is the side effects (changes to globe) resulting from running some code.

What the adjective "scrutable" does is establish a kind of peace treaty, an effective bridge between these two views, by saying that you can reliably get from the document to the side effects — essentially because the syntax-abstracting macros are upper-bounded in how weird they are. Moreover, this can also be checked statically.

Just for completeness, here's a thing you cannot do in a scrutable module file:

(def random-symbol ()
  ;; some code that puts together a random 5-letter symbol
  ..)

(mac def-random ()
  `(def ,(random-symbol) ()
     nil))

(def-random)  ;; define and export something random

@masak
Copy link
Owner

masak commented Aug 23, 2023

What I would also like to understand better is the following: when do we late-bind operatives? (Not to mention the applicative/operative distinction.) It's possible to read the entire Kernel spec and Shutt's entire PhD thesis and be none the wiser about this. Shutt seems to take pride in not knowing, with phrases like "in fact, not knowing this is an expected feature of a smooth language". [citation needed]

I found the relevant Shutt quote I must have half-remembered above, from the comments of this blog post:

In particular, I can't think of any good uses for higher-order operatives.

If you think you're on to an interesting appraoch, pursue it. That said, I'm not concerned with a dearth of higher-order operatives (though $make-tag-predicate in my dissertation, a basic technique for avoiding implicit evaluation, comes close). I see the Smoothness Conjecture (dissertation section1.1.2) as predicting that one can't foresee specific particular benefits of smoothness.

That dissertation link has bitrotted; here is a working link.

Just for completeness, here is the Smoothness Conjecture, verbatim from section 1.1.2:

(Smoothness Conjecture) Every roughness (violation of smoothness) in a language design ultimately bounds its radius of abstraction.

Implicit in all of this is (of course) that lambdas/functions factor into, on the one hand, vaus/operatives/fexprs, and on the other hand, evaluation of the operands into values before passing them — therefore vaus are more "smooth" than lambdas. (Also, pointed out in that same comment, lambda calculus is really about operatives, not about functions.)

I'm all for smoothness, and I still agree with the reasons to seek it out... but it's worth thinking about how the paradigmatic countercurrent of macros and compilation came about. It has something to do with preferring not just dynamism and infinite flexibility, but also static guarantees, hardcoding/partial evaluation, and (not putting too fine a point on it) performance.

@masak
Copy link
Owner

masak commented Aug 25, 2023

In stark contrast to the "what should you use fexprs for?" undercurrent developing through this issue thread, I just found this example of the opposite sentiment — "what shouldn't you use fexprs for?" — at the end of a 2020 blog post by Shutt (about a conlang):

As it happens, I have a notion how to approach this, dormant since early in the development of my dissertation: I've had in mind that, if (as I've been inclined to believe for some years now) fexprs ought to replace macros in pretty much all situations where macros are used, then it follows that TeX, which uses macros as its basic extension mechanism, should be redesigned to use fexprs instead. LaTeX is a (huge) macro package for TeX.

If fexprs ought to replace macros in pretty much all situations where macros are used, then... not only does a quite-big overlap need to exist between macros and fexprs, in order for fexprs to replace macros; it also suggests that fexprs are somehow more suited for the role macros have taken on. In other words, there's a point (in feature-providing state space) where macros cannot follow, and fexprs just keep going.

I'm curious about that point. I'm curious about real-world concrete examples of such fexprs. These fexprs must, by construction, contain some aspect of runtime-only information, or decision-making delayed until runtime, which macros are too static to emulate. Annoyingly, as Shutt himself has pointed out, actually trying to guess at this group of fexprs (and their benefits) is Hard.

Perhaps Alan Kay's dictum, that "the best way to predict the future is to invent it" applies here. I simply need to build enough awesomely dynamic fexprs until I stumble on a useful one. 😄

@masak
Copy link
Owner

masak commented Sep 22, 2023

I've been meaning to write about an idea I've been throwing around inside my head for a while: in brief, what's with the dangerous and accident-prone separation of Lisp code/syntax, and its appropriate environment?

But I just found that in this LtU thread, Ray Dilinger talks about exactly that. John Shutt replies (with what looks like skepticism).

I haven't read the comment and the replies carefully enough to say anything more. I just wanted to dump it here for reference. Maybe I'll come back to this idea when I have read it carefully and/or thought about it some more.

@masak
Copy link
Owner

masak commented Jan 23, 2024

Here's something I'd like to put into writing; been thinking about it for a while (some of which has leaked into above discussions).

  • Consider all the things defined in bel.bel: functions, macros, values. What my current Bel implementation does is interpret these from the source code, which is excruciatingly slow, mostly due to a kind of inefficiency introduced by many, many layers of function calls. Think of it, conceptually, as an enormous dependency pyramid of definitions, most of which are at least a few levels up from the ground, many of which are a ridiculous number of levels up.
  • So we precompile the whole thing. This makes sense. I'm still in the process of writing the compiler that will do this, but it's there is no essential difficulty here. We don't even need to worry about compilation time, because this task is an offline, one-off thing, and we can ship Bel with the resulting optimized bytecode. (Even so, I'm guessing it will be relatively fast; seconds, at most minutes.)
  • But Bel is a radically dynamic language, and built-ins can be redefined. When they are, they trigger an "invalidation" of all the downstream things that depended on the redefined thing. This is a cascading, transitive action which I think needs to be atomic/GIL'd, on pain of producing observably different results from just a pure interpreter, breaking (as it were) an implicit guarantee of compilation that it not alter semantics.
  • Of course, once some definition has changed and its downstream dependents are invalidated, what do you do? You recompile them to new fast bytecode forms and clear their "invalid" bit once more. The question is when and how.
  • Doing it synchronously (holding up all other evaluation) would be straightforward, and is maybe an option for user-defined functions (whose dependents are likely few and shallow). But it would lead to severe delays (even in the optimistic case of "seconds" above) in the case of definitions near the bottom of the pyramid (i.e. imagine redefining cdr or no). I'm fairly sure this isn't a viable option, but I'm still looking forward to confirming this in practice.
  • The diametric opposite, doing it lazily, on demand (triggering compilation only when something invalidated is called) is possibly a better idea, but can also lead to delays when a large graph of dependencies needs to be recompiled; worse, such delays happen at the worst possible time, when we know that we need to run this function and are waiting to do so.
  • Beyond those two extremes, what should actually happen depends on whether we're in the REPL or running a program.
    • If we're in the REPL, my guess is that the time between redefining a function and hitting Enter on the next evaluation is long enough (and consists mostly of free gaps between handling "UI events" on the prompt) that we can just recompile everything in the background, on a separate background thread, as it were. Someone who wanted to call an invalidated function could probably do so, but it'd be a real race against the background thread. Largely, there'd be no visible effect of recompilation at all.
    • If we're running a program, things are trickier. There is now a more severe zero-sum game between executing subsequent code and recompiling — even if we do the latter in a separate thread. Maybe in some cases we can see from the immediately subsequent code which invalidated functions will be used, and give those priority. But such an action is by its nature partial (because we can't see all dynamic events from the static perspective), and might not help much anyway. In that scenario, we need to make peace with the fact that we'll sometimes call an invalidated function.
      • What do we do in such a case? Well, we can block, of course, bringing us back to the synchronous strategy above. Again, this is straightforward, but it feels a little sad that this puts us in a situation where compilation literally introduces delays in our fast, compiled code.
      • Or we can switch over to the interpreter (or more exactly, the interpreter, supplanted with calls to all the fast bytecode that wasn't invalidated) while we recompile the relevant dependency graph in the background (force-killing any recompilation tasks which aren't involved in that dependency graph; hm, that might be too simplistic). We might either let the interpreter finish running that function or, once compilation is finished and the "invalid" bit has been clear, we could do on-stack replacement and switch the running function over to the bytecode. (And there needs to be support in the interpreter for doing that.)

Anyway, all this falls out from the interaction between two factors: supplanting interpretation with compilation (for performance) and keeping the original promise of radical mutability of everything, including built-ins.

@masak
Copy link
Owner

masak commented Aug 5, 2024

Besides being closed and not really actionable, this issue thread is also low-traffic and mostly "saturated" these days.

But I'd like to submit, for the pleasure of what dedicated readership it might still have, Manuel Simyoni's LispX, which I just discovered via this blog post of his about a topic which would be better suited for issue #569.

If you dig a bit into LispX, you will find it's built on top of Shutt's vau calculus. Which makes it fairly unique alongside klisp of actually implementing the ideas from Kernel — but unlike klisp, LispX presents a much more Common-Lispy surface language to the user. Which is kinda cool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants