-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bootstrap syntax-case and quasiquote using racket #115
base: main
Are you sure you want to change the base?
Conversation
Turns out (raw-syntax-case stx), with zero cases to match on, does not fail at runtime with an error message complaining that stx doesn't match any of its (zero) cases, but instead fails at compile time because that's invalid syntax for raw-syntax-case.
thus making is possible to define multiple macros which share the same keywords.
not on an intermediate-stx. an intermediate-stx is a piece of Klister code, typically the variable name 'raw-stx, which the generated code will match on or copy its location.
previously, intermediate-define-syntax was silently introducing a variable named raw-stx, and intermediate-quasiquote was silently assuming its existence. no more! it is now bound and passed along explicitly.
previously, intermediate-syntax-case was generating a `(...) expression, and thus right-hand-sides could use ,(...) expressions to splice some code. that was strange because those ,(...) were not lexically surrounded by `(...). Now each right-hand-side is a Racket expression, not a Klister expression, and must thus use `(...) itself it it wants to return a Klister expression.
eliminate a lot of mind-bending complexity like ,#,(...) blocks, at the cost of slightly more verbose call sites like (generate-syntax-case 'my-macro 'raw-stx (list 'keyword) (list (cons '() 'rhs1) (cons '((a b) (c d)) 'rhs2) (cons '(keyword tail ...) 'rhs3))) intead of (intermediate-syntax-case (my-macro raw-stx) (keyword) [() 'rhs1] [((a b) (c d)) 'rhs2] [(keyword tail ...) 'rhs3])
I thought comparing an identifier with a number would give False, but instead it fails with a runtime error.
this will be useful when implementing fancy-quasiquote
there is only one define-macros and one stx, no need to distinguish intermediate-stx from raw-stx from racket-stx.
not the loc of the implementation of quasiquote
the old quasiquote didn't support splicing-in lists, so we had to use cons-list-syntax instead.
given a list of N+1 elements, it assigns the first N to xs and the last one to x. This is convenient, but it doesn't generalize well: what if we had (,@(list xs ...) ,@(list ys ...)), which fraction should be assigned to which side? for this reason, generate-syntax-parse doesn't support this kind of pattern, and thus we cannot use it to implement fancy-syntax-parse. since generate-syntax-parse is used as a template demonstrating how to implement such a function, I've updated generated-syntax-parse to use the more reasonable pattern (x ,@xs).
it is easy to emulate using generate-quasiquote-inside, and it is not nearly as useful as generate-quasiquote-inside.
there's only one function left, might as well give it the simpler name.
simplifies code-generation. another alternative would have been to define, for each Klister form, a Racket macro which mimics that form and outputs racket code which constructs the Klister code for that form by taking Racket expressions producing the code for the arguments and splicing the results in the generated code for the form. That would require work proportional to the number of syntax forms in Klister, whereas the auto-splice approach only requires work proportional to the number of generate- functions defined in bootstrap.rkt, which is currently much smaller.
"fancy-syntax-case" is only used in the implementation, to distinguish between the many different variants of syntax-case in bootstrap.rkt; no need to show that name to the user.
raw-syntax-case only supported shallow pattern-matching, thus requiring us to write several nested raw-syntax-case calls in order to destructure the input. with fancy-syntax-case, this can now be done in a single call.
I care more about the idea than about the two macros. In particular, it's probably a good idea to provide nested patterns to the user in a different way, e.g. by providing functions converting between a |
Reviewers, don't be afraid of the |
Note that this PR introduces a huge performance regression: the test suite used to pass in 16 seconds, but now it takes 63 seconds! I think this is partially because the generated |
; [_ (failure-cc)]))] | ||
; (failure-cc))))) | ||
(define (generate-syntax-case macro-name stx-expr keywords cases) | ||
(letrec ([stx-name (gensym 'stx-)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to use gensym
a lot. One reason is that when the generated code is written to a file, only the names of the symbols are written; if there was any in-memory information about scope, such as which macro invocation introduced which name, that information is lost.
(match guard-rhs | ||
[`(,guard ,rhs) | ||
`(>>= ,guard | ||
(lambda (guard-approves) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically I should be using gensym
a lot more, for this guard-approves
for example. However, generate-syntax-case
is only used in this file, so the set of identifiers with which we might clash is known and finite, so for most symbols, I did not bother to use gensym
. Even lambda
and >>=
could theoretically be shadowed, so using gensym
everywhere would make the code a lot less readable.
stx) | ||
stx))] | ||
[(,pat-head ,pat-tail ...) | ||
(>>= (make-temporary 'tail) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I might not have the right approach to writing macros. I don't like writing my recursive calls by generating code which calls my macro again, because that requires encoding all the information required by that recursive call in a syntax object. I find it much clearer to write recursive functions like fancy-case
and fancy-cases
which simply take ordinary values as input and return ordinary values as output, and then to have my macro call those functions in order to produce the final syntactic object. Unfortunately, I have found that this approach has a disadvantage: those helper functions all run in the same macro context, and so all the identifiers they manipulate have the same scope! Hygiene is thus not helping me here, and I have to call make-temporary
(the Klister version of gensym
). Is there a third alternative?
([fancy-unquote (lambda (stx) (syntax-error '"unquote used out of context" stx))] | ||
[fancy-... (lambda (stx) (syntax-error '"... used out of context" stx))] | ||
[fancy-_ (lambda (stx) (syntax-error '"_ used out of context" stx))] | ||
[fancy-syntax-case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definition of fancy-syntax-case
is almost the same as the definition of generate-syntax-case
, except is using the Klister names for things instead of the Racket names. I considered writing a macro which would allow me to write the code once in some language-agnostic way and then specialize it to both Racket and Klister. I'm still not sure if that would make the code simpler or more complicated.
('"syntax-case: the input" | ||
,stx-name | ||
'"does not match any of the following patterns" | ||
',(map car cases)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some effort on the quality of my error messages. Most of them are for when syntax-case
itself is misused, but I'm especially proud of this one: if you define a macro using fancy-syntax-case, the callers of that macro also get a nice error message if they misuse that macro!
[(_ ,pat) | ||
(let [stx-name (generate-quasiquote | ||
',(replace-loc pat 'here) | ||
'here)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That incantation from quasiquote.kl
allowed me to get error messages to point to the callers fancy-quasiquote
rather than to the source code of fancy-quasiquote
, but I don't fully understand the magic behind the incantation.
(pure `(define-macros | ||
([,macro-name | ||
(lambda (stx) | ||
(raw-syntax-case stx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still using raw-syntax-case
here so that I don't have to generate a variant of args
in which every argument is wrapped in an unquote
. I guess that's one small downside of fancy-syntax-case
's unorthodox syntax.
(replace-identifier arg `(,'unquote ,arg) t)) | ||
template | ||
args)) | ||
(quasiquoted-template <- (pure `(,'quasiquote ,unquoted-template))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(pure ``,unquoted-template)
would have worked too, but only because I don't implement the R6RS spec for nested quotations, according to which the result of
(let ([x 'foo])
``,x)
should be
`,x
not
`foo
([,macro-name | ||
(lambda (stx) | ||
(raw-syntax-case stx | ||
[(list (_ ,args ...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, I am using raw-syntax-case
in order to avoid generating a variant of args
in which every argument is wrapped in an unquote
. In this case, however, it wouldn't have been a big deal to do so, because I am already generating a variant of template
in which every argument is wrapped in an unquote
.
(import (shift "let.kl" 1)) | ||
(import (shift "dot-dot-dot.kl" 1)) | ||
(import (shift "do-keywords.kl" 1)) | ||
(import "do-keywords.kl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These last two imports are a symptom of something really sad about fancy-syntax-case
: keywords must be visible both at the macro-definition phase and at the macro-caller's phase. This can either be done by defining them twice (once with meta
) or by defining them in a separate module and them importing that module twice. Either way, it's really annoying.
The reason they need to be available in the macro-definition phase is because fancy-syntax-case
uses free-identifier=?
to check that every keyword used in the patterns have been listed in the keyword list. But if the keyword isn't in scope, then the occurrence of the keyword in the keyword list is not free-identifier=?
to the occurrence in the pattern, resulting in spurious did you mean to add the symbol to the keyword list?
errors.
The reason they need to be available in the macro-caller's phase is because the generated code uses free-identifier=?
to compare the input syntax object against the various patterns, including against keywords. So keywords must be in scope in the phase in which the generated code runs or they won't be free-identifier=?
.
One solution could be to drop that keyword list, and to trust that fancy-syntax-case
's caller doesn't accidentally write [(_ cond then-expr else-expr) ...]
instead of [(_ ,cond ,then-expr ,else-expr) ...]
. But since this is an unorthodox syntax, I expect that mistake to be quite common!
My preferred solution would be to add a new primitive, define-keyword
, which would bind an identifier at all phases. It would hardcode the value to which it is bound to a macro complaining that the keyword is used out of context, and thus we won't have to worry about things like whether the closure of the definition includes values which only exist at some phases. Would that make sense?
(lambda (x) | ||
(syntax-case x | ||
[(ident x) (true)] | ||
[_ (false)]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few modules were defining this same helper function, so I moved it to its own module so I could use it in fancy-syntax-case
as well.
(syntax-case stx () | ||
[(_ ,scrut ,cases ...) | ||
(pure (replace-loc stx | ||
`(let (x ,scrut) (free-identifier-case-aux x ,cases ...))))]))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I slightly tweaked the syntax of free-identifier-case-aux
: a call now looks like
(free-identifier-case-aux (just 3)
[(just x) x]
[(nothing) 0])
instead of
(free-identifier-case-aux (just 3)
([(just x) x]
[(nothing) 0]))
I suspect the reason for these seemingly-spurious extra parentheses is simply that it was not easy to construct the recursive call using quasiquote/loc
. Since fancy-quasiquote
now makes it easy, I removed the spurious parentheses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of quasiquote/loc
: I considered writing a fancy-quasiquote/loc
around fancy-quasiquote
, but since fancy-quasiquote
can be invoke very easily using a single backtick character, there didn't seem to be much value over simply calling replace-loc
around the result of a fancy-quasiquote
call.
(syntax-case stx (else) | ||
[(_ ,scrut) | ||
(pure '(syntax-error '"Nothing matched"))] | ||
[(_ ,scrut ((else ,x) ,val)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note how much simpler it now is to match on the else
keyword!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speaking of the else
keyword, I didn't have to define it in a separate module and then import it twice because else
is provided by prelude.kl
.
[(cons car cdr) | ||
(cons-list-syntax car | ||
(append-list-syntax cdr tail stx) | ||
stx)])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these two new helper functions in order to make it easier to write the code-generation code. I'm not using anything else from this module, but this seemed like the best place to put them.
(meta | ||
(define identifier? | ||
(lambda (stx) | ||
(syntax-case stx [(ident x) (true)] [_ (false)])))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to identifier.kl
.
[(cons x xs) (:: x (syntax->list xs))]))) | ||
|
||
(meta | ||
(define else-case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else-case
is no longer needed, it's much easier to use a nested pattern on the else
keyword.
[(,x ,xs ...) | ||
(>>= (syntax->list xs) | ||
(lambda (list) | ||
(pure (:: x list))))]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definition of syntax->list
is longer than before... and it has a different type! That's because while raw-syntax-case
is a pure expression, fancy-syntax-case
runs in the Macro
monad, so that we can use free-identifier=?
to compare identifiers to keywords.
[(:: id-and-stx rest) | ||
`(ppat ,(fst id-and-stx) ,(snd id-and-stx) ,(combine rest) ,kf)])) | ||
(pure `(case ,tgt | ||
[,(cons-list-syntax what (list->syntax temp-names args) args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the nested calls to raw-syntax-case
, we no longer have the name pat
to refer to the (,what ,args ...)
part of the input. So I used args
instead. I don't think this matters much?
@langston-barrett as discussed, this is the branch in which I am trying to define a variant of The reason I want to delete it is because it is needlessly repetitive, so I want to try a different approach. @david-christiansen suggested defining a Racket https://github.com/gelisam/klister/blob/d2f9680d27a1e39d3df5012091cdecf37972c09f/bootstrap/README.md |
This PR provides two things: an idea for bootstrapping complex macros, and an example implementation of two such macros. The two macros are:
syntax-case
variant supporting keywords (e.g.else
below), nested patterns, and matching on the rest of the list using,xs ...
.quasiquote
which also supports,xs ...
for splicing-in a list.Here is an example demonstrating those features:
Next, the idea. Theoretically it should be possible to implement many quality-of-life improvements as libraries, e.g. by writing a Klister macro implementing a
fancy-syntax-case
which supports nested patterns in terms of a primitiveraw-syntax-case
which does not. In practice, however, the lack of nested patterns makesraw-syntax-case
so inconvenient that it makes that task very difficult. My idea is to make that task easier is to generate the Klister code which implementsfancy-syntax-case
in terms ofraw-syntax-case
, and to write the code-generation tool in a more convenient language which does support nested patterns and more.I chose Racket. Note that the task is now to generate the code for a Klister macro, it is not to define a Racket macro. Nevertheless, Racket's macros were very helpful, as I wrote myself a DSL making it easy to generate Klister code. The DSL looks like regular Klister code, augmented a with magic
generate-syntax-case
primitive which makes it look like Klister already supports nested pattern-matching.