Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bootstrap syntax-case and quasiquote using racket #115

Open
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

gelisam
Copy link
Owner

@gelisam gelisam commented Oct 18, 2020

This PR provides two things: an idea for bootstrapping complex macros, and an example implementation of two such macros. The two macros are:

  1. a syntax-case variant supporting keywords (e.g. else below), nested patterns, and matching on the rest of the list using ,xs ....
  2. a variant of quasiquote which also supports ,xs ... for splicing-in a list.

Here is an example demonstrating those features:

(define-macros
  ([multiway-if
    (lambda (stx)
      (syntax-case stx (else)
        [(_ [else ,e])
         (pure e)]
        [(_ [,cond ,e] ,cases ...)
         (pure `(if cond e (multiway-if ,cases ...)))]))]))

Next, the idea. Theoretically it should be possible to implement many quality-of-life improvements as libraries, e.g. by writing a Klister macro implementing a fancy-syntax-case which supports nested patterns in terms of a primitive raw-syntax-case which does not. In practice, however, the lack of nested patterns makes raw-syntax-case so inconvenient that it makes that task very difficult. My idea is to make that task easier is to generate the Klister code which implementsfancy-syntax-case in terms of raw-syntax-case, and to write the code-generation tool in a more convenient language which does support nested patterns and more.

I chose Racket. Note that the task is now to generate the code for a Klister macro, it is not to define a Racket macro. Nevertheless, Racket's macros were very helpful, as I wrote myself a DSL making it easy to generate Klister code. The DSL looks like regular Klister code, augmented a with magic generate-syntax-case primitive which makes it look like Klister already supports nested pattern-matching.

Turns out (raw-syntax-case stx), with zero cases to match on, does not
fail at runtime with an error message complaining that stx doesn't match
any of its (zero) cases, but instead fails at compile time because
that's invalid syntax for raw-syntax-case.
thus making is possible to define multiple macros which share the same
keywords.
not on an intermediate-stx. an intermediate-stx is a piece of Klister
code, typically the variable name 'raw-stx, which the generated code
will match on or copy its location.
previously, intermediate-define-syntax was silently introducing a
variable named raw-stx, and intermediate-quasiquote was silently
assuming its existence. no more! it is now bound and passed along
explicitly.
previously, intermediate-syntax-case was generating a `(...) expression,
and thus right-hand-sides could use ,(...) expressions to splice some
code. that was strange because those ,(...) were not lexically
surrounded by `(...). Now each right-hand-side is a Racket expression,
not a Klister expression, and must thus use `(...) itself it it wants to
return a Klister expression.
eliminate a lot of mind-bending complexity like ,#,(...) blocks, at the
cost of slightly more verbose call sites like

    (generate-syntax-case 'my-macro 'raw-stx (list 'keyword)
      (list
        (cons '()
              'rhs1)
        (cons '((a b) (c d))
              'rhs2)
        (cons '(keyword tail ...)
              'rhs3)))

intead of

    (intermediate-syntax-case (my-macro raw-stx) (keyword)
      [()
       'rhs1]
      [((a b) (c d))
       'rhs2]
      [(keyword tail ...)
       'rhs3])
I thought comparing an identifier with a number would give False, but
instead it fails with a runtime error.
this will be useful when implementing fancy-quasiquote
there is only one define-macros and one stx, no need to distinguish
intermediate-stx from raw-stx from racket-stx.
not the loc of the implementation of quasiquote
the old quasiquote didn't support splicing-in lists, so we had to use
cons-list-syntax instead.
given a list of N+1 elements, it assigns the first N to xs and the last
one to x. This is convenient, but it doesn't generalize well: what if we
had (,@(list xs ...) ,@(list ys ...)), which fraction should be assigned
to which side? for this reason, generate-syntax-parse doesn't support
this kind of pattern, and thus we cannot use it to implement
fancy-syntax-parse. since generate-syntax-parse is used as a template
demonstrating how to implement such a function, I've updated
generated-syntax-parse to use the more reasonable pattern (x ,@xs).
it is easy to emulate using generate-quasiquote-inside, and it is not
nearly as useful as generate-quasiquote-inside.
there's only one function left, might as well give it the simpler name.
simplifies code-generation. another alternative would have been to
define, for each Klister form, a Racket macro which mimics that form and
outputs racket code which constructs the Klister code for that form by
taking Racket expressions producing the code for the arguments and
splicing the results in the generated code for the form. That would
require work proportional to the number of syntax forms in Klister,
whereas the auto-splice approach only requires work proportional to the
number of generate- functions defined in bootstrap.rkt, which is
currently much smaller.
"fancy-syntax-case" is only used in the implementation, to distinguish
between the many different variants of syntax-case in bootstrap.rkt; no
need to show that name to the user.
raw-syntax-case only supported shallow pattern-matching, thus requiring
us to write several nested raw-syntax-case calls in order to destructure
the input. with fancy-syntax-case, this can now be done in a single
call.
@gelisam
Copy link
Owner Author

gelisam commented Oct 18, 2020

I care more about the idea than about the two macros. In particular, it's probably a good idea to provide nested patterns to the user in a different way, e.g. by providing functions converting between a Syntax and an ADT representation of a syntax object, as @david-christiansen suggested. But I think this Racket DSL will provide a good foundation for tackling any future bootstrapping issues we might encounter.

@gelisam
Copy link
Owner Author

gelisam commented Oct 18, 2020

Reviewers, don't be afraid of the +1,818 −218 summary, the actual diff is much smaller! The main addition is the +507 of bootstrap.rkt, plus +23 for two new helper functions in list-syntax.kl. The rest consists of +1,072 of generated code (I don't think bookstrap.rkt will change very often, so I think it's better to include the generated examples/dot-dot-dot.kl in the repository in order to avoid making Racket an extra requirement for building Klister), and the remaining +216 -507 consists of simplifications to other .kl files which previously had to use nested calls to syntax-case but can now use nested patterns.

@gelisam
Copy link
Owner Author

gelisam commented Oct 18, 2020

Note that this PR introduces a huge performance regression: the test suite used to pass in 16 seconds, but now it takes 63 seconds! I think this is partially because the generated dot-dot-dot.kl file is relatively large (1 KLOC) and takes 1.5 seconds to load. Multiplied by the 16 files which import it directly or indirectly, this accounts for half of the slowdown. I guess the other half is because fancy-syntax-case does more work than raw-syntax-case? Or maybe it generates code which does more work?

; [_ (failure-cc)]))]
; (failure-cc)))))
(define (generate-syntax-case macro-name stx-expr keywords cases)
(letrec ([stx-name (gensym 'stx-)]
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to use gensym a lot. One reason is that when the generated code is written to a file, only the names of the symbols are written; if there was any in-memory information about scope, such as which macro invocation introduced which name, that information is lost.

(match guard-rhs
[`(,guard ,rhs)
`(>>= ,guard
(lambda (guard-approves)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically I should be using gensym a lot more, for this guard-approves for example. However, generate-syntax-case is only used in this file, so the set of identifiers with which we might clash is known and finite, so for most symbols, I did not bother to use gensym. Even lambda and >>= could theoretically be shadowed, so using gensym everywhere would make the code a lot less readable.

stx)
stx))]
[(,pat-head ,pat-tail ...)
(>>= (make-temporary 'tail)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might not have the right approach to writing macros. I don't like writing my recursive calls by generating code which calls my macro again, because that requires encoding all the information required by that recursive call in a syntax object. I find it much clearer to write recursive functions like fancy-case and fancy-cases which simply take ordinary values as input and return ordinary values as output, and then to have my macro call those functions in order to produce the final syntactic object. Unfortunately, I have found that this approach has a disadvantage: those helper functions all run in the same macro context, and so all the identifiers they manipulate have the same scope! Hygiene is thus not helping me here, and I have to call make-temporary (the Klister version of gensym). Is there a third alternative?

([fancy-unquote (lambda (stx) (syntax-error '"unquote used out of context" stx))]
[fancy-... (lambda (stx) (syntax-error '"... used out of context" stx))]
[fancy-_ (lambda (stx) (syntax-error '"_ used out of context" stx))]
[fancy-syntax-case
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition of fancy-syntax-case is almost the same as the definition of generate-syntax-case, except is using the Klister names for things instead of the Racket names. I considered writing a macro which would allow me to write the code once in some language-agnostic way and then specialize it to both Racket and Klister. I'm still not sure if that would make the code simpler or more complicated.

('"syntax-case: the input"
,stx-name
'"does not match any of the following patterns"
',(map car cases))
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some effort on the quality of my error messages. Most of them are for when syntax-case itself is misused, but I'm especially proud of this one: if you define a macro using fancy-syntax-case, the callers of that macro also get a nice error message if they misuse that macro!

[(_ ,pat)
(let [stx-name (generate-quasiquote
',(replace-loc pat 'here)
'here)]
Copy link
Owner Author

@gelisam gelisam Oct 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That incantation from quasiquote.kl allowed me to get error messages to point to the callers fancy-quasiquote rather than to the source code of fancy-quasiquote, but I don't fully understand the magic behind the incantation.

(pure `(define-macros
([,macro-name
(lambda (stx)
(raw-syntax-case stx
Copy link
Owner Author

@gelisam gelisam Oct 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still using raw-syntax-case here so that I don't have to generate a variant of args in which every argument is wrapped in an unquote. I guess that's one small downside of fancy-syntax-case's unorthodox syntax.

(replace-identifier arg `(,'unquote ,arg) t))
template
args))
(quasiquoted-template <- (pure `(,'quasiquote ,unquoted-template)))
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(pure ``,unquoted-template)

would have worked too, but only because I don't implement the R6RS spec for nested quotations, according to which the result of

(let ([x 'foo])
  ``,x)

should be

`,x

not

`foo

([,macro-name
(lambda (stx)
(raw-syntax-case stx
[(list (_ ,args ...))
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, I am using raw-syntax-case in order to avoid generating a variant of args in which every argument is wrapped in an unquote. In this case, however, it wouldn't have been a big deal to do so, because I am already generating a variant of template in which every argument is wrapped in an unquote.

(import (shift "let.kl" 1))
(import (shift "dot-dot-dot.kl" 1))
(import (shift "do-keywords.kl" 1))
(import "do-keywords.kl")
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These last two imports are a symptom of something really sad about fancy-syntax-case: keywords must be visible both at the macro-definition phase and at the macro-caller's phase. This can either be done by defining them twice (once with meta) or by defining them in a separate module and them importing that module twice. Either way, it's really annoying.

The reason they need to be available in the macro-definition phase is because fancy-syntax-case uses free-identifier=? to check that every keyword used in the patterns have been listed in the keyword list. But if the keyword isn't in scope, then the occurrence of the keyword in the keyword list is not free-identifier=? to the occurrence in the pattern, resulting in spurious did you mean to add the symbol to the keyword list? errors.

The reason they need to be available in the macro-caller's phase is because the generated code uses free-identifier=? to compare the input syntax object against the various patterns, including against keywords. So keywords must be in scope in the phase in which the generated code runs or they won't be free-identifier=?.

One solution could be to drop that keyword list, and to trust that fancy-syntax-case's caller doesn't accidentally write [(_ cond then-expr else-expr) ...] instead of [(_ ,cond ,then-expr ,else-expr) ...]. But since this is an unorthodox syntax, I expect that mistake to be quite common!

My preferred solution would be to add a new primitive, define-keyword, which would bind an identifier at all phases. It would hardcode the value to which it is bound to a macro complaining that the keyword is used out of context, and thus we won't have to worry about things like whether the closure of the definition includes values which only exist at some phases. Would that make sense?

(lambda (x)
(syntax-case x
[(ident x) (true)]
[_ (false)])))
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few modules were defining this same helper function, so I moved it to its own module so I could use it in fancy-syntax-case as well.

(syntax-case stx ()
[(_ ,scrut ,cases ...)
(pure (replace-loc stx
`(let (x ,scrut) (free-identifier-case-aux x ,cases ...))))]))]
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I slightly tweaked the syntax of free-identifier-case-aux: a call now looks like

(free-identifier-case-aux (just 3)
  [(just x) x]
  [(nothing) 0])

instead of

(free-identifier-case-aux (just 3)
  ([(just x) x]
   [(nothing) 0]))

I suspect the reason for these seemingly-spurious extra parentheses is simply that it was not easy to construct the recursive call using quasiquote/loc. Since fancy-quasiquote now makes it easy, I removed the spurious parentheses.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of quasiquote/loc: I considered writing a fancy-quasiquote/loc around fancy-quasiquote, but since fancy-quasiquote can be invoke very easily using a single backtick character, there didn't seem to be much value over simply calling replace-loc around the result of a fancy-quasiquote call.

(syntax-case stx (else)
[(_ ,scrut)
(pure '(syntax-error '"Nothing matched"))]
[(_ ,scrut ((else ,x) ,val))
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note how much simpler it now is to match on the else keyword!

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speaking of the else keyword, I didn't have to define it in a separate module and then import it twice because else is provided by prelude.kl.

[(cons car cdr)
(cons-list-syntax car
(append-list-syntax cdr tail stx)
stx)]))
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these two new helper functions in order to make it easier to write the code-generation code. I'm not using anything else from this module, but this seemed like the best place to put them.

(meta
(define identifier?
(lambda (stx)
(syntax-case stx [(ident x) (true)] [_ (false)]))))
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to identifier.kl.

[(cons x xs) (:: x (syntax->list xs))])))

(meta
(define else-case
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else-case is no longer needed, it's much easier to use a nested pattern on the else keyword.

[(,x ,xs ...)
(>>= (syntax->list xs)
(lambda (list)
(pure (:: x list))))])))
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition of syntax->list is longer than before... and it has a different type! That's because while raw-syntax-case is a pure expression, fancy-syntax-case runs in the Macro monad, so that we can use free-identifier=? to compare identifiers to keywords.

[(:: id-and-stx rest)
`(ppat ,(fst id-and-stx) ,(snd id-and-stx) ,(combine rest) ,kf)]))
(pure `(case ,tgt
[,(cons-list-syntax what (list->syntax temp-names args) args)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the nested calls to raw-syntax-case, we no longer have the name pat to refer to the (,what ,args ...) part of the input. So I used args instead. I don't think this matters much?

Base automatically changed from master to main January 12, 2021 23:13
@gelisam
Copy link
Owner Author

gelisam commented Mar 9, 2021

@langston-barrett as discussed, this is the branch in which I am trying to define a variant of syntax-case which supports deep patterns. There is a lot of code in this branch already, but most of it is just to change the existing syntax-case calls to use the version which supports deep patterns, and the rest is some Racket code which I plan to delete anyway.

The reason I want to delete it is because it is needlessly repetitive, so I want to try a different approach. @david-christiansen suggested defining a Racket #lang which implements the Klister syntax, but during our call I said that I could not remember how that would help. I am happy to say that I have now figured out how it would help. To make sure I don't forget again, I wrote a document explaining my plan in detail:

https://github.com/gelisam/klister/blob/d2f9680d27a1e39d3df5012091cdecf37972c09f/bootstrap/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant