Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typed Macros #1251

Merged
merged 45 commits into from
Jun 27, 2022
Merged

Typed Macros #1251

merged 45 commits into from
Jun 27, 2022

Conversation

Kha
Copy link
Member

@Kha Kha commented Jun 24, 2022

Compared to macro languages similar to Lean but with "simple" syntax structure such as Racket, optional syntax parts and slight syntactic variations are a frequent source of bugs in Lean macro programming resulting in ill-formed syntax trees.

macro "make_unary" e:term " => " f:term : command =>
  `(def foo $e := $f)  -- oops, `def` parameters do not accept `term` (while others like `fun` do)
make_unary (n : Nat) => n  -- "unknown identifier 'n'"??

macro "mk_0" id:ident : command =>
  `(def $id := 0)  -- oops, `def` actually expects a `declId`, with is an `ident` with an optional universes specifier
mk_0 bar  -- works only because it's such a frequent bug that we included a workaround :/
-- the pretty printer still complains

The most robust solution to date, apart from learning the grammar by heart, was to excessively apply antiquotation kinds like $e:term and $id:ident everywhere, which is not pleasant to read or write. However, Lean already has this handy feature called "types" (though they overdid it, I've heard) to detect ill-formed programs before running them, which this PR puts to use on the syntax manipulation level as well.

application type mismatch
  e.raw
argument
  e
has type
  TSyntax `term : Type
but is expected to have type
  TSyntax `Lean.Parser.Command.optDeclSig : Type

The first example errors with the above when defining the macro, while the second one magically constructs the correct syntax tree now via a coercion from TSyntax `ident to TSyntax `Lean.Parser.Command.declId.

Main surface-level changes

  • a new type TSyntax k that represents a syntax tree of syntax kind k
    • analogously, TSepArray k sep as well as TSyntaxArray k := Array (TSyntax k)
    • in truth, all these are parameterized by an entire list of syntax kinds ks of choices for the run-time syntax kind, though the singleton kind case is by far the most common and presented as above by a coercion and unexpander
    • comes with various helpful coercions, see Init.Meta
      • the insight that a relatively small set of these would be sufficient in practice instead of synthesizing them automatically from
        the grammar or something was the main reason I even embarked on this journey at this point!
  • quotation terms as well as antiquotations produce/accept the corresponding kinded type

Notably, the following fundamental parts did not change (yet) and still expect/produce Syntax:

  • Lean.Macro, as macros are currently unaware of the syntax category they are operating in (and mixing up syntax categories at the macro input/output boundary seems quite rare compared to errors like the above)
    • which technically refutes the PR title, but I like this name by Leo too much to get hung up on such details
  • the various elaborator types, low priority due to the same reason and expected churn in the code base
  • the discriminants of syntax match, necessarily due to the above

Notable implementation changes/refactorings

I've tried to categorize all the breakage after update-stage0 into different commits to give an idea as to what and how many changes are necessary to adapt existing code, as well as what could still be improved:

  • ddafed3 chore: changes to placate coercions
    7 files changed, 13 insertions(+), 12 deletions(-)

  • 596d0f2 refactor: make more use of quotations
    Automatically fixes many TSyntax type errors
    15 files changed, 96 insertions(+), 129 deletions(-)

  • c2fab9d chore: raw syntax kind accesses
    Sometimes just checking the kind simply is the simplest solution.
    5 files changed, 7 insertions(+), 7 deletions(-)

  • 013c815 chore: work around parameterized parser syntax kinds
    We should find a better solution for calling parameterized parsers such
    as matchAlt than these helper definitions that confuse the antiquotations.
    6 files changed, 16 insertions(+), 16 deletions(-)

  • debb002 fix: incorrect antiquotation kind annotations
    Apparently, none of them led to incorrect syntax trees, but TSyntax
    still disapproves.
    8 files changed, 33 insertions(+), 33 deletions(-)

  • ee98c91 chore: work around for type inference
    It seems type information often flows from the body to the loop
    variable, which is problematic with coercions.
    3 files changed, 3 insertions(+), 3 deletions(-)

  • 869ecdf chore: postpone TSyntax adoption for some parts
    The namespace TSyntax.Compat can be opened for a coercion
    Syntax -> TSyntax k for any k, as otherwise this PR would never be done.
    16 files changed, 25 insertions(+), 9 deletions(-)

  • e71e4ac fix: unclear TSyntax breakage
    1 file changed, 1 insertion(+), 1 deletion(-)

  • 4efeef2 refactor: adapt raw syntax manipulations to TSyntax
    Sometimes there's just no structure to work on
    15 files changed, 49 insertions(+), 44 deletions(-)

  • 5f415af refactor: remove some unnecessary antiquotation kind annotations
    These are just the ones I stumbled upon, there should be a lot more now
    rendered obsolete by the coercions.
    7 files changed, 12 insertions(+), 13 deletions(-)

  • ffdf7ae refactor: strengthen Syntax signatures
    Most notable change: Quote is now parameterized by the target kind.
    Which means that Name etc. could actually have different
    implementations for quoting into term and level, if that need ever
    arises.
    30 files changed, 162 insertions(+), 154 deletions(-)

  • 233669b chore: work around macro limitations
    It would be nice if macro was as expressive as syntax quotations, but
    right now it's not.
    4 files changed, 15 insertions(+), 11 deletions(-)

Kha added 30 commits June 24, 2022 17:58
Would be nice to generalize this at some point
It would be nice if `macro` was as expressive as syntax quotations, but
right now it's not.
Most notable change: `Quote` is now parameterized by the target kind.
Which means that `Name` etc. could actually have different
implementations for quoting into `term` and `level`, if that need ever
arises.
These are just the ones I stumbled upon, there should be a lot more now
rendered obsolete by the coercions.
Sometimes there's just no structure to work on
The namespace `TSyntax.Compat` can be opened for a coercion
`Syntax -> TSyntax k` for any `k`, as otherwise this PR would never be done.
It seems type information often flows from the body to the loop
variable, which is problematic with coercions.
@Kha
Copy link
Member Author

Kha commented Jun 24, 2022

And yes, this unapologetically breaks basically all the procedural Lean macros mentioning Syntax out there.

@tydeu
Copy link
Member

tydeu commented Jun 25, 2022

@Kha Quick short suggestion I had while pursuing the code: It would be nice to have some abbreviations for common TSyntax kinds. For example:

abbrev Term := TSyntax `term
abbrev Command := TSyntax `command
abbrev Ident := TSyntax identKind
abbrev StrLit := TSyntax strLitKind
abbrev NumLit := TSyntax numLitKind
-- ... and more

It would also save a lot of horizontal space in signatures. 😉

@Kha
Copy link
Member Author

Kha commented Jun 25, 2022

Yes, I think that's a good idea. They should probably go into Lean.Syntax, though we may want to export them into Lean.

Comment on lines 261 to 265
instance : Coe (TSyntax charLitKind) (TSyntax `term) where
coe s := ⟨s.raw⟩

instance : Coe (TSyntax numLitKind) (TSyntax `prec) where
coe s := ⟨s.raw⟩
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make these instances "safe" by moving Unhygienic into here so that we can do e.g. Unhygienic.run `(prec| $s:num)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a command that automates this construction (presumably with some unconventional parser use)?

tsyntax_coe ident : declId

-- "expands" to
instance : Coe (TSyntax `ident) (TSyntax ``declId) where
  coe s := Unhygienic.run `(declId|$s:ident)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was deliberating/dreading that idea... :)

@gebner
Copy link
Member

gebner commented Jun 25, 2022

in truth, all these are parameterized by an entire list of syntax kinds ks of choices for the run-time syntax kind, though the singleton kind case is by far the most common and presented as above by a coercion and unexpander

Can you explain what the list means? Is the syntax meant to be an element of all of the "kinds", or one of the "kinds", or something else entirely? (I'm putting "kind" in quotes here because it seems to refer to both node kinds as well as category names.)

Maybe an example for when you need a non-singleton list would help as well.

@Kha
Copy link
Member Author

Kha commented Jun 25, 2022

in truth, all these are parameterized by an entire list of syntax kinds ks of choices for the run-time syntax kind, though the singleton kind case is by far the most common and presented as above by a coercion and unexpander

Can you explain what the list means? Is the syntax meant to be an element of all of the "kinds", or one of the "kinds", or something else entirely? (I'm putting "kind" in quotes here because it seems to refer to both node kinds as well as category names.)

Maybe an example for when you need a non-singleton list would help as well.

The run-time kind may be any of the given kinds. This is mostly relevant for parsers of the shape ... (a <|> b) ..., where the quotation `(... $x ...) will bind x : TSyntax [`a, `b] (when those are the kinds produced by the parsers; a or b being a token is problematic as before). It took me quite a while to realize that

  • this is important to have if we want to keep nested uses of <|> (I encountered it quite a bit in relation to uses of the various binder parsers), and
  • this can be implemented relatively easily by relaxing the eager nature of <|> for the special case of antiquotations, without affecting semantics or performance of the regular cases. Now I wonder if we can apply this principle to antiquotations in other places as well, e.g. to improve $[ splice parsing over the current "all or nothing" semantics resulting from a hidden atomic(...).

@Kha
Copy link
Member Author

Kha commented Jun 25, 2022

An example for a choice of kinds and where coercions are doing a lot of work behind the secnes to translate syntax between contexts:

    let structInstFields ← letIdDecls.mapM fun
      ...
        `(structInstField|$id:ident := $val)
    let body ← `({ $structInstFields,* })

Without the coercion from TSyntax [k] to TSyntax (k :: _):

argument
  structInstFields
has type
  Array (TSyntax `Lean.Parser.Term.structInstField) : Type
but is expected to have type
  Syntax.TSepArray [`Lean.Parser.Term.structInstFieldAbbrev, `Lean.Parser.Term.structInstField] "," : Type

@gebner
Copy link
Member

gebner commented Jun 25, 2022

The run-time kind may be any of the given kinds. This is mostly relevant for parsers of the shape ... (a <|> b) ..., where the quotation `(... $x ...) will bind x : TSyntax [`a, `b] (when those are the kinds produced by the parsers; a or b being a token is problematic as before).

Thank you for the explanation! Now the list makes sense.

@Kha
Copy link
Member Author

Kha commented Jun 25, 2022

  • Now I wonder if we can apply this principle to antiquotations in other places as well, e.g. to improve $[ splice parsing over the current "all or nothing" semantics resulting from a hidden atomic(...).

Thinking further on this, relaxing withAntiquot antiP p semantics to "longest match of antiP and p" would allow us to write

`(structInstField| $id := $val)

without the need for $id:ident to placate the parser.

Going further, since longestMatch builds choice nodes by default, this would mean that in def $id we would actually have id : TSyntax [`ident, ``declId], reducing the need for coercions. In quotation patterns, we should probably keep id : TSyntax ``declId however as the most general kind if that asymmetry is acceptable. And for categories we should definitely still rely on coercions. This would require the quotation elaborator to choose the corresponding choice alternative based on the run time kind of id, though we may want to do that in any case https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/Typed.20Macro.20Issues.3F/near/287989131.

@Kha
Copy link
Member Author

Kha commented Jun 25, 2022

!bench

@leanprover-bot
Copy link
Collaborator

Here are the benchmark results for commit 5b8d558.
There were significant changes against commit 220d2e3:

  Benchmark   Metric         Change
  ============================================
- stdlib      instructions       3% (1088.0 σ)
- stdlib      task-clock         3%  (722.9 σ)
- stdlib      wall-clock         3%   (33.6 σ)
- unionfind   instructions       1%  (147.4 σ)

src/Lean/Level.lean Outdated Show resolved Hide resolved
@Kha
Copy link
Member Author

Kha commented Jun 27, 2022

We're intending to merge this today unless there are objections. tsyntax_coe would be a nice-to-have, but it doesn't save all that much typing and I would first like to see how many coercions people are adding in any case.
Note that there will not be a nightly anyway until the Windows CI is fixed...

@Kha Kha merged commit 19c2644 into leanprover:master Jun 27, 2022
@Kha Kha deleted the tsyntax branch June 27, 2022 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants