Pattern matching implementation #33

haxscramper · 2020-10-25T12:24:46Z

Pattern matching implementation as described in nim-lang/RFCs#245

WIP article on how to use this for writing macros.

Changed: - Update unit tests - add more examples from documentation, fix edge cases and test agains different allowed variants of the pattern - Check for consistensy of documentation and implementation - Simplify some code in implementation - Remove exports from certain implementation details

Add unit tests from gara package - https://github.com/alehander92/gara/blob/master/tests/test_gara.nim

Add more tests for use in generics Right now use of `case` statement inside of generics fails, if identifiers like `_` are used in body. There are two workarounds right now: first one is to use `expand` macro that forces evaluation of the body, so you write things like ```nim expand case a: of [_]: discard else: fail() ``` Second option is adding `mixin` statement for each unknown symbol, like this: ```nim mixin _ case a: of [_]: discard else: fail() ``` This can be done automatically, but I don't know of a way to determine if a macro has been expanded inside of generic function or not (since use of `mixin` statement outside of procs/functions/methods is an error)

haxscramper · 2020-10-25T12:29:01Z

Right now use of case statement inside of generics fails, if identifiers like _ are used in body. There are two workarounds right now: first one is to use expand macro that forces evaluation of the body, so you write things like

expand case a:
  of [_]: discard
  else: fail()

Second option is adding mixin statement for each unknown symbol, like this:

mixin _
case a:
  of [_]: discard
  else: fail()

This can be done automatically, but I don't know of a way to determine if a macro has been expanded inside of generic function or not (since use of mixin statement outside of procs/functions/methods is an error)

src/fusion/matching.rst

@@ -0,0 +1,293 @@
+.. raw:: html
+  <blockquote><p>
+  "you can probably make a macro for that" -- Rika, 22-09-2020 10:41:51


juancarlospaco · 2020-10-27T15:12:32Z

Needs changelog.

Changed type naming in unit tests to account for differences in codegen between nim 1.0.0 and 1.2.0+

If you try to compiler code for `tMatching.nim` it will fail with regular type mismatch error - right now I'm figuring out how to properly handle things like `fld[0].len: < 10`. I think a lot of people would expect to write something like that and that it would work, so I'm adding it. Not the point though. So test suite errors out with type mismatch error - just no overload for `<`, nothing interesting. But before that, I used default `echo result.toStrLit()` to show compilation results - but I *cannot* see them *at all*, since there is a approximately ~2 screens worth of text (21") dumped by compiler on overload. So shit looks like this: let expr_23990001 = val3 ((((((expr_23990001.hello3[0][0] < 10)))))) /home/test/workspace/git-sandbox/fusion/tests/tMatching.nim(24, 7) template/generic instantiation of `suite` from here /home/test/workspace/git-sandbox/fusion/tests/tMatching.nim(783, 8) template/generic instantiation of `test` from here /home/test/workspace/git-sandbox/fusion/tests/tMatching.nim(802, 35) Error: type mismatch: got <seq[string], int literal(10)> but expected one of: proc `<`(x, y: bool): bool first type mismatch at position: 1 required type for x: bool but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`(x, y: char): bool first type mismatch at position: 1 required type for x: char but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`(x, y: float): bool first type mismatch at position: 1 required type for x: float but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`(x, y: float32): bool first type mismatch at position: 1 required type for x: float32 but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`(x, y: int): bool first type mismatch at position: 1 required type for x: int but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`(x, y: int16): bool first type mismatch at position: 1 required type for x: int16 but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`(x, y: int32): bool first type mismatch at position: 1 required type for x: int32 but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`(x, y: int64): bool first type mismatch at position: 1 required type for x: int64 but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`(x, y: int8): bool first type mismatch at position: 1 required type for x: int8 but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`(x, y: pointer): bool first type mismatch at position: 1 required type for x: pointer but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`(x, y: string): bool first type mismatch at position: 1 required type for x: string but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`(x, y: uint): bool first type mismatch at position: 1 required type for x: uint but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`(x, y: uint16): bool first type mismatch at position: 1 required type for x: uint16 but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`(x, y: uint32): bool first type mismatch at position: 1 required type for x: uint32 but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`(x, y: uint64): bool first type mismatch at position: 1 required type for x: uint64 but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`(x, y: uint8): bool first type mismatch at position: 1 required type for x: uint8 but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`[Enum: enum](x, y: Enum): bool first type mismatch at position: 1 required type for x: Enum: enum but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`[T: tuple](x, y: T): bool first type mismatch at position: 1 required type for x: T: tuple but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`[T](x, y: ptr T): bool first type mismatch at position: 1 required type for x: ptr T but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`[T](x, y: ref T): bool first type mismatch at position: 1 required type for x: ref T but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] proc `<`[T](x, y: set[T]): bool first type mismatch at position: 1 required type for x: set[T] but expression 'expr_23990001.hello3[0][0]' is of type: seq[string] expression: expr_23990001.hello3[0][0] < 10 If you scrolled past that - good for you. Very readable, indeed

Support Dot/Bracket expression for field access in pattern matching checks. `(fld.len: < 10)` is a short way of writing `(fld: (len: < 10))`. Less visual clutter and more intuitive.

haxscramper · 2020-10-30T23:10:23Z

Previous CI failure was related to C codegen differences between nim 1.0.0 and 1.2.0+ and happened only because I didn't account for it in tests.

In addition to fixing this I also added support for (field.subfield: <pattern>) syntax - previously such construction had to be written using (field: (subfield: <pattern>)) which was not really intuitive and quire bothersome to write. This addition is just purely for ergonomics/syntactic sugar.

I think that implementation is mostly feature-complete (almost all things described in RFC are covered, with some minor adjustments and several additions).

Things left are mostly related to better documentation and defensive programming of the DSL (test against more possible use cases & provide better error messages in case of malformed code).

When view types become non-experimental it might be possible to match objects using them.

Added support for more detailed exception messages on `assertMatch` and `:=`. Not all matching expressions are currently supported, but that is just a matter of implementing support for possible cases.

Removed support for `{kind1, kind2}` for `hasKind` pattern. Added checking for correct identifier name - now invalid kind name will be detected by DSL and not inside of macro-generated code.

Added quick syntax reference to documentation, added more unit tests (for new documentation examples and more involved generic ref object). Added better error messages for undefined fields/procs for `makeTree`

Add more tests. Use `{.line.}` pragma to generate correct backtraces for failed match exceptions.

haxscramper · 2020-11-05T19:46:47Z

All fixes, additions and features are finally implemented, so now I need someone to review this.

I've significantly improved failed match exceptions messages, position for backtraces (using {.line.}), added better DSL compilation checks and added more tests. No new functionality was added.

haxscramper · 2020-11-06T12:36:52Z

There were several things that I considered, but ultimately decided to not implement, or postpone until after the current version is accepted

Support for capturing values into existing variables.

It could be done by adding yet another syntax - @=, to explicitly declare that the existing variable should be reused. The another alternative is to check declared and then for typeof equality, but this is even less intuitive. So I decided it is not worth it, and you can just capture into a new variable and then add/incl/= to existing one however you like.

Pragma annotations for variable captures

There is just not enough place in DSL, and honestly I don't think that this is something all that important, so I decided to not implement this.

Using views for captured variables

Right now all captured variables introduce copies. I declare each variable at the top of the match expression, and then = or add to it during match evaluation. So matching [until @vals == 5, .._] := @[2, 3, 4, 5] would create vals == @[2, 3, 4]] sequence, where all elements will be copied. With view types (I suppose) it might be possible to declare vals as seq[lent int] and no copying would occur.

Pattern	Current types for injected variables	New types
`[@a]`	`var a: typeof(expr[0])`	`var a: lent typeof(expr[0])`
`{"key": @val}`	`var val: typeof(expr["key"])`	`var val: lent typeof(expr["key"])`
`[all @a]`	`var a: seq[typeof(expr[0])]`	`var a: seq[lent typeof(expr[0])]`
`[opt @val]`	`var a: Option[typeof(expr[0])]`	`var a: Option[lent typeof(expr[0])]`
`[opt @val or default]`	`var a: typeof(expr[0])`	`var a: lent typeof(expr[0])`
`(fld: @val)`	`var val: typeof(expr.fld)`	`var val: lent typeof(expr.fld)`

Because views are currently experimental this has not been implemented, but this is possibility for better performance, so it should not be overlooked.

Another related problem - when (maybeFuncMaybeField: [all < 10]) is captured, maybeFuncMaybeField is called multiple times, and this is not possible to determine whether maybe.. is a field or a procedure during pattern construciton. I believe view types can also solve this problem, by doing

let vals: lent typeof(expr) = expr

and then using vals instead of an expr throughout the code. Without view types this can be done too, quite easily, just using

let vals: typeof(expr) = expr

But it could (most likely will) create a lot of copies when expr is actually as field access. I see this is a tradeoff between calling maybeFuncMaybeField multiple times, and copying field on each pattern check. Now there is no copying (since I assumed pattern matching will be done mostly on fields, and things like len that are quite cheap to evaluate).

NOTE due to possibility to switching to immutable views for captured variables there is no guarantee whether the captured variable would be mutable or not. This is considered implementation detail and should not be exploited - next version might break this.

Removed leftover pieces from pgarma parser implementation support

Show original expression for pattern match fail (`:=`). Pattern like `[@username] := line.split(":")` will no fail with: Match failure for pattern '[@username]'. Expected length in range '1 .. 1', but line.split(":") has .len of 7

Added example of dataflow macro implementation and started draft or the 'Getting started with nim macros using pattern matching' article.

Handle both ways of writing array matching. E.g. all of those expressions must match the same input data. ```nim Par[Infix()] Par [Infix()] Par[Infix ()] Par [Infix ()] ```

This is an unholy abomination that was added in order to be somewhat more consistent wrt to `of` operator, but it works quite well though. And didn't take that long to implement (most of the changes in this commit come from `$` operators for `Match` and related things). All thanks to MCS

tests/tMatching.nim

juancarlospaco · 2020-12-07T02:13:18Z

LGTM 👍

changelog.md

tests/tMatching.nim

src/fusion/matching.nim

juancarlospaco · 2020-12-10T00:46:42Z

Looks good still 👍

haxscramper · 2020-12-11T18:31:29Z

Any thoughts on the article? It mostly illustrates the main use case for the library - writing macros, but maybe I should add something else. I hope to put it on nim blog after this is merged and somewhat user-tested

juancarlospaco · 2020-12-11T23:11:19Z

src/fusion/matching.nim

+    of nnkOpenSymChoice: n[0].strVal()
+    of nnkSym: n.strVal()
+    of nnkStrKinds: n.strVal()
+    else: raiseAssert(&"Cannot get string value from node kind {n.kind}")


Some of the code formats error messages using strformat and some uses & and $,
I do not know if theres a convention for this, maybe ask someone, I do not know if theres a reason,
but stdlib error messages are formatted using & and $ AFAIK
:)

Most likely that is done to avoid dependencies on strformat.

I used $ & in quote parts of codegen, to generate messages and strformat otherwise. I must use &$ for code that will be generate by macro, but otherwise I really prefer strformat.

haxscramper · 2020-12-15T17:29:57Z

@timotheecour is there anything else that needs fixing, or this looks good enough for you too?

Sequence matching now uses `items` iterator instead of random-access operator (`[]`).

haxscramper · 2020-12-17T13:52:22Z

Added custom unpackers and changed sequence pattern matching to use items instead of random access via [].

This code doesn't work prior to 1.4.2. and I have no idea as to why, so custom unpackers are only supported from 1.4.2 onwards.

type FieldIndex* = distinct int

template `[]`*(t: tuple, idx: static[FieldIndex]): untyped =
  t[idx.int]
  
echo (12, 12)[FieldIndex(0)]

src/fusion/matching.rst

tests/tmatching.nim

Changed implementation of set handing to more intuitive one - now it is possible to test for things like "aA".matches([{'a' .. 'z', 'A' .. 'Z'}]). Previously tests patterns were implemented as set-set intersections, which was not all that useful.

Araq · 2020-12-29T08:38:16Z

src/fusion/matching.nim

+import sequtils, macros, tables, options, strformat, strutils,
+       parseutils, algorithm, hashes
+
+export options


In a follow-up PR, try to avoid this please.

Araq · 2020-12-29T08:55:00Z

Merging this now but there will be followup changes to the documentation. This is still highly experimental. In the longer run I expect Nim to grow let / var keywords as expressions so that identifier injections become clearer. This means @val will become (let val) or similar. Happy to write an RFC for this.

haxscramper · 2020-12-29T09:03:11Z

@Araq I planned to collect some feedback after this library is somewhat user-tested and update documentation & related parts accordingly.

About let val vs @val - I fully support this, it would look much better compared to current syntax.

haxscramper added 6 commits October 23, 2020 15:14

[>>>] Add matching from hmisc

306c90a

[FIX] Change import name

79a3b00

[FEATURE] Support regular value unification

8d5866a

[TEST] Add unit tests from gara

38628c3

Add unit tests from gara package - https://github.com/alehander92/gara/blob/master/tests/test_gara.nim

ghost reviewed Oct 25, 2020

View reviewed changes

src/fusion/matching.rst

@@ -0,0 +1,293 @@

.. raw:: html

<blockquote><p>

"you can probably make a macro for that" -- Rika, 22-09-2020 10:41:51

This comment was marked as off-topic.

Sign in to view

haxscramper added 5 commits October 27, 2020 21:15

[DOC] Update changelog [skip ci]

333b82a

[FIX|TEST] Type naming in unit tests

3d241bb

Changed type naming in unit tests to account for differences in codegen between nim 1.0.0 and 1.2.0+

[FEATURE] Support fld1.sub2.sub3 for patterns

9200397

[FEATURE] Add support for (fld.len: < 10) checks

80ca957

Support Dot/Bracket expression for field access in pattern matching checks. `(fld.len: < 10)` is a short way of writing `(fld: (len: < 10))`. Less visual clutter and more intuitive.

haxscramper added 5 commits November 1, 2020 14:02

[WIP][FEATURE] Detailed exception on assertMatch fail

33469f9

Added support for more detailed exception messages on `assertMatch` and `:=`. Not all matching expressions are currently supported, but that is just a matter of implementing support for possible cases.

[DOC] Documentation update; explain makeTree

f01d835

Removed support for `{kind1, kind2}` for `hasKind` pattern. Added checking for correct identifier name - now invalid kind name will be detected by DSL and not inside of macro-generated code.

[DOC|TEST] Add cheat-sheet to docs, add more tests

02aee66

Added quick syntax reference to documentation, added more unit tests (for new documentation examples and more involved generic ref object). Added better error messages for undefined fields/procs for `makeTree`

[TEST] More tests, fix backtraces

38a2f4f

Add more tests. Use `{.line.}` pragma to generate correct backtraces for failed match exceptions.

[DOC|CLEAN] Documentation & code cleanup

7d0c08f

[TEST] AST-AST rewriting test

a0b4f65

[CLEAN] Remove pragma parsing support

4dbe1be

Removed leftover pieces from pgarma parser implementation support

haxscramper mentioned this pull request Nov 6, 2020

Early version of RFC: Pattern matching ? nim-lang/RFCs#245

Closed

haxscramper added 5 commits November 7, 2020 19:30

[FEATURE] Show original expression for match fail

285ae2b

Show original expression for pattern match fail (`:=`). Pattern like `[@username] := line.split(":")` will no fail with: Match failure for pattern '[@username]'. Expected length in range '1 .. 1', but line.split(":") has .len of 7

[FEATURE] Dataflow macro implementation example

4451266

Added example of dataflow macro implementation and started draft or the 'Getting started with nim macros using pattern matching' article.

[FIX] Handle A [1] and A[1] unifromly

5778f16

Handle both ways of writing array matching. E.g. all of those expressions must match the same input data. ```nim Par[Infix()] Par [Infix()] Par[Infix ()] Par [Infix ()] ```

[TEST] More tests for broken brackets

b5ced19

juancarlospaco reviewed Dec 7, 2020

View reviewed changes

tests/tMatching.nim Outdated Show resolved Hide resolved