Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern matching implementation #33

Merged
merged 41 commits into from
Dec 29, 2020
Merged

Pattern matching implementation #33

merged 41 commits into from
Dec 29, 2020

Conversation

haxscramper
Copy link
Contributor

@haxscramper haxscramper commented Oct 25, 2020

Pattern matching implementation as described in nim-lang/RFCs#245

WIP article on how to use this for writing macros.

Changed:
  - Update unit tests - add more examples from documentation, fix edge
    cases and test agains different allowed variants of the pattern
  - Check for consistensy of documentation and implementation
  - Simplify some code in implementation
  - Remove exports from certain implementation details
Add more tests for use in generics

Right now use of `case` statement inside of generics fails, if
identifiers like `_` are used in body. There are two workarounds right
now: first one is to use `expand` macro that forces evaluation of the
body, so you write things like

```nim
expand case a:
  of [_]: discard
  else: fail()
```

Second option is adding `mixin` statement for each unknown symbol,
like this:

```nim
mixin _
case a:
  of [_]: discard
  else: fail()
```

This can be done automatically, but I don't know of a way to determine
if a macro has been expanded inside of generic function or not (since
use of `mixin` statement outside of procs/functions/methods is an
error)
@haxscramper
Copy link
Contributor Author

Right now use of case statement inside of generics fails, if identifiers like _ are used in body. There are two workarounds right now: first one is to use expand macro that forces evaluation of the body, so you write things like

expand case a:
  of [_]: discard
  else: fail()

Second option is adding mixin statement for each unknown symbol, like this:

mixin _
case a:
  of [_]: discard
  else: fail()

This can be done automatically, but I don't know of a way to determine if a macro has been expanded inside of generic function or not (since use of mixin statement outside of procs/functions/methods is an error)

@@ -0,0 +1,293 @@
.. raw:: html
<blockquote><p>
"you can probably make a macro for that" -- Rika, 22-09-2020 10:41:51

This comment was marked as off-topic.

@juancarlospaco
Copy link
Contributor

Needs changelog.

Changed type naming in unit tests to account for differences in
codegen between nim 1.0.0 and 1.2.0+
If you try to compiler code for `tMatching.nim` it will fail with
regular type mismatch error - right now I'm figuring out how to
properly handle things like `fld[0].len: < 10`. I think a lot of
people would expect to write something like that and that it would
work, so I'm adding it. Not the point though. So test suite errors out
with type mismatch error - just no overload for `<`, nothing
interesting. But before that, I used default `echo result.toStrLit()`
to show compilation results - but I *cannot* see them *at all*, since
there is a approximately ~2 screens worth of text (21") dumped by
compiler on overload. So shit looks like this:

let expr_23990001 = val3
((((((expr_23990001.hello3[0][0] < 10))))))
/home/test/workspace/git-sandbox/fusion/tests/tMatching.nim(24, 7) template/generic instantiation of `suite` from here
/home/test/workspace/git-sandbox/fusion/tests/tMatching.nim(783, 8) template/generic instantiation of `test` from here
/home/test/workspace/git-sandbox/fusion/tests/tMatching.nim(802, 35) Error: type mismatch: got <seq[string], int literal(10)>
but expected one of:
proc `<`(x, y: bool): bool
  first type mismatch at position: 1
  required type for x: bool
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`(x, y: char): bool
  first type mismatch at position: 1
  required type for x: char
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`(x, y: float): bool
  first type mismatch at position: 1
  required type for x: float
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`(x, y: float32): bool
  first type mismatch at position: 1
  required type for x: float32
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`(x, y: int): bool
  first type mismatch at position: 1
  required type for x: int
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`(x, y: int16): bool
  first type mismatch at position: 1
  required type for x: int16
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`(x, y: int32): bool
  first type mismatch at position: 1
  required type for x: int32
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`(x, y: int64): bool
  first type mismatch at position: 1
  required type for x: int64
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`(x, y: int8): bool
  first type mismatch at position: 1
  required type for x: int8
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`(x, y: pointer): bool
  first type mismatch at position: 1
  required type for x: pointer
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`(x, y: string): bool
  first type mismatch at position: 1
  required type for x: string
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`(x, y: uint): bool
  first type mismatch at position: 1
  required type for x: uint
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`(x, y: uint16): bool
  first type mismatch at position: 1
  required type for x: uint16
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`(x, y: uint32): bool
  first type mismatch at position: 1
  required type for x: uint32
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`(x, y: uint64): bool
  first type mismatch at position: 1
  required type for x: uint64
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`(x, y: uint8): bool
  first type mismatch at position: 1
  required type for x: uint8
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`[Enum: enum](x, y: Enum): bool
  first type mismatch at position: 1
  required type for x: Enum: enum
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`[T: tuple](x, y: T): bool
  first type mismatch at position: 1
  required type for x: T: tuple
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`[T](x, y: ptr T): bool
  first type mismatch at position: 1
  required type for x: ptr T
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`[T](x, y: ref T): bool
  first type mismatch at position: 1
  required type for x: ref T
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]
proc `<`[T](x, y: set[T]): bool
  first type mismatch at position: 1
  required type for x: set[T]
  but expression 'expr_23990001.hello3[0][0]' is of type: seq[string]

expression: expr_23990001.hello3[0][0] < 10

If you scrolled past that - good for you. Very readable, indeed
Support Dot/Bracket expression for field access in pattern matching
checks. `(fld.len: < 10)` is a short way of writing `(fld: (len: <
10))`. Less visual clutter and more intuitive.
@haxscramper
Copy link
Contributor Author

haxscramper commented Oct 30, 2020

Previous CI failure was related to C codegen differences between nim 1.0.0 and 1.2.0+ and happened only because I didn't account for it in tests.

In addition to fixing this I also added support for (field.subfield: <pattern>) syntax - previously such construction had to be written using (field: (subfield: <pattern>)) which was not really intuitive and quire bothersome to write. This addition is just purely for ergonomics/syntactic sugar.

I think that implementation is mostly feature-complete (almost all things described in RFC are covered, with some minor adjustments and several additions).

Things left are mostly related to better documentation and defensive programming of the DSL (test against more possible use cases & provide better error messages in case of malformed code).

When view types become non-experimental it might be possible to match objects using them.

Added support for more detailed exception messages on `assertMatch`
and `:=`. Not all matching expressions are currently supported, but
that is just a matter of implementing support for possible cases.
Removed support for `{kind1, kind2}` for `hasKind` pattern. Added
checking for correct identifier name - now invalid kind name will be
detected by DSL and not inside of macro-generated code.
Added quick syntax reference to documentation, added more unit
tests (for new documentation examples and more involved generic ref
object). Added better error messages for undefined fields/procs for
`makeTree`
Add more tests. Use `{.line.}` pragma to generate correct backtraces
for failed match exceptions.
@haxscramper
Copy link
Contributor Author

haxscramper commented Nov 5, 2020

All fixes, additions and features are finally implemented, so now I need someone to review this.

I've significantly improved failed match exceptions messages, position for backtraces (using {.line.}), added better DSL compilation checks and added more tests. No new functionality was added.

@haxscramper
Copy link
Contributor Author

haxscramper commented Nov 6, 2020

There were several things that I considered, but ultimately decided to not implement, or postpone until after the current version is accepted

Support for capturing values into existing variables.

It could be done by adding yet another syntax - @=, to explicitly declare that the existing variable should be reused. The another alternative is to check declared and then for typeof equality, but this is even less intuitive. So I decided it is not worth it, and you can just capture into a new variable and then add/incl/= to existing one however you like.

Pragma annotations for variable captures

There is just not enough place in DSL, and honestly I don't think that this is something all that important, so I decided to not implement this.

Using views for captured variables

Right now all captured variables introduce copies. I declare each variable at the top of the match expression, and then = or add to it during match evaluation. So matching [until @vals == 5, .._] := @[2, 3, 4, 5] would create vals == @[2, 3, 4]] sequence, where all elements will be copied. With view types (I suppose) it might be possible to declare vals as seq[lent int] and no copying would occur.

Pattern Current types for injected variables New types
[@a] var a: typeof(expr[0]) var a: lent typeof(expr[0])
{"key": @val} var val: typeof(expr["key"]) var val: lent typeof(expr["key"])
[all @a] var a: seq[typeof(expr[0])] var a: seq[lent typeof(expr[0])]
[opt @val] var a: Option[typeof(expr[0])] var a: Option[lent typeof(expr[0])]
[opt @val or default] var a: typeof(expr[0]) var a: lent typeof(expr[0])
(fld: @val) var val: typeof(expr.fld) var val: lent typeof(expr.fld)

Because views are currently experimental this has not been implemented, but this is possibility for better performance, so it should not be overlooked.

Another related problem - when (maybeFuncMaybeField: [all < 10]) is captured, maybeFuncMaybeField is called multiple times, and this is not possible to determine whether maybe.. is a field or a procedure during pattern construciton. I believe view types can also solve this problem, by doing

let vals: lent typeof(expr) = expr

and then using vals instead of an expr throughout the code. Without view types this can be done too, quite easily, just using

let vals: typeof(expr) = expr

But it could (most likely will) create a lot of copies when expr is actually as field access. I see this is a tradeoff between calling maybeFuncMaybeField multiple times, and copying field on each pattern check. Now there is no copying (since I assumed pattern matching will be done mostly on fields, and things like len that are quite cheap to evaluate).

NOTE due to possibility to switching to immutable views for captured variables there is no guarantee whether the captured variable would be mutable or not. This is considered implementation detail and should not be exploited - next version might break this.

Removed leftover pieces from pgarma parser implementation support
Show original expression for pattern match fail (`:=`).

Pattern like `[@username] := line.split(":")` will no fail with:

Match failure for pattern '[@username]'. Expected length in range '1
.. 1', but line.split(":") has .len of 7
Added example of dataflow macro implementation and started draft or
the 'Getting started with nim macros using pattern matching' article.
Handle both ways of writing array matching. E.g. all of those
expressions must match the same input data.

```nim
 Par[Infix()]
 Par [Infix()]
 Par[Infix ()]
 Par [Infix ()]
```
This is an unholy abomination that was added in order to be somewhat
more consistent wrt to `of` operator, but it works quite well though.
And didn't take that long to implement (most of the changes in this
commit come from `$` operators for `Match` and related things). All
thanks to MCS
tests/tMatching.nim Outdated Show resolved Hide resolved
@juancarlospaco
Copy link
Contributor

juancarlospaco commented Dec 7, 2020

LGTM 👍

changelog.md Outdated Show resolved Hide resolved
tests/tMatching.nim Outdated Show resolved Hide resolved
tests/tMatching.nim Outdated Show resolved Hide resolved
tests/tMatching.nim Outdated Show resolved Hide resolved
tests/tMatching.nim Outdated Show resolved Hide resolved
tests/tMatching.nim Outdated Show resolved Hide resolved
tests/tMatching.nim Outdated Show resolved Hide resolved
src/fusion/matching.nim Show resolved Hide resolved
src/fusion/matching.nim Outdated Show resolved Hide resolved
src/fusion/matching.nim Show resolved Hide resolved
src/fusion/matching.nim Show resolved Hide resolved
@juancarlospaco
Copy link
Contributor

Looks good still 👍

@haxscramper
Copy link
Contributor Author

Any thoughts on the article? It mostly illustrates the main use case for the library - writing macros, but maybe I should add something else. I hope to put it on nim blog after this is merged and somewhat user-tested

of nnkOpenSymChoice: n[0].strVal()
of nnkSym: n.strVal()
of nnkStrKinds: n.strVal()
else: raiseAssert(&"Cannot get string value from node kind {n.kind}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the code formats error messages using strformat and some uses & and $,
I do not know if theres a convention for this, maybe ask someone, I do not know if theres a reason,
but stdlib error messages are formatted using & and $ AFAIK
:)

Copy link
Contributor Author

@haxscramper haxscramper Dec 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely that is done to avoid dependencies on strformat.

I used $ & in quote parts of codegen, to generate messages and strformat otherwise. I must use &$ for code that will be generate by macro, but otherwise I really prefer strformat.

@haxscramper
Copy link
Contributor Author

@timotheecour is there anything else that needs fixing, or this looks good enough for you too?

@haxscramper
Copy link
Contributor Author

Added custom unpackers and changed sequence pattern matching to use items instead of random access via [].

This code doesn't work prior to 1.4.2. and I have no idea as to why, so custom unpackers are only supported from 1.4.2 onwards.

type FieldIndex* = distinct int

template `[]`*(t: tuple, idx: static[FieldIndex]): untyped =
  t[idx.int]
  
echo (12, 12)[FieldIndex(0)]

Changed implementation of set handing to more intuitive one - now it is
possible to test for things like "aA".matches([{'a' .. 'z', 'A' .. 'Z'}]).
Previously tests patterns were implemented as set-set intersections, which
was not all that useful.
import sequtils, macros, tables, options, strformat, strutils,
parseutils, algorithm, hashes

export options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow-up PR, try to avoid this please.

@Araq
Copy link
Member

Araq commented Dec 29, 2020

Merging this now but there will be followup changes to the documentation. This is still highly experimental. In the longer run I expect Nim to grow let / var keywords as expressions so that identifier injections become clearer. This means @val will become (let val) or similar. Happy to write an RFC for this.

@Araq Araq merged commit fadb913 into nim-lang:master Dec 29, 2020
@haxscramper
Copy link
Contributor Author

@Araq I planned to collect some feedback after this library is somewhat user-tested and update documentation & related parts accordingly.

About let val vs @val - I fully support this, it would look much better compared to current syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants