Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ADTs, and pattern matching #34

Closed
masak opened this issue Oct 10, 2015 · 15 comments
Closed

Implement ADTs, and pattern matching #34

masak opened this issue Oct 10, 2015 · 15 comments

Comments

@masak
Copy link
Owner

masak commented Oct 10, 2015

Here's the proposed syntax, based on the same declaration in an article by Oleg:

type Exp {
    | Var (Str)
    | App (Exp, Exp)
    | Lam (Var, Exp)
    | Let (Var, Exp, Exp)
}

Semantically, the type Exp declaration would introduce five new types into the current scope, the latter four of which are instantiable classes. (For example, Lam(Var("foo"), Var("foo")).) So far, nothing particularly exciting about that. As you can see, ADTs can be recursively defined without much fanfare. If they don't have a base case, you simply won't be able to instantiate one using a finite program.

The | things stand out a little and make it clear that this is a DSL. They're also a little bit easier on the eyes than ; at the end of the line, and visually hint that there is something very declarative going on here, rather than imperative ; stuff.

We also introduce a case statement:

sub depth(Exp e): Int {
    return case e {
        | Var (_): 0
        | Lam (_, e): depth(e) + 1
        | App (e1, e2): max(depth(e1), depth(e2))
        | Let (_, e1, e2): max(depth(e1), depth(e2))
    };
}

There's both a case statement and (as above) a case expression. Both require an xblock. Notice how the individual cases mirror the syntax of the ADT declaration itself, although here we're binding against values instead of declaring their types. The thing after the colon can be either an expression or a block.

In the case of the App and Let cases above, e1 and e2 are introduced as lexical variables whose scope is only the expression or block immediately following the colon. Written as imperative code, the App case would come out as something like this:

if type(e) == "App" {
    my e1 = e.getProp(0);
    my e2 = e.getProp(1);
    return max(depth(e1), depth(e2));
}

The underscores simply mean "no binding". (I wanted to do asterisks, following Perl 6's lead, but it didn't look nice. Dollar signs don't make sense in 007 because we don't have sigils.) In the case of Var we don't care about any of the properties, and we could elide the (_) completely. But even a paren full of underscores would do shape-checking on the type, which provides a little bit of extra consistency.

I guess we could also allow multiple matchers with | between them. Hm.

    return case e {
        | Var (_): 0
        | Lam (_, e): depth(e) + 1
        | App (e1, e2) | Let (_, e1, e2): max(depth(e1), depth(e2))
    };

In this particular instance that helps us. I'm not sure it's worth the extra complexity with the matching machinery, though. We'd need to throw an error if there was a type mismatch somewhere, or if two matchers didn't introduce exactly the same variables.

If you do otherwise as the last case, it will match when nothing else matched. Notice that you can't match on structure with this one, though; it's just a catch-all.

The compiler statically detects whether you've "covered all the cases". Given what we've said so far, that looks entirely tractable, even with things such as nested matching and multiple parameters. If you haven't covered all the cases, and don't have an otherwise, the compiler fails and describes in poignantly descriptive prose what you missed.

I'm sorely tempted to allow individual cases to happen right inside any function or pointy block, just binding directly against its signature. Both the case statement itself and return could then be implicit. The depth sub could then be written as:

sub depth(Exp _): Int {
    | Var (_): 0
    | App (e1, e2): max(depth(e1), depth(e2))
    | Lam (_, e): depth(e) + 1
    | Let (_, e1, e2): max(depth(e1), depth(e2))
}

This kind of "implicit block case matching" would go a long way to compensate for our lack of multi things. In fact, I'd say it's a pretty competitive alternative, with a bunch of advantages of its own.

It is unclear how much we need visitors à la #26 when we have pattern matching like this. But let's do both and see which one wins. :) Who knows, maybe they'll end up occupying different niches.

@masak
Copy link
Owner Author

masak commented Oct 11, 2015

Almost immediately after I came up with pattern matching, I wanted to use it for primitive values too, not just ADTs. I haven't decided yet whether this is an inane idea, or a very worthy one.

None

No-one is interested in matching on None, as it has no structure. Forget I mentioned it, OK?

Int

The problem with division is if you give people division on Int, you'll either have to provide non-integral types (which we won't), or you'll have to do something stupid like truncate the value, and then people are all like (╯°□°)╯︵ ┻━┻ when their values don't math right:

say 5 / 2;    # 2 (╯°□°)╯︵ ┻━┻
say 2 / 3;    # 0 (ノಠ益ಠ)ノ彡┻━┻

I briefly toyed with a divmod operator, which clearly is the right name for what it does, and which technically solves the problem of truncation:

say 5 divmod 2;    # [2, 1] ┬─┬ノ( º _ ºノ)
say 2 divmod 3;    # [0, 2]  ┬─┬ ¯\_(ツ)

But... meh. What's that Array doing in my math result? And now I have to [0] and [1] all over the place.

'Course, with the new object system we could instead return something like { quotient: 0, remainder: 2 }, but... still meh. What's that DivMod object doing in my math result?

Then I realized pattern matching could be co-opted to "destructure" a number into its q and r parts:

case 5 {
    | Int (2 * q + r): say("The quotient is " ~ str(q) ~ " and the remainder " ~ str(r));
}

Here the Int should probably be optional, because we know it's an Int already. Note that the names q and r are just a convention; it's the pattern that matters, not the names.

Note the assumptions that go into the 2 * q + r expression. q can be any integer, but we (implicitly) require 0 <= r < q for positive q or q < r <= 0 for negative q. (Just like both Perl 6 and Python.)

We can also check for divisibility by setting r to 0. Here's an implementation of FizzBuzz using the above-conjectured sugar notation:

for range(1, 101) -> n {  # `range()` from Python assumed
    | 15 * _: say("Fizzbuzz")
    | 3 * _: say("Fizz")
    | 5 * _: say("Buzz")
    | otherwise: say(n)
}

15 * _ is simply sugar for 15 * _ + 0 when we expect 15 to factor cleanly into n.

We can even check for primality by parameterizing both factors in the multiplication. Here's a function that decomposes an integer into its constituent prime factors:

sub factorize(n: Int): Array<Int> {
    | p * q: return p :: factorize(q)    # infix:<::> cons operator assumed
    | otherwise: return [n]
}

That, I must say, looks very nice.

(A silent assumption made in the p * q + 0 pattern is that p be a prime. That's easy to arrange, though. (And note that we need to assume p >= 2 anyway to avoid getting trivial answers.) If p is either the smallest possible or largest possible prime, then we get a nice sorted array back.)

Str

(Added later: Please consider this whole section highly conjectural and not something we're likely to add in its current form. See this later comment for why I no longer think this is a good idea.)

We can easily check the beginning, middle, and end of a string for patterns:

| ("huey" ~ _)
| (_ ~ "dewey" ~ _)
| (_ ~ "louie")

This eliminates Perl 6's starts-with, index, and ends-with, and also eliminates the need for something like with just to distinguish position 0 from no-such-substring failure. It's all handled by the pattern matching.

If we throw (a hypothetical) infix:<x> into the mix, we can even trim spaces, or count repetitions:

| (" " x _ ~ trimmed ~ " " x _)
| (_ ~ "ook " x n ~ _)

The only slight difficulty we're sweeping under the carpet here is how greedy the parameterized substring captures should be. If they're all greedy, then we'd find the last dewey above. (Which seems like a somewhat odd default.) If they're all frugal, we'd find the first one. Both seem to be useful behaviors, depending.

The trimming example would definitely want the first " " x _ to be greedy and the trimmed capture to be frugal. Hm. Feels like there's a good set of heuristics that could be applied here that would do the right thing most of the time, but I don't quite see it. And even then, sometimes we'd simply want to tell the match which one to match greedily and which one frugally. (Not to mention ratcheting semantics.) I don't see an elegant way to cram that information into the expression, though.

Here's an implementation of split:

sub split(s: Str, sep: Str): Array<Str> {
    return case s {
        | (pre ~ sep ~ rest): pre :: split(rest)
        | otherwise: [s]
    };
}

Array

Similarly, one could do destructuring of arrays, using infix:<::> (cons), infix:<++> (append), and structural matching of the contents of the array. But I don't have much to say about that, so... here, have a quicksort implementation:

sub quicksort(list: Array<Int>): Array<Int> {
    | ([]): []
    | (p :: xs):
        quicksort(xs.filter({ p < l }))    # yeah, yeah, I know literal blocks were killed off
        ++ [p]
        ++ quicksort(xs.filter({ p >= l }))    # and they never implicitly returned values like that
}

@masak
Copy link
Owner Author

masak commented Oct 12, 2015

The only slight difficulty we're sweeping under the carpet here is how greedy the parameterized substring captures should be.

Here's one possible solution: introduce two different concatenation operators <~ and ~>. (Frugal and greedy.) They always apply to their left operand. We can still use ~ when the left operand is a constant string.

Rewriting the examples using this notation:

| case ("huey" ~ _)
| case (_ <~ "dewey" ~ _)   # first match
| case (_ ~> "dewey" ~ _)   # last match
| case (_ ~> "louie")

| case (" " x _ ~> trimmed <~ " " x _)
| case (_ <~ "ook " x n ~> _)

sub split(s: Str, sep: Str): Array<Str> {
    return case s {
        | (pre <~ sep ~ rest): pre :: split(rest)
        | otherwise: [s]
    };
}

@vendethiel
Copy link
Collaborator

Seems like we're mixing grammars into something weird. Also, couldn't ADTs be implemented with macros?

@masak
Copy link
Owner Author

masak commented Oct 12, 2015

Seems like we're mixing grammars into something weird.

Aye. It was a thought experiment, but I don't like it much now that I see the result.

Interestingly, substructural matching on Ints is such a success essentially because of the fundamental theorem of arithmetic, which gives all nonzero integers a unique decomposition into factors. But ~ on strings is additive, and there's no fundamental theorem to back us up there.

I think it would be possible to combine the case statement with parser combinators, and that might be really nice. But exactly how that'd work hasn't clicked for me yet, so I'll stay silent on that.

Anyway, the important bit of the case statement is its conjugate role in destructing type constructors. All the other toying around with Ints and Strs is just bonus.

Also, couldn't ADTs be implemented with macros?

Yes, I sure hope so. We just need the following tech in 007:

  • is parsed (because case introduces syntax)
  • the ability to define and use your own Q types
  • the ability to hook into (the possibly user-land) type system
  • the ability to hook into the parser to declare things in a lexical scope not tied to a { } block

"Just". 😸

@vendethiel
Copy link
Collaborator

I think it would be possible to combine the case statement with parser combinators, and that might be really nice. But exactly how that'd work hasn't clicked for me yet, so I'll stay silent on that.

Another interesting bit: F#'s active patterns.

@masak
Copy link
Owner Author

masak commented Oct 12, 2015

Another interesting bit: F#'s active patterns.

Active patterns, eh? Looks very nice. I'll have to mull over that for a while.

@vendethiel
Copy link
Collaborator

... Though I have no idea how to provide exhaustivity checks in some cases :).

@masak
Copy link
Owner Author

masak commented Oct 28, 2015

Coming back to the type keyword, I think I'd want to call it enum instead. There's prior art; Rust enums look very much like I picture 007 ADTs.

@masak
Copy link
Owner Author

masak commented Jul 16, 2016

for range(1, 101) -> n {  # `range()` from Python assumed
    | 15 * _: say("Fizzbuzz")
    | 3 * _: say("Fizz")
    | 5 * _: say("Buzz")
    | otherwise: say(n)
}

I came back to this issue to provide an case version of FizzBuzz, and was happy to find one was already written. I don't like the Python range, though, and I would probably write it like this:

use case;
use range;

for 1 .. 100 -> n {
    | 15 * _: say("FizzBuzz")
    | 3 * _: say("Fizz")
    | 5 * _: say("Buzz")
    | otherwise: say(n)
}

One thought occurred to me as I was thinking about the above version: there's a notion of "simple statement" that we're leaning on but haven't made explicit in 007. Expression statements like say(n) are simple statements. So are return, (conjectural) next, and throw. I guess basically any future keyword-only statements would be simple. (For example, a hypothetical yield statement or a take statement.) Bigger constructs like sub or while or BEGIN are not simple statements and won't work as the consequent of an alternative in a case.

The same conceptual subdivision would serve us if we introduced <stmt> if <expr>; and <stmt> while <expr>; statements.

@vendethiel
Copy link
Collaborator

vendethiel commented Jul 16, 2016

I coded an Elixir ADT module, and it seems it'll be getting exhaustivity checking soon(tm).

The code is mostly cleanly contained, but Elixir doesn't really prevent you from creating dynamically-named modules.

@masak
Copy link
Owner Author

masak commented Aug 7, 2016

Not sure how much of a problem this is, but...

sub depth(Exp e): Int {
    return case e {
        | Var (_): 0
        | Lam (_, e): depth(e) + 1
        | App (e1, e2): max(depth(e1), depth(e2))
        | Let (_, e1, e2): max(depth(e1), depth(e2))
    };
}

Those e1 and e2 variables on the left of the colon are declarations. But what happens if there's a my e2 somewhere in an outer scope? I see only two options:

  • It always prefers the pre-existing variable. So the outer e2 gets used. This basically creates a spooky-dependency-at-a-distance with that outer variable. What if the code with the outer e2 gets refactored at some point? Does this one then turn back into a declaration? Worse, this code clearly didn't expect to use the outer e2 in the first place. In short, it's not refactor-safe.
  • It's always a declaration. So the outer e2 gets shadowed, and generally everything gets shadowed. This is nice and consistent and local in the sense that you look at the code and know what it means. The only downside is that there's now no way to case-match against existing variables. (Though some small bit of syntax would help fix that.)

In short, something like | App (e1, e2) is actually a declaration of e1 and e2. It should act like one and shadow outer things, by default. We might want to consider a syntax that allows the use of variables from the outside. Is there prior art there somewhere? Otherwise we could go with something like | App (^e1, ^e2).

@vendethiel
Copy link
Collaborator

Erlang picks the action-at-a-distance one (add a declaration, and some later bind in scope became a match), Elixir picks the always-bind, using ^ to match (I think they changed it early in, around 0.9).

I'd rather we go the Elixir route for clarity's sake.

@masak
Copy link
Owner Author

masak commented Aug 16, 2016

Yes, agree.

@eritain
Copy link

eritain commented Aug 18, 2016

Between this and #13 I believe 007 becomes a superset of Prolog.

@masak
Copy link
Owner Author

masak commented Feb 25, 2018

I'm going to close this one; not because it's not going to happen, but because it's a bit too ambitious as-is, the ideas in it have drifted since it was opened, and what we're actually going to do is adequately covered by four other issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants