Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for integer ranges #8504

Open
XVilka opened this Issue Mar 15, 2019 · 6 comments

Comments

Projects
None yet
4 participants
@XVilka
Copy link
Contributor

commented Mar 15, 2019

To be able to support integer ranges like 1..100 in match cases and other expressions.

See https://discuss.ocaml.org/t/numerical-ranges-and-pattern-matching/3102

@XVilka XVilka changed the title Support for numberical ranges Support for numerical ranges Mar 15, 2019

@XVilka XVilka changed the title Support for numerical ranges Support for integer ranges Mar 15, 2019

@gasche

This comment has been minimized.

Copy link
Member

commented Mar 15, 2019

Let me repeat my comment from the forum discussion:

This is a natural idea, but so far nobody has produced a patch to implement the feature, because it is harder to implement than one may expect. If I remember correctly, character ranges '0'..'4' are exploded into or-patterns '0'|'1'|'2'|'3'|'4', and manipulated in that non-compact form in the pattern-matching compiler (which later optimizes continuous ranges to compact/efficient code, so it doesn’t show in the compiled output). If you tried to extend this to integer ranges, you would get compile-time blowups for any large range. Instead you have to change the structure of the pattern-matching compiler to support a first-class notion of range, but that is an ambitious refactoring to a part of the compiler codebase that is unpleasant to work with. Nobody has done it for now.

"cfcs" later came with an interesting suggestion to decompose integer ranges into products of byte ranges. It is a cool idea but I still think it is going to be pretty hard to implement correctly in the current state of the pattern-matching compiler implementation.

@trefis, and myself have been planning to work on refactoring the pattern-matching compiler codebase with @maranget (as we did for parmatch.ml, the type-checking/analysis/warnings side of pattern-matching), and I think it would be easier to consider these features after that work is done. But we don't have any announced time for when to start working on this, and it would be a massive work investment, so please don't hold your breath in the short-term.

@gasche

This comment has been minimized.

Copy link
Member

commented Mar 20, 2019

This encoding is incorrect in general, please see the discussion on the Discourse forum.

@johnwhitington

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

The reasons for not allowing when except on the outermost pattern, so far as I remember from reading previous discussions, are that it complicates further the question of when and how many times when guards are executed, and that it would be a problem matching on mutable patterns, if the guard could mutate the pattern.

But, could we not solve the problem of integer (and character) ranges, by having the pattern-match compiler insert whens to each clause i.e the pattern 1..3 compiles to x when x >= 0 && x <= 3. These auto-generated whens would be ok, because there is no mutability at all. (Or, would {contents = 1..3} still be a problem?)

Partial patterns like 1.. and ..1 could also be introduced, and even checked for completeness.

@gasche

This comment has been minimized.

Copy link
Member

commented Apr 18, 2019

Deep when is hard to implement for more reasons than that (by the way, your second reason also applies to when guards at the toplevel only). Suddenly having expressions in the middle of patterns would be difficult/painful for the current compilation scheme to handle, and that explains why this encoding isn't feasible today.

@johnwhitington

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

Thanks for the explanation @gasche. Allow me to have one more go at this before I admit defeat :-) This one needs only top-level guards.

In the discuss.ocaml.org thread, @gasche says:

Though numbers could be desugared into | x when x > = 0 && x <= 9 , no?

This desugaring is not correct because when can only exist at the top-level of the whole clause, >whereas interval patterns can be used in-depth. For example, consider the or-pattern

(Some (0..9, true) | Some (100..109, false))

Assume, for generality, that there is also a guard which the programmer has added:

(Some (0..9, true) | Some (100..109, false)) when guard

Can this not be desugared into:

(Some (x, true) | Some (y, false)) when x >= 0 && x <= 9 && y >=100 && y <= 109 && guard

The short-circuiting nature of the && operator means the original part of the guard will not be executed unless the generated guards are true.

This could work for characters too. We would need to annotate the generated parts of the guard with say [@ocaml.generated_pattern] so that the completeness-checker, either for characters or integers, could find them.

@gasche

This comment has been minimized.

Copy link
Member

commented Apr 23, 2019

(Some (x, true) | Some (y, false)) is not a valid pattern, as both sides of an or-pattern must bind the same variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.