Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern matching for chars are broken #5557

Closed
amiralies opened this issue Jul 15, 2022 · 20 comments
Closed

Pattern matching for chars are broken #5557

amiralies opened this issue Jul 15, 2022 · 20 comments
Labels
Milestone

Comments

@amiralies
Copy link
Contributor

let isA = c =>
  switch c {
  | 'a' => true
  }

produces this

// Generated by ReScript, PLEASE EDIT WITH CARE
'use strict';


function isA(c) {
  return true;
}

exports.isA = isA;
/* No side effect */

without warnigs.

@cristianoc
Copy link
Collaborator

Indeed thanks for reporting. Moving to the syntax repo.

@cristianoc cristianoc transferred this issue from rescript-lang/rescript Jul 16, 2022
@cristianoc cristianoc added the bug label Jul 16, 2022
@IwanKaramazow
Copy link

What makes this a syntax problem? Is this also unicode related?

@cristianoc
Copy link
Collaborator

No quotes are added in the generated code.

@cristianoc
Copy link
Collaborator

Which means it's probably a compiler problem.
And might need to transfer this back.

@cristianoc cristianoc transferred this issue from rescript-lang/syntax Jul 16, 2022
@cristianoc cristianoc added this to the v10.0 milestone Jul 17, 2022
@cristianoc
Copy link
Collaborator

Not sure what the issue is. The generated code looks file.
Please reopen with an example that illustrates the issue.

@amiralies
Copy link
Contributor Author

The above isA function is certainly includes a partial matching.
It should return true when the input is 'a' and throw Match_failure on other inputs.
The generated code for that block is something like this in 9.1.4

function isA(c) {
  if (c === 97) {
    return true;
  }
  throw {
        RE_EXN_ID: "Match_failure",
        _1: [
          "Main.res",
          2,
          2
        ],
        Error: new Error()
      };
}

@cristianoc
Copy link
Collaborator

Got it.
Have you observed anything except char? Is that the only type leading to this issue?

@cristianoc
Copy link
Collaborator

So the No side effect seems like the clue. Perhaps the code is transformed before it even reached the type checker, which would definitely complain.

@cristianoc
Copy link
Collaborator

It seems to be only for chars, e.g. not strings or integers. That's peculiar.

@amiralies
Copy link
Contributor Author

Got it. Have you observed anything except char? Is that the only type leading to this issue?

Didn't see this happening with other types. just char and char interval (which is the same from typechecker pov)

@IwanKaramazow
Copy link

It seems to be only for chars, e.g. not strings or integers. That's peculiar.

Did something change with Unicode support? I remember chars now accepting codepoints

@cristianoc cristianoc reopened this Jul 17, 2022
@cristianoc
Copy link
Collaborator

I've chased down pattern matching to receiving an integer constant, and presumably expecting a char constant.

I think there's no way to change the type of char without adapting the pattern matching also. As the notion of exhaustive matching would change.

@IwanKaramazow do you have any thoughts?

@cristianoc
Copy link
Collaborator

@amiralies is it true that the only issue you have observed is the detection of exhaustiveness?

@cristianoc
Copy link
Collaborator

Perhaps one possibility would be to make char be an alias to int, and try to get int exhaustive matching to kick in. But is that even right?

@amiralies
Copy link
Contributor Author

@amiralies is it true that the only issue you have observed is the detection of exhaustiveness?

Yes. I'm not sure if there are other problems or no.
But it seems unused match detection is working somehow

ie

 switch c {
| 'a' => true
| 'a' => false
}

shows a warning

@IwanKaramazow
Copy link

Seems like it's a lot more involved cf. the original doc: https://docs.google.com/document/d/1V99iVjIncN7seNe-oECa_PU7Sx4ajz8xIH-qDg1vS6Y/edit

@cristianoc cristianoc modified the milestones: v10.0, v10.1 Jul 18, 2022
@cristianoc
Copy link
Collaborator

I've moved it to milestone v10.1 so there's a little time to figure it out.

@cristianoc cristianoc modified the milestones: v10.1, v10.2 Sep 8, 2022
cristianoc added a commit that referenced this issue Oct 19, 2022
…t working

Fixes #5557
Fixes #5743

Exhaustiveness check for pattern matching relies on checking that the pattern does not lead to a type error.
Since char constants are represented internally as int, this causes an internal type error when processing patterns of the form 'x', where the constant is represented as `Pconst_integer` but the expected type is `char`.

This PR intercepts this situation and changes the expected type to `int`.
A cleaner solution would be to represent chars differently (but we can't change the AST), or make it official that char and int is the same type (and lose some abstraction).
cristianoc added a commit that referenced this issue Oct 19, 2022
#5744)

* Fix issue where exhaustiveness check for pattern matching char was not working

Fixes #5557
Fixes #5743

Exhaustiveness check for pattern matching relies on checking that the pattern does not lead to a type error.
Since char constants are represented internally as int, this causes an internal type error when processing patterns of the form 'x', where the constant is represented as `Pconst_integer` but the expected type is `char`.

This PR intercepts this situation and changes the expected type to `int`.
A cleaner solution would be to represent chars differently (but we can't change the AST), or make it official that char and int is the same type (and lose some abstraction).

* comment
@bobzhang
Copy link
Member

Perhaps one possibility would be to make char be an alias to int, and try to get int exhaustive matching to kick in. But is that even right?

This is correct and seems cleaner. Maybe type char = private int ?

@cristianoc
Copy link
Collaborator

cristianoc commented Oct 23, 2022

Perhaps one possibility would be to make char be an alias to int, and try to get int exhaustive matching to kick in. But is that even right?

This is correct and seems cleaner. Maybe type char = private int ?

That would be nice, though not sure how one would do that in practice.
char is used early on in the compiler, as a predefined type, so not sure where type char = private int would go to make it globally visible.

@cristianoc
Copy link
Collaborator

Specifically the constant 'x' would need to have type char, which is not the same as int, but the compiler internally somewhere would need to consider it as int (for pattern matching mechanics) but also not as int (for unification).

bobzhang added a commit that referenced this issue Oct 24, 2022
butterunderflow pushed a commit to butterunderflow/rescript-compiler that referenced this issue Oct 31, 2022
butterunderflow pushed a commit to butterunderflow/rescript-compiler that referenced this issue Oct 31, 2022
butterunderflow pushed a commit to butterunderflow/rescript-compiler that referenced this issue Oct 31, 2022
add a test case

snapshot

change Pconst_char payload (WIP)

tweak

tweak

representation of char for lambda

lib

bugfix: replace wrong pp

libs

bugfix: replace wrong print

use unsafe_chr to handle possible overflow char

safe print int as char

reduce duplication

(re)use encodeCodepoint to support string_of_int_as_char

some refactor

libs

CHANGELOG.md
butterunderflow pushed a commit to butterunderflow/rescript-compiler that referenced this issue Oct 31, 2022
add a test case

snapshot

change Pconst_char payload (WIP)

representation of char for lambda

safe print int as char

reduce duplication

(re)use encodeCodepoint to support string_of_int_as_char

some refactor

CHANGELOG.md

some cleanup

CHANGELOG.md
butterunderflow pushed a commit to butterunderflow/rescript-compiler that referenced this issue Oct 31, 2022
add a test case

snapshot

change Pconst_char payload (WIP)

representation of char for lambda

safe print int as char

reduce duplication

(re)use encodeCodepoint to support string_of_int_as_char

some refactor

CHANGELOG.md

changelog
bobzhang added a commit that referenced this issue Oct 31, 2022
bobzhang added a commit that referenced this issue Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants