Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved error messages #102

Open
wants to merge 38 commits into
base: 4.02
from
Open

Improved error messages #102

wants to merge 38 commits into from

Conversation

@charguer
Copy link
Contributor

@charguer charguer commented Sep 19, 2014

Short paper:
http://www.chargueraud.org/research/2014/ocaml_errors/ocaml_errors.pdf

Slides:
http://www.chargueraud.org/research/2014/ocaml_errors/demo.html

Video:
https://www.youtube.com/watch?v=V_ipQZeBueg

Technical description:
See file improved_errors_branch.txt in the new branch (or in the diff page at the url below).

Patch:
charguer/ocaml@ocaml:4.02...improved-errors

@johnwhitington
Copy link
Contributor

@johnwhitington johnwhitington commented Sep 19, 2014

This was a fun presentation!

Do you think anything can be done about this one with your new system?

http://caml.inria.fr/mantis/view.php?id=6171

@johnwhitington
Copy link
Contributor

@johnwhitington johnwhitington commented Sep 19, 2014

The other classic one that trips almost everyone up at some time or another is this sort of thing with imperative code - it's a syntax error not a typing one:

let f t = ()

let x a b =
  f a;
  f b;

let y a b =
  f b;
  f a

let z = 1

The error is "syntax error", but the location is at the final "let" - not just one declaration later than the real problem, but two declarations later!

@charguer
Copy link
Contributor Author

@charguer charguer commented Sep 19, 2014

Parsing errors, that's another topic, sorry :)

@Drup
Copy link
Contributor

@Drup Drup commented Sep 19, 2014

Since we are discussing type errors made by beginners, what about this piece of code ? It's the usual Foo of a * bFoo of (a * b).

@charguer
Copy link
Contributor Author

@charguer charguer commented Sep 20, 2014

It's the usual Foo of a * b ≠ Foo of (a * b).

I can fix this one. Instead of the message

"The constructor T expects 3 argument(s), but is here applied to 1 argument(s)"

I could do:

"The constructor T expects 3 arguments, but is here applied to 1 argument of type [int * float * string]".

would that be sufficient?

@johnwhitington
Copy link
Contributor

@johnwhitington johnwhitington commented Sep 20, 2014

"The constructor T expects 3 arguments, but is here applied to 1 argument of type [int * float * string]".

Could you consider changing the way you quote types in the error messages - it seems confusing to use [ and ], since it looks a bit like list syntax. The normal OCaml error messages don't demarcate the type with quotes at all -- are they needed?

@pw374
Copy link

@pw374 pw374 commented Sep 20, 2014

... of type int * float * string

would be better in my opinion.

Or, let's use unicode:

... of type “int * float * string”

@charguer
Copy link
Contributor Author

@charguer charguer commented Sep 20, 2014

Introducing unicode is a good way to get my patch rejected :)

Regarding the bracket notation, I haven't taken the time to argue for it, indeed. They are easier to parse than the double quotes, because they are asymmetric delimiters. Also, the brackets makes the types stand out very clearly from the message, saving a lot of time for visually parsing the message. I don't think there is any confusion with the list notation, since OCaml does not use the bracket notation for the list type (unlike other languages).

Now, I agree that in the current state of my patch, there is a mismatch in that some error messages use brackets, while other don't. My proposal is that, if my patch receives positive feedback from the users and goes mainstream, I could add brackets to other error messages, to make them match the new ones.

@hcarty
Copy link
Member

@hcarty hcarty commented Sep 21, 2014

I like the brackets. In my opinion this would be a positive change even if that were the only update to the error messages.

@pw374
Copy link

@pw374 pw374 commented Sep 21, 2014

I don't think there's any confusion with lists, in [foo * bar * baz].
However brackets already have a meaning in OCaml types: for instance, [ A | B ] is a type, as in

# let a : [ `A | `B ]  = `A ;;
val a : [ `A | `B ] = `A

then what about the error message for this one?

# let a : [ `A | `B ]  = `C ;;

Would we add additional brackets for uniformity?

And uniformity means more changes than just that. Could we be willing to write blahblah type t while writing blahblah type [t * y]? I guess not but otherwise, what about int list list? Would it be blahblah type int list list or blahblah type [int list list]? What about [ A | B ] list array?

I have the strong feeling that OCaml authorS have already used all ASCII asymmetric "brackets", so any solution will have to be some sort of hack...

P.S. when I suggested unicode, I was not considering the patch-acceptation policy at all. And in this post, I'm doing my best to keep ignoring it, otherwise it would quickly wreck my objectivity.

N.B. All that being said, I'm glad and grateful that some people are working on improving the error messages!

@charguer
Copy link
Contributor Author

@charguer charguer commented Sep 21, 2014

I didn't anticipate the conflict with [ A | B ] --because I never use these types.
It's indeed a potential issue. What would be the implications of changing the delimiters for these union types to parentheses?

Regarding the use of pipes, e.g. | int * int | , I insist on using asymmetric delimiters, which are much easier to parse when listing several types next to each others.

@Drup
Copy link
Contributor

@Drup Drup commented Sep 21, 2014

"The constructor T expects 3 arguments, but is here applied to 1 argument of type [int * float * string]".

I think you should add paranthesis around, to be consistent with how the type should be declared to accept this.
I would prefer a better explanation in the error message, like a warning when you declare the type without parenthesis explaining the construct.

They are easier to parse than the double quotes, because they are asymmetric delimiters.

I personally don't mind having symmetric delimiters, quotes are properly highlighted in most text editors anyway.

@pw374
Copy link

@pw374 pw374 commented Sep 21, 2014

I didn't anticipate the conflict with [ A | B ] --because I never use these types.
It's indeed a potential issue. What would be the implications of changing the delimiters for these union types to parentheses?

If you mean writing ( A | B ) instead of [ A | B ], well, except breaking compatibility with a lot of programs, and readability, I don't think it'd have a great impact. However I bet that changing [ A ]to( A ) is even less likely to happen than introducing unicode in error messages...

@charguer
Copy link
Contributor Author

@charguer charguer commented Sep 21, 2014

What would be the implications of changing the delimiters for these union types to parentheses?

ok, let's forget about parentheses then.

But if we stick do brackets, do we have an ambiguity? Is it the case than variant constructors always are prefixed by a ` symbol?

@pw374
Copy link

@pw374 pw374 commented Sep 21, 2014

I think that there's no ambiguity in the [...] proposal, and that the question relies more on the aesthetic side once the uniformity problem has been solved.

I'm fine with having all type T in current compiler messages converted to [T].

I think it can not be acceptable to have non-uniformity as in having blahblah type t (for type names made of a single word) and blahblah type [int map] or blahblah type [[> bar | foo of bar | X ]] (for all types represented by more than just a single word). So, no matter what is chosen, what I really care about is uniformity, the rest is tiny details...

@damiendoligez
Copy link
Member

@damiendoligez damiendoligez commented Sep 27, 2014

I would prefer a better explanation in the error message, like a warning when you declare the type without parenthesis explaining the construct.

We can't have a warning for that because it's the normal case.

As for the delimiter, what about < t > or even << t >> ? It would be much less prone to clashing with the syntax of types.

@Drup
Copy link
Contributor

@Drup Drup commented Sep 27, 2014

As for the delimiter, what about < t > or even << t >> ? It would be much less prone to clashing with the syntax of types.

It would "clash" with object types.

@pw374
Copy link

@pw374 pw374 commented Sep 27, 2014

On 27 Sep 2014, at 22:58 pm, Damien Doligez notifications@github.com wrote:

As for the delimiter, what about < t > or even << t >> ? It would be much less prone to clashing with the syntax of types.

Huh?! What about object types?!

@damiendoligez
Copy link
Member

@damiendoligez damiendoligez commented Sep 27, 2014

Oops, I'm not a big fan of objects, so I forgot about that.

Then what about (( t )) ? Maybe a bit heavy but has the advantage of being syntactically neutral.

@gasche
Copy link
Member

@gasche gasche commented Sep 27, 2014

What about just using parentheses to disambiguate nesting, as we do in all other contexts? To reduce the syntactic burden, I would suggest that we don't need to parenthesize types that don't contain spaces.

"The constructor T expects 3 arguments, but is here applied to 1 argument of type (int * float * string)".

I think it would also be helpful, to go forward, to have reviews not only on the lexical syntax of types displayed in error messages, but also on the implementation itself. I made some privately to Arthur before he sent the pull request, but more eyeballs are always good.

@charguer
Copy link
Contributor Author

@charguer charguer commented Jan 3, 2017

@hcarty
Copy link
Member

@hcarty hcarty commented Jan 3, 2017

Would it be possible to have an environment variable or configuration switch to enable this by default?

@alainfrisch
Copy link
Contributor

@alainfrisch alainfrisch commented Jan 3, 2017

For this reason, I thought it would be better to have an explicit flag to activate the "improved error messages".

Understood, but since error messages are usually nicer and rarely misleading, tutorials will tell beginners to activate the option, everything will go well, and at some point, it will be very confusing and the user will not thing about disabling the option.

@charguer
Copy link
Contributor Author

@charguer charguer commented Jan 5, 2017

Hi all,

I've prepared an opam switch for easily testing this new error messages patch.

Before announcing on the OCaml mailing list, it would be great if some of
you could test the switch and confirms that it works.

opam switch 4.02.2+improved-errors

To activate the new errors, use the -easy-type-errors compiler flag, e.g.:

ocamlc -easy-type-errors foo.ml

Documentation:

Thanks to Armaël Guéneau and Gabriel Scherer for their help with opam packaging.

PS: Alain's concerned are valid, but since properly handling the detection of the cases where the user falls outside of the (large) fragment of OCaml that I support would require some additional work, I would like to know if people like the new messages before hacking further.

@charguer
Copy link
Contributor Author

@charguer charguer commented Jan 10, 2017

To @hcarty : indeed, if you need, we could create an alternative opam switch that sets -easy-type-errors to true by default. This might be useful for teaching, for example. However, like Alain explains, it may not be good idea to hide altogether the fact that the error messages displayed are not the "real" ones.

@boxed
Copy link

@boxed boxed commented Feb 22, 2017

The failing tests in travis, are these all because the tests check exact error messages? (I'm not very familiar with OCaml and the output is terrible so hard to check myself)

@gasche
Copy link
Member

@gasche gasche commented Feb 22, 2017

The tests that are failing are typing tests, so it is very possible that the only reason why they are failing are that the error messages differ. On the other hand, it is also a possibility that other things are broken by the patch that shouldn't. This means that test failures should be manually inspected to tell.

The easiest way to do this is to clone the branch, build it, and then in the testsuite directory (1) run the tests (make all) and (2) use make promote DIR=tests/foo on each failing test foo, which will replace the reference file for the test output with the new result, and use git diff to inspect the change each time to make sure that it is just a message difference, and not a deeper issue.

P.S.: see testsuite/HACKING.adoc for a documentation of this process.

@ShalokShalom
Copy link

@ShalokShalom ShalokShalom commented Aug 29, 2017

Is there any progress?

@damiendoligez damiendoligez added this to the long-term milestone Sep 27, 2017
@ShalokShalom
Copy link

@ShalokShalom ShalokShalom commented Mar 24, 2018

There are prettier error messages available in Bucklescript since over a year.
Why is this so hard to port?

@gasche
Copy link
Member

@gasche gasche commented Mar 24, 2018

@ShalokShalom you are talking about a completely different thing: this PR changes the way type-checking is done to have more informative type errors, and the Bucklescript work you have in mind is about changing the formatting of (the same) error messages to be clearer. I don't think it is helpful to confuse the two issues.

Also, to the question of "is this hard to port?", a first question would be "has anyone tried to do a port?". To my knowledge, the people who did this work in the Bucklescript developer never tried to backport it -- and nor has anyone else.

Would you be interested in having a look at their patches to see what could usefully be brought up upstream?

Regarding the present PR, you may have noticed the linked PRs #1472, #1505, #1510 and #1542, which come from a recent (Fall-Winter 2017) effort to merge many aspects of the present PR in the compiler. There has been work on the issue.

@Armael
Copy link
Member

@Armael Armael commented Mar 24, 2018

@ShalokShalom quick reminder: please be a decent human being.
@gasche has answered comprehensively and politely, so I think the factual aspects of the answer have been covered. I would just add this: I do not think it is a good strategy to come with an aggressive comment, complaining about an unrelated issue, to people that actually do work on the issue for free on their free time.
We are not at your disposal. In particular for porting modifications that have been done on a fork of ocaml that the authors never tried to backport. Don't get me wrong: I would be very happy to see these patches merged -- and it's probably not that hard to do; so yes, please do it.

@xavierleroy
Copy link
Contributor

@xavierleroy xavierleroy commented Oct 14, 2019

Any news on this error message saga?

@gasche
Copy link
Member

@gasche gasche commented Oct 15, 2019

We have merged the low-hanging/easy parts of Arthur's changes in the compiler already. What remains (delaying unification of the type of independent program fragments, making specific assumptions on non-GADT code etc.) is harder to integrate nicely. This may be a target for consideration for a constraint-based approach to type inference (we could ask that it does these nice things from the start), which would suggest closing the current PR. That said, I would argue that this year (with more direct interactions with Jacques Garrigue) is a good year to reconsider this and think about whether we could do more of it with the current type-checker implementation (given its ongoing cleanup). I would therefore propose to wait for yet another year before closing this PR.

@charguer
Copy link
Contributor Author

@charguer charguer commented Oct 16, 2019

My feeling is that:

  • At some point of the life of OCaml, it would have been really straightforward to integrate my changes.
  • After the introduction of GADTs, and of resolution of overloaded record fields (which made typing nonprincipal), it has become challenging (thought not impossible, provided sufficient expertise and refactoring) to have a single code compatible with current behavior and that improves typing errors.
  • There is (legitimate) concern about maintaining two distinct variants for the code that typechecks the core ocaml construct, with one variant that would provide better error messages yet that would work only over a subset of the language (OCaml minus the two advanced features mentioned above). Even though the absolute number of "redundant" lines is relatively small, it makes the compiler harder to maintain and to make evolve.
  • In addition, note that the patch that I implemented is a well-tested prototype, yet it does not have production maturity, so I would bet there exists a few corner cases that my code does not cover as well as it could. Hence, I am not sure that it would be desirable to release such important changes in the default version of the compiler without prior extensive beta-testing, or through an opt-in option for some time.

I agree with @gasche that if there is ongoing cleanup/refactoring in the typechecker, the question about what of this patch could be merged should be re-evaluated after (or during) the refactoring.

mshinwell added a commit to mshinwell/ocaml that referenced this pull request Apr 15, 2020
lthls added a commit to lthls/ocaml that referenced this pull request Sep 23, 2020
lthls added a commit to lthls/ocaml that referenced this pull request Sep 23, 2020
lthls added a commit to lthls/ocaml that referenced this pull request Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

You can’t perform that action at this time.