Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved error messages #102

Closed
wants to merge 38 commits into from
Closed

Conversation

charguer
Copy link
Contributor

Short paper:
http://www.chargueraud.org/research/2014/ocaml_errors/ocaml_errors.pdf

Slides:
http://www.chargueraud.org/research/2014/ocaml_errors/demo.html

Video:
https://www.youtube.com/watch?v=V_ipQZeBueg

Technical description:
See file improved_errors_branch.txt in the new branch (or in the diff page at the url below).

Patch:
charguer/ocaml@ocaml:4.02...improved-errors

@johnwhitington
Copy link
Contributor

This was a fun presentation!

Do you think anything can be done about this one with your new system?

http://caml.inria.fr/mantis/view.php?id=6171

@johnwhitington
Copy link
Contributor

The other classic one that trips almost everyone up at some time or another is this sort of thing with imperative code - it's a syntax error not a typing one:

let f t = ()

let x a b =
  f a;
  f b;

let y a b =
  f b;
  f a

let z = 1

The error is "syntax error", but the location is at the final "let" - not just one declaration later than the real problem, but two declarations later!

@charguer
Copy link
Contributor Author

Parsing errors, that's another topic, sorry :)

@Drup
Copy link
Contributor

Drup commented Sep 19, 2014

Since we are discussing type errors made by beginners, what about this piece of code ? It's the usual Foo of a * bFoo of (a * b).

@charguer
Copy link
Contributor Author

It's the usual Foo of a * b ≠ Foo of (a * b).

I can fix this one. Instead of the message

"The constructor T expects 3 argument(s), but is here applied to 1 argument(s)"

I could do:

"The constructor T expects 3 arguments, but is here applied to 1 argument of type [int * float * string]".

would that be sufficient?

@johnwhitington
Copy link
Contributor

"The constructor T expects 3 arguments, but is here applied to 1 argument of type [int * float * string]".

Could you consider changing the way you quote types in the error messages - it seems confusing to use [ and ], since it looks a bit like list syntax. The normal OCaml error messages don't demarcate the type with quotes at all -- are they needed?

@pw374
Copy link

pw374 commented Sep 20, 2014

... of type int * float * string

would be better in my opinion.

Or, let's use unicode:

... of type “int * float * string”

@charguer
Copy link
Contributor Author

Introducing unicode is a good way to get my patch rejected :)

Regarding the bracket notation, I haven't taken the time to argue for it, indeed. They are easier to parse than the double quotes, because they are asymmetric delimiters. Also, the brackets makes the types stand out very clearly from the message, saving a lot of time for visually parsing the message. I don't think there is any confusion with the list notation, since OCaml does not use the bracket notation for the list type (unlike other languages).

Now, I agree that in the current state of my patch, there is a mismatch in that some error messages use brackets, while other don't. My proposal is that, if my patch receives positive feedback from the users and goes mainstream, I could add brackets to other error messages, to make them match the new ones.

@hcarty
Copy link
Member

hcarty commented Sep 21, 2014

I like the brackets. In my opinion this would be a positive change even if that were the only update to the error messages.

@pw374
Copy link

pw374 commented Sep 21, 2014

I don't think there's any confusion with lists, in [foo * bar * baz].
However brackets already have a meaning in OCaml types: for instance, [ A | B ] is a type, as in

# let a : [ `A | `B ]  = `A ;;
val a : [ `A | `B ] = `A

then what about the error message for this one?

# let a : [ `A | `B ]  = `C ;;

Would we add additional brackets for uniformity?

And uniformity means more changes than just that. Could we be willing to write blahblah type t while writing blahblah type [t * y]? I guess not but otherwise, what about int list list? Would it be blahblah type int list list or blahblah type [int list list]? What about [ A | B ] list array?

I have the strong feeling that OCaml authorS have already used all ASCII asymmetric "brackets", so any solution will have to be some sort of hack...

P.S. when I suggested unicode, I was not considering the patch-acceptation policy at all. And in this post, I'm doing my best to keep ignoring it, otherwise it would quickly wreck my objectivity.

N.B. All that being said, I'm glad and grateful that some people are working on improving the error messages!

@charguer
Copy link
Contributor Author

I didn't anticipate the conflict with [ A | B ] --because I never use these types.
It's indeed a potential issue. What would be the implications of changing the delimiters for these union types to parentheses?

Regarding the use of pipes, e.g. | int * int | , I insist on using asymmetric delimiters, which are much easier to parse when listing several types next to each others.

@Drup
Copy link
Contributor

Drup commented Sep 21, 2014

"The constructor T expects 3 arguments, but is here applied to 1 argument of type [int * float * string]".

I think you should add paranthesis around, to be consistent with how the type should be declared to accept this.
I would prefer a better explanation in the error message, like a warning when you declare the type without parenthesis explaining the construct.

They are easier to parse than the double quotes, because they are asymmetric delimiters.

I personally don't mind having symmetric delimiters, quotes are properly highlighted in most text editors anyway.

@pw374
Copy link

pw374 commented Sep 21, 2014

I didn't anticipate the conflict with [ A | B ] --because I never use these types.
It's indeed a potential issue. What would be the implications of changing the delimiters for these union types to parentheses?

If you mean writing ( A | B ) instead of [ A | B ], well, except breaking compatibility with a lot of programs, and readability, I don't think it'd have a great impact. However I bet that changing [ A ]to( A ) is even less likely to happen than introducing unicode in error messages...

@charguer
Copy link
Contributor Author

What would be the implications of changing the delimiters for these union types to parentheses?

ok, let's forget about parentheses then.

But if we stick do brackets, do we have an ambiguity? Is it the case than variant constructors always are prefixed by a ` symbol?

@pw374
Copy link

pw374 commented Sep 21, 2014

I think that there's no ambiguity in the [...] proposal, and that the question relies more on the aesthetic side once the uniformity problem has been solved.

I'm fine with having all type T in current compiler messages converted to [T].

I think it can not be acceptable to have non-uniformity as in having blahblah type t (for type names made of a single word) and blahblah type [int map] or blahblah type [[> bar | foo of bar | X ]] (for all types represented by more than just a single word). So, no matter what is chosen, what I really care about is uniformity, the rest is tiny details...

@damiendoligez
Copy link
Member

I would prefer a better explanation in the error message, like a warning when you declare the type without parenthesis explaining the construct.

We can't have a warning for that because it's the normal case.

As for the delimiter, what about < t > or even << t >> ? It would be much less prone to clashing with the syntax of types.

@Drup
Copy link
Contributor

Drup commented Sep 27, 2014

As for the delimiter, what about < t > or even << t >> ? It would be much less prone to clashing with the syntax of types.

It would "clash" with object types.

@pw374
Copy link

pw374 commented Sep 27, 2014

On 27 Sep 2014, at 22:58 pm, Damien Doligez notifications@github.com wrote:

As for the delimiter, what about < t > or even << t >> ? It would be much less prone to clashing with the syntax of types.

Huh?! What about object types?!

@damiendoligez
Copy link
Member

Oops, I'm not a big fan of objects, so I forgot about that.

Then what about (( t )) ? Maybe a bit heavy but has the advantage of being syntactically neutral.

@gasche
Copy link
Member

gasche commented Sep 27, 2014

What about just using parentheses to disambiguate nesting, as we do in all other contexts? To reduce the syntactic burden, I would suggest that we don't need to parenthesize types that don't contain spaces.

"The constructor T expects 3 arguments, but is here applied to 1 argument of type (int * float * string)".

I think it would also be helpful, to go forward, to have reviews not only on the lexical syntax of types displayed in error messages, but also on the implementation itself. I made some privately to Arthur before he sent the pull request, but more eyeballs are always good.

@gasche
Copy link
Member

gasche commented Feb 22, 2017

The tests that are failing are typing tests, so it is very possible that the only reason why they are failing are that the error messages differ. On the other hand, it is also a possibility that other things are broken by the patch that shouldn't. This means that test failures should be manually inspected to tell.

The easiest way to do this is to clone the branch, build it, and then in the testsuite directory (1) run the tests (make all) and (2) use make promote DIR=tests/foo on each failing test foo, which will replace the reference file for the test output with the new result, and use git diff to inspect the change each time to make sure that it is just a message difference, and not a deeper issue.

P.S.: see testsuite/HACKING.adoc for a documentation of this process.

@ShalokShalom
Copy link

Is there any progress?

@damiendoligez damiendoligez added this to the long-term milestone Sep 27, 2017
@ShalokShalom
Copy link

There are prettier error messages available in Bucklescript since over a year.
Why is this so hard to port?

@gasche
Copy link
Member

gasche commented Mar 24, 2018

@ShalokShalom you are talking about a completely different thing: this PR changes the way type-checking is done to have more informative type errors, and the Bucklescript work you have in mind is about changing the formatting of (the same) error messages to be clearer. I don't think it is helpful to confuse the two issues.

Also, to the question of "is this hard to port?", a first question would be "has anyone tried to do a port?". To my knowledge, the people who did this work in the Bucklescript developer never tried to backport it -- and nor has anyone else.

Would you be interested in having a look at their patches to see what could usefully be brought up upstream?

Regarding the present PR, you may have noticed the linked PRs #1472, #1505, #1510 and #1542, which come from a recent (Fall-Winter 2017) effort to merge many aspects of the present PR in the compiler. There has been work on the issue.

@Armael
Copy link
Member

Armael commented Mar 24, 2018

@ShalokShalom quick reminder: please be a decent human being.
@gasche has answered comprehensively and politely, so I think the factual aspects of the answer have been covered. I would just add this: I do not think it is a good strategy to come with an aggressive comment, complaining about an unrelated issue, to people that actually do work on the issue for free on their free time.
We are not at your disposal. In particular for porting modifications that have been done on a fork of ocaml that the authors never tried to backport. Don't get me wrong: I would be very happy to see these patches merged -- and it's probably not that hard to do; so yes, please do it.

@xavierleroy
Copy link
Contributor

Any news on this error message saga?

@gasche
Copy link
Member

gasche commented Oct 15, 2019

We have merged the low-hanging/easy parts of Arthur's changes in the compiler already. What remains (delaying unification of the type of independent program fragments, making specific assumptions on non-GADT code etc.) is harder to integrate nicely. This may be a target for consideration for a constraint-based approach to type inference (we could ask that it does these nice things from the start), which would suggest closing the current PR. That said, I would argue that this year (with more direct interactions with Jacques Garrigue) is a good year to reconsider this and think about whether we could do more of it with the current type-checker implementation (given its ongoing cleanup). I would therefore propose to wait for yet another year before closing this PR.

@charguer
Copy link
Contributor Author

My feeling is that:

  • At some point of the life of OCaml, it would have been really straightforward to integrate my changes.
  • After the introduction of GADTs, and of resolution of overloaded record fields (which made typing nonprincipal), it has become challenging (thought not impossible, provided sufficient expertise and refactoring) to have a single code compatible with current behavior and that improves typing errors.
  • There is (legitimate) concern about maintaining two distinct variants for the code that typechecks the core ocaml construct, with one variant that would provide better error messages yet that would work only over a subset of the language (OCaml minus the two advanced features mentioned above). Even though the absolute number of "redundant" lines is relatively small, it makes the compiler harder to maintain and to make evolve.
  • In addition, note that the patch that I implemented is a well-tested prototype, yet it does not have production maturity, so I would bet there exists a few corner cases that my code does not cover as well as it could. Hence, I am not sure that it would be desirable to release such important changes in the default version of the compiler without prior extensive beta-testing, or through an opt-in option for some time.

I agree with @gasche that if there is ongoing cleanup/refactoring in the typechecker, the question about what of this patch could be merged should be re-evaluated after (or during) the refactoring.

mshinwell added a commit to mshinwell/ocaml that referenced this pull request Apr 15, 2020
lthls pushed a commit to lthls/ocaml that referenced this pull request Sep 23, 2020
lthls pushed a commit to lthls/ocaml that referenced this pull request Sep 23, 2020
lthls pushed a commit to lthls/ocaml that referenced this pull request Sep 24, 2020
chambart pushed a commit to chambart/ocaml-1 that referenced this pull request Aug 4, 2021
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Oct 5, 2021
@gasche
Copy link
Member

gasche commented Mar 10, 2023

We are looking at old PRs again. I think that this one is not going to get merged in the future -- rather, we want to take ideas from it and merge them piece by piece, as we already did it in parts. I believe that this could be closed, but we can wait a bit to see if people have different opinions.

@damiendoligez
Copy link
Member

We have waited a bit and nothing happened. I'm archiving (closing) this PR, which did its job and deserves retirement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet