Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Location: refactor the code responsible for displaying errors and warnings #1952

Merged
merged 6 commits into from Aug 9, 2018

Conversation

Projects
None yet
7 participants
@Armael
Copy link
Member

commented Aug 1, 2018

This PR is an attempt at refactoring the part of Location responsible for printing errors and warnings, to have a clearer separation between the structure of errors(/warnings), and backend-specific display of that structure.

There are indeed several "backends" on which an error can get printed, all with slightly different capabilities, in particular with respect to the highlighting of locations. I can distinguish:

  • "batch mode", i.e. the standard compiler output when called from the command-line
  • the toplevel, which e.g. can have the extra capability of going up and highlighting parts of the input to point at locations
  • merlin, which has different support from the editor to highlight code directly in the source code (currently merlin maintains a patched location.ml)
  • expect_test and the manual script caml_tex.ml also both display errors/collect locations in a custom way using the compiler-libs api.

I would say that the current implementation has the following issues:

  • The current Location module exposes many functions related to the printing of errors/warnings/locations; with some of them being re-definable, and indeed re-defined in some places to allow for different printing backends. This makes the printing code entangled with the rest of the implementation, and hard to keep track of.
  • On the printing code side, a few quirks make it hard to print errors nicely (example 1: there's a hack to feed a custom formatter to errorf which adds fake indentation to anticipate the fact that the message will be printed in front of Error: ) (example 2: poor indentation in the printing of sub-locations for name conflicts)
  • On the error-message-writer side, it is hard to write errors that look good on all the backends, the result being that often, information that could be propagated to the backend for backend-specific display (e.g. sub-locations of an error) are instead baked in the error message.
  • As a corollary, trying to improve the printing of errors is hard (this was the initial motivation for this PR)

In this PR

This PR proposes to rework the current implementation, with minimal changes to its observational behavior, and pave the way for further improvements, in particular to the display of errors with several locations (typically, a main location and several sub-locations).

More specifically:

  • Rework the Location.error type.
    • messages as strings do not compose well as soon as they are multiline (e.g. with regard to Format boxes). In this PR error messages become of the type formatter -> unit. This required some refactoring in some places (e.g. printtyp.ml) to make sure the side effects performed when crafting an error message are still triggered at the right time. We discussed a bit with @Drup and @Octachron of an alternative solution (where messages would instead be formats where all %a, %t, %d... have been expanded) but I leave this as a potential future improvement.
    • remove the if_highlight field as it is intrinsically backend-dependent (it provided an alternative error message when in the toplevel if parts of the input had been highlighted) and is used in only one error message ("syntax error: missing closing )" )
    • there is already a notion of sub-errors, which is used only once across the codebase (also for "syntax error: missing closing )"). Sub-errors were recursively error themselves; this PR changes it to a list of sub-messages (and not recursively). The rationale is that it is simpler, but hopefully expressive enough to describe somewhat faithfully the structure of errors with sub-locations. The plan in future PRs is to rework existing errors to actually use this feature, instead of baking sub-locations in the text of the error itself as it is currently done almost everywhere.
  • Generalize Location.error to a Location.report type, which can then be used to display warnings as well. The idea is that is gives us a more uniform printing of errors and warnings. The intent is that Location.report is a backend-agnostic description of an error or warning.
  • Define "report printers", which define backend-specific ways of printing a report. Also make it possible to easily reuse code between report printers (e.g. the toplevel printer is defined after the batch mode printer; expect_test and the manual backend are also defined by reusing the toplevel/batch mode backend and modifying some bits)
  • Limit the number of printing-related functions exposed by Location; the point is that everyone printing errors should emit a Location.report

Unfortunately, because of the way the current implementation is entangled, I not manage to make these changes independently -- so they come in a big one commit, sorry for that!

Changes to the compiler output

The objective is to minimize the changes in the output; actual improvements to the way reports are printed should come in future PRs. There are nevertheless some changes:

  • indentation-related, and arguably always an improvement
  • there is a regression when displaying name conflicts in the terminal (there is no highlighting anymore). I plan to fix that in an upcoming PR which will turn such conflicts into proper sub-messages (I could do that here but it involves a whole bunch of uninteresting plumbing)
  • changes to the "missing closing)" message; maybe not for the best but at least it is consistent with the rest now
  • one weird change that Format experts (@Drup @Octachron maybe?) may want to look at -- I point to it in a comment below.

Comments on the code

  • A report_printer is currently defined as a class type. The goal is to use inheritance to be able to re-use code between printers (the toplevel, expect_test and manual printers are defined that way). The currently used report printer can be changed when using the compiler-libs API. This way I that merlin can define its own report_printer (inheriting from the batch mode printer for example) and simply switch to it, instead of having to patch location.ml.

    If people are reluctant to have objects in this part of the compiler, I could rewrite the printers as a record of functions -- given the way inheritance between printers is used currently, it should not be too painful. EDIT: printers have been rewritten as a record of functions

  • I expect this to cause some breakage at least in utop; I can send patches to fix it if needed.

  • The PR is best reviewed commit by commit. The two first commits move some code around and add docstrings in location.{ml,mli}, but do not change anything else. The third commit does the refactoring, and should probably be read starting from location.mli.

int
Error: This expression has type
string but an expression was expected of type
int

This comment has been minimized.

Copy link
@Armael

Armael Aug 1, 2018

Author Member

I don't know why this is happening. Any ideas?

This comment has been minimized.

Copy link
@Octachron

Octachron Aug 3, 2018

Contributor

This is yet another memory effect due to the coupling of formatter margin and max_indent .

More precisely, in this test, the format margin is first set to 20 then to 80 . This has the insidious effect of setting max_indent for Format.std_formatter to 10.

However, previously, Location.errorf printed to a new private formatter which inherited the current margin (and only the margin) from Format.std_formatter at each new call of errorf (this setup was done by Misc.Color.set_color_tag_handling). Thus, this formatter kept the default max_indent value of 68 and since there was no formatting boxes remaining during the printing of Location.error to stdout, the problematic value of max_indent was unseen.

The fix here is thus to properly set the geometry of Format.std_formatter by replacing
Format.set_margin 80 with Format.set_margin 80; Format.set_max_indent 70 (in that order!).

This comment has been minimized.

Copy link
@Armael

Armael Aug 4, 2018

Author Member

Wow, thanks for the explanation! That's not a very intuitive behavior... :) I'll fix the test as you say.

@gasche

This comment has been minimized.

Copy link
Member

commented Aug 1, 2018

I don't have the time budget to look at this right now (not before the end of August, I think), but a few quick remarks:

  • Indeed, objects are a no-go. You should use records of functions. If you want to have a form of inheritance, you can use open recursion, although it is not totally clear what this is necessary here. Moving useful reusable block to a shared library (eg. in Misc) could also be an option for code reuse.

  • Your commit list is confusing because I cannot tell where the big changes are. After a while my names adjusted to the dimness of your naming choice and I noticed that you use "reorganize" for changes that (I suppose) don't change the behavior and "refactor" for changes that may change the program behavior. Please use "rework" or some other word that does not mean "move around without changes" for the latter -- I do understand that the initial goal was to have a proper refactoring.

  • You say that the PR "probably breaks utop", maybe you could try to actually build utop with the changes and look at the necessary patches? This could be a good way to battle-test the generality of the new interface you propose.

@trefis

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2018

To add to @gasche's comment:

  • I have the time budget to look at the code right now: unless someone beats me to it, you can expect a proper review from me soon

  • This could be a good way to battle-test the generality of the new interface you propose.

    UTop doesn't really do anything different from the toplevel in terms of error printing as far as I remember. It should be fine.

  • thank you for the very nice PR description!

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2018

I always found it weird that the module called Location was responsible for printing errors and had a dependency on Warnings (which is were the actual location type is really defined first). Let's use this big redesign to organize features in a more logical way:

  • Location should be responsible for managing locations: type definitions, "constructors", perhaps printers.

  • A new Error module should be introduced, with a dependency on Location; responsible for printing error.

  • The Warnings module should depend on Location (and probably Error)

Location and Error should have no or little dependencies on specificities of the OCaml tools (contrary to Warnings, which has an exhaustive list of warnings known to OCaml). One might be able (in theory) to use them is other projects -- I'm not saying that we want to encourage that, but low-level manipulation of locations and error reporting have no reason to be tied too much to the rest of the compiler code base.

@Armael

This comment has been minimized.

Copy link
Member Author

commented Aug 1, 2018

I always found it weird that the module called Location was responsible for printing errors and had a dependency on Warnings (which is were the actual location type is really defined first). Let's use this big redesign to organize features in a more logical way

I completely agree. I would have suggested a PR doing just that earlier, but I wasn't sure if it was OK to introduce new modules in compiler-libs, since this pollutes the global namespace.

@Armael Armael force-pushed the Armael:location-refactor branch from 768ca43 to 2a7e9ef Aug 1, 2018

@trefis

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2018

I wasn't sure if it was OK to introduce new modules in compiler-libs, since this pollutes the global namespace.

I thought about that as well, and my intuition would be: it's fine. But I guess to properly answer it we want to know who links against compiler-libs.
I don't expect ppxs to have problems with the apparition of an Error module. Do we know any other use cases of compiler-libs?

Also, we can always point people to https://github.com/janestreet/ocaml-compiler-libs

@gasche

This comment has been minimized.

Copy link
Member

commented Aug 1, 2018

Yep: it's "not really ok" right now (except if your module has a really long name, right?), and a PR to prefix compiler-libs modules in the same horrible way that Stdlib__ works is overdue.

@nojb

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2018

a PR to prefix compiler-libs modules in the same horrible way that Stdlib__ works is overdue.

@gasche do you think that prefixing compiler-libs has a chance of getting merged, given that it will break so much code out there?

@Drup

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2018

I have been designated as "Format expert", so I'll rise to the task and review later this week! I general though, I strongly agree with all the motivations for doing these changes. I also agree with @alainfrisch's remark, but I'm afraid it's going to divert our attention too much if we try to deal with that right now.

I just want to point out that I have used sub-errors for PPXs for multi-location errors (in jsoo's ppx in particular, iirc). However, Location is versioned as part of ocaml-migrate-parsetree, so you can make breaking changes there.

@gasche wrt objects in the compiler. It would be nice to make that extra clear somewhere in the contributing guidelines and explain why. It would give the occasion to confirm with all the core devs that this restriction still stands.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2018

given that it will break so much code out there?

Cannot we apply the same hacks as for Stdlib to avoid breaking code?

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2018

Anyway, I think that people linking against compiler-libs implicit agree to get the occasional(?) breakage. I don't want to wait until a namespacing solution for compiler-libs to allow adding any module to the compiler code base, that would be crazy (it's not like the set of compiler-libs modules had been stable anyway).

@Armael Armael force-pushed the Armael:location-refactor branch from 2a7e9ef to 2048619 Aug 4, 2018

@@ -0,0 +1,6 @@
File "hello.ml", line 18, characters 20-27:
Error: These are the contents of the main error message. It is very long and should wrap across several lines.

This comment has been minimized.

Copy link
@Armael

Armael Aug 4, 2018

Author Member

Interestingly, with the changes to use kdprintf, this line is not wrapped anymore, and I have no idea why. Also sorry for rebasing, I just realised it would have been useful to have kept the previous version...

This comment has been minimized.

Copy link
@Armael

Armael Aug 5, 2018

Author Member

Nevermind, that was a mistake on my side: a Format.pp_print_text was missing after refactoring errors_batch.ml

@Armael

This comment has been minimized.

Copy link
Member Author

commented Aug 4, 2018

I just pushed a new version of the patch that uses the new Format.kdprintf function implemented by @Drup -- this makes crafting the error messages as nice as before!

@Armael Armael force-pushed the Armael:location-refactor branch 2 times, most recently from 774cd08 to 6526979 Aug 4, 2018

Line 2, characters 2-12:
type t = A
^^^^^^^^^^
Definition of type t/2

This comment has been minimized.

Copy link
@gasche

gasche Aug 4, 2018

Member

Maybe you have said it before, but how come the definitions are not printed anymore? Arguably that is a regression for the message, but maybe that comes from (otherwise sensible) expect vs. batch-mode difference.

This comment has been minimized.

Copy link
@Armael

Armael Aug 4, 2018

Author Member

Yes it is buried towards the end of the first message:

a regression when displaying name conflicts in the terminal (there is no highlighting anymore). I plan to fix that in an upcoming PR which will turn such conflicts into proper sub-messages (I could do that here but it involves a whole bunch of uninteresting plumbing)

@Armael Armael force-pushed the Armael:location-refactor branch 2 times, most recently from 0c62dd9 to 75869e8 Aug 5, 2018

@gasche

This comment has been minimized.

Copy link
Member

commented Aug 5, 2018

This is a minor remark for something that needn't go into the PR, but maybe you should know if you keep working on this code. Most of the formatting in error messages was written before pp_print_text, and uses @ explicitly to cut lines. I personally think that it would be nicer to use pp_print_text most of the time for free-flowing text (it is more flexible to adapt to changed print width, higher indentations, etc.); sometimes some cut places make sense because you might want to show code/types/formulas on their own line, but most of the time the cuts are mostly arbitrary. In the future, please feel free to change manual cuts into pp_print_text if it is a natural part of the change you are making.

@Armael Armael force-pushed the Armael:location-refactor branch 2 times, most recently from 195febc to 0996b6d Aug 6, 2018

@Armael

This comment has been minimized.

Copy link
Member Author

commented Aug 6, 2018

Objects and classes are now gone.

@trefis
Copy link
Contributor

left a comment

I think there is a small bug (on top of the regression pointed out by @gasche) but appart from that I really like this PR:

  • location.ml is much cleaner than before you started working on it
  • the reporting/printing of errors throughout the codebase is a lot more regular
  • thanks to @Drup's recent PR reporting an error is still as convenient as before this (that wasn't the case in the initial version of this PR), yeay!

I'll let you have a look at the potential bug that I pointed out before I approve though. :)

@@ -661,7 +661,6 @@ module rec Bad : A = Bad;;
[%%expect{|
module type Alias = sig module N : sig end module M = N end
module F : functor (X : sig end) -> sig type t end
Line 1:

This comment has been minimized.

Copy link
@trefis

trefis Aug 8, 2018

Contributor

I assume this is an example of a "dummy loc" that doesn't get printed anymore?
I actually agree with the choice, the "Line 1" was just nonsense.

I know it's unrelated but did you happen to have a look at why the error gets attached to that location?

This comment has been minimized.

Copy link
@Armael

Armael Aug 8, 2018

Author Member

Yes precisely, this was a ghost dummy location that does not get printed anymore. For your second question, I have no idea. I'll try to investigate.

Line 2, characters 0-5,
Line 2, characters 10-12:
Line 5, characters 5-7,
Line 5, characters 0-1:

This comment has been minimized.

Copy link
@trefis

trefis Aug 8, 2018

Contributor

The order of positions has been reversed here and in all the other tests displaying a location / range.
I assume that's a bug.

This comment has been minimized.

Copy link
@Armael

Armael Aug 8, 2018

Author Member

Interestingly, this is not a bug per-se.

The new behavior is to print the locations in the same order they appear in the message below (so first the main location, then the sub-locations in order). The current behavior is to print locations in reverse order. This makes little sense in general, but on this particular error (which is in fact the only one using sub-locations currently) it matches the left-to-right/top-to-bottom intuitive ordering of locations. Since on this error, the main location (the token that should be ')' but isn't) appears in the source code after the sub-location (the opening '('), we get a reversed order which looks weird.

What I could do is sort the locations lexicographically, instead of printing them in the order they appear in the message. I think the two make sense in general, we just have to choose. Any opinions?

This comment has been minimized.

Copy link
@gasche

gasche Aug 8, 2018

Member

Given the constraint on this first step (not change the current code or results too much), I would think that sorting the locations before printing them textually (and highlighting them) would be slightly nicer than the current output, indeed.

This comment has been minimized.

Copy link
@trefis

trefis Aug 8, 2018

Contributor

You could also not print the location that is printed in the sub-message. But I guess that would remove half of the highlighting?

I might still prefer that to adding any kind of sorting code if you indeed fix the highlighting for sub-messages in a later PR.
If you've done that work already and simply don't want it part of the present PR, you could perhaps setup a branch on your repo for us to have a look.

This comment has been minimized.

Copy link
@Armael

Armael Aug 8, 2018

Author Member

To clear things up, here's a summary of the situation.

On the following code:

let x = (1 +
  2 in ();;

1. Before this PR

1.1 In batch mode

(modulo a missing \n)

File "/tmp/toto.ml", line 2, characters 4-6:
Error: Syntax error: ')' expected
File "/tmp/toto.ml", line 1, characters 8-9:
Error: This '(' might be unmatched

1.2 In an ANSI toplevel

scrot1 2
-> in ANSI mode, no location is printed and all locations are highlighted

1.3 In a dumb toplevel

On 4.07:

Characters 8-9:
  let x = (1 +
          ^
Syntax error: ')' expected, the highlighted '(' might be unmatched

(the sub-location gets highlighted arbitrarily)

On trunk (modulo some missing \n):

Line 1, characters 8-9,
Line 2, characters 4-6:
  ........(...
  ....in.....
Syntax error: ')' expected, the highlighted '(' might be unmatched

-> highlighting in dumb mode prints the location(s)

2. With this PR (currently)

2.1 In batch mode

File "/tmp/toto.ml", line 2, characters 4-6:
Error: Syntax error: ')' expected
       File "/tmp/toto.ml", line 1, characters 8-9:
       This '(' might be unmatched

2.2 In an ANSI toplevel

scrot2 2
-> in ANSI mode, all locations are highlighted and the main location is not printed

2.3 In a dumb toplevel

Line 2, characters 4-6,
Line 1, characters 8-9:
  ........(...
  ....in.....
Error: Syntax error: ')' expected
       Line 1, characters 8-9:
       This '(' might be unmatched

-> in dumb mode, highlighting includes printing the locations, and the main location is not printed

This comment has been minimized.

Copy link
@Armael

Armael Aug 8, 2018

Author Member

Here's an alternative propositon for 2.3: split the highlighting part across the locations:

Line 2, characters 4-6:
    2 in ();;
      ^^
Error: Syntax error: ')' expected
       Line 1, characters 8-9:
         let x = (1 +
                 ^         
       This '(' might be unmatched

This comment has been minimized.

Copy link
@trefis

trefis Aug 8, 2018

Contributor

-> in dumb mode, highlighting includes printing the locations, and the main location is not printed

Isn't it?

Here's an alternative propositon for 2.3: split the highlighting part across the locations:

I think that's a clear improvement over the current (and previous) situation.
I vote yes.

This comment has been minimized.

Copy link
@Armael

Armael Aug 8, 2018

Author Member

Isn't it?

Hum, it is indeed. I guess I meant that it is printed as part of the "dumb" highlighting function, but not re-printed after as part of the error (just like in ANSI mode).

let highlight_locations ppf locs =
setup_terminal ();
let norepeat =
try Sys.getenv "TERM" = "norepeat" with Not_found -> false

This comment has been minimized.

Copy link
@trefis

trefis Aug 8, 2018

Contributor

Somewhat unrelated to your PR but I noticed there were no tests exercising this norepeat behavior.
Could you add one as a preliminary commit? (I assume we can pass some environment variables through ocamltest, is that correct @shindere?)

This comment has been minimized.

Copy link
@gasche

gasche Aug 8, 2018

Member

Yes, flags = "..." works as expected for the toplevel action.

This comment has been minimized.

Copy link
@trefis

trefis Aug 8, 2018

Contributor

That's not what I want though, flags is for cmdline flags. Here the behavior is controlled through an environment variable (TERM).

This comment has been minimized.

Copy link
@Armael

Armael Aug 8, 2018

Author Member

Good idea, I'll investigate how to test this. It would also be interesting to know what is the motivation behind TERM=norepeat -- is it actually used somewhere? Maybe @xavierleroy knows?

This comment has been minimized.

Copy link
@Armael

Armael Aug 8, 2018

Author Member

Digging up a bit, TERM=norepeat has been introduced by @garrigue as TERM=dumb in b92bf2f , then renamed to character in 9a32677 , then to norepeatin 59d5d16. The text of the first commit seems to imply that it was used by the manual, but this is not the case anymore AFAIK.

This comment has been minimized.

Copy link
@gasche

gasche Aug 8, 2018

Member

Indeed, relying on norepeat in the manual build was removed in 4be6caf. Feel free to remove that special support in a later PR or this one.

This comment has been minimized.

Copy link
@Armael

Armael Aug 8, 2018

Author Member

I'll do that in a separate PR just in case someone actually uses TERM=norepeat in the wild.

This comment has been minimized.

Copy link
@Octachron

Octachron Aug 8, 2018

Contributor

In theory, norepeat is useful when communicating with a toplevel launched in a separate process, where repeating the faulty line is not a very useful behavior. (At least, this was the use case of the manual). This kind of use is partially superseded by compiler-libs , but I am not sure if it is still used in the wild.

(** Hook for redefining the printer of reports. *)

val default_report_printer: unit -> report_printer
(** Original report printer for use in hooks. *)

This comment has been minimized.

Copy link
@trefis

trefis Aug 8, 2018

Contributor

I understand why it is that way, but still find it a bit unpleasant that default_report_printer is unit -> report_printer instead of report_printer (and likewise for the ref above).

I don't have any nice alternative though (that is: I don't think saying that batch_mode_printer is the default and that the toplevel should update report_printer would be a good decision).

This comment has been minimized.

Copy link
@Armael

Armael Aug 8, 2018

Author Member

Good point, I agree it is a bit unpleasant.

What do you think of this alternative proposition:
default_report_printer is a report_printer (and likewise for the ref). Then, default_report_printer is defined as a new report_printer named compiler_printer, which does the autodetection and dispatches between batch_mode_printer and toplevel_printer.

I think it makes sense, and possibly nicer than the current proposition.

This comment has been minimized.

Copy link
@trefis

trefis Aug 8, 2018

Contributor

Do you mean: each of the functions inside the record would do the dispatch?
That seems fine yes.

This comment has been minimized.

Copy link
@Armael

Armael Aug 8, 2018

Author Member

Hum actually I spoke too fast, I was thinking of doing the dispatch at the entry point, but this does not make sense since you have to also export the other functions. Dispatching at each function would kinda work I guess but it is not very nice.

This comment has been minimized.

Copy link
@trefis

trefis Aug 8, 2018

Contributor

Indeed. Do as you see fit (not doing anything is also an acceptable choice)

This comment has been minimized.

Copy link
@Armael

Armael Aug 8, 2018

Author Member

I think I will just add a comment to explain the purpose of the unit -> ...

@Armael Armael force-pushed the Armael:location-refactor branch from 0996b6d to 7303e5d Aug 8, 2018

@Armael

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2018

I just rebased on top of trunk, and added a commit that implements the remarks above (mainly a better printer for multiple locations in a dumb toplevel). This allowed me to simplify the code a bit more (highlight_dumb does not print its own locations anymore).
On the downside, we got the two Line 1: back, and two new ones in an other place where dummy locs leak out. I don't think it is a big deal, I plan to do some minor refactoring in print_loc in an other PR to try to get rid of them.

@Armael Armael force-pushed the Armael:location-refactor branch from 7303e5d to b90a058 Aug 8, 2018

@trefis

trefis approved these changes Aug 8, 2018

Copy link
Contributor

left a comment

Ship it.

@Armael Armael force-pushed the Armael:location-refactor branch from b90a058 to 79c8a94 Aug 8, 2018

@trefis trefis merged commit 0312eb3 into ocaml:trunk Aug 9, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

gasche added a commit to gasche/ocaml that referenced this pull request Aug 19, 2018

gasche added a commit to gasche/ocaml that referenced this pull request Aug 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.