Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print end line and valid end character for multi-lines locations #8541

Merged
merged 1 commit into from Apr 8, 2019

Conversation

@Khady
Copy link
Contributor

commented Mar 24, 2019

When a location is related to multiple lines of code, it is printed
incorrectly. More specifically, the end character is actually an
offset between the beginning and the end of the location.

This commit changes the format of the locations when they cover
multiple lines. It adds the end line and the end character is now a
proper column rather than an offset. It doesn't affect locations
related to a single line.

The old format was:

Line STARTLINE, characters STARTCHAR-OFFSET

The new format is:

Lines STARTLINE-ENDLINE, characters STARTCHAR-ENDCHAR

The new format has been chosen so that emacs doesn't need any
change. I unfortunately don't have a vim with enough configuration
to check if it is affected.

Copy link
Member

left a comment

Seems fine in principle, see code comments.

parsing/location.ml Outdated Show resolved Hide resolved
parsing/location.ml Outdated Show resolved Hide resolved
@Khady Khady force-pushed the Khady:message-location-multi-lines branch from bb80c31 to 2b034a7 Mar 24, 2019
@Armael

This comment has been minimized.

Copy link
Member

commented Mar 24, 2019

I'm all in favor for that change. I think it would still be nice to have an idea of whether we break some of the text editors people use (@Khady says emacs is fine, what about vim, vscode..?), so that we can ping the respective maintainers of the ocaml modes/plugins.

@Armael

This comment has been minimized.

Copy link
Member

commented Mar 24, 2019

Navigating the world of ocaml vscode plugins turns out to be a bit confusing, but it seems at least vscode-reasonml and reason-language-server will need updating (vscode-reasonml , reason-language-server).
I also found https://github.com/hackwaly/vscode-ocaml but it doesn't seem to be maintained anymore.

For the vim users out there, is there a plugin that does something like "jump to next error"? I couldn't tell if there's support for that actually.

@c-cube

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2019

@rgrinberg probably knows more about vim; I definitely do have a jump to error in vim but it comes from merlin, afaik.

@let-def

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2019

The error format interpreted by vim is defined there: https://github.com/rgrinberg/vim-ocaml/blob/master/compiler/ocaml.vim
This is a fork of the upstream vim support for OCaml. I am not that familiar with vim error format, but it seems that the current change will break it.

@gasche

This comment has been minimized.

Copy link
Member

commented Mar 25, 2019

Note that the breakage discussed is mild because, if I understand correctly, it will be possible for tool authors to write backward-compatible regexps that uniformly support both the old and the new messages. (Of course I appreciate the idea of warning tool authors in any case.)

(This discussion is ripe for a mention of the machine-friendly and human-obscure GNU error format that could be used in addition to the human-readable format.)

@pmetzger

This comment has been minimized.

Copy link
Member

commented Mar 25, 2019

FYI, the "correct" error message format for Emacs is defined in the Gnu Coding Standards here:

https://www.gnu.org/prep/standards/html_node/Errors.html#Errors

This is also the format followed by GCC and by many other compilers that imitate GCC.

@dbuenzli

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2019

There are a few issues about this #5070, #6518. Though I suspect #5070 is actually fixed.

@gasche
gasche approved these changes Mar 25, 2019
@gasche

This comment has been minimized.

Copy link
Member

commented Mar 25, 2019

I gave my approval in principe because the handling of final character position in the new error message is nicer.

That said, I still have a proposition for a change: if we dropped the idea of using "Lines" instead of "Line", would be remain compatible with more of the existing error-message parsers? If that is the case, maybe it would be a nice compromise between human-friendliness and backward compatibility?

P.S.: And yes, people willing to submit a PR to implement the GNU location format are more than welcome to do that.

@dbuenzli

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2019

P.S.: And yes, people willing to submit a PR to implement the GNU location format are more than welcome to do that.

I'm actually unsure whether it's widely supported or not.

In any case if you are using emacs' compile-mode, it does recognize OCaml's format out of the box nowadays see here for the regexp that defines the recognition.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2019

I'm actually unsure whether it [the GNU error format] is widely supported or not.

I'm intrigued. What makes you doubt that? A quick look around tells me that gcc, clang, javac (from OpenJDK), GHC, the Rust compiler, and CompCert all use the GNU error format.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2019

My gut feeling is that if we change the formatting of locations for error messages, we could just as well use the GNU format, at least for batch compilation, because it is familiar to many users and understood by many tools.

For interactive use (the toplevel REPL) we already use several different methods to report error locations (highlighting, Characters N-M:) and I don't think the GNU format would improve things much.

@dbuenzli

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2019

What makes you doubt that?

Apparently I program too much in OCaml...

More seriously I vaguely remember trying the GNU format in a project and got mixed result with it mainly due to Unicode.

One advantage of the current OCaml format is that if you are able to convince your editor to treat the Characters part as byte offsets into your UTF-8 encoded buffer you are more likely to get correct highlighting in presence of non US-ASCII Unicode characters (which may happen for example in string literals).

The GNU format requires counting columns and counting them as bytes in an UTF-8 encoded buffer is doomed to fail; counting the decoded Unicode scalar values will also fail and even folding wcwidth over them will fail (for a discussion about this see here).

In any case if the format for batch compilation is changed, please do not forget about the format string in printexc.ml aswell.

@Khady

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2019

In any case if you are using emacs' compile-mode, it does recognize OCaml's format out of the box nowadays see here for the regexp that defines the recognition.

A general remark about those problem matchers: they capture more than just the location. For emacs and vscode at least (maybe for vim too), they try to detect the severity and eventually the message. Since 4.08, the emacs regexp will not be able to capture the severity. I wrote about it in this issue.

I don't mind changing the format of the locations to something else (if it's not too complicated). But there is "little" advantage to use a common format for the location if the rest of the message has a custom shape that will require customization in editors anyway.

One advantage of the current OCaml format is that if you are able to convince your editor to treat the Characters part as byte offsets into your UTF-8 encoded buffer you are more likely to get correct highlighting in presence of non US-ASCII Unicode characters (which may happen for example in string literals).

Is there any editor doing this extra work?

Some kind of conversion support from byte offsets to column might be interesting to help the LSP tools. As in the LSP protocol, the file must be sent in utf8 and positions must use utf16 offset (what it means, I don't know).

@dbuenzli

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2019

Is there any editor doing this extra work?

I don't think I ever managed to get emacs do the right thing but was never bothered enough to dig in into it to find the spot where it breaks. However in theory it should be possible see the compilation-error-screen-columns variable and the, byte-to-position and/or filepos-to-bufferpos functions (see here).

Some kind of conversion support from byte offsets to column might be interesting to help the LSP tools. As in the LSP protocol, the file must be sent in utf8 and positions must use utf16 offset (what it means, I don't know).

With Unicode you really can't count visual columns if you don't have feedback from the rendering layer. However what LSP does seems to be one of the possible right thing: which is if count lines (that you can easily do) and give me offsets into (here an UTF-16 encoding of) the Unicode data on the line for me and I will do the highlight.

Though given that you send the data in UTF-8, the choice LSP made for the position on the line is a bit odd (to remain polite). But I guess is explained by the prevalent (broken) string model in programming languages which is an array of UTF-16 code points; I don't know how LSP came to be but that looks like an implementation detail leaking out to the protocol.

Converting from the current OCaml output to an LSP range is possible though you need the initial file:

  1. Find on which zero-based line the OCaml start character lies, here's your "LSP position line" for the start position of the range.
  2. Convert that line to UTF-16BE or UTF-16LE, find the zero-based offset at which the OCaml character starts in that encoding that's your "LSP position character" for the starting point.
  3. Do the same for the OCaml end character to get the end position of the range.
@gasche

This comment has been minimized.

Copy link
Member

commented Mar 27, 2019

There is a small concrete question (do we want the change in this PR?) and a broad question (what's a robust way to print error positions?). I would be happy if people volunteered to experiment with the GNU format (personally I would rather have it in addition to the human-readable format, as a prefix to the error message), but in the shorter term I would like to make a decision on this PR.

The change is a clear win for human readers (and yes, I notice people jumping to lines by hand on a regular basis), but it requires adaptation in OCaml-consuming tools (usually it's easy to fix/adapt the existing patterns to the new one). Which do we favor?

Personally I would be in favor of merging, but I also understand Xavier's gut feeling that it may not be worth bothering tool authors for a relatively small change.

@Armael

This comment has been minimized.

Copy link
Member

commented Mar 27, 2019

I also think the GNU format and related questions should be explored in an other PR if people feel motivated to actually tackle the issue.
I'd be in favor of merging this one; as Gabriel said, it's a clear improvement for the end users (you'd be surprised how many university students learn ocaml with zero tooling, and are directly impacted with the various bugs (or respectively, improvements) in the error reporting code).

@dbuenzli

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2019

I would be happy if people volunteered to experiment with the GNU format (personally I would rather have it in addition to the human-readable format, as a prefix to the error message)

Given what I (maybe unsuccessfully) tried to communicate above I think the GNU format is a waste of time. Working on machine readable outputs (e.g. JSON) seems a better idea since it seems "modern" IDEs rightly prefer this than regexping around.

I'm also in favour of merging this, but then I'm not affected by this change.

@gasche

This comment has been minimized.

Copy link
Member

commented Mar 29, 2019

@let-def, what do you think?

(Also: if we decided to merge the patch, who is volunteering to ping the affected maintainers?)

@Khady

This comment has been minimized.

Copy link
Contributor Author

commented Mar 29, 2019

@Armael

This comment has been minimized.

Copy link
Member

commented Apr 6, 2019

@Khady : regarding vim, I think it should be enough to open an issue on https://github.com/rgrinberg/vim-ocaml. You also need to rebase the PR :-).

@gasche should we merge this after the PR is rebased?

@gasche

This comment has been minimized.

Copy link
Member

commented Apr 6, 2019

@Armael: yes, let's do that. (You should exercise your sense of responsibility by also approving the PR.)

@Armael
Armael approved these changes Apr 6, 2019
Copy link
Member

left a comment

ack

When a location is related to multiple lines of code, it is printed
incorrectly. More specifically, the end character is actually an
offset between the beginning and the end of the location.

This commit changes the format of the locations when they cover
multiple lines. It adds the end line and the end character is now a
proper column rather than an offset. It doesn't affect locations
related to a single line.

The old format was:

```
Line STARTLINE, characters STARTCHAR-OFFSET
```

The new format is:

```
Lines STARTLINE-ENDLINE, characters STARTCHAR-ENDCHAR
```
@Khady Khady force-pushed the Khady:message-location-multi-lines branch from 2b034a7 to 4fee301 Apr 8, 2019
@Khady

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2019

Rebased.

@gasche gasche merged commit b1dcc2b into ocaml:trunk Apr 8, 2019
1 of 2 checks passed
1 of 2 checks passed
continuous-integration/travis-ci/pr The Travis CI build failed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@gasche

This comment has been minimized.

Copy link
Member

commented Apr 8, 2019

Merged, thanks! Please notify the tool maintainers as necessary.

(This will be in 4.09 at the earliest, not 4.08)

@dbuenzli

This comment has been minimized.

Copy link
Contributor

commented Apr 14, 2019

In fact I think caml-mode will need to be updated see ocaml/caml-mode#5 for a discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.