-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error message shows wrong source text with preprocessor #12238
Comments
Attached files |
Reproduced in 5.0.0 |
This is your
The |
Thanks for your response. Yes, it is confusing, but très respectueusement I suggest that ce n'est pas moi qui est confus. Line 2 of the trivpp output comes from line 1 of the input file, so it's right to label it as line 1. The error is truly on line 100, and that is the line number given in the error header.
What's wrong is the subsequent display of the source text. Making the change you suggest doesn't fix things, but just results in the header being wrong as well! And
which starts with a lot of nonsense, but does indeed label the first true line of its output as coming from line 1 of the input file. As further evidence that the trouble is elsewhere, keeping trivpp as I wrote it, but deleting the tail of the input file beyond line 100 or so results on my machine in a correct display of source text, suggesting that the process that is selecting fragments of source to display is losing track of where it is. |
I can confirm that this behaviour exists on trunk. The fun part is that it depends on the path used to refer to
A bit of quick experimentation seems to suggest that in |
The C preprocessors label it as line 2, so regardless of what you think and of how much French you use, OCaml will stick to this behavior. You can adapt your script as I suggested or keep complaining about it, that's your choice. |
I apologise for the offence, which was not intended. The facts, however, are as I stated them. It seems fruitless to say more. |
Just to be clear, this bug has nothing to do with the numbering of lines. I see that |
Thanks for the report. Just paraphrasing what @lthls has already said, the problem is simple to explain: when using line directives, the specified file name is used to look up the "context" material printed in error messages; this is nonesense because the lexing offsets only make sense on the actual input file, not the one mentioned in the line directive. There's also an optimization that avoids looking inside the file if the material is still present in the lexing buffer; in this case the bug is not triggered (as the lexing offsets are correct for the contents of the lexing buffer). |
My understanding is that nothing special will happen here so I am
closing this issue, anybody should feel free to re-open it if necessary.
|
I don't see the point of closing this issue. There is something wrong, the cause is known, we just don't have any people working on the fix at the moment. It actually has a "bug" label, so I think we should make sure that the bug has actually disappeared before closing such issues. |
I agree. Though it doesn't affect my teaching any more (I'm setting |
So what should we do to fix this issue? I can think of the following approaches:
Currently the default buffer size appears to be 1024 bytes. The largest .ml file in the compiler distribution is the menhir-generated (Note: parser.cmo is 1.7Mio on my machine, and parser.cmt is 3.2Mio; the 4Mio of memory usage caused by the lexing buffer in more than those but not outlandish.) |
Some of our error printing styles print the source code at the location of the error. This source is found either in the lexing buffer when available (it has not been discarded to make room for more source code), orelse by trying to re-open the source file. Re-opening the source file is not so reliable, in particular it fails in presence of preprocessor directives (the user-facing locations we have do not necessarily refer to real locations in the input file). We propose to workaround this issue by simply using large lexing buffers by default, so that the vast majority of program keep the whole source input in the lexing buffer, and the unreliable fallback is rarely used.
My bad, I took a too quick decision. My apologies to everybody
and thanks to the other developers and participants for their awareness.
|
Some of our error printing styles print the source code at the location of the error. This source is found either in the lexing buffer when available (it has not been discarded to make room for more source code), orelse by trying to re-open the source file. Re-opening the source file is not so reliable, in particular it fails in presence of preprocessor directives (the user-facing locations we have do not necessarily refer to real locations in the input file), see ocaml#12238. This commit fixes the issue by reading the whole source file and then using Lexing.from_string, which preserves all the input in the buffer.
This logic is not robust in presence of lexer directives and will silently do the wrong file. We don't need it in the compiler anymore now that we read the entire file at once -- we do not need a fallback strategy after lines_around_from_lexbuf anymore. In theory this might make a difference for compiler-libs users that would set Location.input_name but not Location.input_lexbuf, and rely on the read-from-file fallback logic. Those users can fix their code (in a backward-compatible way) by setting Location.input_lexbuf themselves.
Some of our error printing styles print the source code at the location of the error. This source is found either in the lexing buffer when available (it has not been discarded to make room for more source code), orelse by trying to re-open the source file. Re-opening the source file is not so reliable, in particular it fails in presence of preprocessor directives (the user-facing locations we have do not necessarily refer to real locations in the input file), see ocaml#12238. This commit fixes the issue by reading the whole source file and then using Lexing.from_string, which preserves all the input in the buffer.
This logic is not robust in presence of lexer directives and will silently do the wrong file. We don't need it in the compiler anymore now that we read the entire file at once -- we do not need a fallback strategy after lines_around_from_lexbuf anymore. In theory this might make a difference for compiler-libs users that would set Location.input_name but not Location.input_lexbuf, and rely on the read-from-file fallback logic. Those users can fix their code (in a backward-compatible way) by setting Location.input_lexbuf themselves.
I just merged #12403 which has the compiler read files in full before parsing them. This should fix the issue. The fix will be included in OCaml 5.2 (not the imminent 5.1 release). Thanks for the report! |
Some of our error printing styles print the source code at the location of the error. This source is found either in the lexing buffer when available (it has not been discarded to make room for more source code), orelse by trying to re-open the source file. Re-opening the source file is not so reliable, in particular it fails in presence of preprocessor directives (the user-facing locations we have do not necessarily refer to real locations in the input file), see ocaml#12238. This commit fixes the issue by reading the whole source file and then using Lexing.from_string, which preserves all the input in the buffer.
This logic is not robust in presence of lexer directives and will silently do the wrong file. We don't need it in the compiler anymore now that we read the entire file at once -- we do not need a fallback strategy after lines_around_from_lexbuf anymore. In theory this might make a difference for compiler-libs users that would set Location.input_name but not Location.input_lexbuf, and rely on the read-from-file fallback logic. Those users can fix their code (in a backward-compatible way) by setting Location.input_lexbuf themselves.
I'm using OCaml 4.11.1 on Debian/amd64.
If a (text-based) preprocessor heads its output with a #line directive, then compiler error messages show the correct line and char numbers, but the echo of the source text shown next to them is wrong. For example, try compiling the attached file
foo.ml
using the trivial preprocessortrivpp
, a shell script that just prepends a #line directive.The result is the following error message:
The line number is correct, but the echo of the source is garbled.
Interestingly, deleting all the input text after the line with the error results in a correct display of the relevant source code.
(I am not using this trivial preprocessor, but another one that implements certain constructs by textual expansion. For files that don't contain instances of the constructs, its effect is the same as the trivial preprocessor, however.)
The text was updated successfully, but these errors were encountered: