Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read source files in one go before lexing #12403

Merged
merged 7 commits into from
Sep 4, 2023

Conversation

gasche
Copy link
Member

@gasche gasche commented Jul 21, 2023

This is another attempt at fixing #12238 after #12396 was found to be too much of a hack.

Context: some of our error printing styles print the source code at the location of the error. This source is found either in the lexing buffer when available (it has not been discarded to make room for more source code), orelse by trying to re-open the source file. Re-opening the source file is not so reliable, in particular it fails in presence of preprocessor directives (the user-facing locations we have do not necessarily refer to real locations in the input file). See #12238 for an example.

The fix proposed in the present PR is to read source files entirely, and use Lexing.from_string to build the lexing buffer. This guarantees that the lexing buffer has the whole source.

We discussed whether this approach is reasonable in terms of memory consumption in #12396 (comment), and our conclusion was that it is indeed reasonable: the intermediate representations used by the compiler have an in-memory size that is (noticeably) larger than the source code itself, so the increase in total live memory usage should never be a big issue.

The PR removes the code for the "re-open the source file" fallback that is not useful anymore.

The PR contains several commits and it is best reviewed commit-by-commit.

Copy link
Contributor

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't spot any problems, the patch LGTM.

@gasche
Copy link
Member Author

gasche commented Jul 24, 2023

This feels like a non-trivial change. I am planning to wait a couple more days before merging so that people can have another look if they want.

@gasche
Copy link
Member Author

gasche commented Sep 3, 2023

I had completely forgotten about this PR (the perils of "waiting for a few days"), I am planning to fix conflicts and merge. Thanks @dra27 for the reminder!

Some of our error printing styles print the source code at the
location of the error. This source is found either in the lexing
buffer when available (it has not been discarded to make room for more
source code), orelse by trying to re-open the source file. Re-opening
the source file is not so reliable, in particular it fails in presence
of preprocessor directives (the user-facing locations we have do not
necessarily refer to real locations in the input file), see ocaml#12238.

This commit fixes the issue by reading the whole source file and then
using Lexing.from_string, which preserves all the input in the buffer.
This logic is not robust in presence of lexer directives and will
silently do the wrong file. We don't need it in the compiler anymore
now that we read the entire file at once -- we do not need a fallback
strategy after lines_around_from_lexbuf anymore.

In theory this might make a difference for compiler-libs users that
would set Location.input_name but not Location.input_lexbuf, and rely
on the read-from-file fallback logic. Those users can fix their
code (in a backward-compatible way) by setting Location.input_lexbuf
themselves.
@gasche
Copy link
Member Author

gasche commented Sep 4, 2023

AppVeyor failed with a flaky test (#12425):

 ... testing 'forbidden.ml' with 1.1.1 (bytecode) => failed (Running program C:\projects\🐫реализация-mingw64\testsuite\_ocamltest\tests/memory-model\forbidden\ocamlc.byte\forbidden.byte.exe without any argument: command
C:\projects\🐫реализация-mingw64\testsuite\_ocamltest\tests/memory-model\forbidden\ocamlc.byte\forbidden.byte.exe 
failed with exit code -1073741819)

This is unrelated to the present PR, so merging.

@anmonteiro
Copy link
Contributor

anmonteiro commented Feb 25, 2024

This PR introduces a regression when preprocessing code in the style of Dune + ppxlib (where all the transformations are bundled into a single executable, running standalone). Note that there's no issue when running a PPX with the compiler via -ppx ...

I'm unsure whether the fix needs to happen here or in ppxlib, but the gist is that Location.input_lexbuf is set to None when dune executes the generated ppxlib executable.

Here's a specific example from a project of mine:

5.1

         ppx src/models/app.pp.ml (exit 1)
File "src/models/app.ml", lines 53-55, characters 15-19:
53 | ...............[%rapper get_one {sql|
54 |   invalid sql
55 |   |sql} record_out]...
Error: Error in ppx_rapper: Syntax error in SQL: 'syntax error at or near "invalid"'

5.2

         ppx src/models/app.pp.ml (exit 1)
File "src/models/app.ml", lines 53-55, characters 15-19:
Error: Error in ppx_rapper: Syntax error in SQL: 'syntax error at or near "invalid"'

@gasche
Copy link
Member Author

gasche commented Feb 25, 2024

@anmonteiro
Copy link
Contributor

@gasche thanks. Your patch fixes the issue for me.

@gasche
Copy link
Member Author

gasche commented Feb 25, 2024

Thanks! I submitted the patch as a PR at #12991.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants