Skip to content

Conversation

@gasche
Copy link
Member

@gasche gasche commented Oct 22, 2023

I was reading the code of Pparse.file_aux to review #12654 and I found a bug in the choice of filename used in AST locations. (cc @malekbr @Octachron)

$ echo "()" > test.ml

# expected behavior
$ ocamlc -dparsetree -stop-after parsing test.ml
[
  structure_item (test.ml[1,0+0]..[1,0+2])
    Pstr_eval
    expression (test.ml[1,0+0]..[1,0+2])
      Pexp_construct "()" (test.ml[1,0+0]..[1,0+2])
      None
]

# the bug: adding a '-pp' flag should not change the locations
$ ocamlc -dparsetree -stop-after parsing -pp cat test.ml
[
  structure_item (/tmp/ocamlppb3505d[1,0+0]..[1,0+2])
    Pstr_eval
    expression (/tmp/ocamlppb3505d[1,0+0]..[1,0+2])
      Pexp_construct "()" (/tmp/ocamlppb3505d[1,0+0]..[1,0+2])
      None
]

When using the -pp option, the locations in the AST mention the temporary file created to implement -pp, instead of mentioning the source file. The present PR fixes this bug.

Test:

    $ echo "let x = 1" > test.ml
    $ ocamlc -dparsetree -pp cat test.ml

Result before:

    [
      structure_item (/tmp/ocamlpp1775ad[1,0+0]..[1,0+9])
        Pstr_value Nonrec
        [
          <def>
            pattern (/tmp/ocamlpp1775ad[1,0+4]..[1,0+5])
              Ppat_var "x" (/tmp/ocamlpp1775ad[1,0+4]..[1,0+5])
            expression (/tmp/ocamlpp1775ad[1,0+8]..[1,0+9])
              Pexp_constant PConst_int (1,None)
        ]
    ]

Problem: the locations in the AST refer to the temporary
file created for preprocessing, which has since been removed.

Result after:

    [
      structure_item (test.ml[1,0+0]..[1,0+9])
        Pstr_value Nonrec
        [
          <def>
            pattern (test.ml[1,0+4]..[1,0+5])
              Ppat_var "x" (test.ml[1,0+4]..[1,0+5])
            expression (test.ml[1,0+8]..[1,0+9])
              Pexp_constant PConst_int (1,None)
        ]
    ]
| Signature -> Parse.interface lexbuf

let file_aux ~tool_name inputfile (type a) parse_fun invariant_fun
let file_aux ~tool_name ~sourcefile inputfile (type a) parse_fun invariant_fun
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the naming of these arguments could be changed a little?
We read source from the inputfile below, not the sourcefile.

So maybe source_for_locations? or something like that

Copy link
Member

@Octachron Octachron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we want to keep using the source file name in location, since the temporary file is always removed after parsing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants