Skip to content

Attach location to constants #12968

Merged
gasche merged 4 commits into
ocaml:trunkfrom
Julow:parsing_constant_loc
Mar 11, 2024
Merged

Attach location to constants #12968
gasche merged 4 commits into
ocaml:trunkfrom
Julow:parsing_constant_loc

Conversation

@Julow

@Julow Julow commented Feb 9, 2024

Copy link
Copy Markdown
Contributor

This is a retry of #10544

In OCamlformat, we use a patched version of the parser as we need locations on every nodes among other things.
We would like to propose some of our patches in the hope that they are useful to the compiler and others and to help OCamlformat maintenance.

The second commit finds a use for the new location information and improves the unsupported interval error:

File "./test.ml", line 3, characters 9-10:
3 |   | '1'..2 -> true
             ^
Error: Only character intervals are supported in patterns.

@Julow Julow force-pushed the parsing_constant_loc branch from d532d97 to 57f715c Compare March 4, 2024 18:03
Julow and others added 3 commits March 5, 2024 16:09
Constants don't have a location in interval patterns. This adds the
location information on every constants.

Co-authored-by: Guillaume Petiot <guillaume@tarides.com>
This uses a more precise location when one of the bounds is not a
character:

    File "./test.ml", line 3, characters 9-10:
    3 |   | '1'..2 -> true
                 ^
    Error: Only character intervals are supported in patterns.

The entire pattern is highlighted when both are not characters:

    File "./test.ml", line 3, characters 4-12:
    3 |   | "1".."2" -> true
            ^^^^^^^^
    Error: Only character intervals are supported in patterns.
@Julow Julow force-pushed the parsing_constant_loc branch from 57f715c to 381b664 Compare March 5, 2024 15:09
@gasche gasche added the parsetree-change Track changes to the parsetree for that affects ppxs label Mar 5, 2024

@gasche gasche left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me, but it will cause a lot of churn in the syntax-processing ecosystem (ppxlib and others).

Comment thread typing/typecore.ml Outdated
| {pconst_loc; _}, {pconst_desc = Pconst_char _; _} ->
pconst_loc
| _, _ -> loc
in

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this approach a bit meh. I would define a function get_bound that takes a constant, returns the character if it is a Pconst_char, and fails with Invalid_interval otherwise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better like this!

Comment thread parsing/parsetree.mli Outdated
type constant = {
pconst_desc : constant_desc;
pconst_loc : Location.t;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The house style is to define the record first and the _desc type second.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to house style.

Comment thread parsing/parser.mly
| "-", Pexp_constant({pconst_desc = Pconst_integer (n,m); _}) ->
Pexp_constant(mkconst ~loc:sloc (Pconst_integer(neg_string n, m)))
| ("-" | "-."), Pexp_constant({pconst_desc = Pconst_float (f, m); _}) ->
Pexp_constant(mkconst ~loc:sloc (Pconst_float(neg_string f, m)))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are with mkuplus below, I think that the record pattern on the constant record should list all fields. Otherwise if we add new fields later (say to support attributes) the new fields will be silently dropped.

In fact there is already a bug related to dropping information in these functions today, -(1[@foo]) will drop the attribute.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed! I can have a look to the dropped attribute in an other PR.

@Julow

Julow commented Mar 6, 2024

Copy link
Copy Markdown
Contributor Author

Thanks for the review!

but it will cause a lot of churn in the syntax-processing ecosystem (ppxlib and others).

I plan to work with a ppxlib maintainer into updating ppxlib. Unfortunately, that wouldn't help other users of the parsetree, though it changed a lot recently and didn't seem to have caused problems ?

More readable interval typing code and styles changes.
@Julow Julow force-pushed the parsing_constant_loc branch from 91bd338 to dcb5b46 Compare March 6, 2024 10:09

@gasche gasche left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the current code and found the justification for the change reasonable -- this causes churn, but it brings the AST in the right direction, and is useful for ocamlformat.

I am planning to wait a few days to let other people chime in if they want to. @Julow, feel free to ping me again after that as I will inevitably have forgotten.

Julow added a commit to Julow/ocaml that referenced this pull request Mar 6, 2024
The special prefix operators [-], [-.], [+] and [+.] could rewrite:

    -(1 [@foo])

into the constant:

    -1

which drops the attribute. This adds the attribute check, the above
example is now parsed to the correct AST:

    ~-(1 [@foo])

An alternative implementation would be to drop the rewrite from the
parser and instead to perform it during a later phase.

Spotted by @gasche in ocaml#12968 (comment)
@shindere

shindere commented Mar 11, 2024 via email

Copy link
Copy Markdown
Contributor

@gasche gasche merged commit b02f003 into ocaml:trunk Mar 11, 2024
@gasche

gasche commented Mar 11, 2024

Copy link
Copy Markdown
Member

Good idea, done.

@pitag-ha

Copy link
Copy Markdown
Member

I'd like to chime in with two different comments/questions. I hope that's ok!

The first one is a clarification: @Julow, about

I plan to work with a ppxlib maintainer into updating ppxlib.

Thank you! Having the author of a parsing change write the parsetree migration (or help writing it), is highly appreciated. I'd like to clarify one thing, though: Writing the migration is only one of the two ways in which a parsetree change impacts the PPX ecosystem. The second one is the following: Some/many PPXs will break when ppxlib bumps the exposed versioned parsetree to a version containing the change. As you've mentioned, @Julow, the parsing has changed a lot in 5.1 and 5.2 already. However, we haven't bumped the ppxlib parsetree, yet (for concrete reasons I could explain). So we can only hope that all the authors of the PPXs that will be affected once we do so, will adapt to the changes without too much resistance/complaints. That being said, I know that these kinds of parsing changes are important, so I'm not arguing against them. I just hope it's clear what we can expect about the impact of the sum of all these kinds of changes on the PPX ecosystem.

And talking about "the sum of all these kinds of changes" brings me to my second point. You've mentioned

In OCamlformat, we use a patched version of the parser (...) We would like to propose some of our patches in the hope that they are useful to the compiler and others

I expect that most of those patches bring the parsetree/AST closer to the original syntax, right? I remember at some point hearing ideas around adding a concrete syntax tree (CST) as a first abstraction to OCaml. So, we'd have: Syntax -> CST -> AST -> expanded AST (via PPXs) -> Typedtree. From what I understand, the CST would be very useful in different contexts handling syntax, such as formatting, refactoring etc. And for ppxlib and the PPX ecosystem, adding that extra first layer of abstraction would have the advantage that the necessary parsing changes would be absorbed by the CST, leaving the AST as is. @gasche , do you happen to have any info about that? Is that something that has been / might be discussed among the compiler maintainers, or is it just something I heard somewhere around? Or is the idea to make the AST itself more and more concrete step by step?

(Ok, the PR has been merged 1 min ago, hope this comment still gets read :))

@gasche

gasche commented Mar 11, 2024

Copy link
Copy Markdown
Member

Indeed, it's a bit unfortunate that I merged at the same time you provided your (much-appreciated) feedback, sorry about that.

Personally I am not at all opposed to the idea of having a CST/AST separation in the compiler, but I don't know any of the current heavy contributors who is planning to work on this in the medium term. If this effort were to be started, it would have to be by someone doing the heavy lifting. Also, I suspect that this would result in a substantial overhaul of the AST definition (larger than the present change) to regularize it, at least if the goal is to get a pleasant final design.

@Julow

Julow commented Mar 11, 2024

Copy link
Copy Markdown
Contributor Author

Thanks for the clarification. I'll help as much as I can but if these changes are too annoying to the PPX maintainers, I'll stop proposing more.

The CST idea was indeed a way to make changes to the parser that couldn't make sense to upstream into the AST (eg. arrow types as lists, begin end node, etc..). Though it was never planned to use it as an intermediate pass in the compiler. Instead, it would be a different parser that we would maintain separately.
Making the CST an intermediate pass would be a nice way to make it more maintainable, though it might add some confusion and frustration (do we need an other ppxlib for the CST?).

But we also have changes that make total sense in the AST too and would benefit to other users: added locations, removal of elaborated nodes.

I can publish these changes into a list (with the most useful/less intrusive first) into an issue.

@shindere

shindere commented Mar 11, 2024 via email

Copy link
Copy Markdown
Contributor

@pitag-ha

Copy link
Copy Markdown
Member

My bad.

Not at all, @shindere. Thanks for mentioning the PR to me, I would have missed it!


I'll help as much as I can but if these changes are too annoying to the PPX maintainers, I'll stop proposing more.

As mentioned, we (the PPX maintainers) really want changes, that are important for other tools or the compiler itself, to happen. The PPX situation shouldn't be blocking for an improving parsetree and surrounding tooling etc. We just need to be aware that those changes have more impact beyond writing the migration. And we'll need to observe how the PPX authors will react.

Btw, as mentioned, we haven't bumped the ppxlib parsetree since 5.0 yet. IIUC, it seems very likely that 5.3 will contain a syntax addition reflected in the parsetree (effects syntax). If that's the case, for 5.3 we'll need to bump it for sure. So if you're planning to add more parsetree changes similar to this one, doing them before 5.3 would be better for the PPX ecosystem than after 5.3.

I can publish these changes into a list (with the most useful/less intrusive first) into an issue.

That sounds great. Thank you!


Thank you, both @gasche and @Julow , for clarifying about the CST/AST separation!

Also, I suspect that this would result in a substantial overhaul of the AST definition (larger than the present change) to regularize it, at least if the goal is to get a pleasant final design.

Right, @gasche , this is a very good point that I had missed!

Though it was never planned to use it as an intermediate pass in the compiler. Instead, it would be a different parser that we would maintain separately.

Interesting. @Julow , we seem to have different understandings about this. It makes sense what you're saying. If the CST ever gets approached, let's discuss it. If we end up having a CST, it'd be cool if it can be beneficial for as many things as possible.

Julow added a commit to Julow/ocaml that referenced this pull request Mar 13, 2024
The special prefix operators [-], [-.], [+] and [+.] could rewrite:

    -(1 [@foo])

into the constant:

    -1

which drops the attribute. This adds the attribute check, the above
example is now parsed to the correct AST:

    ~-(1 [@foo])

An alternative implementation would be to drop the rewrite from the
parser and instead to perform it during a later phase.

Spotted by @gasche in ocaml#12968 (comment)
Julow added a commit to Julow/ocaml that referenced this pull request Mar 14, 2024
The special prefix operators [-], [-.], [+] and [+.] could rewrite:

    -(1 [@foo])

into the constant:

    -1

which drops the attribute. This adds the attribute check, the above
example is now parsed to the correct AST:

    ~-(1 [@foo])

An alternative implementation would be to drop the rewrite from the
parser and instead to perform it during a later phase.

Spotted by @gasche in ocaml#12968 (comment)
tmcgilchrist added a commit to tmcgilchrist/ocaml-h2 that referenced this pull request Jul 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parsetree-change Track changes to the parsetree for that affects ppxs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants