Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep previous location when relocating ast node in the parsers. #1960

Closed
wants to merge 2 commits into from

Conversation

Projects
None yet
8 participants
@hhugo
Copy link
Contributor

commented Aug 2, 2018

Node relocation is usefull to get better error messages for users but currently painful to use by tools caring about precise locations (like ppx_expect, ocamlformat [1]).

This PR propose to keep the list of all locations during relocation.

Note that a bootstrap is needed.

[1] https://github.com/ocaml-ppx/ocamlformat/blob/master/src/Source.ml#L55

@gasche

This comment has been minimized.

Copy link
Member

commented Aug 2, 2018

I have bad feelings about this PR and I think it should start with a solid explanation of the motivations.

I came to look at whether it would be a pain to rebase #292 against this -- it turns out I think it would be manageable. But here are other weaknesses with the approach:

  • we add more information to AST nodes in a less-structured way, it is unclear that this is the right way to do it (what about, for example, having a Pexp_parens of exp node in the AST, is that a better or worse approach?)

  • I believe the code is currently wrong (the location stacks are unreliable) because of the way reloc_exp is used in the semantic actions, which makes assumptions violated by the current patch. Sometimes it is a "real relocation" and the proposed behavior of pushing on the loc stack is sensible, but sometimes it is because the helper functions used to build the initial AST fragments generate a garbage location by default, and then the old location must be thrown away. Look at reloc_exp (mktailexp ...) for example.

I'm very happy that ocamlformat exists and I would like to have time to look at it more deeply; making it easier for it to do a better job is a perfectly good reason to improve the compiler frontend. But I think it would be useful to have more in-depth discussions for changes, making sure that there are the right way to solve the problems that the tool is facing.

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Aug 2, 2018

Ocamlformat and Ppx_expect both face the same problem. They want to extract the textual content of a node from its location stored in the parsetree. They are currently forced to perform additional scanning/lexing/parsing to get rid of the decoration surrounding the node.

I don't think having Pexp_parens make us goes in the right direction. It would be way too invasive compared to what it would bring.

As discussed with many people in the past, and with @trefis earlier today, the ideal solution would be to implement a surface ast (compile syntactic sugar at a later pass). I could also imagine the surface ast be able to handle some parsing errors gracefully. But we're talking long term here.

the location stacks are unreliable

This is not specific to location stacks. Locations in general are unreliable, I've filtered ghost location out.

The parser currently consider multiple locations for a given node and keep the "latest" (bigger?) one.
This PR does not change the current behavior. It just keeps (a subset of ) all considered locations for a given node in an additional field.

@hhugo hhugo force-pushed the hhugo:relocs branch from 7f9044f to 6133147 Aug 6, 2018

@gasche

This comment has been minimized.

Copy link
Member

commented Aug 9, 2018

I'm inclined to go forward with the change: I don't like it very much, but it makes sense as an incremental/pragmatic step. But I would like to get the opinion of more people that care about the parsetree, such as @trefis, @let-def and @alainfrisch first.

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Aug 9, 2018

\cc @diml

@let-def

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2018

I don't like this change either (maybe because I don't understand the problem well enough...).

Note: in Merlin there are a few places where we need to store more specific locations. We do so by adding attributes with the given location. E.g your patch would be encoded as @[ocaml.loc_stack] attribute. This let Merlin access the right information without affecting the rest of the toolchain.

@diml

This comment has been minimized.

Copy link
Member

commented Aug 9, 2018

I think it's important to have a way to compute the exact location of things. For instance it is currently painful when you want the exact location of an extension point, and we have to resort to manual and fragile text analysis.

I initially thought that we could just add a location in Parsetree.extension, but after talking with @hhugo I realized that the same problems exist for other part of the parsetree. So overall I'm favor of this change.

@diml

This comment has been minimized.

Copy link
Member

commented Aug 9, 2018

@let-def, one instance of this problem exist in ppx_expect: due to the way it works it needs to know exactly the location of [%expect ...] nodes, i.e. the positions of the opening and closing square brackets. These positions are currently not available in the parsetree.

@let-def

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2018

Then why not adding and specifying the exact location for those cases rather than introducing a fuzzy concept of "location stacks"? (as @gasche said, reloc are sometimes used for different purposes).

Btw for Merlin, the related-issue is locating the in in let ... = ... in ....

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Aug 10, 2018

I don't think the concept is that fuzzy. We are just storing additional valid location for a given node. The name might not be the good one.

The only other use case for reloc was list literal which can easily be rewritten and is now done in this PR.

@diml

This comment has been minimized.

Copy link
Member

commented Aug 13, 2018

Then why not adding and specifying the exact location for those cases rather than introducing a fuzzy concept of "location stacks"?

AFAIU, this would work fine for extension points, but not for other constructions as there is no right place to add this location in the AST. Additionally, for tools that require all the intermediate locations such as ocamlformat, storing a single location might not be enough.

Personally, I find the current PR fine: the complete information is present for tools that need it, but doesn't disturb other tools that don't care about it, which is the majority of tools using the AST.

@diml

This comment has been minimized.

Copy link
Member

commented Aug 20, 2018

In the end it seems that the concerns are more about the implementation than the feature itself. Given that we don't have strong backward compatibility requirements for compiler-libs, it seems fine to merge this now. Unless someone has a better proposal now, I'll review and merge this PR.

@diml diml self-assigned this Aug 20, 2018

let nil = { txt = Lident "[]"; loc = loc } in
Exp.mk ~loc (Pexp_construct (nil, None))
in
let rec loop tail =

This comment has been minimized.

Copy link
@diml

diml Aug 20, 2018

Member

It seems to me that we can use List.fold_left directly here

Exp.mk ~loc (Pexp_construct (nil, None))
| e1 :: el ->
let exp_el = mktailexp nilloc el in
let mktailexp full_loc nilloc rev_list =

This comment has been minimized.

Copy link
@diml

diml Aug 20, 2018

Member

I suggest to make rev_list a labelled argument. At this will be clear at the call site that we expect the list in reverse order

@@ -0,0 +1,9 @@
(* TEST
include ocamlcommon
* toplevel

This comment has been minimized.

Copy link
@diml

diml Aug 22, 2018

Member

Could you turn this into an expect test? It seems to me that we are generally turning tests into expectation tests

This comment has been minimized.

Copy link
@hhugo

hhugo Sep 3, 2018

Author Contributor

I'm not sure how this work. I'll have a look

This comment has been minimized.

Copy link
@hhugo

hhugo Sep 6, 2018

Author Contributor

This is now done

@hhugo hhugo force-pushed the hhugo:relocs branch from c6c60f6 to 1023ece Aug 22, 2018

@hhugo hhugo force-pushed the hhugo:relocs branch from 1023ece to 63f312c Sep 3, 2018

@@ -10,3 +10,4 @@ pr6604.ml
pr6865.ml
pr7165.ml
shortcut_ext_attr.ml
reloc.ml

This comment has been minimized.

Copy link
@pmetzger

pmetzger Sep 5, 2018

Member

You're missing a trailing newline.

This comment has been minimized.

Copy link
@hhugo

hhugo Sep 5, 2018

Author Contributor

Thanks, I've been struggling with that many times.
We should at least have proper error message in that case.

This comment has been minimized.

Copy link
@damiendoligez

damiendoligez Nov 23, 2018

Member

check-typo outputs a warning in this case. If you don't have the check-typo hook in your git config, you should install it now.

@hhugo hhugo force-pushed the hhugo:relocs branch 2 times, most recently from 4b34d1a to f1dde79 Sep 5, 2018

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Sep 6, 2018

The feature is now rebased top of the menhir parser. That made some diff to disappear.
The test has been turned into an expect test.
@diml, can you take another look ?

@diml

This comment has been minimized.

Copy link
Member

commented Sep 6, 2018

Looks good. I reworked a bit the test I was finding a bit hard to check it was correct. I let you check that you are happy with the result and then it's good to merge once it's rebased

@diml

diml approved these changes Sep 6, 2018

@hhugo hhugo force-pushed the hhugo:relocs branch from 7db1b7e to e0422f7 Sep 6, 2018

@hhugo hhugo force-pushed the hhugo:relocs branch from e0422f7 to 5cd3926 Sep 6, 2018

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Sep 6, 2018

Rebased.
Remember that you will need merge manually and redo the bootstrap

@diml

This comment has been minimized.

Copy link
Member

commented Sep 6, 2018

We only need a bootstrap of the parser right?

@diml

This comment has been minimized.

Copy link
Member

commented Sep 6, 2018

Hmm, I guess we need to bootstrap because of attributes stored in cmi files. I bootstrapped everything, rerun the testsuite and merged manually

@diml diml closed this Sep 6, 2018

@hhugo hhugo deleted the hhugo:relocs branch Sep 6, 2018

@Drup

This comment has been minimized.

Copy link
Contributor

commented Sep 7, 2018

Huh, what that supposed to be closed without merging ?

@trefis

This comment has been minimized.

Copy link
Contributor

commented Sep 7, 2018

I believe @diml merged manually (at least the changes are on trunk), but didn't include the GPR number in the merge message so github didn't do its magic. That's why a "close" was needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.