Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsetree, typedtree: add locations to all nodes carrying attributes #1903

Merged
merged 3 commits into from Aug 22, 2018

Conversation

Projects
None yet
4 participants
@gasche
Copy link
Member

commented Jul 14, 2018

Currently, there are nodes in the AST that carry attributes, but don't store their own location. This is a problem for various kind of preprocessing where one would want to have access to the locations of the nodes concerned by a particular attribute.

We ran into this issue with @Octachron when trying to add an [@ellipsis] attribute to elide parts of a program for the manual examples (#1765).

This PR changes the parsetree and typedtree definitions, so it will break third-party consumers of those. I tried to keep the changes relatively minimal. (In particular, I started with a version that put Rtag and Otag payloads into records, but that proved too invasive and I went for a simpler approach here.)

@gasche

This comment has been minimized.

Copy link
Member Author

commented Jul 14, 2018

cc @alainfrisch and @diml and @let-def, who care about parsetree changes

@trefis

This comment has been minimized.

Copy link
Contributor

commented Jul 14, 2018

This looks all fine.

Recently some people at janestreet, who work on ocamlformat and other related tools, also complained about a bunch of nodes not having locations. The only example I can remember right now are toplevel_phrases but there might be other and I can give you a more complete list on monday if you want to extend the present PR.
Otherwise, feel free to merge this one and I'll open another one at some point.

@gasche gasche force-pushed the gasche:add-locations-to-all-attributed-nodes branch from 54408d5 to 87b5680 Jul 14, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

commented Jul 14, 2018

Thanks for the review! I would be in favor of merging (if approved, CI passes, etc.), because I don't think I will have much time to work on this myself during the next weeks.

@trefis

This comment has been minimized.

Copy link
Contributor

commented Jul 15, 2018

I'll give this a second (more thorough) look tomorrow then.

@trefis
Copy link
Contributor

left a comment

I'm not so fond of the use of Location.loc. I think I'd rather see row_field (resp. object_field) turned into a record, and a row_field_desc (resp. object_field_desc) being introduced. Potentially also moving the attributes out of the desc and into the record (added bonus: one could then put attributes on Rinherit nodes), though I understand that might cause a bit more pain to update to.

This would be more regular with the rest of the AST and easier to extend if we ever need to (and I don't think it would cause noticeably more breakage than the current state of the PR).

I'm not approving in the hope you'll agree to the change, but I recognize it's unlikely we'll want to extend these records in the foreseeable future, so feel free to push back.

PS: I'm surprised to see you touch the parser now, considering the parallel work that is going on to switch to menhir.

@gasche

This comment has been minimized.

Copy link
Member Author

commented Jul 16, 2018

Pre-scriptum: I'm a bit surprised as well, but we have no ETA for when the Menhir work will be merged or even decided, so I decided to go scratch the other itches as well (I think the present change, if people like it, should be done early in 4.08 as it gives more time to parsetree-consumers to adapt).

For a record, there are two potential design choices. I started a first patch (that I backtracked as I found it too invasive, but I can change again) with the following structure:

 and row_field =
  | Rtag of rtag
  | Rinherit of core_type 
 and object_field =
  | Otag of otag
  | Oinherit of core_type
and rtag =
    {
     rtag_label: label loc;
     rtag_constant: bool;
     rtag_types: core_type list;
     rtag_loc: Location.t;
     rtag_attributes: attributes;
   }
and otag =
    {
     otag_label: label loc;
     otag_type: core_type;
     otag_loc: Location.t;
     otag_attributes: attributes;
   }

but I could also respect the desc pattern and use for example

 and row_field =
  | Rtag of rtag
  | Rinherit of core_type 
 and object_field =
  | Otag of otag
  | Oinherit of core_type
and rtag =
    {
     rtag_desc: rtag_desc;
     rtag_loc: Location.t;
     rtag_attributes: attributes;
   }
and rtag_desc = label loc * bool * core_type list
and otag =
    {
     otag_desc: otag_desc;
     otag_loc: Location.t;
     otag_attributes: attributes;
   }
and otag_desc = label loc * core_type

Do you have a preference between one or the other?

(The second one makes it really tempting to add a parametric
'desc node structure and use it in all the places where we
currently have hungarian-notation desc,loc,attributes fields,
but that is total parsetree armaggedon and I would rather wait
a few more OCaml versions to do that, to ensure that we have the
ocaml-migrate-parsetree in place in a large portion of the
ecosystem.)

@trefis

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2018

You misunderstood, I was only suggesting:

and row_field_desc =
  | Rtag of label loc * attributes * bool * core_type list
  | Rinherit of core_type

and row_field =
  {
    rf_desc: row_field_desc;
    rf_loc: Location.t;
  }

which could then be changed to:

and row_field_desc =
  | Rtag of label loc * bool * core_type list
  | Rinherit of core_type

and row_field =
  {
    rf_desc: row_field_desc;
    rf_loc: Location.t;
    rf_attributes: attributes;
  }

What do you think?

@trefis

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2018

Also, I'm not fond of the change you just proposed: the current content of the PR keeps a location for every kind of row_field, and so does my proposition.
Your latest one doesn't, and that feels like a mistake.

@gasche

This comment has been minimized.

Copy link
Member Author

commented Jul 16, 2018

I see, I'm fine with your proposal as well. So should we move the attributes field into the record or not? (In general I prefer to have the attributes close to the location, so I would say yes.)

@trefis

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2018

Yes, having the attributes in the record seems better.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2018

What would be the syntax for attributes on Rinherit? Would you change the syntax to interpret:

 type t = [ s [@foo] ]

as having the attribute attached to Rinherit (and not to the inner core_type), and require extra parentheses to put an attribute on the core_type?

@trefis

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2018

One already requires extra parentheses to put an attribute on s in your example, without them one gets a syntax error.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2018

Ah, yes, good point! Btw, I realize type [ t ] is not valid syntax, one needs [ |t ]. I wonder if there is a technical reason for this restriction.

@gasche

This comment has been minimized.

Copy link
Member Author

commented Jul 16, 2018

To be clear, I am not planning to adding parsing rules for Rinherit or Oinherit attributes as part of this PR. I think that it is fine if the AST structure allows for more attributes than the concrete syntax.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2018

I think that it is fine if the AST structure allows for more attributes than the concrete syntax.

Do we have precedents for similar situations where the Parsetree accepts forms without a concrete syntax? I can see a (smallish) risk that alternative syntaxes such as Reason start supporting those forms, making the syntaxes less interoperable. At least I suggest to explicitly reject such cases in Ast_invariants (which could be used, at some point, for quickcheck-style testing of printing/parsing roundtripping).

@gasche

This comment has been minimized.

Copy link
Member Author

commented Jul 16, 2018

It does not seem wrong, to me, to have the ability to place attributes at arbitrary nodes of the AST, and in particular people producing code from ppx preprocessors may want to add warning-disabling attributes locally, without first checking whether they are actually allowed to do so there or not by the concrete syntax. Are you sure we should forbid this? I trust your perspective on the matter, so I'll add the Ast_invariants check if you don't change your mind.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2018

Indeed, I believe that it's better to disallow some use cases in order to preserve stronger invariants (such as "valid parsetrees can be printed to valid source syntax"), in the same way that we reject "tuples" of arity 0 and 1. We can relax invariants later if the syntax is provided.

(But perhaps it's simple enough to support the new syntax?)

@Octachron

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2018

Do we have precedents for similar situations where the Parsetree accepts forms without a concrete syntax?

There are also empty polymorphic variant types. For instance, type t = [ ] where [ ] would stand for an empty list of polymorphic variants and not the list constructor [].

@gasche gasche force-pushed the gasche:add-locations-to-all-attributed-nodes branch from 87b5680 to a7b0741 Aug 16, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

commented Aug 16, 2018

I just updated the PR to use a desc/loc record instead of a 'a Location.t.

@gasche gasche force-pushed the gasche:add-locations-to-all-attributed-nodes branch from a7b0741 to d7f3e08 Aug 18, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

commented Aug 18, 2018

I just changed this PR to follow the recommandations of @trefis and @alainfrisch. I think that adding parsing of attributes on inherited subtypes would be a nice next step, but for a later PR.

@gasche gasche force-pushed the gasche:add-locations-to-all-attributed-nodes branch from d7f3e08 to 4d9b59b Aug 19, 2018

@gasche gasche force-pushed the gasche:add-locations-to-all-attributed-nodes branch from 4d9b59b to 1e9b1b3 Aug 20, 2018

@gasche gasche force-pushed the gasche:add-locations-to-all-attributed-nodes branch from 1e9b1b3 to e348103 Aug 20, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

commented Aug 20, 2018

I just rebased to take into account @jberdine's comment.

gasche added some commits Jul 14, 2018

parsetree: make sure that all nodes that store attributes also store …
…a location

Florian Angeletti and myself ran into a problem when trying to use attributes
for ellision of parts of manual example. We wanted to be turn any ast-node
marked with the [@ellipsis] attribute into "..." in the rendering of the
corresponding code block, but for this we need the location of the
attributed node, and it turns out that some constructions supported
attributes without carrying a location:
- Rtag in row_field
- Otag in object_field
- type_exception record
- type_extension record

We added locations in all those positions, guaranteeing the invariant
that all nodes to which attributes can be attached have a precise
position.
ast_{iter,mapper}: always traverse the attributes right after the loc…
…ation

Now that each node that supports attributes also has a location,
we want to make the attributed node's location robustly available
to AST traversal functions.

The best way to do this would be to change the "attributes" traverser
to take the node's location as extra parameter. However, doing this
changes the type of the traverser interface, with a risk of breaking
user code. This may be the right long-term change, but for now
we go with something weaker: we ensure that the attributes of a node
are always traversed right after the node's location, which lets
user track attributed location (if they wish) through a side-effect.
parsetree.{row,object}_field: move attributes in the wrapper record
The concrete syntax only allows attributes on tags/constructors/fields
(Rtag, Otag), not on inherited subtypes (Rinherit, Oinherit); we add
this as new enforced invariant in ast_invariants.
@trefis

trefis approved these changes Aug 22, 2018

Copy link
Contributor

left a comment

I like the new AST much better, thanks! :)

@gasche gasche merged commit 4a95b0b into ocaml:trunk Aug 22, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.