Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of mutable syntax node usage #15710

Open
Veykril opened this issue Oct 4, 2023 · 7 comments
Open

Get rid of mutable syntax node usage #15710

Veykril opened this issue Oct 4, 2023 · 7 comments
Labels
A-assists Broken Window Bugs / technical debt to be addressed immediately C-Architecture Big architectural things which we need to figure up-front (or suggestions for rewrites :0) ) E-hard

Comments

@Veykril
Copy link
Member

Veykril commented Oct 4, 2023

Somewhat meant as a means of discussion (though ultimately just expressing my opinion):

After having used and reviewed a bunch of PRs using the mutable syntax node API I am personally not too happy about it. It can be awkward to use at times and has some pitfalls that one has to be aware of. A major problem with that API is that it introduces iterator invalidation (you can iterate while mutating the tree), giving way to easily end up panicking if one does not delay the edits after iteration via collecting into a vec or similar. Likewise it makes it harder to replace our version of rowan to one that uses proper slots which is something I am in favor of, as the slot API is unlikely to have the same feature (unless we implement it back into rome's fork of rowan which would be the current slot based rowan solution).

The current benefit of it is that we have the ted api that allows automagically fixing up whitespace on edits, but that's really not necessary in the end game when we have our own syntax formatter which would allow us to format edited nodes on the fly. It is also a rather powerful api in that you can patch up syntax trees on the fly.

The main question then arises, how do we do edits if not via mutable syntax nodes? The current approach was using the handwritten make module to stitch nodes together (which is still somewhat used). But that approach is really annoying to use, inefficient and looks horrid as well. One nice looking approach would be having a quasi quote api as described in #2227, that could also do some static checks ideally (like checking well typedness of the tree at comp time maybe). With quoting one could just reconstruct the patched tree / construct a the biggest tree encompassing the changes of interested and then ask the assist building api to replace the node pointed to by a specific node ptr with the constructed tree. And alternative to mutation is #9649 though I haven't looked too much into it.

In general, assists are all over the place currently, some do text based editing, some do make api + stringified text edits and finally we have some that use the mutable node api. Obviously not ideal if we mix and match however we like to.

(Issue spawned of me really not enjoying reviewing assist PRs nowadays anymore, as they tend to be messy in general, though also because I really want us to switch to the trailing trivia model)

@Veykril Veykril added E-hard C-Architecture Big architectural things which we need to figure up-front (or suggestions for rewrites :0) ) A-assists labels Oct 4, 2023
@Veykril
Copy link
Member Author

Veykril commented Oct 4, 2023

cc @matklad feels like an issue you'd have an opinion about or two :)

@matklad
Copy link
Member

matklad commented Oct 5, 2023

Yeah, I wouldn’t say that mutable syntax API is a success :-)

That being said, I think it barks at the right tree: the fundamental capability there is “tracking nodes across edits”, and I think it would be important to have that in one way or another. But I am not too confident here: maybe we don’t need cursor API at all?

One curious solution here is Roslyn’s SyntaxAnnotations:

https://joshvarty.com/2015/09/18/learn-roslyn-now-part-13-keeping-track-of-syntax-nodes-with-syntax-annotations/

my understanding is that this mechanism powers all “mutable” editing APIs, which allow you to incrementally mutate the tree.

I looked at how that’s implemented under the hood, and I think it actually is pretty horrible —- there’s a global static weak map of annotated nodes (no link to source, so I might be misremembering)

I think another related issue is how our semantics API can panic if you feed it with a “wrong” Syntax Node. I was thinking that maybe the solution to both problems is biting the bullet and ensuring that every single bit of syntax has a globally unique id?

Definitely need to change something here, though I still am not 100% positive about the ideal end state :P

@Veykril
Copy link
Member Author

Veykril commented Dec 19, 2023

Another thing I just realized, this API makes it near impossible to map up edits back into macro input token trees

@Veykril Veykril added the Broken Window Bugs / technical debt to be addressed immediately label Dec 19, 2023
@DropDemBits
Copy link
Contributor

I feel like using both approaches of using SyntaxEditor and using SyntaxAnnotations could work as a replacement to the mutable syntax node approach. I mention SyntaxAnnotations because they could, I dunno, be used to figure out where to place LSP snippets 🙂

Since a SyntaxEditor would store all of the edit actions (insert, replace, delete, etc.) to transform the original tree into a final tree, we retain information on how nodes are transformed. This also means that a SyntaxEditor can propagate SyntaxAnnotations forward to the final tree based on this information, so long as the SyntaxEditor knows what nodes are tracked.

A caveat of letting SyntaxEditor propagate SyntaxAnnotations is that the make constructors will need to inform the SyntaxEditor of where to propagate SyntaxAnnotations to, otherwise we lose track of nodes. Possible approaches of resolving this could be:

  • Require passing in a SyntaxEditor to a make constructor (or potential quasi-quoting macro) and then let it add mappings from the old nodes to the new nodes. This would be fine for the make constructors, but would be a bit cumbersome for the quasi-quoting macro since you'd need to separate the SyntaxEditor input from the actual syntax to construct
  • Have the make constructors/quasi-quoting return a SyntaxResult which holds a mapping from old to new nodes and have an explicit into_syntax(&mut SyntaxEditor) which adds the mappings to SyntaxEditor.
  • Give up and use a static global concurrent map. It's certainly an option 🙃

One interesting situation to handle is when trying to change both a parent node and its children's nodes, e.g. PathTransform trying to transform

SomeTrait<Item = [u8; SOME_CONSTANT]>;

into

m1::SomeTrait<Item = [u8; m2::SOME_CONSTANT]>;

(based on #17321)

Roslyn's approach is to have a ReplaceNode that takes in a lambda to compute the replacement node. This has a downside of serializing transformation (see dotnet/roslyn#28378), but maybe partitioning the changes list by each computed replacement change may help?

Other idle thoughts:

  • Do we want to expose the most recent version of the tree? This would imply applying the transformations to a tree at some point (either after adding any change, or whenever we request for the current tree). Not sure if there's anywhere where we do need to inspect the current version of the tree, and the current semantic model will panic on any newly created nodes (see The Semantics API is prone to panics #17367)
  • Do we want to allow combining SyntaxEditor changes? Would be useful for e.g. PathTransform, since it could produce a SyntaxEditor with all of the changes instead of needing to take in the current SyntaxEditor.
  • Is tracking all node transformations enough to up-map changes back to the original macro inputs? I feel like this helps somewhat, but I'm not sure of the specifics of what is required for that.
  • Moving away from mutable syntax nodes will necessarily affect the edit_in_place APIs, since they assume that they'll refer to the same node. Passing in a SyntaxEditor will help with creating a new version of the node, but we'll still have a reference to the old node. Maybe just use the quasi-quote macro instead?

@Veykril
Copy link
Member Author

Veykril commented Jun 9, 2024

Require passing in a SyntaxEditor to a make constructor (or potential quasi-quoting macro) and then let it add mappings from the old nodes to the new nodes. This would be fine for the make constructors, but would be a bit cumbersome for the quasi-quoting macro since you'd need to separate the SyntaxEditor input from the actual syntax to construct

That seems fine to me, make functions could be exposed as methods on a SyntaxEditor I guess which would make it a bit more convenient. As for quasi quoting, sure it'll be a bit more annoying but at the same time it feels bit a like how the quote macro has an extra optional input for re-spanning the tokens. Feels like we can do it the same way, where having no extra input just creates a standalone node as well if that's useful. Overall this kind of API seems a lot nicer still than mutable syntax trees as with those you have to actually fight at times when iterating and mutating at the same time.

Have the make constructors/quasi-quoting return a SyntaxResult which holds a mapping from old to new nodes and have an explicit into_syntax(&mut SyntaxEditor) which adds the mappings to SyntaxEditor.

This sounds a lot worse usability wise than option 1 to me.

Give up and use a static global concurrent map. It's certainly an option 🙃

I don't think anything needs to be said here 😬

Do we want to expose the most recent version of the tree? This would imply applying the transformations to a tree at some point (either after adding any change, or whenever we request for the current tree). Not sure if there's anywhere where we do need to inspect the current version of the tree, and the current semantic model will panic on any newly created nodes (see
#17367)

I feel like we don't? At least I can't tell an immediate use case (we have one place where we kind of do this right now which is in adjustment inlay hints, but that is really just a hack we need to replace generally)

Do we want to allow combining SyntaxEditor changes? Would be useful for e.g. PathTransform, since it could produce a SyntaxEditor with all of the changes instead of needing to take in the current SyntaxEditor.

Feels like something that should be naturally supported.

Is tracking all node transformations enough to up-map changes back to the original macro inputs? I feel like this helps somewhat, but I'm not sure of the specifics of what is required for that.

I'd expect us to only touch real file trees here. That is if we are inside a macro call we ought to first upmap that out and then attempt any edits. Anything else gets hairy for no benefit (in my eyes at least).

@DropDemBits
Copy link
Contributor

That is if we are inside a macro call we ought to first upmap that out and then attempt any edits.

For assists like add_missing_match_arms, it'd be nice if there was a way to edit a synthetic version of the macro input for declarative macros (i.e. the "parsed" macro input) and re-serialize that rather than plain token trees, but that's another issue (do we even have an issue for this? I know it's something we maybe wanted to have).

@Veykril
Copy link
Member Author

Veykril commented Jun 9, 2024

Ah right, yes that should be in theory possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-assists Broken Window Bugs / technical debt to be addressed immediately C-Architecture Big architectural things which we need to figure up-front (or suggestions for rewrites :0) ) E-hard
Projects
None yet
Development

No branches or pull requests

3 participants