De-Gc the AST #7929

huonw · 2013-07-20T09:07:56Z

It would be really nice to have an AST that was a tree with ~. This would have countless advantages, like the ast-fold could consume the AST and only allocate if it is actually creating a new node, and we'd generally not have the problems associated with Gc (the GC might be able to land!).

However, there are a few areas that are very tricky; mainly the AST map, which maps node_id to actual AST nodes of various types, the only safe way to do this with an ~ tree is with & references... which freezes the AST for the remainder of the program, so no transformations like constant evaluation, and no consuming of the AST by trans.

The text was updated successfully, but these errors were encountered:

Aatch · 2013-07-20T09:45:21Z

There are several side-issues with making the AST map work "properly". For one, since it contains & it would have lifetime, which means that every piece of code that uses it needs lifetimes and that causes more lifetime requirements and so on and so forth.

Much of the current code isn't designed to deal with these kinds of structures, happily copying around @-pointers from the map and using @fn closures that can't have borrowed pointers in them. Furthermore, much of the code only needs minor use of the map, but the rest of the code pays for it's design decisions.

While unpalatable, a map based on unsafe pointers may help us, at least as a bridging solution until the rest of the code is brought up to a place where a safe-code map can be used instead.

My idea is to integrate the map into a wrapper for the AST, to limit the potential for, uh, bad things to happen. A simple struct like this:

struct ASTData {
    crate: ast::crate,
    priv map: ast_map::Map
}

With methods like this:

fn find<'a>(&'a self, node: node_id) -> Option<&'a node>;

The only place (as far as I am aware) the AST undergoes structural changes is during folding. Thus, folding can maintain the AST map. There is very little code that absolutely requires @, instead it is used as the "easy" option.

The current AST map is HashMap<node_id, node>, where node is an enum variant for the various indexed nodes. The variants mostly use @, but they could use * instead. ASTData has map as a private field to ensure no direct access, instead most access should be done via a safe API that returns a region pointer that causes ASTData to be frozen for it's length. An unsafe API can also be written to deal with cases where such freezing is unwanted, but the memory usage from copying is unacceptable.

The design above should be fairly safe as the pointers held in the map are "internal" pointers to the map's owner, meaning it's unlikely end up with dangling pointers.

nikomatsakis · 2013-07-20T11:16:25Z

I actually don't know that ~ trees are the way to model the AST. I was thinking that using an arena might be the right approach (it's certainly the traditional approach). To make that work, though, we need to finish up the work on lifetime syntax.

Aatch · 2013-07-20T12:08:44Z

@nikomatsakis as long as it can be semantically owned (so folding is memory efficient, it can be sent across task boundaries, etc) then I don't really care how it's implemented.

Finishing the lifetime syntax would be very nice too 😄

dobkeratops · 2013-09-10T10:24:16Z

Probably not enough code that uses it for it to be an issue yet ... but would you forsee a need for seperate compiler & tools AST structure/interfaces - similar to how clang provides a stable C interface and a more fully featured , but more rapidly changing internal representation.

r.e. tools, are there any ideas on how incremental build might go - with C++ although the header situation is pretty dire, you can at least translate individual sources - wheras at present you must compile an entire crate. Perhaps a tools oriented AST could be more suited to incremental builds? (for ide support).

(perhaps there are other issues more suitable for this question)

emberian · 2013-11-08T08:18:41Z

@nikomatsakis given that said work seems to be being finished, is it time to reapproach this? Probably not a 1.0 priority, as it certainly has no bearing on the language itself, but it seems it will be necessary to get rustc memory usage reasonable.

nikomatsakis · 2013-11-08T10:30:42Z

Well, PR #10153 has not yet landed (still under review) but yes it might be reasonable to experiment once it does.

nikomatsakis · 2013-11-08T10:57:04Z

On Fri, Nov 08, 2013 at 12:18:45AM -0800, Corey Richardson wrote:

it seems it will be necessary to get rustc memory usage reasonable.

...though I'm not sure I agree with this particular statement. It
would reduce the usage in box headers, but then improving the GC
would do so as well.

Nonetheless, I don't think the AST is correctly structured today, just
from first principles. I can imagine two alternative designs, both of
which will save memory and make it obvious how to trash the AST in a
deterministic fashion etc:

Store all AST nodes in arrays, divided by type. Something like:
```
struct Ast {
    items: ~[Item],
    exprs: ~[Expr],
    ...
}

...

enum Expr {
     ...
     ExprBinop(Binop, ExprId, ExprId),
     ...
}

struct ItemId { index: u32 }
struct ExprId { index: u32 }
```
Note that @ast::Expr pointers are replaced with ExprId, etc.
This requires some refactoring to Segregate node ids into distinct
categories. In some cases we pun within a single table, mixing node
ids for items and exprs. We'll have to stop doing that.

For those code paths that take a single node id, we could create
an enum like:
```
enum NodeId {
     ItemId(ItemId),
     ExprId(ExprId),
     ...
}
```
which would allow us to keep a single code path.

Pro: Maximally memory layout; pointers can be 32 bit.
Pro: Subsumes the current AST Map, which can just go away
entirely, saving even more memory.
Pro: Trivial to trash the AST Node after folding.
Pro: We can use TLS to thread the AST array around with minimal
effort; though we will have to be careful in trans
when growing the array.
Pro: Refactoring away from NodeId into different categories
will improve readability and correctness, imo.
Con: Annoying to match nested nodes.
Con: The potential for "dangling pointers" in the form of outdated
indices exists when folding.
Use an arena to allocate AST nodes. Make ASTNodes have the lifetime
of this arena. Replace any @ast::Node with Node<'ast>, defined
as follows:

type Node<'ast> = &'a Node<'ast>;

Pro: Relatively efficient memory layout.
Pro: Easy to trash, no need for dangling pointers.
Pro: Correctness of AST fold becomes type checked, because it
will have to have two lifetime parameters, 'in and 'out,
and it will be producing Node<'out>.
Con: Must thread lifetimes around, lots of manual editing.
Con: We still need the AST Map. That makes me very sad.

Looking forward, I would (maybe) like to eventually convert from one
monolithic AST into an AST per item. The same would hold for all the
side tables in trans. The idea is to enable us to parallelize by
shipping items to different tasks. Either (1) or (2) could be adapted
to this approach, though using an array is somewhat easier but less
safe. Easier because indices can be sent between tasks. Less safe
because the type of an ExprId would not identify the item it belongs
to, and so we could confusingly use an ExprId from one fn in the
context of another. To convert approach (2) we'd have to use something
like the Message abstraction I outlined on the mailing list.

That said, what I'd REALLY like to do is to desugar from the AST into
a CFG after type checking, and do most of our processing on the CFG.
But all of this discussion is still relevant. As a babystep in that
direction I opened #6298 and I even have a lot of work on that lying
around on a branch, waiting for me to finish up with #2202 before it
can be merged. Got to get on that!

pnkfelix · 2013-11-29T15:46:39Z

cc me

eddyb · 2013-12-04T14:51:56Z

I would like to replace all uses of @T/P<T> in Visitor with &T.
The only Visitor implementation I know of that needs the original pointer is in ast_map, and I have a few ideas on how to deal with it:

provide a helper P::from_ref(&T) -> P<T> function that returns the original @T/P<T> pointer from a reference; this will work in most cases, but it's not safe if the provided &T wasn't borrowed from a @T/P<T>.
use a Folder instead of a Visitor; if all the methods are implemented to return the original @T/P<T>, no new allocations should be made behind the scenes, but it could be slower (experimentation required)
write a specialized version of the current Visitor, which works with @T/P<T> and is used only by ast_map; apart from the code duplication, it's the cleanest solution

nikomatsakis · 2013-12-04T16:28:36Z

@eddyb what is the motivation to make this change? The fact that some users want to take ownership of what they are passed seems like a good reason for keeping it the way it is.

eddyb · 2013-12-04T16:56:57Z

@nikomatsakis it's an extra cost (ref-counting right now) for all visitors and changing @T with anything else means touching every Visitor implementation.

huonw · 2014-06-28T07:56:15Z

(The AST now has Gc instead of @, but the semantics haven't changed.)

Replaces Gc<T> in the AST with a custom owned smart pointer, P<T>. Fixes #7929. ## Benefits * **Identity** (affinity?): sharing AST nodes is bad for the various analysis passes (e.g. one could bypass borrowck with a shared `ExprAddrOf` node taking a mutable borrow), the only reason we haven't hit any serious issues with it is because of inefficient folding passes which will always deduplicate any such shared nodes. Even if we were to switch to an arena, this would still hold, i.e. we wouldn't just use `&'a T` in the AST, but rather an wrapper (`P<'a, T>`?). * **Immutability**: `P<T>` disallows mutating its inner `T` (unless that contains an `Unsafe` interior, which won't happen in the AST), unlike `~T`. * **Efficiency**: folding can reuse allocation space for `P<T>` and `Vec<T>`, the latter even when the input and output types differ (as it would be the case with arenas or an AST with type parameters to toggle macro support). Also, various algorithms have been changed from copying `Gc<T>` to using `&T` and iterators. * **Maintainability**: there is another reason I didn't just replace `Gc<T>` with `~T`: `P<T>` provides a fixed interface (`Deref`, `and_then` and `map`) which can remain fully functional even if the implementation changes (using a special thread-local heap, for example). Moreover, switching to, e.g. `P<'a, T>` (for a contextual arena) is easy and mostly automated.

Rustup r? `@ghost` changelog: none

fix: make custom expr prefix completions to understand refs Possible fix of rust-lang#7929 While reviewing the postfix completion code I saw that while calling `add_custom_postfix_completions` we were doing it under the part where reference was not taken into consideration, but as we are only adding postfix completions with `Expr` scope ( [source](https://github.com/rust-lang/rust-analyzer/blob/ba28e19b7838e3ad4223ae82d074dc3950ef1548/crates/ide-completion/src/completions/postfix.rs#L272) ) I shifted the `add_custom_postfix_completions` call to part where references are considered I am not sure if this is the correct fix or I am understanding the problem exactly but this small move seemed to have fixed the issue :)

…hievink fix: breaking snippets on typed incomplete suggestions Possible fix for rust-lang#7929 Fix the case where if a user types `&&42.o`, snippet completion was still applying &&Ok(42). Note this was fixed previously on `&&42.` but this still remained a problem for this case Previous relevant PR: rust-lang#13517 ### Points to help in review: - The main problem why everything broke on adding an extra `o` was, earlier `dot_receiver` was `42.` which was a `LITERAL` but now `42.o` becomes a `FIELD_EXPR` - Till now `include_references` was just checking for parent of `LITERAL` and if it was a `REF_EXPR`, but now we consider `FIELD_EXPR` and traverse all of them, finally to reach `REF_EXPR`. If `REF_EXPR` is not found we just return the original `initial_element` - We are constructing a new node during `include_references` because if we rely on `dot_receiver` solely we would get `&&42.o` to be replaced with, but we want `&&42` to be replaced with ### Output Video: https://user-images.githubusercontent.com/49019259/205420166-efbdef78-5b3a-4aef-ab4b-d892dac056a0.mov Hope everything I wrote makes sense 😅 Also interestingly previous PR's number was `13517` and this PR's number is `13715`, nicee

eddyb mentioned this issue Apr 4, 2014

Replace @T in the AST with a smart pointer. #13316

Merged

huonw changed the title ~~De-@ the AST~~ De-Gc the AST Jun 28, 2014

bors closed this as completed in #13316 Sep 14, 2014

flip1995 pushed a commit to flip1995/rust that referenced this issue Nov 4, 2021

Auto merge of rust-lang#7929 - flip1995:rustup, r=flip1995

e181011

Rustup r? `@ghost` changelog: none

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

De-Gc the AST #7929

De-Gc the AST #7929

huonw commented Jul 20, 2013

Aatch commented Jul 20, 2013

nikomatsakis commented Jul 20, 2013

Aatch commented Jul 20, 2013

dobkeratops commented Sep 10, 2013

emberian commented Nov 8, 2013

nikomatsakis commented Nov 8, 2013

nikomatsakis commented Nov 8, 2013

pnkfelix commented Nov 29, 2013

eddyb commented Dec 4, 2013

nikomatsakis commented Dec 4, 2013

eddyb commented Dec 4, 2013

huonw commented Jun 28, 2014

De-Gc the AST #7929

De-Gc the AST #7929

Comments

huonw commented Jul 20, 2013

Aatch commented Jul 20, 2013

nikomatsakis commented Jul 20, 2013

Aatch commented Jul 20, 2013

dobkeratops commented Sep 10, 2013

emberian commented Nov 8, 2013

nikomatsakis commented Nov 8, 2013

nikomatsakis commented Nov 8, 2013

pnkfelix commented Nov 29, 2013

eddyb commented Dec 4, 2013

nikomatsakis commented Dec 4, 2013

eddyb commented Dec 4, 2013

huonw commented Jun 28, 2014