Skip to content

Conversation

@iliabylich
Copy link
Contributor

Per #5.

Also, I think the format of the AST is redundant in many places, for example there's no need to store actual value for constant tokens like while, or colon, because they are constant. It makse more sense to simply store their location in source input (just like whitequark/parser does).

Dynamic values should be stored of course, things like IDENTIFIER must be stored by value and by location too (whitequark/parser stores them as separate fields like name and name_l)

@iliabylich
Copy link
Contributor Author

I'll take a look at CI failure a bit later today.

@iliabylich iliabylich force-pushed the introduce-yp_string_t branch from d26b3d8 to 65da2bb Compare September 26, 2022 17:49
@iliabylich
Copy link
Contributor Author

@kddnewton I understand that you are still busy with your other work, so no rush. PR is ready but it can wait of course.

I'm pretty sure that nodes shouldn't be based on tokens in general. It does make sense in many cases (especially for things like keywords or literals), but there are cases when location should be computed dynamically, for example here:

def foo(bar:)
end

^ the token is LABEL("bar:"), but the name of the argument is just bar and there are more cases like that, I can grab some if you want. Basically the whole AST builder of whitequark/parser (and lib-ruby-parser) is written manually from scratch to solve cases like this.

On the other side I understand that it's much easier to embed tokens, and maybe we could go with this approach at first and then slowly change it to be manual. I suppose right now there are more obvious things that must be solved first (like the absence of the whole parser lol). This is simply a thing that instantly caught my eye, but I don't know if we should focus on it rn.

@kddnewton
Copy link
Collaborator

So in general I'm keen to reduce the depth of the tree, which this accomplishes. So I'm definitely open to the idea of it. However, on the Ruby side of things, I definitely don't want to lose as much information as this PR would cause it to.

As a simple example, with the Binary node if you're consuming this from the Ruby side, you no longer know where the operator was in source relative to the left and right child nodes. This has caused problems for both syntax_tree and error_highlight. I'd rather we retain that information so it can be used by consumers.

That being said, yeah, we're definitely going to run into limits with the templating at some point. I don't think it's going to be a long term solution for all of our AST building. It gets especially hairy with the location information as things become more and more optional. I'm not sure of the right solution yet, but I definitely want to leave this open for a while so we can keep talking about it.

@iliabylich
Copy link
Contributor Author

Sorry for not being clear. My goal is to keep all information about all locations as well.

This PR is somewhat more about changing rules of how nodes are constructed. Of course in its current form it introduces quality degradation, for example VariableReference no longer knows what is assigned (a local variable or an instance variable).

I've sent this PR to change templates and node constructors and I think it makes sense to replace tokens with strings + locations (similar to how other parsers do it), but not in a single PR 😄 . It's quite big as it is, and I'm not even sure it's possible to easily review it.

@kddnewton
Copy link
Collaborator

Yeah I definitely agree about tracking strings. We're going to need to do this eventually, I just don't think we can do it just yet.

@iliabylich
Copy link
Contributor Author

Ok, I'm closing it for now. Honestly I think we can iterate much faster with tokens that are not 100% precise in a few case (like those mentioned above).

Later we can re-open it once we have the parser that works 😄

@iliabylich iliabylich closed this Sep 28, 2022
@kddnewton kddnewton deleted the introduce-yp_string_t branch October 20, 2022 18:08
peterzhu2118 added a commit that referenced this pull request Oct 5, 2025
We need to free the current_block_exits in parse_program when we're done
with it to prevent memory leaks. This fixes the following memory leak detected
when running Ruby using `RUBY_FREE_AT_EXIT=1 ruby -nc -e "break"`:

    Direct leak of 32 byte(s) in 1 object(s) allocated from:
        #0 0x5bd3c5bc66c8 in realloc (miniruby+0x616c8) (BuildId: ba6a96e5a060aec6fd9f05ed7e95d9627e1dbd74)
        #1 0x5bd3c5f91fd9 in pm_node_list_grow prism/templates/src/node.c.erb:35:40
        #2 0x5bd3c5f91e9d in pm_node_list_append prism/templates/src/node.c.erb:48:9
        #3 0x5bd3c6001fa0 in parse_block_exit prism/prism.c:15788:17
        #4 0x5bd3c5fee155 in parse_expression_prefix prism/prism.c:19221:50
        #5 0x5bd3c5fe9970 in parse_expression prism/prism.c:22235:23
        #6 0x5bd3c5fe0586 in parse_statements prism/prism.c:13976:27
        #7 0x5bd3c5fd6792 in parse_program prism/prism.c:22508:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants