store strings instead of tokens in nodes #6

iliabylich · 2022-09-23T18:54:47Z

Per #5.

Also, I think the format of the AST is redundant in many places, for example there's no need to store actual value for constant tokens like while, or colon, because they are constant. It makse more sense to simply store their location in source input (just like whitequark/parser does).

Dynamic values should be stored of course, things like IDENTIFIER must be stored by value and by location too (whitequark/parser stores them as separate fields like name and name_l)

iliabylich · 2022-09-23T18:56:00Z

I'll take a look at CI failure a bit later today.

iliabylich · 2022-09-26T18:10:41Z

@kddnewton I understand that you are still busy with your other work, so no rush. PR is ready but it can wait of course.

I'm pretty sure that nodes shouldn't be based on tokens in general. It does make sense in many cases (especially for things like keywords or literals), but there are cases when location should be computed dynamically, for example here:

def foo(bar:)
end

^ the token is LABEL("bar:"), but the name of the argument is just bar and there are more cases like that, I can grab some if you want. Basically the whole AST builder of whitequark/parser (and lib-ruby-parser) is written manually from scratch to solve cases like this.

On the other side I understand that it's much easier to embed tokens, and maybe we could go with this approach at first and then slowly change it to be manual. I suppose right now there are more obvious things that must be solved first (like the absence of the whole parser lol). This is simply a thing that instantly caught my eye, but I don't know if we should focus on it rn.

kddnewton · 2022-09-27T21:45:42Z

So in general I'm keen to reduce the depth of the tree, which this accomplishes. So I'm definitely open to the idea of it. However, on the Ruby side of things, I definitely don't want to lose as much information as this PR would cause it to.

As a simple example, with the Binary node if you're consuming this from the Ruby side, you no longer know where the operator was in source relative to the left and right child nodes. This has caused problems for both syntax_tree and error_highlight. I'd rather we retain that information so it can be used by consumers.

That being said, yeah, we're definitely going to run into limits with the templating at some point. I don't think it's going to be a long term solution for all of our AST building. It gets especially hairy with the location information as things become more and more optional. I'm not sure of the right solution yet, but I definitely want to leave this open for a while so we can keep talking about it.

iliabylich · 2022-09-27T22:06:57Z

Sorry for not being clear. My goal is to keep all information about all locations as well.

This PR is somewhat more about changing rules of how nodes are constructed. Of course in its current form it introduces quality degradation, for example VariableReference no longer knows what is assigned (a local variable or an instance variable).

I've sent this PR to change templates and node constructors and I think it makes sense to replace tokens with strings + locations (similar to how other parsers do it), but not in a single PR 😄 . It's quite big as it is, and I'm not even sure it's possible to easily review it.

kddnewton · 2022-09-28T19:45:25Z

Yeah I definitely agree about tracking strings. We're going to need to do this eventually, I just don't think we can do it just yet.

iliabylich · 2022-09-28T22:43:51Z

Ok, I'm closing it for now. Honestly I think we can iterate much faster with tokens that are not 100% precise in a few case (like those mentioned above).

Later we can re-open it once we have the parser that works 😄

We need to free the current_block_exits in parse_program when we're done with it to prevent memory leaks. This fixes the following memory leak detected when running Ruby using `RUBY_FREE_AT_EXIT=1 ruby -nc -e "break"`: Direct leak of 32 byte(s) in 1 object(s) allocated from: #0 0x5bd3c5bc66c8 in realloc (miniruby+0x616c8) (BuildId: ba6a96e5a060aec6fd9f05ed7e95d9627e1dbd74) #1 0x5bd3c5f91fd9 in pm_node_list_grow prism/templates/src/node.c.erb:35:40 #2 0x5bd3c5f91e9d in pm_node_list_append prism/templates/src/node.c.erb:48:9 #3 0x5bd3c6001fa0 in parse_block_exit prism/prism.c:15788:17 #4 0x5bd3c5fee155 in parse_expression_prefix prism/prism.c:19221:50 #5 0x5bd3c5fe9970 in parse_expression prism/prism.c:22235:23 #6 0x5bd3c5fe0586 in parse_statements prism/prism.c:13976:27 #7 0x5bd3c5fd6792 in parse_program prism/prism.c:22508:40

iliabylich force-pushed the introduce-yp_string_t branch from d26b3d8 to 65da2bb Compare September 26, 2022 17:49

introduce yp_string_t, use it instead of yp_token_t as node field

7960c48

iliabylich force-pushed the introduce-yp_string_t branch from 65da2bb to 7960c48 Compare September 26, 2022 18:01

iliabylich requested a review from kddnewton September 26, 2022 18:04

iliabylich closed this Sep 28, 2022

kddnewton deleted the introduce-yp_string_t branch October 20, 2022 18:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

store strings instead of tokens in nodes #6

store strings instead of tokens in nodes #6

Uh oh!

iliabylich commented Sep 23, 2022

Uh oh!

iliabylich commented Sep 23, 2022

Uh oh!

iliabylich commented Sep 26, 2022

Uh oh!

kddnewton commented Sep 27, 2022

Uh oh!

iliabylich commented Sep 27, 2022

Uh oh!

kddnewton commented Sep 28, 2022

Uh oh!

iliabylich commented Sep 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

store strings instead of tokens in nodes #6

store strings instead of tokens in nodes #6

Uh oh!

Conversation

iliabylich commented Sep 23, 2022

Uh oh!

iliabylich commented Sep 23, 2022

Uh oh!

iliabylich commented Sep 26, 2022

Uh oh!

kddnewton commented Sep 27, 2022

Uh oh!

iliabylich commented Sep 27, 2022

Uh oh!

kddnewton commented Sep 28, 2022

Uh oh!

iliabylich commented Sep 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants