Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating statically typed syntax tree. #882

Open
4 of 9 tasks
TheVeryDarkness opened this issue Jul 10, 2023 · 16 comments
Open
4 of 9 tasks

Generating statically typed syntax tree. #882

TheVeryDarkness opened this issue Jul 10, 2023 · 16 comments

Comments

@TheVeryDarkness
Copy link
Contributor

TheVeryDarkness commented Jul 10, 2023

I'm working to add some codes to pest to support generating statically typed syntax tree in my fork, and I'm going to make a pull request after fixing all bugs and maybe adding some documentations and tests.

As we can see, there are several crates doing similar things, such as pest-ast and pest-consume. So I'm afraid that you may not accept this contribution if my implementation is bad or you don't agree with my design.

I'll talk about what I did and what I'm going to do then. If you have any suggestions, please tell me.

My principles

  • Users don't need to write the same thing twice in pest grammar files and rust codes. (This is also why pest-ast and pest-consume do not fully satisfy me. Typically I need to write a lot of codes with attributes with them)
    Otherwise, this may cause panics or errors if users forget to modify them in the same time.
    In my codes, the proc macro TypedParser will create structs for sequences, enums for choices and a lot of generics structs for strings, peek, and etc.
  • Re-use previous codes in pest without too many changes.
    I added several new traits, several helper structs and 1 proc macro. But I modify generate in pest-generator to avoid repetition.
  • Generate essential document comments.

Something to be discussed

  • In Rust, according to what I've learned, enum variants must be named, currently I name them var_i, where i is the index of the variant in those choices. And for clarity, I didn't use tuple structs for sequences, and name struct fields as field_i, where i is the index.
    This may not be the best choice, and please tell me if there is a better design :)
    Maybe using node tags?

TODO list

pest-typed.

pest3.

(Some of these may be done in pest3 instead pest-typed, as pest-typed still needs compatibility with pest2)

@tomtau
Copy link
Contributor

tomtau commented Jul 10, 2023

Would that be possible to add to pest-ast instead for the moment?

@tomtau tomtau added this to the v3.0 milestone Jul 10, 2023
@TheVeryDarkness
Copy link
Contributor Author

Would that be possible to add to pest-ast instead for the moment?

Yes, but it may take some extra work, and I will not be able to reuse some codes in pest as they are private currently.
What's more, some codes are supposed to be in pest. For example, I add an implementation of Display for OptimizedExpr, so that I can show the structure of the grammar in a easy-to-read format.
And thanks to your reply :)

@tomtau
Copy link
Contributor

tomtau commented Jul 11, 2023

That's fine (if you don't mind the extra work 🫣) -- one thing is that for the current pest 2.X, the changes should be semver-compatible (I haven't looked into that fork to see if that's the case), while for pest-ast, it's pre-1.0 and breaking changes are expected (as long as they are documented).

We can later look into incorporating pest-ast parts into pest, but it's better to start with pest-ast first and refine it there.

@TheVeryDarkness
Copy link
Contributor Author

TheVeryDarkness commented Jul 11, 2023

That's fine (if you don't mind the extra work 🫣) -- one thing is that for the current pest 2.X, the changes should be semver-compatible (I haven't looked into that fork to see if that's the case), while for pest-ast, it's pre-1.0 and breaking changes are expected (as long as they are documented).

We can later look into incorporating pest-ast parts into pest, but it's better to start with pest-ast first and refine it there.

As I didn't change any public imterface that already exists in previous versions (generate in pest-generator is wrapped and private), I think it won't be a breaking change that demands the change of major version number. So I think we needn't place those codes into pest-ast, but I may do that after all bugs fixed if you think it's needed.
What's more, I do add some public interfaces, so I think reviews are required after the fork is finished.
Thanks to your reply :)

@tomtau
Copy link
Contributor

tomtau commented Jul 11, 2023

semver-breaking changes can be sneaky in Rust, e.g. if some implicit autoderives disappear, but we can see how it goes

@TheVeryDarkness
Copy link
Contributor Author

Well, I see. Just give me time.
Though I think I'm doing that carefully, we can never be too careful.

@0nyr
Copy link
Contributor

0nyr commented Jul 15, 2023

Having the option to build an AST automatically from pest out-of-the-box would be a really great and time-saving enhancement. Currently, working with Pest means having to write both the grammar and then again the AST parsing, which is a lot of work.

@tomtau
Copy link
Contributor

tomtau commented Jul 15, 2023

Having the option to build an AST automatically from pest out-of-the-box would be a really great and time-saving enhancement. Currently, working with Pest means having to write both the grammar and then again the AST parsing, which is a lot of work.

@0nyr you can check out pest-ast in the meantime https://github.com/pest-parser/ast/blob/master/examples/csv.rs

@TheVeryDarkness
Copy link
Contributor Author

Having the option to build an AST automatically from pest out-of-the-box would be a really great and time-saving enhancement. Currently, working with Pest means having to write both the grammar and then again the AST parsing, which is a lot of work.

@0nyr I'm glad that you might like it, but I'm not sure how to name the fields in those generated structs and enums. Do you have any ideas?

@TheVeryDarkness
Copy link
Contributor Author

@tomtau Can I add an implementation of Display for OptimizedExpr? Will it be a breaking change?

@TheVeryDarkness
Copy link
Contributor Author

@tomtau Can I add an implementation of Display for OptimizedExpr? Will it be a breaking change?

I create a pull request, #889, for that.

@0nyr
Copy link
Contributor

0nyr commented Jul 16, 2023

Having the option to build an AST automatically from pest out-of-the-box would be a really great and time-saving enhancement. Currently, working with Pest means having to write both the grammar and then again the AST parsing, which is a lot of work.

@0nyr I'm glad that you might like it, but I'm not sure how to name the fields in those generated structs and enums. Do you have any ideas?

Hi @TheVeryDarkness. I don't know, may I have a look at your code ?

In my personal project, I have made a kind of tree of struct Node like so:

#[derive(Debug, PartialEq)]
pub struct Node<'a, T> {
    pub sp: Span<'a>,   // contains information about the node's position (position of span) to be matched to string in the source code
    pub data: T,            // contains the data, wrapped into an inner type
}

I have defined a type AST for the root Node:

pub type AST<'a> = Node<'a, TranslationUnit<'a>>;

and my inner node types are like so:

// AST nodes
#[derive(Debug, PartialEq)]
pub struct TranslationUnit<'a> {
    pub functions: Option<Vec<Node<'a, Function<'a>>>>,
    pub main_function: Node<'a, Function<'a>>,
}

#[derive(Debug, PartialEq)]
pub struct Function<'a> {
    pub name: Node<'a, Identifier>,
    pub return_type: TypeSpecifier,
    pub params: Option<Vec<Node<'a, Declaration<'a>>>>,
    pub body: Node<'a, Block<'a>>,
}

Depending on what you want to do, maybe you could reuse the Node idea, by having several kind of nodes for generic AST, or even using macros to build nodes with names extracted from the PEG files... You can then names those inner fields like so:

#[derive(Debug, PartialEq)]
pub struct SomeNode<'a> {
    pub optional_function_nodes: Option<Vec<Node<'a, Function<'a>>>>,
    pub function_node: Node<'a, Function<'a>>,
}

This are just ideas, but it depends on how you have planned to tackle the AST structs and enums generation.

@TheVeryDarkness
Copy link
Contributor Author

Having the option to build an AST automatically from pest out-of-the-box would be a really great and time-saving enhancement. Currently, working with Pest means having to write both the grammar and then again the AST parsing, which is a lot of work.

@0nyr I'm glad that you might like it, but I'm not sure how to name the fields in those generated structs and enums. Do you have any ideas?

Hi @TheVeryDarkness. I don't know, may I have a look at your code ?

In my personal project, I have made a kind of tree of struct Node like so:

#[derive(Debug, PartialEq)]
pub struct Node<'a, T> {
    pub sp: Span<'a>,   // contains information about the node's position (position of span) to be matched to string in the source code
    pub data: T,            // contains the data, wrapped into an inner type
}

I have defined a type AST for the root Node:

pub type AST<'a> = Node<'a, TranslationUnit<'a>>;

and my inner node types are like so:

// AST nodes
#[derive(Debug, PartialEq)]
pub struct TranslationUnit<'a> {
    pub functions: Option<Vec<Node<'a, Function<'a>>>>,
    pub main_function: Node<'a, Function<'a>>,
}

#[derive(Debug, PartialEq)]
pub struct Function<'a> {
    pub name: Node<'a, Identifier>,
    pub return_type: TypeSpecifier,
    pub params: Option<Vec<Node<'a, Declaration<'a>>>>,
    pub body: Node<'a, Block<'a>>,
}

Depending on what you want to do, maybe you could reuse the Node idea, by having several kind of nodes for generic AST, or even using macros to build nodes with names extracted from the PEG files... You can then names those inner fields like so:

#[derive(Debug, PartialEq)]
pub struct SomeNode<'a> {
    pub optional_function_nodes: Option<Vec<Node<'a, Function<'a>>>>,
    pub function_node: Node<'a, Function<'a>>,
}

This are just ideas, but it depends on how you have planned to tackle the AST structs and enums generation.

Thanks a lot.

I've used generics for almost all cases, but proc macros are also required to make them work.

By the way, we may need hooks on rules to convert CST to AST.

@TheVeryDarkness
Copy link
Contributor Author

Hello! I've fixed all known bugs in my fork, and then separated my codes into another repository.
Do I need to commit my codes to pest-ast now or after I finish most of my work? I hope my codes could be a part of pest in the future.
I'm writing a document for the interfaces that those codes provide, and we may discuss them then.

@tomtau
Copy link
Contributor

tomtau commented Jul 19, 2023

@TheVeryDarkness you can open a PR on pest-ast if you'd like some preliminary feedback

@tomtau
Copy link
Contributor

tomtau commented Nov 23, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants