Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silent atomic rule #520

Open
Volker-Weissmann opened this issue Jun 13, 2021 · 16 comments
Open

Silent atomic rule #520

Volker-Weissmann opened this issue Jun 13, 2021 · 16 comments

Comments

@Volker-Weissmann
Copy link

Hello,

The book says that @{...} creates an atomic rule and _{...} creates a silent rule.
But (afaik) there is currently no way to create a rule that is both silent and atomic (neither @_{...} nor _@{...} compiles ).

@enthal
Copy link

enthal commented Jun 30, 2021

+1

@process0
Copy link

+1

@timfayz
Copy link

timfayz commented Feb 27, 2023

Could kindly someone explain what the atomic rule mean? I tried to understand it from here https://pest.rs/book/grammars/syntax.html#atomic but failed :) Silent one was easy to grasp but the atomic seems totally new thing to me in parsing.

@Volker-Weissmann
Copy link
Author

Could kindly someone explain what the atomic rule mean? I tried to understand it from here https://pest.rs/book/grammars/syntax.html#atomic but failed :) Silent one was easy to grasp but the atomic seems totally new thing to me in parsing.

Go on the pest.rs editor and put this into the grammar field:

WHITESPACE = _{ " " }
COMMENT = _{ "/*" ~ (!"*/" ~ ANY)* ~ "*/" }
my_nonatomic_rule = { "a" ~ "b" }
my_atomic_rule = @{ "a" ~ "b" }

You will find that "ab" matches both my_nonatomic_rule and my_atomic_rule, but "a b" or "a/*123*/b" only matches my_nonatomic_rule. my_nonatomic_rule is identical to this:

my_manual = @{ "a" ~ (WHITESPACE | COMMENT)* ~ "b" }

In other words, an nonatomic rule ignores everything that matches WHITESPACE or COMMENT, an atomic rule does not.

Hope that helps.

@WHMHammer
Copy link

I also wish silent atomic to be implemented. In my opinion, silentness and atomicity should be 2 separate properties of the rules. My suggestion is to implement the following behaviors:

Normal (without @, $, nor ! modifier) Atomic (@) Compound Atomic ($) Non-Atomic (!)
Not Silent (without _) no modifier, the same as the current Normal @, the same as the current Atomic $, the same as the current Compound Atomic !, the same as the current Non-Atomic
Silent (_) _, the same as the current Silent _@, the same as the current Atomic, but do not produce pairs nor tokens _$, the same as the current Compound Atomic, but do not produce pairs nor tokens _!, the same as the current Non-Atomic, but do not produce pairs nor tokens

And the corresponding change in grammar.pest is simple. We only need to change the following current rules:

grammar_rule = {
    identifier ~ assignment_operator ~ modifier? ~ opening_brace ~ expression ~ closing_brace
  | line_doc
}

modifier = _{
    silent_modifier
  | atomic_modifier
  | compound_atomic_modifier
  | non_atomic_modifier
}

to

grammar_rule = {
    identifier ~ assignment_operator ~ silent_modifier? ~ modifier? ~ opening_brace ~ expression ~ closing_brace
  | line_doc
}

modifier = _{
    atomic_modifier
  | compound_atomic_modifier
  | non_atomic_modifier
}

@WHMHammer
Copy link

WHMHammer commented Jun 8, 2023

I've looked into the codes that generate parsers and rules from pest files, and successfully separated the silent modifier from the other modifiers (_@, _$, and _! all function the same as _ though). Here is the commit in my fork: WHMHammer@a868acc. The remaining work is to change the generator and vm codes to make _@, _$, and _! function as I posted in my previous comment.

However, I have difficulty understanding the current codes implementing the functionalities of the modifiers. I've posted a question in discussion #864. I would really appreciate it if someone could answer it.

@tomtau
Copy link
Contributor

tomtau commented Jun 17, 2023

@WHMHammer BTW you can include this commit in your fork 867da2e

then it should be possible to use pet as a git dependency (if other people want to use the fork at the moment and don't want to clone it locally):

pest = { git = "https://github.com/WHMHammer/pest.git", rev = "..." }
pest_derive = { git = "https://github.com/WHMHammer/pest.git", rev = "...", features = ["not-bootstrap-in-src"]}

but note that the compilation time will be higher due to the cargo library dependency

@WHMHammer
Copy link

@tomtau pulled. Thank you!

@tomtau
Copy link
Contributor

tomtau commented Jul 1, 2023

@WHMHammer FYI for the upstream, you can add this as feature-guarded under "grammar-extras", so that the semver backwards compatibility with 2.X is preserved (see #871 and https://github.com/pest-parser/pest/pull/878/files )

@lthoerner
Copy link

This is something I am struggling with greatly in my own project. I am also not sure if the reason I am getting unexpected results is because non-silent rules inside of silent rules do produce pairs. Definitely a +1 from me on this issue.

@WHMHammer
Copy link

This is something I am struggling with greatly in my own project. I am also not sure if the reason I am getting unexpected results is because non-silent rules inside of silent rules do produce pairs. Definitely a +1 from me on this issue.

@lthoerner non-silent rules inside silent rules producing pairs is an expected behavior. You need to explicitly make them silent as well if you do not want them to produce pairs.

@lthoerner
Copy link

@lthoerner non-silent rules inside silent rules producing pairs is an expected behavior. You need to explicitly make them silent as well if you do not want them to produce pairs.

How do I apply both modifiers to one rule? If I try to make a silent rule, it includes whitespace, which I need it to exclude, and if I make it atomic, it isn't silent.

@lthoerner
Copy link

lthoerner commented Sep 16, 2023

@WHMHammer Sorry for the ping, I forgot to add it in the original reply.

@pamburus
Copy link

pamburus commented Oct 8, 2023

Currently, I have found a workaround, but it is not very pleasant. Do not use the special WHITESPACE rule, instead declare your own ws rule and add it explicitly to every other rule except atomic rules and those that need to be silent + atomic.

@NoahTheDuke
Copy link
Member

It might feel cumbersome but I think you've found the best solution, @pamburus. Relying on WHITESPACE allows for ease of writing and iteration early, but once a grammar gets complex enough, one should be explicit with each choice.

@dochy-ksti
Copy link

Atomic means the repetition can't be backtracked, so I guess non-atomic expressions have performance penalties. Is that correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants