Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Pest.rs hangs indefinitely with this grammar #571

Closed
matthew-dean opened this issue Dec 12, 2021 · 6 comments
Closed

[BUG] Pest.rs hangs indefinitely with this grammar #571

matthew-dean opened this issue Dec 12, 2021 · 6 comments

Comments

@matthew-dean
Copy link

matthew-dean commented Dec 12, 2021

// Less extends https://www.w3.org/TR/css-syntax-3/

ws = { " " | "\t" | NEWLINE }
WHITESPACE = _{ ws* }
COMMENT = _{ multi_comment | line_comment }

multi_comment = @{ "/*" ~ (!"*/" ~ ANY)* ~ "*/" }
line_comment = @{ "//" ~ (!"\n" ~ ANY)* }

escape = @{ "\\" ~ escape_text }
escape_text = { !("\n" | ASCII_HEX_DIGIT) ~ ANY | ASCII_HEX_DIGIT{1,6} ~ ws? }

nmstart = @{ "-"? ~ (ASCII_ALPHA | "_" | NON_ASCII | escape) }
nmchar = @{ ASCII_DIGIT | ASCII_ALPHA | "_" | "-" | escape }
ident_token = @{
    ( "--" | nmstart ) ~ nmchar*
}

function_token = @{ ident_token ~ "(" }
at_keyword_token = @{ "@" ~ ident_token }
hash_token = @{ "#" ~ (nmchar | escape)+ }

string1 = @{
    "\"" ~ (
        !("\"" | "\\" | "\n") ~ ANY
        | escape
        | "\\" ~ NEWLINE 
    )* ~ "\""
}
string2 = @{
    "'" ~ (
        !("'" | "\\" | "\n") ~ ANY
        | escape
        | "\\" ~ NEWLINE 
    )* ~ "'"
}
string_token = { string1 | string2 }

url_token = @{
    "url("
    ~ ws*
    ~ (
        !("\"" | "'" | "(" | ")" | "\\" | ws | NON_PRINTABLE) ~ ANY
        | escape
        | string_token
    )*
    ~ ws*
    ~ ")"
}

NON_PRINTABLE = {
	'\u{0000}'..'\u{0008}'
	| "\u{000B}"
    | '\u{000E}'..'\u{001F}'
    | "\u{007F}"
}

number_token = @{
    ("+" | "-")?
    ~ (
        ASCII_DIGIT+ ~ ("." ~ ASCII_DIGIT+)?
        | "." ~ ASCII_DIGIT+
    )
    ~ (
        ^"e"
        ~ ("+" | "-")?
        ~ ASCII_DIGIT+
    )?
}

// or percentage_token
dimension_token = @{ number_token ~ (ident_token | "%") }
CDO_token = { "<!--" }
CDC_token = { "-->" }

NON_ASCII = { '\u{0080}'..'\u{10FFFF}' }

at_rule = {
    at_keyword_token
}

qualified_rule = {
	ident_token ~ "{" ~ "}"
}

rule_list = {
    (
        qualified_rule
        | at_rule
    )*
}

root = {
    (
        CDO_token
        | CDC_token
        | qualified_rule
        | at_rule
    )*
}

The existence of an additional match after ident_token will cause qualified_rule to hang when typing, as well as any rule which includes qualified_rule, such as root. Even though selecting ident_token from the drop-down on Pest.rs does not hang when typing. I haven't tried the Rust integration yet as I'm still crafting grammar. Is there any reason to expect the Rust part would work when Pest.rs fails / hangs like this?

Note: I tried to reduce this to just qualified_rule, and only the rules referenced. But, when I did that, the grammar actually succeeded and didn't freeze the site. So, somehow there's an invisible interaction with other rules that are not referenced? 🤔

@ancientstraits
Copy link

@matthew-dean I think it is because of this:

WHITESPACE = _{ ws* }

In Pest, the whitespace is supposed to be just a character, not a sequence of them.
For example:

WHITESPACE = _{ " " | "\n" | "\t" }

I think removing the * from your WHITESPACE rule would stop the hanging, especially since I tried adding a sequence with * to the WHITESPACE rule in the Pest playground, and the page froze.

@matthew-dean
Copy link
Author

@ancientstraits Oh, it auto-consumes multiples of that token between other tokens? 🤔

@CAD97
Copy link
Contributor

CAD97 commented May 5, 2022

Yes, when you define WHITESPACE then ~ effectively does ~ WHITESPACE* ~, so you're getting (ws*)*, which just infinitely repeats the empty string. WHITESPACE needs to always consume at least one character.

IIRC we have a safeguard against this for normal rules, but apparently WHITESPACE isn't handled.

@ancientstraits
Copy link

ancientstraits commented May 5, 2022

@CAD97 Guess we need to add in that safeguard then. I will try to see where to look to add that safeguard

@ancientstraits
Copy link

ancientstraits commented May 5, 2022

It's at this line.

"expression inside repetition cannot fail and will repeat \
Wonder why it never triggers for WHITESPACE.

@Tartasprint
Copy link
Contributor

This specific exemple of a grammar causing the parser to hang indefinitely was fixed in #848, although there are still cases not covered.

@tomtau tomtau closed this as completed Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants