Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing f32 in a single rule #218

Closed
khionu opened this issue Feb 12, 2020 · 9 comments
Closed

Parsing f32 in a single rule #218

khionu opened this issue Feb 12, 2020 · 9 comments

Comments

@khionu
Copy link

khionu commented Feb 12, 2020

I'm trying to describe this regex \d+(\.\d+)? within a single rule, so I can support floats. I can't figure out how to express this without creating an entirely separate rule.

The primary issue seems to be the lack of expression groups, which would enable this: integer() ("." integer())?

Without expression groups, I thought maybe a delimiter with a restriction of a single repetition (integer() ** . but limited to two elements).

If there's an existing solution, that would be ideal.

@khionu
Copy link
Author

khionu commented Feb 12, 2020

Just found #97.

@khionu khionu closed this as completed Feb 12, 2020
@khionu
Copy link
Author

khionu commented Feb 12, 2020

Actually, can I leave this open till it's documented?

@khionu khionu reopened this Feb 12, 2020
@kevinmehall
Copy link
Owner

kevinmehall commented Feb 12, 2020

I'm not sure if this is what you mean by "expression groups", but the $() operator captures the source &str of the expression matched inside it. So you want something like:

rule float() -> f32
    = quiet!{ text:$("-"? ['0'..='9']+ ("." ['0'..='9']*)?) { text.parse().unwrap() } }
    / expected!("float")

@khionu
Copy link
Author

khionu commented Feb 15, 2020

I meant a la Regex, so you can add modifiers to a set. This is useful for repeating more complicated patterns. So ((alpha\!?|beta\!?|gamma\!?)\s?)+ could match alpha alpha! beta! alpha gamma.. $() works in this case because it is being bound to a variable as a whole, but if we're not looking to bind it to a variable it's a bit wasteful.

@kevinmehall
Copy link
Owner

That basically works the same in PEG: (("alpha" "!"? / "beta" "!"? / "gamma" "!"?) space()?)+.

When converting regex to PEG, the main difference is that / is ordered choice and only backtracks if its left argument fails, not if a failure occurs later. For example, regex (a|abc)d matches "abcd", but PEG ("a" / "abc") "d" would match the first "a" and then fail to match "d" without trying the "abc" literal.

@kevinmehall
Copy link
Owner

Oh, I noticed that the documentation doesn't explicitly call out that parentheses work for grouping / overriding precedence. Fixed in d06b88c.

@khionu
Copy link
Author

khionu commented Feb 15, 2020

Great, thank you!

@BenoitRanque
Copy link

For anyone coming here looking for the answer, I believe it would look like this. The bit that escaped me was the meaning of "Action blocks", and the fact I could use them inside parentheses expressions. Please ignore the terrible code actually parsing into a float.

peg::parser! {
    grammar float_parser() for str {
        rule integer() -> u32
            = i:$(['0'..='9']+) {? i.parse().or(Err("u32")) }
        rule float() -> f64
            = i:integer() d:("." d:integer() { d })? {?
                if let Some(d) = d {
                    format!("{i}.{d}").parse().or(Err("f64"))
                } else {
                    Ok(i as f64)
                }
            }
    }
}

Note how d is captured inside the parentheses group, and must be returned from the action block so it can be captured again, so we can finally use it. This wasn't self evident to me.

@kevinmehall
Copy link
Owner

@BenoitRanque That technique to choose which value to return from a parenthesized sequence is useful in other cases, but it's unnecessary here, as is putting the string back together with format!() -- you can use $ to capture a single slice containing both parts: v:$(integer() ("." integer())?) {? v.parse().or(Err("f64")) }

I also wouldn't use an integer rule that parses and returns u32 in this since it would perform unnecessary integer parsing, and would reject numbers that are out of range for u32, but valid f64. You could have a "lexer-style" rule number() = ['0'..='9']+ without a return type and use it from float and integer, if you want, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants