New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid token doesn't consume input from match_ in non-initial rule #48
Comments
Thanks for reporting this. The fix is easy, but I'm afraid it won't work the way you might expect. Basically the rule is that when the lexer fails, it also consumes the character. Here's an example to demonstrate: use lexgen::lexer;
lexer! {
Lexer -> &'input str;
rule Init {
"aaa" = "a",
"b" = "b",
}
}
fn main() {
let input = "aab";
let mut lexer = Lexer::new(input);
println!("{:?}", lexer.next());
println!("{:?}", lexer.next());
} Output:
Since the lexer realizes that the input is invalid when it sees the With the bug that causes inconsistent behavior in your example is fixed, this is the output for your program: lexer! {
Lexer -> &'input str;
rule Init {
's' => |lexer| lexer.switch_and_return(LexerRule::InString, "s"),
}
rule InString {
"a" => |lexer| lexer.return_(lexer.match_()),
}
}
fn main() {
let input = "sxa";
let mut lexer = Lexer::new(input);
println!("{:?}", lexer.next());
println!("{:?}", lexer.next());
println!("{:?}", lexer.next());
println!("{:?}", lexer.next());
} Output:
We could make the lexers not consume the character when they return an error, but I'm not sure if it will be any more useful than the current behavior. In general, recovery from a lexing error will depend on the current lexer state (i.e. the current rule) and the lexer user state. I'm not sure what would be a good interface to provide to the users so that they will be able to recover from lexing errors. In my use case, I'm trying to not fail in a lexer, but rather mark returned tokens/lexemes with error information. For example, if I'm lexing character literals, an unterminated literal like |
Thank you for swift response and fix!
Sorry that I didn't include this in the first place, but that'd work fine for me! Actually I'm not trying to recover from lexer error (at least for now), but my problem was that semantic actions being invoked with "\\u" $hex_digit $hex_digit $hex_digit $hex_digit =? |lexer| {
// Ooops, unwrap cause a panic if there were characters left from InvalidToken!
let value = u32::from_str_radix(&lexer.match_()[2..], 16).unwrap();
// This works. (But what to do if the rule didn't have a fixed length like this example?)
let value = u32::from_str_radix(&lexer.match_()[lexer.match_().len()-4..], 16).unwrap();
// do something with value and return
todo!()
} As for error recoveries, I have a question; I somehow assumed that the rule with longest match is preferred, and then ties are broken by the order of appearance in the rule Init {
's' => |lexer| lexer.switch_and_return(LexerRule::InString, lexer.match_()),
}
rule InString {
'a' => |lexer| lexer.return_(lexer.match_()),
_ =? |lexer| lexer.return_(Err(lexer.match_())),
} (or enter to new |
Have you seen the
Correct.
I haven't experimented with this idea myself, but it looks like a good way to handle errors. |
Yes, I knew the function existed, but in my case the leftover was coming from |
Maybe we should reset the match on |
Sorry for the confusion.. This is already what we do in #49. |
Yeah, that wold be ideal for me! |
Thank you for the swift fix!! |
This works as I expected
though if I move
"a"
to a different rule and let"s"
enter the rule, it starts behaving differently while I expected to return exactly same as above oneIt'd be great if
InvalidToken
consume the invalid part always, andLexerError
is expected, have it documented in README.md (and perhaps return a different error kind than InvalidToken?)The text was updated successfully, but these errors were encountered: