Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compose non-exclusive token with regex w/ diff priorities #397

Open
pinkforest opened this issue Jun 9, 2024 · 1 comment
Open

Compose non-exclusive token with regex w/ diff priorities #397

pinkforest opened this issue Jun 9, 2024 · 1 comment
Labels
duplicate This issue or pull request already exists help wanted Extra attention is needed

Comments

@pinkforest
Copy link

pinkforest commented Jun 9, 2024

I know the regex support is limited but being able to do exclusive regexes would be nice without repeating ?

e.g. in the below I would like to match the token '(' and stop there not even try the regex below it.

#[derive(Debug, Logos, PartialEq)]
pub enum Tokens<'hdr> {
    #[token("(", priority = 20)]
    CommentStart,

    #[regex(r#"[^\s\r\n\t;]+"#, |lex| lex.slice(), priority = 2)]                                                                                                                                   
    MaybeValue(&'hdr str),

}

#[cfg(test)]
mod test {

    use super::*;

    #[test]
    fn comments() {
        let mut lexer = Tokens::lexer("(comment) value");
        assert_eq!(lexer.next(), Some(Ok(Tokens::CommentStart)));
    }
}

Results regex with priority = 2 overriding the priority = 20 token

assertion `left == right` failed
  left: Some(Ok(MaybeValue("(comment)")))
 right: Some(Ok(CommentStart))

This happens regardless whether priority is higher or lower between each other - e.g. commentstart would have 2 instead of 20 and MaybeValue has 20 instead of 2 - effectively ignoring the priority e.g.:

    #[token("(", priority = 2)]
    CommentStart,

    #[regex(r#"[^\s\r\n\t;]+"#, |lex| lex.slice(), priority = 20)]                                                                                                                                   
    MaybeValue(&'hdr str),

If I add ( to the [^..] exclusive Tokens::MaybeValue then it works but it would be nice if priority can be used to compose regular expression/s over tokens that may match each other.

Looking at the codegen both seem to be treated as regexes but it doesn't explain different priorities not working.

That said the documentation perhaps could be filled out re: limitations if not supported - happy to help doc or send PR/s.

What is curious that if the priority is same you get the warning at least about it matching the same input but given priority is different it probably could be composable grouping.

Related:

Sidenote

If I write it all regexes then it also works but it would be nice to compose tokens with regexes w/ diff priorites

e.g. this works:

use logos::{Logos};

#[derive(Debug, Logos, PartialEq)]
pub enum Tokens<'hdr> {
    #[regex(r#"\([a-z0-9\s]+\)"#, |lex| lex.slice())]
    WholeComment(&'hdr str),

    #[regex(r#"[^()\s\r\n\t;]+"#, |lex| lex.slice())]
    MaybeValue(&'hdr str),

    #[regex(r"[\s\r\n\t]+", |lex| lex.slice())]
    WHS(&'hdr str),
}

#[cfg(test)]
mod test {

    use super::*;

    #[test]
    fn whole_thing() {
        let mut lexer = Tokens::lexer("(comment) value");
        assert_eq!(lexer.next(), Some(Ok(Tokens::WholeComment("(comment)"))));
        assert_eq!(lexer.next(), Some(Ok(Tokens::WHS(" "))));
        assert_eq!(lexer.next(), Some(Ok(Tokens::MaybeValue("value"))));
    }
}

But my preference would be to use tokens where I can and leave regexes where I can't use tokens.

I could always split to different lexer but having to construct & morph diff lexer is time consuming.

@pinkforest pinkforest changed the title regex does not honor priority Compose non-exclusive token with regex w/ diff priorities Jun 9, 2024
@jeertmans jeertmans added duplicate This issue or pull request already exists help wanted Extra attention is needed labels Jun 10, 2024
@jeertmans
Copy link
Collaborator

Hello, thanks for creating this issue!

I think this is part of related to all the bugs with priorities, see also #265, and other related issues.

Sadly, I currently have not time to invest into this problem, but I hope someone smarter than me (and with more free time) can address this in the near future! That would greatly help the project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants