Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode support #71

Closed
Yamakaky opened this issue Oct 24, 2016 · 3 comments
Closed

Unicode support #71

Yamakaky opened this issue Oct 24, 2016 · 3 comments

Comments

@Yamakaky
Copy link

Yamakaky commented Oct 24, 2016

Does your crate support Unicode character classes and normalization?

@Marwes
Copy link
Owner

Marwes commented Oct 25, 2016

No, you will have to look into some other crate for that. You can then use satisfy(|token| predicate(token)).expected("Expected value") to create a custom parser for whatever predicate you want (digit, letter etc are all implemented like this)

@jan-hudec
Copy link

jan-hudec commented Feb 15, 2017

For classes only, yes. For normalisation, no, one would have to write a new parser using the unicode-normalization crate, because a token here means a single codepoint (char), but normalisation needs to see multiple-codepoint sequences.

In fact I tried—and failed (#85)—at similar thing with graphemes and unicode-segmentation yesterday.

And then it depends on how generic you'd need the parser to be. unicode-normalization works on &str, so if you have that, you can use it directly, but if you have some other kind of stream, you have to collect and try normalising.

@Marwes
Copy link
Owner

Marwes commented Aug 7, 2017

To get regex parsers working I added FullRangeStream which lets one get retrieve a view into the entire parsers input.
Since unicode parsers require &str then this can be used to write parsers generic over the input stream (translating the example in #85 (comment).

pub fn grapheme<'a, I>(input: I) -> ParseResult<&'a str, I>
    where I: FullRangeStream<Range = &'a str>
{
    let mut iter = input.range().graphemes();
    match iter.next() {
        Some(g) => {
            let len = input.range().len() - iter.as_str().len();
            take(len).parse_stream(input)
        }
        None => Err(ParseError::end_of_input()),
    }
}

@Marwes Marwes closed this as completed Nov 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants