You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It was a good experience, but two small things would have helped a lot.
Support for a parse() method that takes a character iterator rather than a string. This would allow me to nfc normalize the input text without allocating an extra string.
Optional support for more detailed character offsets in Pair (UTF-16 and UTF-32). Finding these offsets requires iterating over the input string with str.char_indices after parsing, but I bet Pest could provide them.
The text was updated successfully, but these errors were encountered:
Also, a more difficult request: compile long sequences of literal choices to tries. At the bottom of the tweet grammar, this was done manually for TLDs. I didn't test this yet, but I did look at the generated code, and it seemed to be called for.
The way that pest is currently set up, the returned parse tree borrows the original input string, so making a streaming API isn't that possible. The string has to be collected in either case, so making this externally obvious seems ideal.
It might be possible to support a streaming API in the future with pest:3.0 or otherwise, but streaming introduces a lot of issues. As I understand it, pest is optimized for a full-file processing, as you see in a programming language.
As for the literals, I believe the intent is to utilize logos's lexing plumbing superpowers, which will give us O(1) bytewise lexing for "free" so long as we can get ordered-choice semantics instead of longest-match.
Thanks for the reply. I agree logos should take care of my trie request (excellent!). It's also understandable that streaming might take a while or never happen.
I wrote a tweet parser with Pest here: https://github.com/sayrer/twitter-text/blob/master/parser/src/twitter_text.pest
It was a good experience, but two small things would have helped a lot.
Support for a parse() method that takes a character iterator rather than a string. This would allow me to nfc normalize the input text without allocating an extra string.
Optional support for more detailed character offsets in Pair (UTF-16 and UTF-32). Finding these offsets requires iterating over the input string with str.char_indices after parsing, but I bet Pest could provide them.
The text was updated successfully, but these errors were encountered: