Feature requests: feedback from parsing tweets #370

sayrer · 2019-02-15T16:25:10Z

I wrote a tweet parser with Pest here: https://github.com/sayrer/twitter-text/blob/master/parser/src/twitter_text.pest

It was a good experience, but two small things would have helped a lot.

Support for a parse() method that takes a character iterator rather than a string. This would allow me to nfc normalize the input text without allocating an extra string.
Optional support for more detailed character offsets in Pair (UTF-16 and UTF-32). Finding these offsets requires iterating over the input string with str.char_indices after parsing, but I bet Pest could provide them.

sayrer · 2019-02-15T16:47:25Z

Also, a more difficult request: compile long sequences of literal choices to tries. At the bottom of the tweet grammar, this was done manually for TLDs. I didn't test this yet, but I did look at the generated code, and it seemed to be called for.

CAD97 · 2019-02-15T18:48:22Z

The way that pest is currently set up, the returned parse tree borrows the original input string, so making a streaming API isn't that possible. The string has to be collected in either case, so making this externally obvious seems ideal.

It might be possible to support a streaming API in the future with pest:3.0 or otherwise, but streaming introduces a lot of issues. As I understand it, pest is optimized for a full-file processing, as you see in a programming language.

As for the literals, I believe the intent is to utilize logos's lexing plumbing superpowers, which will give us O(1) bytewise lexing for "free" so long as we can get ordered-choice semantics instead of longest-match.

sayrer · 2019-02-15T18:52:45Z

Thanks for the reply. I agree logos should take care of my trie request (excellent!). It's also understandable that streaming might take a while or never happen.

But what about getting UTF-16 and UTF-32 offsets?

tomtau added enhancement pest labels Jul 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature requests: feedback from parsing tweets #370

Feature requests: feedback from parsing tweets #370

sayrer commented Feb 15, 2019

sayrer commented Feb 15, 2019

CAD97 commented Feb 15, 2019

sayrer commented Feb 15, 2019

Feature requests: feedback from parsing tweets #370

Feature requests: feedback from parsing tweets #370

Comments

sayrer commented Feb 15, 2019

sayrer commented Feb 15, 2019

CAD97 commented Feb 15, 2019

sayrer commented Feb 15, 2019