Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for streaming input #153

Open
flip111 opened this issue Sep 8, 2017 · 6 comments
Open

Support for streaming input #153

flip111 opened this issue Sep 8, 2017 · 6 comments

Comments

@flip111
Copy link

flip111 commented Sep 8, 2017

Would be nice to be able to parse streams like stdin and network sockets. Can only find file and string at the moment https://docs.rs/pest/1.0.0-beta.9/pest/inputs/trait.Input.html#implementors

@dragostis
Copy link
Contributor

I completely agree. Is there any particular stream structure that would make sense to target? Also, I'd be more than happy to mentor this as a PR if you're willing to go for the implementation.

@jstnlef
Copy link
Contributor

jstnlef commented Apr 3, 2018

Is this issue still relevant with the change to work with &str?

@dragostis
Copy link
Contributor

I would say so. Currently, I have no particular design for this feature, but streamed parsing is something desireable for pest. Maybe one solution would be to have this in a separate crate that takes pest as a dependency.

@dbrgn
Copy link
Contributor

dbrgn commented Nov 17, 2018

It would also be fantastic if the stream would not have to be consumed completely, but instead pest would emit tokens along the way (basically acting like a stream transforming function).

That would allow it to parse huge amounts of data, that would not fit in memory otherwise.

@flip111
Copy link
Author

flip111 commented Nov 17, 2018

@dbrgn of course :) you just wrote down the definition of the stream (parser) more or less :p If you need to load all the data in memory first, the data can come from a stream, but after that it's no longer a stream. Also a stream can be potentially endless/infinitely big (network parser on backbone for example).

@iago-lito
Copy link

I think the problem is closely related to applying regexes on streams, which I understand is quite challenging as it involves rewriting much of the engine at the cost of loosing some optimizations. Hyperscan did the step in C/C++ back in the days, but I am aware of very few other initiatives like this.

In particular, I think it also implies the API to be revisited for the streaming case, as partial matches / alternative non-yet-matching paths / dropped hypotheses may be requested from user on each timestep.

This is a lot of work, but it sure would be awesome :)

@tomtau tomtau added this to the v3.0 milestone Jul 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants