-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming Re #59
Comments
Preferably the interface would be something that works for the following scenario:
Bonus: If the system doesn't store the contents of the substring somewhere (perhaps just by partial matches referring to each fragment they are composed of), then there should be a way for the user of the library to do so. For instance, for very long matches the client could choose to forget parts of them or put them to a storage other than memory. Or is this too rare of a requirement? For 99.9% of cases the matches are going to be short. |
That would require a specific API for partial matches, not just the current API slightly augmented. I don't understand the bonus. |
I meant "Bonus" as in a feature that is probably not often useful, but use cases could be found. For example: I could write a pattern that optimistically matches certain kind of network traffic from an unframed network capture. The matches could possibly be of unbounded length, if the input stream is infinite. I am still be able to find the substrings - that may span multiple units of processing - from a capture file, even if I cannot hold the whole capture in memory. |
To summarize: you want manual control over the internal buffer. |
Do you also need to mark the beginning of a stream? So that bol,bos,start match as well. How will group capture that spans across chunks will work? Or will it be possible at all. |
This is a (set of) notes after a discussion with @vouillon on how to make re able to stream.
pos
andlast
out of theinfo
record and pass them around explicitly, in particular inloop
. Important: check spilling in theloop
function.Partial
would give an abstract typepartial
containingRe.state
partial
and starting the matching again. This would be implemented using theloop
function to match more things and then theRe_automaton.status
function.It should also be possible to say "The streaming is finished, you can match eol/eos/stop".
There are delicate questions of content copying when initializing and refilling the buffer. In particular, copying the matched string to initialize the buffer is clearly not acceptable.
The text was updated successfully, but these errors were encountered: