New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
html5ever pull tokenizer? #208
Comments
@nox and @SimonSapin, any opinions here? |
A pull-based model would also allow for abandoning a parse midway through. for token in tokenize(input)
sink.process(token) |
What about the tokenizer’s own input, when it’s not available all at once? |
My example was highly simplified. I would expect that you could have a PullParser struct that kept parse-state and allowed more source to be added over time. This PullParser would either be an iterator, or have an |
@TyOverby So, would this require a major rewrite of tokenizer and tree builder's rules macros? Do current macros make sense under PullParser approach? |
@Ygg01: I doubt it, but since I don't see anyone vehemently opposed, I'll prototype this and see how it turns out. |
@TyOverby Don't forget that tree building alters tokenisation, though. |
@TyOverby I don't have a vote, but one really great thing about html5ever is that macros follow the spec format closely, so comparing code and spec it's easy to see where there was a divergence. |
In what way can the tree builder change tokenisation? |
My opinion is that if you manage to add a new pull API without affecting the current push API, go for it. (It’d also be nice to avoid significantly rewriting the tokenizer or tree builder code, as @Ygg01 mentioned.) But I’m skeptical that it’s possible without corountines (that Rust doesn’t have) or threads (that would add synchronization overhead). |
The tokeniser is already broken up into discrete steps with the |
@TyOverby @SimonSapin We may want to discuss that again now that h5e doesn't own its input stream anymore. |
Closing. |
I'm working on an application where the push-tokenizer that is built into html5ever is not very ergonomic.
Instead of making a sink and having
process_token
get called on it, I would rather enqueue a Tendril, and get an Iterator of Tokens that are parsed and returned on-demand.My question is: would this be an acceptable option for html5ever? I wouldn't mind doing the implementation.
The text was updated successfully, but these errors were encountered: