Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract selector matching as a library #3669

Closed
kmcallister opened this issue Oct 13, 2014 · 2 comments
Closed

Extract selector matching as a library #3669

kmcallister opened this issue Oct 13, 2014 · 2 comments

Comments

@kmcallister
Copy link
Contributor

@kmcallister kmcallister commented Oct 13, 2014

For example, a web scraping tool might use CSS selectors to navigate a static parse tree from html5ever. The selector matching code would need to be generic over the DOM representation, like html5ever is.

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Oct 17, 2014

At a very high level, the reason this is not the case so far is that there is a tension between providing a stable API that other projects can rely on, and making all kinds of structural changes to the matching code in order to add optimizations. So far we’ve gave up on the former in order to enable the latter.

However it might still be possible to find something in between, I just haven’t though a lot about it. The matching code is already generic over the DOM representation, although that was only introduced as a kind of forward declaration to break a dependency cycle between libstyle and libscript.

Also, in order to be useful, there should be some kind of default DOM provided. Is html5ever’s src/sink intended for production use, or is it closer to a proof of concept?

@kmcallister
Copy link
Contributor Author

@kmcallister kmcallister commented Oct 18, 2014

Is html5ever’s src/sink intended for production use, or is it closer to a proof of concept?

I won't really know until someone tries to write a production-quality web scraping library with html5ever :)

There are no glaring omissions that I can think of. It's sufficient for running the html5lib tests. All the same, I think a scraping library would want its own tree type, because it's just not that much work, and it could contain consumer-specific fields (caches, etc).

It might be fun (although I have no idea how much work) to implement the fastest possible document tree for Servo, completely discarding the ability to run JavaScript, and see how static page performance compares to the current DOM.

bors-servo pushed a commit that referenced this issue Feb 23, 2015
The new library is https://github.com/servo/rust-selectors. It’s not quite ready for other users (the API needs work), but this is a first step.

servo/rust-selectors#2 should also be reviewed.

Fixes #3669.
bors-servo pushed a commit that referenced this issue Feb 23, 2015
The new library is https://github.com/servo/rust-selectors. It’s not quite ready for other users (the API needs work), but this is a first step.

servo/rust-selectors#2 should also be reviewed.

Fixes #3669.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants
You can’t perform that action at this time.