Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upExtract selector matching as a library #3669
Comments
|
At a very high level, the reason this is not the case so far is that there is a tension between providing a stable API that other projects can rely on, and making all kinds of structural changes to the matching code in order to add optimizations. So far we’ve gave up on the former in order to enable the latter. However it might still be possible to find something in between, I just haven’t though a lot about it. The matching code is already generic over the DOM representation, although that was only introduced as a kind of forward declaration to break a dependency cycle between libstyle and libscript. Also, in order to be useful, there should be some kind of default DOM provided. Is html5ever’s |
I won't really know until someone tries to write a production-quality web scraping library with html5ever :) There are no glaring omissions that I can think of. It's sufficient for running the html5lib tests. All the same, I think a scraping library would want its own tree type, because it's just not that much work, and it could contain consumer-specific fields (caches, etc). It might be fun (although I have no idea how much work) to implement the fastest possible document tree for Servo, completely discarding the ability to run JavaScript, and see how static page performance compares to the current DOM. |
The new library is https://github.com/servo/rust-selectors. It’s not quite ready for other users (the API needs work), but this is a first step. servo/rust-selectors#2 should also be reviewed. Fixes #3669.
The new library is https://github.com/servo/rust-selectors. It’s not quite ready for other users (the API needs work), but this is a first step. servo/rust-selectors#2 should also be reviewed. Fixes #3669.
For example, a web scraping tool might use CSS selectors to navigate a static parse tree from html5ever. The selector matching code would need to be generic over the DOM representation, like html5ever is.