Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of things that would need to be done to remove the warning? #56

Open
alecnotthompson opened this issue Aug 5, 2021 · 4 comments
Open

Comments

@alecnotthompson
Copy link

Out of curiosity, what would need to be done to remove the warning in the readme? With regards to removing unsafe code, is that even possible with this sort of project? I'd love to hear more from someone who knows the compiler rules better than me. It seems like something that is necessary for someone who is wanting this level of memory guarantees with bytes

@SimonSapin
Copy link
Member

Indeed, removing all unsafe code could be very difficult or impossible for a custom data structure library that wants to do low-level heap allocations, but it could be reduced by a lot by building internal safe(r) abstractions. At the moment inside Tendril there’s a large amount of code that needs to carefully maintain various invariants in order for the unsafe code to stay sound.

Personally I feel that a rewrite would be the way to go, doing things much simpler not only with respect to unsafe code but also functionality. For example the whole generic "format" idea is neat but never turned out very useful. I made an attempt at https://github.com/servo/html5ever/tree/zbuf/zbuf but never pushed it over the finish line in polish and integration in html5ever.

@SimonSapin
Copy link
Member

Another approach worth considering is whether using https://crates.io/crates/bytes (with a Unicode wrapper, the same way that String wraps Vec<u8>) would be a good fit for html5ever instead of Tendril.

@alecnotthompson
Copy link
Author

Thanks for the response! Zbuf looks cool. Poked around a bit. Didn't think about bytes being a potential option. Might be useful sharing a generic/widely used library like that.

Just looked through some of bytes's code and saw some areas it will panic if you don't do proper checks around your usage which is normal.

Another library I've come across while looking at parsing stuff is https://crates.io/crates/untrusted. I'm not sure if it's exactly useful for html5ever because it explicitly says the following in it's documentation:

Languages that require more lookahead and/or backtracking require some significant contortions to parse using this framework. It would not be realistic to use it for parsing programming language code, for example.

It just seems intriguing to me because it's used in the author's cryptography library and it is "branded" as "safe" and won't ever panic. The glaring difference in the key types I see is that untrusted::Input requires a lifetime which might not be as ergonomic as Tendril.

Feel free to close this issue if it's not useful/worth tracking in this repository.

@SimonSapin
Copy link
Member

Right, one of the design goals html5ever is to support incremental parsing (which enables incremental rendering in Servo) while an HTML document is still being downloaded. Contrast this for example with https://crates.io/crates/cssparser that requires its input to be an entire stylesheet at once in contiguous memory, and yields Tokens that borrow from it with a lifetime parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants