-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML entities #75
Comments
Hm, is this safe to do? For all document encodings/lack of encoding? |
For pages with |
Then we can have it as a non-default option to see how it behaves in other cases |
So I'll need some pointers here - what are the valid (enough) non-ASCII characters that does not need to be escaped in a HTML document? I am thinking of tackling this alongside all those weird and wonderful JS/CSS attributes which has escaped characters beyond quotation marks (which we already handle today). As for encoding, I think given the constraints of JavaScript we are probably better off just treating all input as Unicode. I haven't had to use the "encoding" section of my web browser for years now. |
Must remain escaped: See https://mathiasbynens.be/notes/ambiguous-ampersands for some hardcore optimizations you could enable. |
E.g.
©
→©
or©
→©
You could use he’s
he.decode()
(combined withhe.escape()
, perhaps) for this. It implements the character reference decoding algorithm described in the HTML spec and support the special “attribute value mode” as well.The text was updated successfully, but these errors were encountered: