Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML entities #75

Closed
mathiasbynens opened this issue Jul 4, 2013 · 5 comments
Closed

HTML entities #75

mathiasbynens opened this issue Jul 4, 2013 · 5 comments

Comments

@mathiasbynens
Copy link
Collaborator

E.g. ©© or &#00000000000000000000000000000169;©

You could use he’s he.decode() (combined with he.escape(), perhaps) for this. It implements the character reference decoding algorithm described in the HTML spec and support the special “attribute value mode” as well.

@kangax
Copy link
Owner

kangax commented Jul 4, 2013

Hm, is this safe to do? For all document encodings/lack of encoding?

@mathiasbynens
Copy link
Collaborator Author

For pages with <meta charset=utf-8> or similar that are saved using UTF-8 encoding this would be safe.

@kangax
Copy link
Owner

kangax commented Jul 4, 2013

Then we can have it as a non-default option to see how it behaves in other cases

@alexlamsl
Copy link
Collaborator

So I'll need some pointers here - what are the valid (enough) non-ASCII characters that does not need to be escaped in a HTML document?

I am thinking of tackling this alongside all those weird and wonderful JS/CSS attributes which has escaped characters beyond quotation marks (which we already handle today).

As for encoding, I think given the constraints of JavaScript we are probably better off just treating all input as Unicode. I haven't had to use the "encoding" section of my web browser for years now.

@mathiasbynens
Copy link
Collaborator Author

Must remain escaped: <, & in some cases (see below), " and ' in attribute values. And the backtick character too, if you care about old IE. > doesn’t need to be escaped unless it’s part of an unquoted attribute value.

See https://mathiasbynens.be/notes/ambiguous-ampersands for some hardcore optimizations you could enable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants