-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc.outerHTML is not properly (entity) encoding attribute values #641
Comments
I'm not sure; do browsers behave differently? |
Not a bug. Whether a character was originally represented by an entity is intentionally not stored anywhere -- just like every other HTML/XML parser I've worked with. Here's what Chrome does:
See also #323 Is this actually causing problems? Do you need the reserialized output to be pure ASCII? |
Thanks @papandreou for testing this for us. |
the issue is that the page doesn't contain what encoding it is, so presumably ascii. So are we to assume that the serialized output will be utf8 and therefore overwrite the doc's encoding to be utf8? I would assume you'd want to have a way to preserve the original encoding, so you can re-serialize it out as ascii. |
@thehesiod I believe the default encoding for documents without a |
(Actually, I think the |
I do actually have proof :) If you JSDOM http://www.w3schools.com/js/tryit.asp?filename=tryjs_write, then serialize it back to the browser, you'll note that the "Submit Code »" button does not render correctly. However if you do something like this: response.headers['content-type'] += '; charset=utf-8'; then it will render correctly. Thus, as I suspected the default charset is not utf8, but probably iso-latin1. |
I think in any scenario there's an issue:
|
actually, here's the real proof. On that page, if you execute the following you'll get the charset the browser thinks it is: |
Right, but serializing something back to the browser isn't jsdom's job. jsdom's job is to emulate a HTML DOM. Re-serializing is apparently something you're trying to use jsdom for, yes, but it's not a built-in feature. A simple example would be trying to deserialize and re-serialize a badly-formed document like If you try this in the console of your browser on that page: document.getElementById("submitBTN").outerHTML you'll get back exactly what jsdom gives, namely I'm still not seeing anything that jsdom does differently from a browser. Can you give me a line of JavaScript I can run which produces different results in jsdom and in the browser? |
Interesting, indeed outerhtml is decoded in the browser, but view source is encoded...I suppose it makes sense. We'll just do our fixups then, thanks for the patience and time! Domenic Denicola notifications@github.com wrote: Right, but serializing something back to the browser isn't jsdom's job. jsdom's job is to emulate a HTML DOM. If you try this in the console of your browser on that page: document.getElementById("submitBTN").outerHTML you'll get back exactly what jsdom gives, namely I'm still not seeing anything that jsdom does differently from a browser. Can you give me a line of JavaScript I can run which produces different results in jsdom and in the browser? Reply to this email directly or view it on GitHub: |
Hi,
Was trying to use JSDOM on http://www.w3schools.com/js/tryit.asp?filename=tryjs_write, by building a DOM out of the HTML and making minor changes and then serializing the DOM.
However the serialized HTML doesn't have the attribute values properly HTML encoded. In particular there's attribute value called 'Submit code »' which does get HTML decoded during DOM creation, but not encoded during serialization.
Is this a bug?
Thanks, Sunil
The text was updated successfully, but these errors were encountered: