MDN URL
https://developer.mozilla.org/en-US/docs/Web/API/Element/getAttribute
What specific section or headline is this issue about?
No response
What information was incorrect, unhelpful, or incomplete?
I only recently discovered that HTML character entities (such as <) in a tag's attribute, are decoded (e.g. back into <) when you extract them using .getAttribute(). I searched the MDN site to get some insight into this behavior but I couldn't find it mentioned anywhere. This seems like maybe the best page to mention it on.
It's good to be aware of this because it can lead to XSS issues if you think that the HTML entities will still be encoded.
I came across this in a situation where user-supplied data was escaped using PHP's htmlentities(), then rendered as an HTML tag's data attribute. JavaScript would then get the data from there and insert it into the page as the contents of a modal (using .innerHTML). The developer thought they were being secure by escaping the user-supplied content, but unbeknownst to them, it was unescaped when it was read from the data attribute in JavaScript.
As best as I can tell from searching the Internet, this decoding of HTML character entities in HTML tag attributes is an intended behavior, specified in the HTML parser specifications (though I couldn't find exactly where).
What did you expect to see?
Some sort of security note mentioning that element.getAttribute() returns the rendered (or "decoded") version of any HTML entities in the attribute, so use caution if taking a data attribute and inserting it into the page as HTML because if you have this HTML:
<div id="myElement" data-mydata="<b>test</b>" />
... and you run this Javascript:
const elem = document.getElementById('myElement');
const mydata = elem.getAttribute('data-mydata');
You'll receive the string "<b>test</b>" rather than "<b>test</b>"
Do you have any supporting links, references, or citations?
No response
Do you have anything more you want to share?
No response
MDN metadata
Page report details
MDN URL
https://developer.mozilla.org/en-US/docs/Web/API/Element/getAttribute
What specific section or headline is this issue about?
No response
What information was incorrect, unhelpful, or incomplete?
I only recently discovered that HTML character entities (such as
<) in a tag's attribute, are decoded (e.g. back into<) when you extract them using.getAttribute(). I searched the MDN site to get some insight into this behavior but I couldn't find it mentioned anywhere. This seems like maybe the best page to mention it on.It's good to be aware of this because it can lead to XSS issues if you think that the HTML entities will still be encoded.
I came across this in a situation where user-supplied data was escaped using PHP's
htmlentities(), then rendered as an HTML tag's data attribute. JavaScript would then get the data from there and insert it into the page as the contents of a modal (using.innerHTML). The developer thought they were being secure by escaping the user-supplied content, but unbeknownst to them, it was unescaped when it was read from the data attribute in JavaScript.As best as I can tell from searching the Internet, this decoding of HTML character entities in HTML tag attributes is an intended behavior, specified in the HTML parser specifications (though I couldn't find exactly where).
What did you expect to see?
Some sort of security note mentioning that
element.getAttribute()returns the rendered (or "decoded") version of any HTML entities in the attribute, so use caution if taking a data attribute and inserting it into the page as HTML because if you have this HTML:... and you run this Javascript:
You'll receive the string
"<b>test</b>"rather than"<b>test</b>"Do you have any supporting links, references, or citations?
No response
Do you have anything more you want to share?
No response
MDN metadata
Page report details
en-us/web/api/element/getattribute