String#unescapeHTML() decodes entities after stripping tags, reintroducing markup

`String#unescapeHTML()` calls `stripTags()` first and then decodes entities. Because the decode runs after the strip, encoded markup that survives stripping (since it is not a real tag at that point) gets turned back into live markup. Any code that assumes the output is tag-free will be wrong.

Current implementation (around line 439 of `src/prototype/lang/string.js`):

```js
function unescapeHTML() {
  return this.stripTags().replace(/&lt;/g,'<').replace(/&gt;/g,'>').replace(/&amp;/g,'&');
}
```

Reproduction:

```js
'&lt;img src=x onerror=alert(1)&gt;'.unescapeHTML();
// stripTags() leaves the entity text alone (there is no real tag yet),
// then the decode step produces a live tag:
// => '<img src=x onerror=alert(1)>'
```

If a developer relies on `unescapeHTML()` to produce safe, tag-free text before inserting it into the page, the decode step reintroduces executable markup, which is a path to XSS.

Suggested fix: decode entities first and then strip, or use a single normalization pass that does not leave decoded markup behind. It would also help to document that the result is not safe to insert into the DOM as HTML.

Refs: CWE-79, CWE-116, OWASP ASVS V5.3.3.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String#unescapeHTML() decodes entities after stripping tags, reintroducing markup #371

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

String#unescapeHTML() decodes entities after stripping tags, reintroducing markup #371

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions