html_safe encoding bug #151

hyperknot · 2022-03-09T10:27:28Z

The html_safe encoding doesn't always work. For example:

<script> -> \u003Cscript>
</script> -> \u003C\/script>

The browser parses this as: \x3Cscript> or \x3C/script>

The reference implementation for me is Python's Jinja2's htmlsafe_json_dumps:
https://github.com/pallets/jinja/blob/4bbb1fb5fe5ec141d302c5baff95165887fb7338/src/jinja2/utils.py#L626

    return markupsafe.Markup(
        dumps(obj, **kwargs)
        .replace("<", "\\u003c")
        .replace(">", "\\u003e")
        .replace("&", "\\u0026")
        .replace("'", "\\u0027")
    )

The Python implementation encodes:
<script> -> \u003cscript\u003e
</script> -> \u003c/script\u003e

The browser correctly parses these.

It might be as simple as lower vs. uppercase C, but the implementation looks quite complex so I couldn't figure out the bug. I like the simplicity of the Python implementation, it's just 4 string replacement.

The text was updated successfully, but these errors were encountered:

michalmuskala · 2022-09-12T19:41:23Z

I'm sorry for a late reply.

In general, in JavaScript syntax "\u003C" and "\x3C" represent the exact same string (this is not true of JSON where only the \u forms are supported).

I can't really reproduce the issue. If you have a way for me to reproduce, feel free to reopen.

michalmuskala closed this as completed Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

html_safe encoding bug #151

html_safe encoding bug #151

hyperknot commented Mar 9, 2022 •

edited

michalmuskala commented Sep 12, 2022

html_safe encoding bug #151

html_safe encoding bug #151

Comments

hyperknot commented Mar 9, 2022 • edited

michalmuskala commented Sep 12, 2022

hyperknot commented Mar 9, 2022 •

edited