Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html_safe encoding bug #151

Closed
hyperknot opened this issue Mar 9, 2022 · 1 comment
Closed

html_safe encoding bug #151

hyperknot opened this issue Mar 9, 2022 · 1 comment

Comments

@hyperknot
Copy link

hyperknot commented Mar 9, 2022

The html_safe encoding doesn't always work. For example:

<script> -> \u003Cscript>
</script> -> \u003C\/script>

The browser parses this as: \x3Cscript> or \x3C/script>

The reference implementation for me is Python's Jinja2's htmlsafe_json_dumps:
https://github.com/pallets/jinja/blob/4bbb1fb5fe5ec141d302c5baff95165887fb7338/src/jinja2/utils.py#L626

    return markupsafe.Markup(
        dumps(obj, **kwargs)
        .replace("<", "\\u003c")
        .replace(">", "\\u003e")
        .replace("&", "\\u0026")
        .replace("'", "\\u0027")
    )

The Python implementation encodes:
<script> -> \u003cscript\u003e
</script> -> \u003c/script\u003e

The browser correctly parses these.

It might be as simple as lower vs. uppercase C, but the implementation looks quite complex so I couldn't figure out the bug. I like the simplicity of the Python implementation, it's just 4 string replacement.

@michalmuskala
Copy link
Owner

I'm sorry for a late reply.

In general, in JavaScript syntax "\u003C" and "\x3C" represent the exact same string (this is not true of JSON where only the \u forms are supported).
Screenshot 2022-09-12 at 21 40 37

I can't really reproduce the issue. If you have a way for me to reproduce, feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants