Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe JSON #4196

Merged
merged 4 commits into from
May 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ This project adheres to [Semantic Versioning](http://semver.org/).

### Fixed
- Fixed another compatibility issue with Pandas 2.0, just affecting `px.*(line_close=True)` [[#4190](https://github.com/plotly/plotly.py/pull/4190)]
- Added some rounding to the `make_subplots` function to handle situations where the user-input specs cause the domain to exceed 1 by small amounts https://github.com/plotly/plotly.py/pull/4153
- Added some rounding to the `make_subplots` function to handle situations where the user-input specs cause the domain to exceed 1 by small amounts [[#4153](https://github.com/plotly/plotly.py/pull/4153)]
- Sanitize JSON output to prevent an XSS vector when graphs are inserted directly into HTML [[#4196](https://github.com/plotly/plotly.py/pull/4196)]

## [5.14.1] - 2023-04-05

Expand Down
29 changes: 26 additions & 3 deletions packages/python/plotly/plotly/io/_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,25 @@ def coerce_to_strict(const):
return const


_swap_json = (
("<", "\\u003c"),
(">", "\\u003e"),
("/", "\\u002f"),
)
_swap_orjson = _swap_json + (
("\u2028", "\\u2028"),
("\u2029", "\\u2029"),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are apparently JavaScript line terminator characters. The JS parser will see this end-of-line, see that the data structure is incomplete, and throw an error. AFAICT this isn't in itself a security issue, but it's a bug - if you ever wanted these characters in your figure for real it wouldn't work (when inserted in HTML).

The standard library json converts at least these unicode characters into escape sequences anyway, so we don't need to worry about them, but orjson sends them as unicode characters.

)


def _safe(json_str, _swap):
out = json_str
for unsafe_char, safe_char in _swap:
if unsafe_char in out:
out = out.replace(unsafe_char, safe_char)
Comment on lines +74 to +75
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing first whether the character is present in the string is substantially faster than .replace when it finds even a single instance of the character - so given that much of the time you won't have some of these characters it's worthwhile including the if first. And looping over each character in turn is a lot faster than any solution I could find that only searches the string once, ie a regexp.

return out


def to_json_plotly(plotly_object, pretty=False, engine=None):
"""
Convert a plotly/Dash object to a JSON string representation
Expand Down Expand Up @@ -120,7 +139,9 @@ def to_json_plotly(plotly_object, pretty=False, engine=None):

from _plotly_utils.utils import PlotlyJSONEncoder

return json.dumps(plotly_object, cls=PlotlyJSONEncoder, **opts)
return _safe(
json.dumps(plotly_object, cls=PlotlyJSONEncoder, **opts), _swap_json
)
elif engine == "orjson":
JsonConfig.validate_orjson()
opts = orjson.OPT_NON_STR_KEYS | orjson.OPT_SERIALIZE_NUMPY
Expand All @@ -136,7 +157,9 @@ def to_json_plotly(plotly_object, pretty=False, engine=None):

# Try without cleaning
try:
return orjson.dumps(plotly_object, option=opts).decode("utf8")
return _safe(
orjson.dumps(plotly_object, option=opts).decode("utf8"), _swap_orjson
)
except TypeError:
pass

Expand All @@ -146,7 +169,7 @@ def to_json_plotly(plotly_object, pretty=False, engine=None):
datetime_allowed=True,
modules=modules,
)
return orjson.dumps(cleaned, option=opts).decode("utf8")
return _safe(orjson.dumps(cleaned, option=opts).decode("utf8"), _swap_orjson)


def to_json(fig, validate=True, pretty=False, remove_uids=True, engine=None):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import pandas as pd
import json
import datetime
import re
import sys
from pytz import timezone
from _plotly_utils.optional_imports import get_module
Expand Down Expand Up @@ -201,6 +202,14 @@ def to_str(v):

array_str = to_json_test(dt_values)
expected = build_test_dict_string(array_str)
if orjson:
# orjson always serializes datetime64 to ns, but json will return either
# full seconds or microseconds, if the rest is zeros.
# we don't care about any trailing zeros
trailing_zeros = re.compile(r'[.]?0+"')
result = trailing_zeros.sub('"', result)
expected = trailing_zeros.sub('"', expected)

assert result == expected
check_roundtrip(result, engine=engine, pretty=pretty)

Expand All @@ -221,3 +230,25 @@ def test_mixed_string_nonstring_key(engine, pretty):
value = build_test_dict({0: 1, "a": 2})
result = pio.to_json_plotly(value, engine=engine)
check_roundtrip(result, engine=engine, pretty=pretty)


def test_sanitize_json(engine):
layout = {"title": {"text": "</script>\u2028\u2029"}}
fig = go.Figure(layout=layout)
fig_json = pio.to_json_plotly(fig, engine=engine)
layout_2 = json.loads(fig_json)["layout"]
del layout_2["template"]

assert layout == layout_2

replacements = {
"<": "\\u003c",
">": "\\u003e",
"/": "\\u002f",
"\u2028": "\\u2028",
"\u2029": "\\u2029",
}

for bad, good in replacements.items():
assert bad not in fig_json
assert good in fig_json
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ matplotlib==2.2.3
scikit-image==0.14.4
psutil==5.7.0
kaleido
orjson==3.8.12
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ matplotlib==2.2.3
scikit-image==0.18.1
psutil==5.7.0
kaleido
orjson==3.8.12
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added orjson only to py3.7 and py3.9 optional tests, so that we'd run a bunch of the other tests using json. Recent orjson doesn't support py3.6 anyway.