Safe JSON #4196

alexcjohnson · 2023-05-09T02:27:21Z

Fixing a possible XSS issue when our JSON is inserted into an HTML string. We could apply this just to fig.to_html, but to be even safer I'm applying it to all JSON serialization, in case users use a different mechanism to insert this into HTML.

Performance is a concern here, but in my testing except in really pathological situations these substitutions are a good deal faster than serialization even by orjson.

I have read through the contributing notes and understand the structure of the package. In particular, if my PR modifies code of plotly.graph_objects, my modifications concern the codegen files and not generated files.
I have added tests (if submitting a new feature or correcting a bug) or
modified existing tests.
I have added a CHANGELOG entry if fixing/changing/adding anything substantial.

for insertion in HTML, to avoid XSS

alexcjohnson · 2023-05-09T02:36:19Z

packages/python/plotly/plotly/io/_json.py

+)
+_swap_orjson = _swap_json + (
+    ("\u2028", "\\u2028"),
+    ("\u2029", "\\u2029"),


These are apparently JavaScript line terminator characters. The JS parser will see this end-of-line, see that the data structure is incomplete, and throw an error. AFAICT this isn't in itself a security issue, but it's a bug - if you ever wanted these characters in your figure for real it wouldn't work (when inserted in HTML).

The standard library json converts at least these unicode characters into escape sequences anyway, so we don't need to worry about them, but orjson sends them as unicode characters.

alexcjohnson · 2023-05-09T02:41:24Z

packages/python/plotly/plotly/io/_json.py

+        if unsafe_char in out:
+            out = out.replace(unsafe_char, safe_char)


Testing first whether the character is present in the string is substantially faster than .replace when it finds even a single instance of the character - so given that much of the time you won't have some of these characters it's worthwhile including the if first. And looping over each character in turn is a lot faster than any solution I could find that only searches the string once, ie a regexp.

alexcjohnson · 2023-05-09T02:44:28Z

packages/python/plotly/test_requirements/requirements_39_optional.txt

@@ -19,3 +19,4 @@ matplotlib==2.2.3
 scikit-image==0.18.1
 psutil==5.7.0
 kaleido
+orjson==3.8.12


I added orjson only to py3.7 and py3.9 optional tests, so that we'd run a bunch of the other tests using json. Recent orjson doesn't support py3.6 anyway.

LiamConnors

💃

alexcjohnson added 3 commits May 8, 2023 18:48

escape unsafe chars in JSON

9543afe

for insertion in HTML, to avoid XSS

test for json sanitization

115d166

adjust old test for orjson datetime64 fixed precision

fabd54d

alexcjohnson commented May 9, 2023

View reviewed changes

changelog for JSON sanitizer

b955f35

LiamConnors self-requested a review May 9, 2023 22:04

LiamConnors approved these changes May 9, 2023

View reviewed changes

alexcjohnson merged commit fc3ef00 into master May 10, 2023
5 checks passed

alexcjohnson deleted the safe-json branch May 10, 2023 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safe JSON #4196

Safe JSON #4196

alexcjohnson commented May 9, 2023 •

edited

alexcjohnson May 9, 2023

alexcjohnson May 9, 2023

alexcjohnson May 9, 2023

LiamConnors left a comment

		if unsafe_char in out:
		out = out.replace(unsafe_char, safe_char)

Safe JSON #4196

Safe JSON #4196

Conversation

alexcjohnson commented May 9, 2023 • edited

alexcjohnson May 9, 2023

Choose a reason for hiding this comment

alexcjohnson May 9, 2023

Choose a reason for hiding this comment

alexcjohnson May 9, 2023

Choose a reason for hiding this comment

LiamConnors left a comment

Choose a reason for hiding this comment

alexcjohnson commented May 9, 2023 •

edited