Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe JSON #4196

Merged
merged 4 commits into from May 10, 2023
Merged

Safe JSON #4196

merged 4 commits into from May 10, 2023

Conversation

alexcjohnson
Copy link
Contributor

@alexcjohnson alexcjohnson commented May 9, 2023

Fixing a possible XSS issue when our JSON is inserted into an HTML string. We could apply this just to fig.to_html, but to be even safer I'm applying it to all JSON serialization, in case users use a different mechanism to insert this into HTML.

Performance is a concern here, but in my testing except in really pathological situations these substitutions are a good deal faster than serialization even by orjson.

  • I have read through the contributing notes and understand the structure of the package. In particular, if my PR modifies code of plotly.graph_objects, my modifications concern the codegen files and not generated files.
  • I have added tests (if submitting a new feature or correcting a bug) or
    modified existing tests.
  • I have added a CHANGELOG entry if fixing/changing/adding anything substantial.

)
_swap_orjson = _swap_json + (
("\u2028", "\\u2028"),
("\u2029", "\\u2029"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are apparently JavaScript line terminator characters. The JS parser will see this end-of-line, see that the data structure is incomplete, and throw an error. AFAICT this isn't in itself a security issue, but it's a bug - if you ever wanted these characters in your figure for real it wouldn't work (when inserted in HTML).

The standard library json converts at least these unicode characters into escape sequences anyway, so we don't need to worry about them, but orjson sends them as unicode characters.

Comment on lines +74 to +75
if unsafe_char in out:
out = out.replace(unsafe_char, safe_char)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing first whether the character is present in the string is substantially faster than .replace when it finds even a single instance of the character - so given that much of the time you won't have some of these characters it's worthwhile including the if first. And looping over each character in turn is a lot faster than any solution I could find that only searches the string once, ie a regexp.

@@ -19,3 +19,4 @@ matplotlib==2.2.3
scikit-image==0.18.1
psutil==5.7.0
kaleido
orjson==3.8.12
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added orjson only to py3.7 and py3.9 optional tests, so that we'd run a bunch of the other tests using json. Recent orjson doesn't support py3.6 anyway.

@LiamConnors LiamConnors self-requested a review May 9, 2023 22:04
Copy link
Contributor

@LiamConnors LiamConnors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💃

@alexcjohnson alexcjohnson merged commit fc3ef00 into master May 10, 2023
5 checks passed
@alexcjohnson alexcjohnson deleted the safe-json branch May 10, 2023 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants