Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow safe inline CSS styles in Markdown #5012

Merged
merged 12 commits into from Aug 29, 2018

Conversation

@biermeester
Copy link
Member

@biermeester biermeester commented Jul 31, 2018

Fixes #1812 and #5183

This pull request makes it possible to add style attributes to HTML tags embedded in Markdown cells.

The CSS properties and their values inside the style attribute will be sanitized: if they are not allowed, or malformed, they will get stripped silently.

This update also allows for the embedding of encoded image data inside the src attribute of image tags.

Most CSS properties and their values are supported. The missing (safe) ones will be added in the near future.

There's bound to be some errors in the (great number of) regular expressions used to sanitize the CSS values. Don't hesitate to open an issue and mention me if some CSS you think is valid is not working as expected.

@blink1073
Copy link
Member

@blink1073 blink1073 commented Jul 31, 2018

This looks great, thanks! Can we also add line-height, text-align, and color? For reference, here are the allowed attributes in caja.

@blink1073
Copy link
Member

@blink1073 blink1073 commented Jul 31, 2018

Also rowspan and colspan as previously requested.

@biermeester
Copy link
Member Author

@biermeester biermeester commented Aug 1, 2018

Thanks, I'll see if I can rework the caja configuration.

@biermeester biermeester force-pushed the biermeester:sanitize_html_augmentation branch from 661b86e to 47beb4d Aug 2, 2018
@biermeester
Copy link
Member Author

@biermeester biermeester commented Aug 4, 2018

I've been looking at caja's allowed css properties, but to convert all those to regular expressions for sanitize is quite a bit of work. Are there any css properties that we definitely don't want people to use?

@blink1073
Copy link
Member

@blink1073 blink1073 commented Aug 4, 2018

@biermeester, I would suggest either writing a script to scrape the caja fields or stick with what we have and whitelist more as they are requested and are individually evaluated.

@blink1073
Copy link
Member

@blink1073 blink1073 commented Aug 6, 2018

Looks like the rebase caught a bunch of other commits.

@biermeester
Copy link
Member Author

@biermeester biermeester commented Aug 6, 2018

Oops, will do a little cleanup today.

@biermeester biermeester force-pushed the biermeester:sanitize_html_augmentation branch 2 times, most recently from 47beb4d to 352cf73 Aug 6, 2018
@blink1073
Copy link
Member

@blink1073 blink1073 commented Aug 7, 2018

Bravo, @biermeester, thank you! Can you share the script(s) you used for extraction as a gist perhaps for posterity in case we need to do this again for HTML 6.0 😉?

@biermeester
Copy link
Member Author

@biermeester biermeester commented Aug 7, 2018

Extracting the tags and attributes was pretty trivial. I'm still working on converting the CSS property value syntax into Regular Expressions, which is a lot less trivial :)

I'll make sure to share the code snippets that I used to help extract/transform the data when I'm done.

@blink1073
Copy link
Member

@blink1073 blink1073 commented Aug 7, 2018

Great, thank you!

@ellisonbg
Copy link
Contributor

@ellisonbg ellisonbg commented Aug 12, 2018

I haven't followed the PR closely - the only thing I want us to make sure of is that we are not allowing any tags or attribute/styles that have security vulnerabilities. @biermeester can you comment on that aspect of this PR?

@biermeester
Copy link
Member Author

@biermeester biermeester commented Aug 12, 2018

I've been working off of Google Caja to make sure css properties (or property values) that are deemed unsafe/disruptive will get filtered out by sanitize-html.

There's only a few of such properties/values, and since sanitize-html requires me to white-list everything that is allowed, I've been working on quite some regular expressions to do so.

I'm not even sure that the end result will be workable, performance wise. Some testing will need to be done.

@blink1073 blink1073 modified the milestones: 0.34, 0.35 Aug 13, 2018
@biermeester biermeester force-pushed the biermeester:sanitize_html_augmentation branch from 352cf73 to 4f9d0a4 Aug 13, 2018
@biermeester biermeester changed the title Allow certain inline CSS styles in Markdown Allow safe inline CSS styles in Markdown Aug 13, 2018
@biermeester biermeester force-pushed the biermeester:sanitize_html_augmentation branch from 4f9d0a4 to a84ffd9 Aug 13, 2018
@biermeester
Copy link
Member Author

@biermeester biermeester commented Aug 13, 2018

I've added CSS value validation for most of the allowed CSS properties. I've removed some properties for now (e.g. animation properties) and simplified others, but I will probably add/extend those in the near future.

I was a bit worried about how the performance would be, but until now I don't see any ill effects.

I used the following script to extract information from Caja's Json definitions: https://gist.github.com/biermeester/9437f13d3735e65ba013c947ea59021f

A demo notebook can be found here: https://gist.github.com/biermeester/83802c33fb8483d25bcff6b59ce1353d

Next up is going to a be a test case extension, that will make sure that harmful CSS gets filtered out.

@blink1073
Copy link
Member

@blink1073 blink1073 commented Aug 13, 2018

I think a test in test-rendermime that reads from your example notebook and verifies stripped properties would work nicely.

@blink1073
Copy link
Member

@blink1073 blink1073 commented Aug 13, 2018

Or at the sanitizer test level, whichever is easier.

@biermeester biermeester force-pushed the biermeester:sanitize_html_augmentation branch from 3e7b006 to 10d33ad Aug 13, 2018
@blink1073
Copy link
Member

@blink1073 blink1073 commented Aug 14, 2018

This looks like a legitimate failure:

lerna ERR! ERROR in bundle.js from UglifyJs
lerna ERR! Legacy octal escape sequences are not allowed in strict mode [/home/travis/build/jupyterlab/jupyterlab/packages/apputils/lib/sanitizer.js:41,0][bundle.js:105459,22]
@biermeester
Copy link
Member Author

@biermeester biermeester commented Aug 14, 2018

Weird, local tests seemed to run fine. Thanks for the heads-up, I'll have a look.

@biermeester biermeester force-pushed the biermeester:sanitize_html_augmentation branch from 924f7e5 to 570bdbc Aug 21, 2018
@biermeester biermeester force-pushed the biermeester:sanitize_html_augmentation branch from 570bdbc to 631e365 Aug 28, 2018
@biermeester biermeester force-pushed the biermeester:sanitize_html_augmentation branch from 631e365 to 9eaf35f Aug 28, 2018
@blink1073
Copy link
Member

@blink1073 blink1073 commented Aug 29, 2018

@biermeester, this is great! Do you mind updating the top level comment for posterity?

@biermeester
Copy link
Member Author

@biermeester biermeester commented Aug 29, 2018

Done. Is there also some piece of the documentation you would like me to update?

@blink1073
Copy link
Member

@blink1073 blink1073 commented Aug 29, 2018

Nope, this is perfect, thank you!

@blink1073 blink1073 merged commit 649fc5d into jupyterlab:master Aug 29, 2018
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@biermeester biermeester deleted the biermeester:sanitize_html_augmentation branch Aug 29, 2018
@blink1073 blink1073 modified the milestones: 0.35, 1.0 Sep 5, 2018
@jasongrout jasongrout modified the milestones: 1.0, 0.35 Oct 1, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Aug 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

4 participants
You can’t perform that action at this time.