New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow safe inline CSS styles in Markdown #5012

Merged
merged 12 commits into from Aug 29, 2018

Conversation

Projects
None yet
3 participants
@biermeester
Member

biermeester commented Jul 31, 2018

Fixes #1812 and #5183

This pull request makes it possible to add style attributes to HTML tags embedded in Markdown cells.

The CSS properties and their values inside the style attribute will be sanitized: if they are not allowed, or malformed, they will get stripped silently.

This update also allows for the embedding of encoded image data inside the src attribute of image tags.

Most CSS properties and their values are supported. The missing (safe) ones will be added in the near future.

There's bound to be some errors in the (great number of) regular expressions used to sanitize the CSS values. Don't hesitate to open an issue and mention me if some CSS you think is valid is not working as expected.

@blink1073

This comment has been minimized.

Show comment
Hide comment
@blink1073

blink1073 Jul 31, 2018

Member

This looks great, thanks! Can we also add line-height, text-align, and color? For reference, here are the allowed attributes in caja.

Member

blink1073 commented Jul 31, 2018

This looks great, thanks! Can we also add line-height, text-align, and color? For reference, here are the allowed attributes in caja.

@blink1073

This comment has been minimized.

Show comment
Hide comment
@blink1073

blink1073 Jul 31, 2018

Member

Also rowspan and colspan as previously requested.

Member

blink1073 commented Jul 31, 2018

Also rowspan and colspan as previously requested.

@biermeester

This comment has been minimized.

Show comment
Hide comment
@biermeester

biermeester Aug 1, 2018

Member

Thanks, I'll see if I can rework the caja configuration.

Member

biermeester commented Aug 1, 2018

Thanks, I'll see if I can rework the caja configuration.

@biermeester

This comment has been minimized.

Show comment
Hide comment
@biermeester

biermeester Aug 4, 2018

Member

I've been looking at caja's allowed css properties, but to convert all those to regular expressions for sanitize is quite a bit of work. Are there any css properties that we definitely don't want people to use?

Member

biermeester commented Aug 4, 2018

I've been looking at caja's allowed css properties, but to convert all those to regular expressions for sanitize is quite a bit of work. Are there any css properties that we definitely don't want people to use?

@blink1073

This comment has been minimized.

Show comment
Hide comment
@blink1073

blink1073 Aug 4, 2018

Member

@biermeester, I would suggest either writing a script to scrape the caja fields or stick with what we have and whitelist more as they are requested and are individually evaluated.

Member

blink1073 commented Aug 4, 2018

@biermeester, I would suggest either writing a script to scrape the caja fields or stick with what we have and whitelist more as they are requested and are individually evaluated.

@blink1073

This comment has been minimized.

Show comment
Hide comment
@blink1073

blink1073 Aug 6, 2018

Member

Looks like the rebase caught a bunch of other commits.

Member

blink1073 commented Aug 6, 2018

Looks like the rebase caught a bunch of other commits.

@biermeester

This comment has been minimized.

Show comment
Hide comment
@biermeester

biermeester Aug 6, 2018

Member

Oops, will do a little cleanup today.

Member

biermeester commented Aug 6, 2018

Oops, will do a little cleanup today.

@blink1073

This comment has been minimized.

Show comment
Hide comment
@blink1073

blink1073 Aug 7, 2018

Member

Bravo, @biermeester, thank you! Can you share the script(s) you used for extraction as a gist perhaps for posterity in case we need to do this again for HTML 6.0 😉?

Member

blink1073 commented Aug 7, 2018

Bravo, @biermeester, thank you! Can you share the script(s) you used for extraction as a gist perhaps for posterity in case we need to do this again for HTML 6.0 😉?

@biermeester

This comment has been minimized.

Show comment
Hide comment
@biermeester

biermeester Aug 7, 2018

Member

Extracting the tags and attributes was pretty trivial. I'm still working on converting the CSS property value syntax into Regular Expressions, which is a lot less trivial :)

I'll make sure to share the code snippets that I used to help extract/transform the data when I'm done.

Member

biermeester commented Aug 7, 2018

Extracting the tags and attributes was pretty trivial. I'm still working on converting the CSS property value syntax into Regular Expressions, which is a lot less trivial :)

I'll make sure to share the code snippets that I used to help extract/transform the data when I'm done.

@blink1073

This comment has been minimized.

Show comment
Hide comment
@blink1073

blink1073 Aug 7, 2018

Member

Great, thank you!

Member

blink1073 commented Aug 7, 2018

Great, thank you!

@ellisonbg

This comment has been minimized.

Show comment
Hide comment
@ellisonbg

ellisonbg Aug 12, 2018

Contributor

I haven't followed the PR closely - the only thing I want us to make sure of is that we are not allowing any tags or attribute/styles that have security vulnerabilities. @biermeester can you comment on that aspect of this PR?

Contributor

ellisonbg commented Aug 12, 2018

I haven't followed the PR closely - the only thing I want us to make sure of is that we are not allowing any tags or attribute/styles that have security vulnerabilities. @biermeester can you comment on that aspect of this PR?

@biermeester

This comment has been minimized.

Show comment
Hide comment
@biermeester

biermeester Aug 12, 2018

Member

I've been working off of Google Caja to make sure css properties (or property values) that are deemed unsafe/disruptive will get filtered out by sanitize-html.

There's only a few of such properties/values, and since sanitize-html requires me to white-list everything that is allowed, I've been working on quite some regular expressions to do so.

I'm not even sure that the end result will be workable, performance wise. Some testing will need to be done.

Member

biermeester commented Aug 12, 2018

I've been working off of Google Caja to make sure css properties (or property values) that are deemed unsafe/disruptive will get filtered out by sanitize-html.

There's only a few of such properties/values, and since sanitize-html requires me to white-list everything that is allowed, I've been working on quite some regular expressions to do so.

I'm not even sure that the end result will be workable, performance wise. Some testing will need to be done.

@blink1073 blink1073 modified the milestones: 0.34, 0.35 Aug 13, 2018

@biermeester biermeester changed the title from Allow certain inline CSS styles in Markdown to Allow safe inline CSS styles in Markdown Aug 13, 2018

@biermeester

This comment has been minimized.

Show comment
Hide comment
@biermeester

biermeester Aug 13, 2018

Member

I've added CSS value validation for most of the allowed CSS properties. I've removed some properties for now (e.g. animation properties) and simplified others, but I will probably add/extend those in the near future.

I was a bit worried about how the performance would be, but until now I don't see any ill effects.

I used the following script to extract information from Caja's Json definitions: https://gist.github.com/biermeester/9437f13d3735e65ba013c947ea59021f

A demo notebook can be found here: https://gist.github.com/biermeester/83802c33fb8483d25bcff6b59ce1353d

Next up is going to a be a test case extension, that will make sure that harmful CSS gets filtered out.

Member

biermeester commented Aug 13, 2018

I've added CSS value validation for most of the allowed CSS properties. I've removed some properties for now (e.g. animation properties) and simplified others, but I will probably add/extend those in the near future.

I was a bit worried about how the performance would be, but until now I don't see any ill effects.

I used the following script to extract information from Caja's Json definitions: https://gist.github.com/biermeester/9437f13d3735e65ba013c947ea59021f

A demo notebook can be found here: https://gist.github.com/biermeester/83802c33fb8483d25bcff6b59ce1353d

Next up is going to a be a test case extension, that will make sure that harmful CSS gets filtered out.

@blink1073

This comment has been minimized.

Show comment
Hide comment
@blink1073

blink1073 Aug 13, 2018

Member

I think a test in test-rendermime that reads from your example notebook and verifies stripped properties would work nicely.

Member

blink1073 commented Aug 13, 2018

I think a test in test-rendermime that reads from your example notebook and verifies stripped properties would work nicely.

@blink1073

This comment has been minimized.

Show comment
Hide comment
@blink1073

blink1073 Aug 13, 2018

Member

Or at the sanitizer test level, whichever is easier.

Member

blink1073 commented Aug 13, 2018

Or at the sanitizer test level, whichever is easier.

@blink1073

This comment has been minimized.

Show comment
Hide comment
@blink1073

blink1073 Aug 14, 2018

Member

This looks like a legitimate failure:

lerna ERR! ERROR in bundle.js from UglifyJs
lerna ERR! Legacy octal escape sequences are not allowed in strict mode [/home/travis/build/jupyterlab/jupyterlab/packages/apputils/lib/sanitizer.js:41,0][bundle.js:105459,22]
Member

blink1073 commented Aug 14, 2018

This looks like a legitimate failure:

lerna ERR! ERROR in bundle.js from UglifyJs
lerna ERR! Legacy octal escape sequences are not allowed in strict mode [/home/travis/build/jupyterlab/jupyterlab/packages/apputils/lib/sanitizer.js:41,0][bundle.js:105459,22]
@biermeester

This comment has been minimized.

Show comment
Hide comment
@biermeester

biermeester Aug 14, 2018

Member

Weird, local tests seemed to run fine. Thanks for the heads-up, I'll have a look.

Member

biermeester commented Aug 14, 2018

Weird, local tests seemed to run fine. Thanks for the heads-up, I'll have a look.

@blink1073

This comment has been minimized.

Show comment
Hide comment
@blink1073

blink1073 Aug 14, 2018

Member

It shows up when you run "build:prod".

Member

blink1073 commented Aug 14, 2018

It shows up when you run "build:prod".

@blink1073

This comment has been minimized.

Show comment
Hide comment
@blink1073

blink1073 Aug 16, 2018

Member

Thanks again, @biermeester! I'll come back to reviewing this after the 0.34 release process.

Member

blink1073 commented Aug 16, 2018

Thanks again, @biermeester! I'll come back to reviewing this after the 0.34 release process.

@blink1073

This comment has been minimized.

Show comment
Hide comment
@blink1073

blink1073 Aug 29, 2018

Member

@biermeester, this is great! Do you mind updating the top level comment for posterity?

Member

blink1073 commented Aug 29, 2018

@biermeester, this is great! Do you mind updating the top level comment for posterity?

@biermeester

This comment has been minimized.

Show comment
Hide comment
@biermeester

biermeester Aug 29, 2018

Member

Done. Is there also some piece of the documentation you would like me to update?

Member

biermeester commented Aug 29, 2018

Done. Is there also some piece of the documentation you would like me to update?

@blink1073

This comment has been minimized.

Show comment
Hide comment
@blink1073

blink1073 Aug 29, 2018

Member

Nope, this is perfect, thank you!

Member

blink1073 commented Aug 29, 2018

Nope, this is perfect, thank you!

@blink1073 blink1073 merged commit 649fc5d into jupyterlab:master Aug 29, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@biermeester biermeester deleted the biermeester:sanitize_html_augmentation branch Aug 29, 2018

@blink1073 blink1073 modified the milestones: 0.35, 1.0 Sep 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment