Add `:strict` context for stricter sanitization #46

yuku · 2017-06-15T08:30:25Z

What

This patch introduces :strict context. When it is given, UserSanitizer filter sanitizes malicious HTML tags right after core Markdown renderer and before any other filters.

Why

Since Sanitize (renamed as FinalSanitizer) filter is applied at the end of html-pipeline, its rules are intentionally weakened to allow elements and attributes which are generated by other filters.

Intentionally weakened?

For example, in previous, <div> element and class attribute was allowed so that users can emulate almost all UI components of Qiita in their articles. Since articles are hosted on qiita.com, there is a potential risk of being abused as a phishing site.

With :strict context, they are not permitted.

yuku · 2017-06-15T08:42:04Z

@yujinakayama review please

yujinakayama · 2017-06-15T09:17:58Z

The name FinalSanitize and UserSanitize look strange as they consist of [adjective + verb] or [noun + verb]. How about renaming them to FinalSanitizer and UserSanitizer (or maybe UserInputSanitizer would be better since we're not sanitizing users themselves)?

yujinakayama · 2017-06-15T09:18:47Z

Typos in the commit message Rename Filters::Sanitize as Filters::FianalSanitize

yujinakayama · 2017-06-15T09:24:03Z

spec/qiita/markdown/processor_spec.rb

+          '<i class="fa fa-user"></i>user'
+        end
+
+        it "sanitized them" do


This description should be does not sanitize them or allows them

yujinakayama · 2017-06-15T09:24:10Z

spec/qiita/markdown/processor_spec.rb

+          '<i class="fa fa-user"></i>user'
+        end
+
+        it "sanitized them" do


yujinakayama · 2017-06-15T09:26:48Z

Could you extract the RuboCop and CI commits into another PR? 🙏

yuku · 2017-06-15T10:46:46Z

@yujinakayama Updated

yujinakayama

There's a regression where the UserInputSanitizer breaks Markdown blockquotes (> foo) by confusing the angle brackets as HTML tags, since the sanitizer has no knowledge of Markdown and is run against raw Markdown source before the core Markdown renderer (greenmat). I've added a pending spec for it.

  1) Qiita::Markdown::Processor#call with strict context with blockquote syntax does not confuse it with HTML tag angle brackets
     # No reason given
     Failure/Error: should eq "<blockquote>\n<p>foo</p>\n</blockquote>\n"

       expected: "<blockquote>\n<p>foo</p>\n</blockquote>\n"
            got: "<p>&gt; foo</p>\n"

       (compared using ==)

       Diff:
       @@ -1,4 +1,2 @@
       -<blockquote>
       -<p>foo</p>
       -</blockquote>
       +<p>&gt; foo</p>

     # ./spec/qiita/markdown/processor_spec.rb:1048:in `block (5 levels) in <top (required)>'

yujinakayama · 2017-06-16T02:01:23Z

Maybe we should merge the :strict and :script contexts into something like sanitization mode, because context { strict: true, script: true } doesn't make sense. Also, I think the name strict is a bit ambiguous since it does not indicate what it makes strict: sanitization, Markdown syntax compliance, or restriction of some features?

Edit: We need not necessarily to handle it right now in this PR.

yuku · 2017-06-16T05:09:04Z

Maybe we should merge the :strict and :script contexts into something like sanitization mode, because context { strict: true, script: true } doesn't make sense.

~~How about, sanitization_mode = :qiita | :qiita_team | :custom or :strict | :allow_script | :custom? (When :custom is given, FinalSanitizer takes :rules context into account.)~~

Edit: We decided to not handle the problem in this PR for simplicity.

yuku · 2017-06-16T08:08:38Z

Greenmat filter inserts class attribute in two cases:

anchor for headings. (fixed by Extract heading decoration logic from Greenmat renderer to Toc filter #48)
code block with characters. (fixed by Follow change of code block metadata attribute in greenmat #49)

We have to fix them ahead. 🏃

... so that we can sanitize `class` attribute inputted by user in post-process. Ref: increments/qiita-markdown#46

yujinakayama · 2017-06-16T09:01:30Z

lib/qiita/markdown/filters/user_input_sanitizer.rb

+              "rev" => %w[footnote],
+            },
+            "sup" => {
+              "id" => /^fnref\d+$/,


We always want to use \A instead of ^, and \z instead of $ when possible.

yujinakayama · 2017-06-16T09:06:03Z

lib/qiita/markdown/filters/user_input_sanitizer.rb

+          private
+
+          def transform_attribute(attr, pattern)
+            node.attributes[attr].value = node.attributes[attr].value.split.select do |value|


You can write as:

node.attributes[attr].value -> node[attr]

node.attributes[attr].value = -> node[attr] =

http://www.rubydoc.info/gems/nokogiri/1.8.0/Nokogiri/XML/Node#[]-instance_method

Omitting argument of String#split might be dangerous, since it depends on a special global variable $;:

http://ruby-doc.org/core-2.4.1/String.html#method-i-split

If pattern is nil, the value of $; is used. If $; is nil (which is the default), str is split on whitespace as if ‘ ’ were specified.

Since we're doing security things here, it's better to always be defensive 😉

yuku · 2017-06-16T09:47:53Z

@yujinakayama Updated 🙇

yujinakayama

LGTM 👍

yuku requested a review from yujinakayama June 15, 2017 08:43

yujinakayama reviewed Jun 15, 2017

View reviewed changes

spec/qiita/markdown/processor_spec.rb Outdated

'<i class="fa fa-user"></i>user'

end

it "sanitized them" do

Copy link

Contributor

yujinakayama Jun 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitizes

yuku force-pushed the strict branch 3 times, most recently from 4ec6839 to 961668f Compare June 15, 2017 10:45

yuku force-pushed the strict branch from 961668f to c809325 Compare June 15, 2017 11:56

yuku mentioned this pull request Jun 15, 2017

Update Rubocop and test ruby versions #47

Merged

yujinakayama force-pushed the strict branch from 83f7414 to e6f7b93 Compare June 16, 2017 01:34

yujinakayama suggested changes Jun 16, 2017

View reviewed changes

yujinakayama mentioned this pull request Jun 16, 2017

Extract heading decoration logic from Greenmat renderer to Toc filter #48

Merged

yujinakayama added a commit to increments/greenmat that referenced this pull request Jun 16, 2017

Avoid using class attribute for code block metadata

d4dcec6

... so that we can sanitize `class` attribute inputted by user in post-process. Ref: increments/qiita-markdown#46

yujinakayama mentioned this pull request Jun 16, 2017

Avoid using class attribute for code block metadata increments/greenmat#5

Merged

yujinakayama added a commit to increments/greenmat that referenced this pull request Jun 16, 2017

Avoid using class attribute for code block metadata

4c53a84

... so that we can sanitize `class` attribute inputted by user in post-process. Ref: increments/qiita-markdown#46

yujinakayama added a commit to increments/greenmat that referenced this pull request Jun 16, 2017

Avoid using class attribute for code block metadata

88677c6

... so that we can sanitize `class` attribute inputted by user in post-process. Ref: increments/qiita-markdown#46

yujinakayama reviewed Jun 16, 2017

View reviewed changes

yuku force-pushed the strict branch from 52bec36 to 8332d00 Compare June 16, 2017 09:10

Rename Filters::Sanitize as Filters::FinalSanitizer

8fe01b5

yuku force-pushed the strict branch from ae3f659 to 4aa1d76 Compare June 16, 2017 09:27

Refactor specs by extracting shared examples for DRY

88e9944

yuku force-pushed the strict branch from 4aa1d76 to b73df71 Compare June 16, 2017 09:29

yujinakayama approved these changes Jun 16, 2017

View reviewed changes

yuku force-pushed the strict branch from 865da43 to 2ac0682 Compare June 16, 2017 10:00

yujinakayama changed the title ~~Add :strict context for stricter stanitization~~ Add :strict context for stricter sanitization Jun 16, 2017

Add :strict context for stricter sanitization

b1fbe72

yuku force-pushed the strict branch from 763e6be to b1fbe72 Compare June 16, 2017 12:35

yuku merged commit 9c0dec2 into master Jun 16, 2017

yuku deleted the strict branch June 16, 2017 12:40

yujinakayama mentioned this pull request Jun 16, 2017

Add missing sanitization for <div> class attribute #51

Merged

yuku mentioned this pull request Jun 21, 2017

Merge :strict and :script contexts #56

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `:strict` context for stricter sanitization #46

Add `:strict` context for stricter sanitization #46

yuku commented Jun 15, 2017 •

edited

yuku commented Jun 15, 2017

yujinakayama commented Jun 15, 2017

yujinakayama commented Jun 15, 2017

yujinakayama Jun 15, 2017

yujinakayama Jun 15, 2017

yujinakayama commented Jun 15, 2017

yuku commented Jun 15, 2017

yujinakayama left a comment

yujinakayama commented Jun 16, 2017 •

edited

yuku commented Jun 16, 2017 •

edited

yuku commented Jun 16, 2017 •

edited

yujinakayama Jun 16, 2017

yujinakayama Jun 16, 2017

yujinakayama Jun 16, 2017 •

edited

yuku commented Jun 16, 2017

yujinakayama left a comment

Add :strict context for stricter sanitization #46

Add :strict context for stricter sanitization #46

Conversation

yuku commented Jun 15, 2017 • edited

What

Why

Intentionally weakened?

yuku commented Jun 15, 2017

yujinakayama commented Jun 15, 2017

yujinakayama commented Jun 15, 2017

yujinakayama Jun 15, 2017

Choose a reason for hiding this comment

yujinakayama Jun 15, 2017

Choose a reason for hiding this comment

yujinakayama commented Jun 15, 2017

yuku commented Jun 15, 2017

yujinakayama left a comment

Choose a reason for hiding this comment

yujinakayama commented Jun 16, 2017 • edited

yuku commented Jun 16, 2017 • edited

yuku commented Jun 16, 2017 • edited

yujinakayama Jun 16, 2017

Choose a reason for hiding this comment

yujinakayama Jun 16, 2017

Choose a reason for hiding this comment

yujinakayama Jun 16, 2017 • edited

Choose a reason for hiding this comment

yuku commented Jun 16, 2017

yujinakayama left a comment

Choose a reason for hiding this comment

Add `:strict` context for stricter sanitization #46

Add `:strict` context for stricter sanitization #46

yuku commented Jun 15, 2017 •

edited

yujinakayama commented Jun 16, 2017 •

edited

yuku commented Jun 16, 2017 •

edited

yuku commented Jun 16, 2017 •

edited

yujinakayama Jun 16, 2017 •

edited