Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add :strict context for stricter sanitization #46

Merged
merged 3 commits into from Jun 16, 2017
Merged

Add :strict context for stricter sanitization #46

merged 3 commits into from Jun 16, 2017

Conversation

yuku
Copy link
Contributor

@yuku yuku commented Jun 15, 2017

What

This patch introduces :strict context. When it is given, UserSanitizer filter sanitizes malicious HTML tags right after core Markdown renderer and before any other filters.

Why

Since Sanitize (renamed as FinalSanitizer) filter is applied at the end of html-pipeline, its rules are intentionally weakened to allow elements and attributes which are generated by other filters.

Intentionally weakened?

For example, in previous, <div> element and class attribute was allowed so that users can emulate almost all UI components of Qiita in their articles. Since articles are hosted on qiita.com, there is a potential risk of being abused as a phishing site.

With :strict context, they are not permitted.

@yuku
Copy link
Contributor Author

yuku commented Jun 15, 2017

@yujinakayama review please

@yuku yuku requested a review from yujinakayama June 15, 2017 08:43
@yujinakayama
Copy link
Contributor

The name FinalSanitize and UserSanitize look strange as they consist of [adjective + verb] or [noun + verb]. How about renaming them to FinalSanitizer and UserSanitizer (or maybe UserInputSanitizer would be better since we're not sanitizing users themselves)?

@yujinakayama
Copy link
Contributor

Typos in the commit message Rename Filters::Sanitize as Filters::FianalSanitize

'<i class="fa fa-user"></i>user'
end

it "sanitized them" do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description should be does not sanitize them or allows them

'<i class="fa fa-user"></i>user'
end

it "sanitized them" do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitizes

@yujinakayama
Copy link
Contributor

Could you extract the RuboCop and CI commits into another PR? 🙏

@yuku yuku force-pushed the strict branch 3 times, most recently from 4ec6839 to 961668f Compare June 15, 2017 10:45
@yuku
Copy link
Contributor Author

yuku commented Jun 15, 2017

@yujinakayama Updated

Copy link
Contributor

@yujinakayama yujinakayama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a regression where the UserInputSanitizer breaks Markdown blockquotes (> foo) by confusing the angle brackets as HTML tags, since the sanitizer has no knowledge of Markdown and is run against raw Markdown source before the core Markdown renderer (greenmat). I've added a pending spec for it.

  1) Qiita::Markdown::Processor#call with strict context with blockquote syntax does not confuse it with HTML tag angle brackets
     # No reason given
     Failure/Error: should eq "<blockquote>\n<p>foo</p>\n</blockquote>\n"

       expected: "<blockquote>\n<p>foo</p>\n</blockquote>\n"
            got: "<p>&gt; foo</p>\n"

       (compared using ==)

       Diff:
       @@ -1,4 +1,2 @@
       -<blockquote>
       -<p>foo</p>
       -</blockquote>
       +<p>&gt; foo</p>

     # ./spec/qiita/markdown/processor_spec.rb:1048:in `block (5 levels) in <top (required)>'

@yujinakayama
Copy link
Contributor

yujinakayama commented Jun 16, 2017

Maybe we should merge the :strict and :script contexts into something like sanitization mode, because context { strict: true, script: true } doesn't make sense. Also, I think the name strict is a bit ambiguous since it does not indicate what it makes strict: sanitization, Markdown syntax compliance, or restriction of some features?

Edit: We need not necessarily to handle it right now in this PR.

@yuku
Copy link
Contributor Author

yuku commented Jun 16, 2017

Maybe we should merge the :strict and :script contexts into something like sanitization mode, because context { strict: true, script: true } doesn't make sense.

How about, sanitization_mode = :qiita | :qiita_team | :custom or :strict | :allow_script | :custom? (When :custom is given, FinalSanitizer takes :rules context into account.)

Edit: We decided to not handle the problem in this PR for simplicity.

@yuku
Copy link
Contributor Author

yuku commented Jun 16, 2017

Greenmat filter inserts class attribute in two cases:

  1. anchor for headings. (fixed by Extract heading decoration logic from Greenmat renderer to Toc filter #48)
  2. code block with characters. (fixed by Follow change of code block metadata attribute in greenmat #49)

We have to fix them ahead. 🏃

yujinakayama added a commit to increments/greenmat that referenced this pull request Jun 16, 2017
... so that we can sanitize `class` attribute inputted by user in
post-process.

Ref: increments/qiita-markdown#46
yujinakayama added a commit to increments/greenmat that referenced this pull request Jun 16, 2017
... so that we can sanitize `class` attribute inputted by user in
post-process.

Ref: increments/qiita-markdown#46
yujinakayama added a commit to increments/greenmat that referenced this pull request Jun 16, 2017
... so that we can sanitize `class` attribute inputted by user in
post-process.

Ref: increments/qiita-markdown#46
"rev" => %w[footnote],
},
"sup" => {
"id" => /^fnref\d+$/,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always want to use \A instead of ^, and \z instead of $ when possible.

private

def transform_attribute(attr, pattern)
node.attributes[attr].value = node.attributes[attr].value.split.select do |value|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write as:

  • node.attributes[attr].value -> node[attr]
  • node.attributes[attr].value = -> node[attr] =

http://www.rubydoc.info/gems/nokogiri/1.8.0/Nokogiri/XML/Node#[]-instance_method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omitting argument of String#split might be dangerous, since it depends on a special global variable $;:

http://ruby-doc.org/core-2.4.1/String.html#method-i-split

If pattern is nil, the value of $; is used. If $; is nil (which is the default), str is split on whitespace as if ‘ ’ were specified.

Since we're doing security things here, it's better to always be defensive 😉

@yuku
Copy link
Contributor Author

yuku commented Jun 16, 2017

@yujinakayama Updated 🙇

Copy link
Contributor

@yujinakayama yujinakayama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@yujinakayama yujinakayama changed the title Add :strict context for stricter stanitization Add :strict context for stricter sanitization Jun 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants