New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add :strict
context for stricter sanitization
#46
Conversation
@yujinakayama review please |
The name |
Typos in the commit message |
'<i class="fa fa-user"></i>user' | ||
end | ||
|
||
it "sanitized them" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description should be does not sanitize them
or allows them
'<i class="fa fa-user"></i>user' | ||
end | ||
|
||
it "sanitized them" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sanitizes
Could you extract the RuboCop and CI commits into another PR? 🙏 |
4ec6839
to
961668f
Compare
@yujinakayama Updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a regression where the UserInputSanitizer breaks Markdown blockquotes (> foo
) by confusing the angle brackets as HTML tags, since the sanitizer has no knowledge of Markdown and is run against raw Markdown source before the core Markdown renderer (greenmat). I've added a pending spec for it.
1) Qiita::Markdown::Processor#call with strict context with blockquote syntax does not confuse it with HTML tag angle brackets
# No reason given
Failure/Error: should eq "<blockquote>\n<p>foo</p>\n</blockquote>\n"
expected: "<blockquote>\n<p>foo</p>\n</blockquote>\n"
got: "<p>> foo</p>\n"
(compared using ==)
Diff:
@@ -1,4 +1,2 @@
-<blockquote>
-<p>foo</p>
-</blockquote>
+<p>> foo</p>
# ./spec/qiita/markdown/processor_spec.rb:1048:in `block (5 levels) in <top (required)>'
Maybe we should merge the Edit: We need not necessarily to handle it right now in this PR. |
Edit: We decided to not handle the problem in this PR for simplicity. |
Greenmat filter inserts
We have to fix them ahead. 🏃 |
... so that we can sanitize `class` attribute inputted by user in post-process. Ref: increments/qiita-markdown#46
... so that we can sanitize `class` attribute inputted by user in post-process. Ref: increments/qiita-markdown#46
... so that we can sanitize `class` attribute inputted by user in post-process. Ref: increments/qiita-markdown#46
"rev" => %w[footnote], | ||
}, | ||
"sup" => { | ||
"id" => /^fnref\d+$/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always want to use \A
instead of ^
, and \z
instead of $
when possible.
private | ||
|
||
def transform_attribute(attr, pattern) | ||
node.attributes[attr].value = node.attributes[attr].value.split.select do |value| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can write as:
node.attributes[attr].value
->node[attr]
node.attributes[attr].value =
->node[attr] =
http://www.rubydoc.info/gems/nokogiri/1.8.0/Nokogiri/XML/Node#[]-instance_method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omitting argument of String#split
might be dangerous, since it depends on a special global variable $;
:
http://ruby-doc.org/core-2.4.1/String.html#method-i-split
If pattern is nil, the value of
$; is used. If $ ; is nil (which is the default), str is split on whitespace as if ‘ ’ were specified.
Since we're doing security things here, it's better to always be defensive 😉
@yujinakayama Updated 🙇 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
:strict
context for stricter stanitization:strict
context for stricter sanitization
What
This patch introduces
:strict
context. When it is given,UserSanitizer
filter sanitizes malicious HTML tags right after core Markdown renderer and before any other filters.Why
Since
Sanitize
(renamed asFinalSanitizer
) filter is applied at the end of html-pipeline, its rules are intentionally weakened to allow elements and attributes which are generated by other filters.Intentionally weakened?
For example, in previous,
<div>
element andclass
attribute was allowed so that users can emulate almost all UI components of Qiita in their articles. Since articles are hosted on qiita.com, there is a potential risk of being abused as a phishing site.With
:strict
context, they are not permitted.