Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace whitelist with more neutral language #339

Merged
merged 1 commit into from
Dec 2, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ EmojiPipeline = Pipeline.new [
* `ImageMaxWidthFilter` - link to full size image for large images
* `MarkdownFilter` - convert markdown to html
* `PlainTextInputFilter` - html escape text and wrap the result in a div
* `SanitizationFilter` - whitelist sanitize user markup
* `SanitizationFilter` - allow sanitize user markup
* `SyntaxHighlightFilter` - code syntax highlighter
* `TextileFilter` - convert textile to html
* `TableOfContentsFilter` - anchor headings with name attributes and generate Table of Contents html unordered list linking headings
Expand Down Expand Up @@ -330,9 +330,9 @@ html_fragment = "This is outside of an html element, but <strong>this isn't. :+1
EmojiPipeline.call("<div>#{html_fragment}</div>") # <- Wrap your own html fragments to avoid escaping
```

### 2. How do I customize a whitelist for `SanitizationFilter`s?
### 2. How do I customize an allowlist for `SanitizationFilter`s?

`SanitizationFilter::WHITELIST` is the default whitelist used if no `:whitelist`
`SanitizationFilter::ALLOWLIST` is the default allowlist used if no `:allowlist`
argument is given in the context. The default is a good starting template for
you to add additional elements. You can either modify the constant's value, or
re-define your own constant and pass that in via the context.
Expand Down
18 changes: 14 additions & 4 deletions lib/html/pipeline/camo_filter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ class Pipeline
# Context options:
# :asset_proxy (required) - Base URL for constructed asset proxy URLs.
# :asset_proxy_secret_key (required) - The shared secret used to encode URLs.
# :asset_proxy_whitelist - Array of host Strings or Regexps to skip
# :asset_proxy_allowlist - Array of host Strings or Regexps to skip
# src rewriting.
#
# This filter does not write additional information to the context.
Expand All @@ -37,7 +37,7 @@ def call
end

next if uri.host.nil?
next if asset_host_whitelisted?(uri.host)
next if asset_host_allowed?(uri.host)

element['src'] = asset_proxy_url(original_src)
element['data-canonical-src'] = original_src
Expand Down Expand Up @@ -76,11 +76,21 @@ def asset_proxy_secret_key
end

def asset_proxy_whitelist
context[:asset_proxy_whitelist] || []
warn "[DEPRECATION] 'asset_proxy_whitelist' is deprecated. Please use 'asset_proxy_allowlist' instead."
asset_proxy_allowlist
end

def asset_proxy_allowlist
context[:asset_proxy_allowlist] || context[:asset_proxy_whitelist] || []
end

def asset_host_whitelisted?(host)
asset_proxy_whitelist.any? do |test|
warn "[DEPRECATION] 'asset_host_whitelisted?' is deprecated. Please use 'asset_host_allowed?' instead."
asset_host_allowed?(host)
end

def asset_host_allowed?(host)
asset_proxy_allowlist.any? do |test|
test.is_a?(String) ? host == test : test.match(host)
end
end
Expand Down
39 changes: 22 additions & 17 deletions lib/html/pipeline/sanitization_filter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

module HTML
class Pipeline
# HTML filter with sanization routines and whitelists. This module defines
# HTML filter with sanization routines and allowlists. This module defines
# what HTML is allowed in user provided content and fixes up issues with
# unbalanced tags and whatnot.
#
Expand All @@ -13,13 +13,13 @@ class Pipeline
# https://github.com/rgrove/sanitize/#readme
#
# Context options:
# :whitelist - The sanitizer whitelist configuration to use. This
# :allowlist - The sanitizer allowlist configuration to use. This
# can be one of the options constants defined in this
# class or a custom sanitize options hash.
# :anchor_schemes - The URL schemes to allow in <a href> attributes. The
# default set is provided in the ANCHOR_SCHEMES
# constant in this class. If passed, this overrides any
# schemes specified in the whitelist configuration.
# schemes specified in the allowlist configuration.
#
# This filter does not write additional information to the context.
class SanitizationFilter < Filter
Expand All @@ -37,9 +37,9 @@ class SanitizationFilter < Filter
# These schemes are the only ones allowed in <a href> attributes by default.
ANCHOR_SCHEMES = ['http', 'https', 'mailto', 'xmpp', :relative, 'github-windows', 'github-mac', 'irc', 'ircs'].freeze

# The main sanitization whitelist. Only these elements and attributes are
# The main sanitization allowlist. Only these elements and attributes are
# allowed through by default.
WHITELIST = {
ALLOWLIST = {
elements: %w[
h1 h2 h3 h4 h5 h6 h7 h8 br b i strong em a pre code img tt
div ins del sup sub p ol ul table thead tbody tfoot blockquote
Expand Down Expand Up @@ -108,10 +108,10 @@ class SanitizationFilter < Filter
].freeze
}.freeze

# A more limited sanitization whitelist. This includes all attributes,
# protocols, and transformers from WHITELIST but with a more locked down
# A more limited sanitization allowlist. This includes all attributes,
# protocols, and transformers from ALLOWLIST but with a more locked down
# set of allowed elements.
LIMITED = WHITELIST.merge(
LIMITED = ALLOWLIST.merge(
elements: %w[b i strong em a pre code img ins del sup sub mark abbr p ol ul li]
)

Expand All @@ -120,19 +120,24 @@ class SanitizationFilter < Filter

# Sanitize markup using the Sanitize library.
def call
Sanitize.clean_node!(doc, whitelist)
Sanitize.clean_node!(doc, allowlist)
end

# The whitelist to use when sanitizing. This can be passed in the context
# hash to the filter but defaults to WHITELIST constant value above.
def whitelist
whitelist = context[:whitelist] || WHITELIST
warn "[DEPRECATION] 'whitelist' is deprecated. Please use 'allowlist' instead."
allowlist
end

# The allowlist to use when sanitizing. This can be passed in the context
# hash to the filter but defaults to ALLOWLIST constant value above.
def allowlist
allowlist = context[:allowlist] || context[:whitelist] || ALLOWLIST
anchor_schemes = context[:anchor_schemes]
return whitelist unless anchor_schemes
whitelist = whitelist.dup
whitelist[:protocols] = (whitelist[:protocols] || {}).dup
whitelist[:protocols]['a'] = (whitelist[:protocols]['a'] || {}).merge('href' => anchor_schemes)
whitelist
return allowlist unless anchor_schemes
allowlist = allowlist.dup
allowlist[:protocols] = (allowlist[:protocols] || {}).dup
allowlist[:protocols]['a'] = (allowlist[:protocols]['a'] || {}).merge('href' => anchor_schemes)
allowlist
end
end
end
Expand Down
29 changes: 28 additions & 1 deletion test/html/pipeline/camo_filter_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ def setup
@options = {
asset_proxy: @asset_proxy_url,
asset_proxy_secret_key: @asset_proxy_secret_key,
asset_proxy_whitelist: [/(^|\.)github\.com$/]
asset_proxy_allowlist: [/(^|\.)github\.com$/]
}
end

Expand Down Expand Up @@ -76,4 +76,31 @@ def test_required_context_validation
assert_match /:asset_proxy[^_]/, exception.message
assert_match /:asset_proxy_secret_key/, exception.message
end

def test_deprecated_asset_proxy_whitelist_context
orig = %(<p><img></p>)
deprecated_options = {
asset_proxy: @asset_proxy_url,
asset_proxy_secret_key: @asset_proxy_secret_key,
asset_proxy_whitelist: [/(^|\.)github\.com$/]
}
assert_equal [/(^|\.)github\.com$/], CamoFilter.new(orig, deprecated_options).asset_proxy_allowlist
end

def test_deprecation_warning_asset_proxy_whitelist
orig = %(<p><img></p>)
_stdout, stderror = capture_io do
CamoFilter.new(orig, @options).asset_proxy_whitelist
end
assert_match "[DEPRECATION] 'asset_proxy_whitelist' is deprecated. Please use 'asset_proxy_allowlist' instead.", stderror
end

def test_deprecation_warning_asset_host_whitelisted
orig = %(<p><img></p>)
host = 'https://facebook.com'
_stdout, stderror = capture_io do
CamoFilter.new(orig, @options).asset_host_whitelisted?(host)
end
assert_match "[DEPRECATION] 'asset_host_whitelisted?' is deprecated. Please use 'asset_host_allowed?' instead.", stderror
end
end
35 changes: 25 additions & 10 deletions test/html/pipeline/sanitization_filter_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def test_unknown_schemes_are_removed
assert_equal '<a>Wat</a> is this', html
end

def test_whitelisted_longdesc_schemes_are_allowed
def test_allowlisted_longdesc_schemes_are_allowed
stuff = '<img src="./foo.jpg" longdesc="http://longdesc.com">'
html = SanitizationFilter.call(stuff).to_s
assert_equal '<img src="./foo.jpg" longdesc="http://longdesc.com">', html
Expand Down Expand Up @@ -81,35 +81,35 @@ def test_custom_anchor_schemes_are_not_removed

def test_anchor_schemes_are_merged_with_other_anchor_restrictions
stuff = '<a href="something-weird://heyyy" ping="more-weird://hiii">Wat</a> is this'
whitelist = {
allowlist = {
elements: ['a'],
attributes: { 'a' => %w[href ping] },
protocols: { 'a' => { 'ping' => ['http'] } }
}
filter = SanitizationFilter.new(stuff, whitelist: whitelist, anchor_schemes: ['something-weird'])
filter = SanitizationFilter.new(stuff, allowlist: allowlist, anchor_schemes: ['something-weird'])
html = filter.call.to_s
assert_equal '<a href="something-weird://heyyy">Wat</a> is this', html
end

def test_uses_anchor_schemes_from_whitelist_when_not_separately_specified
def test_uses_anchor_schemes_from_allowlist_when_not_separately_specified
stuff = '<a href="something-weird://heyyy">Wat</a> is this'
whitelist = {
allowlist = {
elements: ['a'],
attributes: { 'a' => ['href'] },
protocols: { 'a' => { 'href' => ['something-weird'] } }
}
filter = SanitizationFilter.new(stuff, whitelist: whitelist)
filter = SanitizationFilter.new(stuff, allowlist: allowlist)
html = filter.call.to_s
assert_equal stuff, html
end

def test_whitelist_contains_default_anchor_schemes
assert_equal SanitizationFilter::WHITELIST[:protocols]['a']['href'], ['http', 'https', 'mailto', 'xmpp', :relative, 'github-windows', 'github-mac', 'irc', 'ircs']
def test_allowlist_contains_default_anchor_schemes
assert_equal SanitizationFilter::ALLOWLIST[:protocols]['a']['href'], ['http', 'https', 'mailto', 'xmpp', :relative, 'github-windows', 'github-mac', 'irc', 'ircs']
end

def test_whitelist_from_full_constant
def test_allowlist_from_full_constant
stuff = '<a href="something-weird://heyyy" ping="more-weird://hiii">Wat</a> is this'
filter = SanitizationFilter.new(stuff, whitelist: SanitizationFilter::FULL)
filter = SanitizationFilter.new(stuff, allowlist: SanitizationFilter::FULL)
html = filter.call.to_s
assert_equal 'Wat is this', html
end
Expand Down Expand Up @@ -165,4 +165,19 @@ def test_nested_details_tag_are_not_removed
NESTED
assert_equal orig, SanitizationFilter.call(orig).to_s
end

def test_deprecated_whitelist_context
orig = %(<p><style>hey now</style></p>)
context = { whitelist: ['table'] }

assert_equal ['table'], SanitizationFilter.new(orig, context).allowlist
end

def test_deprecation_warning_whitelist
orig = %(<p><style>hey now</style></p>)
_stdout, stderror = capture_io do
SanitizationFilter.new(orig).whitelist
end
assert_match "[DEPRECATION] 'whitelist' is deprecated. Please use 'allowlist' instead.", stderror
end
end