Skip to content

Commit

Permalink
Secure #sanitize, #strip_tags, and #strip_links helpers against xss a…
Browse files Browse the repository at this point in the history
…ttacks. Closes #8877. [Rick, lifofifo, Jacques Distler]

git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@7589 5ecf4fe2-1ee6-0310-87b1-e25e094e27de
  • Loading branch information
technoweenie committed Sep 23, 2007
1 parent 4965b1b commit 2d02199
Show file tree
Hide file tree
Showing 4 changed files with 423 additions and 53 deletions.
5 changes: 5 additions & 0 deletions actionpack/CHANGELOG
@@ -1,5 +1,10 @@
*SVN*

* Secure #sanitize, #strip_tags, and #strip_links helpers against xss attacks. Closes #8877. [Rick, lifofifo, Jacques Distler]

This merges and renames the popular white_list helper (along with some css sanitizing from Jacques Distler version of the same plugin).
Also applied updated versions of #strip_tags and #strip_links from #8877.

* Remove use of & logic operator. Closes #8114. [watson]

* Fixed JavaScriptHelper#escape_javascript to also escape closing tags #8023 [rubyruy]
Expand Down
129 changes: 129 additions & 0 deletions actionpack/lib/action_view/base.rb
Expand Up @@ -198,6 +198,135 @@ class Base

@@erb_variable = '_erbout'
cattr_accessor :erb_variable

# A regular expression of the valid characters used to separate protocols like
# the ':' in 'http://foo.com'
@@sanitized_protocol_separator = /:|(&#0*58)|(&#x70)|(%|%)3A/
cattr_accessor :sanitized_protocol_separator

# Specifies a Set of HTML attributes that can have URIs.
@@sanitized_uri_attributes = Set.new(%w(href src cite action longdesc xlink:href lowsrc))
cattr_reader :sanitized_uri_attributes

# Adds valid HTML attributes that the #sanitize helper checks for URIs.
#
# Rails::Initializer.run do |config|
# config.action_view.sanitized_uri_attributes = 'lowsrc', 'target'
# end
#
def self.sanitized_uri_attributes=(attributes)
@@sanitized_uri_attributes.merge(attributes)
end

# Specifies a Set of 'bad' tags that the #sanitize helper will remove completely, as opposed
# to just escaping harmless tags like <font>
@@sanitized_bad_tags = Set.new('script')
cattr_reader :sanitized_bad_tags

# Adds to the Set of 'bad' tags for the #sanitize helper.
#
# Rails::Initializer.run do |config|
# config.action_view.sanitized_bad_tags = 'embed', 'object'
# end
#
def self.sanitized_bad_tags=(attributes)
@@sanitized_bad_tags.merge(attributes)
end

# Specifies the default Set of tags that the #sanitize helper will allow unscathed.
@@sanitized_allowed_tags = Set.new(%w(strong em b i p code pre tt output samp kbd var sub
sup dfn cite big small address hr br div span h1 h2 h3 h4 h5 h6 ul ol li dt dd abbr
acronym a img blockquote del ins fieldset legend))
cattr_reader :sanitized_allowed_tags

# Adds to the Set of allowed tags for the #sanitize helper.
#
# Rails::Initializer.run do |config|
# config.action_view.sanitized_allowed_tags = 'table', 'tr', 'td'
# end
#
def self.sanitized_allowed_tags=(attributes)
@@sanitized_allowed_tags.merge(attributes)
end

# Specifies the default Set of html attributes that the #sanitize helper will leave
# in the allowed tag.
@@sanitized_allowed_attributes = Set.new(%w(href src width height alt cite datetime title class name xml:lang abbr))
cattr_reader :sanitized_allowed_attributes

# Adds to the Set of allowed html attributes for the #sanitize helper.
#
# Rails::Initializer.run do |config|
# config.action_view.sanitized_allowed_attributes = 'onclick', 'longdesc'
# end
#
def self.sanitized_allowed_attributes=(attributes)
@@sanitized_allowed_attributes.merge(attributes)
end

# Specifies the default Set of acceptable css properties that #sanitize and #sanitize_css will accept.
@@sanitized_allowed_css_properties = Set.new(%w(azimuth background-color border-bottom-color border-collapse
border-color border-left-color border-right-color border-top-color clear color cursor direction display
elevation float font font-family font-size font-style font-variant font-weight height letter-spacing line-height
overflow pause pause-after pause-before pitch pitch-range richness speak speak-header speak-numeral speak-punctuation
speech-rate stress text-align text-decoration text-indent unicode-bidi vertical-align voice-family volume white-space
width))
cattr_reader :sanitized_allowed_css_properties

# Adds to the Set of allowed css properties for the #sanitize and #sanitize_css heleprs.
#
# Rails::Initializer.run do |config|
# config.action_view.sanitized_allowed_css_properties = 'expression'
# end
#
def self.sanitized_allowed_css_properties=(attributes)
@@sanitized_allowed_css_properties.merge(attributes)
end

# Specifies the default Set of acceptable css keywords that #sanitize and #sanitize_css will accept.
@@sanitized_allowed_css_keywords = Set.new(%w(auto aqua black block blue bold both bottom brown center
collapse dashed dotted fuchsia gray green !important italic left lime maroon medium none navy normal
nowrap olive pointer purple red right solid silver teal top transparent underline white yellow))
cattr_reader :sanitized_allowed_css_keywords

# Adds to the Set of allowed css keywords for the #sanitize and #sanitize_css helpers.
#
# Rails::Initializer.run do |config|
# config.action_view.sanitized_allowed_css_keywords = 'expression'
# end
#
def self.sanitized_allowed_css_keywords=(attributes)
@@sanitized_allowed_css_keywords.merge(attributes)
end

# Specifies the default Set of allowed shorthand css properties for the #sanitize and #sanitize_css helpers.
@@sanitized_shorthand_css_properties = Set.new(%w(background border margin padding))
cattr_reader :sanitized_shorthand_css_properties

# Adds to the Set of allowed shorthand css properties for the #sanitize and #sanitize_css helpers.
#
# Rails::Initializer.run do |config|
# config.action_view.sanitized_shorthand_css_properties = 'expression'
# end
#
def self.sanitized_shorthand_css_properties=(attributes)
@@sanitized_shorthand_css_properties.merge(attributes)
end

# Specifies the default Set of protocols that the #sanitize helper will leave in
# protocol attributes.
@@sanitized_allowed_protocols = Set.new(%w(ed2k ftp http https irc mailto news gopher nntp telnet webcal xmpp callto feed svn urn aim rsync tag ssh sftp rtsp afs))
cattr_reader :sanitized_allowed_protocols

# Adds to the Set of allowed protocols for the #sanitize helper.
#
# Rails::Initializer.run do |config|
# config.action_view.sanitized_allowed_protocols = 'ssh', 'feed'
# end
#
def self.sanitized_allowed_protocols=(attributes)
@@sanitized_allowed_protocols.merge(attributes)
end

@@template_handlers = HashWithIndifferentAccess.new

Expand Down
136 changes: 98 additions & 38 deletions actionpack/lib/action_view/helpers/text_helper.rb
Expand Up @@ -324,63 +324,118 @@ def auto_link(text, link = :all, href_options = {}, &block)
#
# strip_links('Blog: <a href="http://www.myblog.com/" class="nav" target=\"_blank\">Visit</a>.')
# # => Blog: Visit
def strip_links(text)
text.gsub(/<a\b.*?>(.*?)<\/a>/mi, '\1')
def strip_links(html)
# Stupid firefox treats '<href="http://whatever.com" onClick="alert()">something' as link!
if html.index("<a") || html.index("<href")
tokenizer = HTML::Tokenizer.new(html)
result = ''
while token = tokenizer.next
node = HTML::Node.parse(nil, 0, 0, token, false)
result << node.to_s unless node.is_a?(HTML::Tag) && ["a", "href"].include?(node.name)
end
strip_links(result) # Recurse - handle all dirty nested links
else
html
end
end

VERBOTEN_TAGS = %w(form script plaintext) unless defined?(VERBOTEN_TAGS)
VERBOTEN_ATTRS = /^on/i unless defined?(VERBOTEN_ATTRS)

# Sanitizes the +html+ by converting <form> and <script> tags into regular
# text, and removing all "on*" (e.g., onClick) attributes so that arbitrary Javascript
# cannot be executed. It also removes <tt>href</tt> and <tt>src</tt> attributes that start with
# "javascript:". You can modify what gets sanitized by defining VERBOTEN_TAGS
# and VERBOTEN_ATTRS before this Module is loaded.
# This #sanitize helper will html encode all tags and strip all attributes that aren't specifically allowed.
# It also strips href/src tags with invalid protocols, like javascript: especially. It does its best to counter any
# tricks that hackers may use, like throwing in unicode/ascii/hex values to get past the javascript: filters. Check out
# the extensive test suite.
#
# ==== Examples
# sanitize('<script> do_nasty_stuff() </script>')
# # => &lt;script> do_nasty_stuff() &lt;/script>
# <%= sanitize @article.body %>
#
# You can add or remove tags/attributes if you want to customize it a bit. See ActionView::Base for full docs on the
# available options. You can add tags/attributes for single uses of #sanitize by passing either the :attributes or :tags options:
#
# sanitize('<a href="javascript: sucker();">Click here for $100</a>')
# # => <a>Click here for $100</a>
# Normal Use
#
# sanitize('<a href="#" onClick="kill_all_humans();">Click here!!!</a>')
# # => <a href="#">Click here!!!</a>
# <%= sanitize @article.body %>
#
# sanitize('<img src="javascript:suckers_run_this();" />')
# # => <img />
def sanitize(html)
# only do this if absolutely necessary
if html.index("<")
# Custom Use
#
# <%= sanitize @article.body, :tags => %w(table tr td), :attributes => %w(id class style)
#
# Add table tags
#
# Rails::Initializer.run do |config|
# config.action_view.sanitized_allowed_tags = 'table', 'tr', 'td'
# end
#
# Remove tags
#
# Rails::Initializer.run do |config|
# config.after_initialize do
# ActionView::Base.sanitized_allowed_tags.delete 'div'
# end
# end
#
# Change allowed attributes
#
# Rails::Initializer.run do |config|
# config.action_view.sanitized_allowed_attributes = 'id', 'class', 'style'
# end
#
def sanitize(html, options = {})
return html if html.blank? || !html.include?('<')
attrs = options.key?(:attributes) ? Set.new(options[:attributes]).merge(sanitized_allowed_attributes) : sanitized_allowed_attributes
tags = options.key?(:tags) ? Set.new(options[:tags] ).merge(sanitized_allowed_tags) : sanitized_allowed_tags
returning [] do |new_text|
tokenizer = HTML::Tokenizer.new(html)
new_text = ""

parent = []
while token = tokenizer.next
node = HTML::Node.parse(nil, 0, 0, token, false)
new_text << case node
when HTML::Tag
if VERBOTEN_TAGS.include?(node.name)
node.to_s.gsub(/</, "&lt;")
if node.closing == :close
parent.shift
else
if node.closing != :close
node.attributes.delete_if { |attr,v| attr =~ VERBOTEN_ATTRS }
%w(href src).each do |attr|
node.attributes.delete attr if node.attributes[attr] =~ /^javascript:/i
end
end
node.to_s
parent.unshift node.name
end
node.attributes.keys.each do |attr_name|
value = node.attributes[attr_name].to_s
if !attrs.include?(attr_name) || contains_bad_protocols?(attr_name, value)
node.attributes.delete(attr_name)
else
node.attributes[attr_name] = attr_name == 'style' ? sanitize_css(value) : CGI::escapeHTML(value)
end
end if node.attributes
tags.include?(node.name) ? node : nil
else
node.to_s.gsub(/</, "&lt;")
sanitized_bad_tags.include?(parent.first) ? nil : node.to_s.gsub(/</, "&lt;")
end
end
end.join
end

html = new_text
# Sanitizes a block of css code. Used by #sanitize when it comes across a style attribute
def sanitize_css(style)
# disallow urls
style = style.to_s.gsub(/url\s*\(\s*[^\s)]+?\s*\)\s*/, ' ')

# gauntlet
if style !~ /^([:,;#%.\sa-zA-Z0-9!]|\w-\w|\'[\s\w]+\'|\"[\s\w]+\"|\([\d,\s]+\))*$/ ||
style !~ /^(\s*[-\w]+\s*:\s*[^:;]*(;|$))*$/
return ''
end

html
returning [] do |clean|
style.scan(/([-\w]+)\s*:\s*([^:;]*)/) do |prop,val|
if sanitized_allowed_css_properties.include?(prop.downcase)
clean << prop + ': ' + val + ';'
elsif sanitized_shorthand_css_properties.include?(prop.split('-')[0].downcase)
unless val.split().any? do |keyword|
!sanitized_allowed_css_keywords.include?(keyword) &&
keyword !~ /^(#[0-9a-f]+|rgb\(\d+%?,\d*%?,?\d*%?\)?|\d{0,2}\.?\d{0,2}(cm|em|ex|in|mm|pc|pt|px|%|,|\))?)$/
end
clean << prop + ': ' + val + ';'
end
end
end
end.join(' ')
end

# Strips all HTML tags from the +html+, including comments. This uses the
# html-scanner tokenizer and so its HTML parsing ability is limited by
# that of html-scanner.
Expand All @@ -407,7 +462,7 @@ def strip_tags(html)
end
# strip any comments, and if they have a newline at the end (ie. line with
# only a comment) strip that too
text.gsub(/<!--(.*?)-->[\n]?/m, "")
strip_tags(text.gsub(/<!--(.*?)-->[\n]?/m, "")) # Recurse - handle all dirty nested tags
else
html # already plain text
end
Expand Down Expand Up @@ -574,6 +629,11 @@ def auto_link_email_addresses(text)
end
end
end

def contains_bad_protocols?(attr_name, value)
sanitized_uri_attributes.include?(attr_name) &&
(value =~ /(^[^\/:]*):|(&#0*58)|(&#x70)|(%|&#37;)3A/ && !sanitized_allowed_protocols.include?(value.split(sanitized_protocol_separator).first))
end
end
end
end

0 comments on commit 2d02199

Please sign in to comment.