Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReDOS possible in the sanitize_html function #73

Closed
merbinr opened this issue Apr 12, 2023 · 13 comments
Closed

ReDOS possible in the sanitize_html function #73

merbinr opened this issue Apr 12, 2023 · 13 comments

Comments

@merbinr
Copy link

merbinr commented Apr 12, 2023

There is a security vulnerability in this gem. I tried to communicate with the maintainers in an email, but still haven't got a response, I'm raising the issue here.

Vulnerable Code:
https://github.com/jgarber/redcloth/blob/v4.3.2/lib/redcloth/formatters/html.rb#L327

image

The above code has /<(/)([A-Za-z]\w*)([^>]?)(\s?/?)>/ regex and it matches user-provided input with the regex pattern, the above regex is vulnerable to ReDOS

image

  • Run the test.rb (provided below) with the time ruby test.rb command and observe the time and CPU taken to process it
  • RedCloth.new(text, [:sanitize_html]).to_html function will trigger the same function, so run the below code and observe the processing time
require 'RedCloth'
text = '<A' + 'A' * (54773)
text = RedCloth.new(text, [:sanitize_html]).to_html

Test Output:
image

Fix:

  • Rebuild the regex used as not vulnerable to ReDOS
  • You can use Recheck to check if a regex is vulnerable to ReDOS
@carnil
Copy link

carnil commented Jun 9, 2023

Apparently CVE-2023-31606 is assigned for this issue.

@korny
Copy link
Contributor

korny commented Jun 28, 2023

@e23e Thanks so much for investigating this! We're using it on our platform, and would like to stay with Textile if possible. Do you have a fix for the issue already? If not, I think I can help with the RegExp, and then we could at least create a fork 🤔

@korny
Copy link
Contributor

korny commented Jun 28, 2023

Changing the regex to:

    text.gsub!( /<(\/*)(?>[A-Za-z]\w*)([^>]*?)(\s?\/?)>/ ) do |m|

already speeds this up by 30x.


Edit: The version below doesn't work! Use this one instead: #73 (comment).

Allowing open tags (without a >) fixes the issue completely, but this might change the behavior in unexpected ways:

    text.gsub!( /<(\/*)(?>[A-Za-z]\w*)([^>]*?)(\s?\/?>?)/ ) do |m|

        "<#{raw[1]}#{pcs.join " "}#{raw[4]}"

@merbinr
Copy link
Author

merbinr commented Jun 28, 2023

Hi @korny
Thank you for looking into it. I appreciate your thoughts on fixing this. The solution you provided above fixes the issue partially, but I would like to see if we can fix it completely,

 def clean_html( text, allowed_tags = BASIC_TAGS )
     text.gsub!( /<!\[CDATA\[/, '' )
     text.gsub!( /<(\/*)([A-Za-z]\w*)([^>]*?)(\s?\/?)>/ ) do |m|

The clean_html function is used to clean the not allowed tags. I'm not an expert in building regex. I guess this regex detects tags in a string/paragraph. So if you could build a regex without backtracking that detects the tags, we can completely fix this.

Could you please let me know your thoughts on above?

@korny
Copy link
Contributor

korny commented Jun 28, 2023

After a bit of testing, I'm positive that my first version already fixes the problem for good. The use of Atomic Grouping effectively prevents backtracking in the second group (which matches the tag). The same can be achieved with a possessive quantifier:

    #                              v-- This "+" does the trick
    text.gsub!( /<(\/*)([A-Za-z]\w*+)([^>]*?)(\s?\/?)>/ ) do |m|

In this case, the \w*+ part (and therefore, the whole group ([A-Za-z]\w*+)) will not backtrack. The rest of this regex is not subject to ReDoS (it's linear). So, we can use this version; it should work in Ruby 1.9 and up.

By the way: Ruby 3.2 doesn't even need this patch because it uses some smart caching.

@merbinr
Copy link
Author

merbinr commented Jul 2, 2023

Hi @korny,
I did some testing, and as you said above, it is fixing the ReDOS issue, But the output for the old regex(present one in the gem) and the new regex differ,

I gave below as input.

"<a href=https://example.com> Example </a>"

With the old regex, it returns the below output.

<a href="https://example.com"> Example </a>

But with the new regex, it returns the below output.

<a >href=https://example.com> Example </a>>

> character is added after the tag. Could you please look into it?

I have added attached below the test code I ran

BASIC_TAGS = {
    'a' => ['href', 'title'],
    'img' => ['src', 'alt', 'title'],
    'br' => [],
    'i' => nil,
    'u' => nil, 
    'b' => nil,
    'pre' => nil,
    'kbd' => nil,
    'code' => ['lang'],
    'cite' => nil,
    'strong' => nil,
    'em' => nil,
    'ins' => nil,
    'sup' => nil,
    'sub' => nil,
    'del' => nil,
    'table' => nil,
    'tr' => nil,
    'td' => ['colspan', 'rowspan'],
    'th' => nil,
    'ol' => ['start'],
    'ul' => nil,
    'li' => nil,
    'p' => nil,
    'h1' => nil,
    'h2' => nil,
    'h3' => nil,
    'h4' => nil,
    'h5' => nil,
    'h6' => nil, 
    'blockquote' => ['cite'],
    'notextile' => nil
}

def clean_html( text, regex, allowed_tags = BASIC_TAGS )
    text.gsub!( /<!\[CDATA\[/, '' )
    text.gsub!( regex ) do |m|
      raw = $~
      tag = raw[2].downcase
      if allowed_tags.has_key? tag
        pcs = [tag]
        allowed_tags[tag].each do |prop|
          ['"', "'", ''].each do |q|
            q2 = ( q != '' ? q : '\s' )
            if raw[3] =~ /#{prop}\s*=\s*#{q}([^#{q2}]+)#{q}/i
              attrv = $1
              next if (prop == 'src' or prop == 'href') and not attrv =~ %r{^(http|https|ftp):}
              pcs << "#{prop}=\"#{attrv.gsub('"', '\\"')}\""
              break
            end
          end
        end if allowed_tags[tag]
        "<#{raw[1]}#{pcs.join " "}#{raw[4]}>"
      else # Unauthorized tag
        if block_given?
          yield m
        else
          ''
        end
      end
    end
end



text = "<a href=https://example.com> Example </a>"
old_regex = /<(\/*)([A-Za-z]\w*)([^>]*?)(\s?\/?)>/
new_regex = /<(\/*)([A-Za-z]\w*+)([^>]*?)(\s?\/?>?)/
output = clean_html(text,old_regex)
puts "Output for the currently used regex: ",output
output = clean_html(text,new_regex)
puts "Output for the new proposed regex: ",output

@korny
Copy link
Contributor

korny commented Jul 2, 2023

I'm sorry, it seems I had a copy/paste error in my comment. The only change I propose is the + in the tag name group. The change at the end (including the > in the last group) was nonsense. I updated the comments.

Here's the PR: https://github.com/jgarber/redcloth/pull/75/files.

Thanks for testing this!

@merbinr
Copy link
Author

merbinr commented Jul 3, 2023

Hi @korny

It works fine and fixes the ReDOS issue,
Thank you for looking into this.

@heliocola
Copy link
Collaborator

Thank you @korny and @e23e .
I will try to add some tests to cover the scenarios above, merge #75, and release a new version to Rubygems!

@heliocola
Copy link
Collaborator

@korny : while working on adding tests to ensure the regexp will not take a long time, I've noticed your proposed regexp fails on Recheck playgound (https://makenowjust-labs.github.io/recheck/playground/).

Here is what I get:
Screenshot 2023-11-01 at 11 10 22 PM

However, on my development laptop, I was able to replicate the time spent between the old regexp and the one you provided in #75 .

Do you have any thoughts?

@korny
Copy link
Contributor

korny commented Nov 2, 2023

Recheck can’t really read Ruby regexps. It doesn’t support all of the syntax that Ruby provides. Don’t use it to verify Ruby regexps.

@heliocola
Copy link
Collaborator

Thank you @korny .
I will release a new version to Rubygems soon! 🤞

@heliocola
Copy link
Collaborator

RedCloth v4.3.3 has been release to RubyGems! 🎉
THANK YOU @korny and @e23e !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants