ReDOS possible in the sanitize_html function #73

merbinr · 2023-04-12T17:14:02Z

There is a security vulnerability in this gem. I tried to communicate with the maintainers in an email, but still haven't got a response, I'm raising the issue here.

Vulnerable Code:
https://github.com/jgarber/redcloth/blob/v4.3.2/lib/redcloth/formatters/html.rb#L327

The above code has /<(/)([A-Za-z]\w*)([^>]?)(\s?/?)>/ regex and it matches user-provided input with the regex pattern, the above regex is vulnerable to ReDOS

Run the test.rb (provided below) with the time ruby test.rb command and observe the time and CPU taken to process it
RedCloth.new(text, [:sanitize_html]).to_html function will trigger the same function, so run the below code and observe the processing time

require 'RedCloth'
text = '<A' + 'A' * (54773)
text = RedCloth.new(text, [:sanitize_html]).to_html

Test Output:

Fix:

Rebuild the regex used as not vulnerable to ReDOS
You can use Recheck to check if a regex is vulnerable to ReDOS

The text was updated successfully, but these errors were encountered:

carnil · 2023-06-09T18:44:39Z

Apparently CVE-2023-31606 is assigned for this issue.

korny · 2023-06-28T14:47:32Z

@e23e Thanks so much for investigating this! We're using it on our platform, and would like to stay with Textile if possible. Do you have a fix for the issue already? If not, I think I can help with the RegExp, and then we could at least create a fork 🤔

korny · 2023-06-28T15:10:56Z

Changing the regex to:

    text.gsub!( /<(\/*)(?>[A-Za-z]\w*)([^>]*?)(\s?\/?)>/ ) do |m|

already speeds this up by 30x.

Edit: The version below doesn't work! Use this one instead: #73 (comment).

Allowing open tags (without a >) fixes the issue completely, but this might change the behavior in unexpected ways:

    text.gsub!( /<(\/*)(?>[A-Za-z]\w*)([^>]*?)(\s?\/?>?)/ ) do |m|
…
        "<#{raw[1]}#{pcs.join " "}#{raw[4]}"

merbinr · 2023-06-28T16:28:10Z

Hi @korny
Thank you for looking into it. I appreciate your thoughts on fixing this. The solution you provided above fixes the issue partially, but I would like to see if we can fix it completely,

 def clean_html( text, allowed_tags = BASIC_TAGS )
     text.gsub!( /<!\[CDATA\[/, '' )
     text.gsub!( /<(\/*)([A-Za-z]\w*)([^>]*?)(\s?\/?)>/ ) do |m|

The clean_html function is used to clean the not allowed tags. I'm not an expert in building regex. I guess this regex detects tags in a string/paragraph. So if you could build a regex without backtracking that detects the tags, we can completely fix this.

Could you please let me know your thoughts on above?

korny · 2023-06-28T22:29:42Z

After a bit of testing, I'm positive that my first version already fixes the problem for good. The use of Atomic Grouping effectively prevents backtracking in the second group (which matches the tag). The same can be achieved with a possessive quantifier:

    #                              v-- This "+" does the trick
    text.gsub!( /<(\/*)([A-Za-z]\w*+)([^>]*?)(\s?\/?)>/ ) do |m|

In this case, the \w*+ part (and therefore, the whole group ([A-Za-z]\w*+)) will not backtrack. The rest of this regex is not subject to ReDoS (it's linear). So, we can use this version; it should work in Ruby 1.9 and up.

By the way: Ruby 3.2 doesn't even need this patch because it uses some smart caching.

merbinr · 2023-07-02T06:54:13Z

Hi @korny,
I did some testing, and as you said above, it is fixing the ReDOS issue, But the output for the old regex(present one in the gem) and the new regex differ,

I gave below as input.

"<a href=https://example.com> Example </a>"

With the old regex, it returns the below output.

<a href="https://example.com"> Example </a>

But with the new regex, it returns the below output.

<a >href=https://example.com> Example </a>>

> character is added after the tag. Could you please look into it?

I have added attached below the test code I ran

BASIC_TAGS = {
    'a' => ['href', 'title'],
    'img' => ['src', 'alt', 'title'],
    'br' => [],
    'i' => nil,
    'u' => nil, 
    'b' => nil,
    'pre' => nil,
    'kbd' => nil,
    'code' => ['lang'],
    'cite' => nil,
    'strong' => nil,
    'em' => nil,
    'ins' => nil,
    'sup' => nil,
    'sub' => nil,
    'del' => nil,
    'table' => nil,
    'tr' => nil,
    'td' => ['colspan', 'rowspan'],
    'th' => nil,
    'ol' => ['start'],
    'ul' => nil,
    'li' => nil,
    'p' => nil,
    'h1' => nil,
    'h2' => nil,
    'h3' => nil,
    'h4' => nil,
    'h5' => nil,
    'h6' => nil, 
    'blockquote' => ['cite'],
    'notextile' => nil
}

def clean_html( text, regex, allowed_tags = BASIC_TAGS )
    text.gsub!( /<!\[CDATA\[/, '' )
    text.gsub!( regex ) do |m|
      raw = $~
      tag = raw[2].downcase
      if allowed_tags.has_key? tag
        pcs = [tag]
        allowed_tags[tag].each do |prop|
          ['"', "'", ''].each do |q|
            q2 = ( q != '' ? q : '\s' )
            if raw[3] =~ /#{prop}\s*=\s*#{q}([^#{q2}]+)#{q}/i
              attrv = $1
              next if (prop == 'src' or prop == 'href') and not attrv =~ %r{^(http|https|ftp):}
              pcs << "#{prop}=\"#{attrv.gsub('"', '\\"')}\""
              break
            end
          end
        end if allowed_tags[tag]
        "<#{raw[1]}#{pcs.join " "}#{raw[4]}>"
      else # Unauthorized tag
        if block_given?
          yield m
        else
          ''
        end
      end
    end
end



text = "<a href=https://example.com> Example </a>"
old_regex = /<(\/*)([A-Za-z]\w*)([^>]*?)(\s?\/?)>/
new_regex = /<(\/*)([A-Za-z]\w*+)([^>]*?)(\s?\/?>?)/
output = clean_html(text,old_regex)
puts "Output for the currently used regex: ",output
output = clean_html(text,new_regex)
puts "Output for the new proposed regex: ",output

korny · 2023-07-02T15:12:26Z

I'm sorry, it seems I had a copy/paste error in my comment. The only change I propose is the + in the tag name group. The change at the end (including the > in the last group) was nonsense. I updated the comments.

Here's the PR: https://github.com/jgarber/redcloth/pull/75/files.

Thanks for testing this!

merbinr · 2023-07-03T18:15:19Z

Hi @korny

It works fine and fixes the ReDOS issue,
Thank you for looking into this.

heliocola · 2023-11-02T01:02:16Z

Thank you @korny and @e23e .
I will try to add some tests to cover the scenarios above, merge #75, and release a new version to Rubygems!

heliocola · 2023-11-02T03:12:17Z

@korny : while working on adding tests to ensure the regexp will not take a long time, I've noticed your proposed regexp fails on Recheck playgound (https://makenowjust-labs.github.io/recheck/playground/).

Here is what I get:

However, on my development laptop, I was able to replicate the time spent between the old regexp and the one you provided in #75 .

Do you have any thoughts?

korny · 2023-11-02T10:59:24Z

Recheck can’t really read Ruby regexps. It doesn’t support all of the syntax that Ruby provides. Don’t use it to verify Ruby regexps.

heliocola · 2023-11-03T01:31:43Z

Thank you @korny .
I will release a new version to Rubygems soon! 🤞

heliocola · 2023-11-03T01:52:26Z

RedCloth v4.3.3 has been release to RubyGems! 🎉
THANK YOU @korny and @e23e !

korny mentioned this issue Jun 28, 2023

Fix CVE-2023-31606 (ReDOS possible in the sanitize_html function) #75

Merged

heliocola mentioned this issue Nov 2, 2023

Add tests for CVE-2023-31606 #80

Merged

heliocola closed this as completed Nov 3, 2023

CloCkWeRX mentioned this issue Feb 22, 2024

Address issue with incorrect open CVE-2019-10226 fatfreecrm/fat_free_crm#1235

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReDOS possible in the sanitize_html function #73

ReDOS possible in the sanitize_html function #73

merbinr commented Apr 12, 2023

carnil commented Jun 9, 2023

korny commented Jun 28, 2023

korny commented Jun 28, 2023 •

edited

Loading

merbinr commented Jun 28, 2023

korny commented Jun 28, 2023 •

edited

Loading

merbinr commented Jul 2, 2023

korny commented Jul 2, 2023 •

edited

Loading

merbinr commented Jul 3, 2023

heliocola commented Nov 2, 2023

heliocola commented Nov 2, 2023

korny commented Nov 2, 2023 •

edited

Loading

heliocola commented Nov 3, 2023

heliocola commented Nov 3, 2023

ReDOS possible in the sanitize_html function #73

ReDOS possible in the sanitize_html function #73

Comments

merbinr commented Apr 12, 2023

carnil commented Jun 9, 2023

korny commented Jun 28, 2023

korny commented Jun 28, 2023 • edited Loading

merbinr commented Jun 28, 2023

korny commented Jun 28, 2023 • edited Loading

merbinr commented Jul 2, 2023

korny commented Jul 2, 2023 • edited Loading

merbinr commented Jul 3, 2023

heliocola commented Nov 2, 2023

heliocola commented Nov 2, 2023

korny commented Nov 2, 2023 • edited Loading

heliocola commented Nov 3, 2023

heliocola commented Nov 3, 2023

korny commented Jun 28, 2023 •

edited

Loading

korny commented Jun 28, 2023 •

edited

Loading

korny commented Jul 2, 2023 •

edited

Loading

korny commented Nov 2, 2023 •

edited

Loading