Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warning: regexp match /.../n against to UTF-8 string #3927

Closed
dmarczal opened this issue Dec 10, 2011 · 33 comments
Closed

warning: regexp match /.../n against to UTF-8 string #3927

dmarczal opened this issue Dec 10, 2011 · 33 comments

Comments

@dmarczal
Copy link

I updated the rails to 3.1.3 and I got the follow warning:

bundler/gems/rails-d06c3b3cd22c/activesupport/lib/active_support/core_ext/string/output_safety.rb:23: warning: regexp match /.../n against to UTF-8 string

To not show this warning I changed the output_safety.rb line 23

from
s.gsub(/[&"><]/n) { |special| HTML_ESCAPE[special] }.html_safe
to
s.gsub(/[&"><]/) { |special| HTML_ESCAPE[special] }.html_safe

removing the ''n'' option of the Regex.

This is a problem with output_safety? Is there another way to fix this warning?

@kennyj
Copy link
Contributor

kennyj commented Dec 10, 2011

If we use n option for regex, I think that a string should be binary.
I'll test a little more.

@kennyj
Copy link
Contributor

kennyj commented Dec 10, 2011

relate to 0e17cf1

@kennyj
Copy link
Contributor

kennyj commented Dec 10, 2011

I fixed it. but It seems that my fix is not so cool.
I think that we have 3 plan.

1: silence_warning
2: my fix
3: revert 0e17cf1 (but will we have security problem?)

What do you think?

/cc @josevalim

@dmarczal
Copy link
Author

I don't now, but if there is a problem with security the first option is better. But There is any problem with silence the warning?

@kennyj
Copy link
Contributor

kennyj commented Dec 11, 2011

I tested a performance for each fix.
$ vim lib/foo.rb

# encoding: utf-8
require 'active_support/core_ext/string/output_safety'
require 'benchmark'
include ERB::Util

def html_escape1(s)
  s = s.to_s
  if s.html_safe?
    s
  else
    s = s.gsub(/[&"><]/n) { |special| HTML_ESCAPE[special] }.html_safe
  end
end

def html_escape2(s)
  silence_warnings {
    s = s.to_s
    if s.html_safe?
      s
    else
      s = s.gsub(/[&"><]/n) { |special| HTML_ESCAPE[special] }.html_safe
    end
  }
end

def runner(title, method, string)
  GC.start
  puts title
  puts Benchmark.measure {
    10000.times do
      send(method, string)
    end
  }
end

string = '1&2"<3>あ' * 10
string.freeze

puts Benchmark::CAPTION

runner "original",        :html_escape1, string
runner "silence warning", :html_escape2, string
runner "my fix",          :html_escape , string

II got the following test results.

$ bundle exec rails runner 'require "foo"' 2> /dev/null
      user     system      total        real
original
  1.230000   0.100000   1.330000 (  1.333781)
silence warning
  0.680000   0.000000   0.680000 (  0.679740)
my fix
  0.970000   0.000000   0.970000 (  0.964449)

It seems that silence_warnings is fastest.
I'll fix it, and send PR.

kennyj added a commit to kennyj/rails that referenced this issue Dec 11, 2011
@kennyj
Copy link
Contributor

kennyj commented Dec 11, 2011

I send PR
#3937

@jenseng
Copy link
Contributor

jenseng commented Dec 11, 2011

@dmarczal, @kennyj: i think we can change the regexp to /&|"|>|</ ... that should get rid of the warning, while still protecting against XSS (and keeping it fast)

@jenseng
Copy link
Contributor

jenseng commented Dec 11, 2011

never mind, that doesn't actually match invalid utf-8 bytes :(

@jenseng
Copy link
Contributor

jenseng commented Dec 11, 2011

interesting ... so /&|"|>|</ actually does work in 1.9.2 (no XSS), and it doesn't issue any warnings. it does not work in 1.8.7, but the current regexp does (and doesn't issue warnings)

so an alternative solution would be to use different regexes for 1.8.x and 1.9.x

@josevalim
Copy link
Contributor

+1 for different solutions for 1.8 and 1.9.x. Could someone please submit a pull request? Tks.

@kennyj
Copy link
Contributor

kennyj commented Dec 12, 2011

@josevalim

@jeremy commited 63cd943 today.
It seem that the above commit remove the place that we should fix :-)

@jeremy
Copy link
Member

jeremy commented Dec 12, 2011

Aha! Thanks for looking into this :)

@jeremy jeremy closed this as completed Dec 12, 2011
@rurounijones
Copy link

I have just run across this issue in Rails 3.2.0 while using jruby in 1.9 mode (Doesn't occur in 1.8 mode)

Do you guys know if this is something that should be fixed in rails or jruby? so I know where to raise a ticket.

@bundacia
Copy link

bundacia commented Mar 2, 2012

Will this fix be ported back to rails 3.1? I'm hitting the same issue there with ruby 1.9.

@agibralter
Copy link
Contributor

Same here.

@agibralter
Copy link
Contributor

Yeah I get this when upgrading to Rails 3.1.4 from 3.1.3 with Ruby 1.9.3:

v3.1.3...v3.1.4#L73L23

@kei-s
Copy link
Contributor

kei-s commented Mar 7, 2012

+1 for backport to 3.1.

@kennyj
Copy link
Contributor

kennyj commented Mar 7, 2012

Anyway, I send PR. (cherry-picked @jeremy's commit)
#5312

@rurounijones
Copy link

Have upgraded to ActiveSupport 3.2.3 and this error is still occurring for me at least:

11:43:52,556 ERROR [stderr] (http--127.0.0.1-8080-1) /home/jjones/.rvm/gems/jruby-1.6.7@gemini/gems/activesupport-3.2.3/lib/active_support/core_ext/string/output_safety.rb:34 warning: regexp match /.../n against to UTF-8 string

@kennyj
Copy link
Contributor

kennyj commented May 24, 2012

@rurounijones Are you using ruby 1.8 mode (or 1.9 mode) ?

@rurounijones
Copy link

Ah sorry, as in my previous comment, 1.8 mode works without issue, 1.9 mode causes this issue to appear.

@kennyj
Copy link
Contributor

kennyj commented May 24, 2012

You used 1.9 mode, but according to your stacktrace 1.8 mode was executed.
See also https://github.com/rails/rails/blob/3-2-stable/activesupport/lib/active_support/core_ext/string/output_safety.rb

I tested it. It's strangeness...

cruby 1.9.3

$ rvm use 1.9.3
Using /home/kennyj/.rvm/gems/ruby-1.9.3-p194
$ irb
1.9.3p194 :001 > '"&gt;&lt;&amp;&quot;"' == ('><&"'.encode('utf-8', :xml => :attr) rescue false)
 => true ★
1.9.3p194 :002 > '><&"'.encode('utf-8', :xml => :attr)
 => "\"&gt;&lt;&amp;&quot;\""

jruby 1.6.7 (1.9 mode)

$ rvm use jruby
Using /home/kennyj/.rvm/gems/jruby-1.6.7
$ ruby -v
jruby 1.6.7 (ruby-1.9.2-p312) (2012-02-22 3e82bc8) (OpenJDK 64-Bit Server VM 1.6.0_22) [linux-amd64-java]
$ irb
jruby-1.6.7 :001 > '"&gt;&lt;&amp;&quot;"' == ('><&"'.encode('utf-8', :xml => :attr) rescue false)
 => false ★
jruby-1.6.7 :002 > '><&"'.encode('utf-8', :xml => :attr)
 => "><&\""

FYI: jruby-1.7.0.preview1

jruby-1.7.0.preview1 :001 > '"&gt;&lt;&amp;&quot;"' == ('><&"'.encode('utf-8', :xml => :attr) rescue false)
 => false ★
jruby-1.7.0.preview1 :002 > '><&"'.encode('utf-8', :xml => :attr)
 => "><&\""

jruby's problem ??

cc/ @jeremy

@kennyj
Copy link
Contributor

kennyj commented May 24, 2012

This is also major issue for master, because master don't support 1.8 mode code ;-)

@kennyj kennyj reopened this May 24, 2012
@jeremy
Copy link
Member

jeremy commented May 24, 2012

Looks like jruby doesn't match cruby there!

See the :xml option at http://www.ruby-doc.org/core-1.9.3/String.html#method-i-encode

@jeremy jeremy closed this as completed May 24, 2012
@rurounijones
Copy link

I forgot to mention that the JRuby ticket for this issue can be found here: https://jira.codehaus.org/browse/JRUBY-6723

@betelgeuse
Copy link

Just to clarify I run mri 1.9.3 so this ticket is not jruby specific.

On 15.8.2012, at 7.19, Jeffrey Jones notifications@github.com wrote:

I forgot to mention that the JRuby ticket for this issue can be found here: https://jira.codehaus.org/browse/JRUBY-6723


Reply to this email directly or view it on GitHub.

@betelgeuse
Copy link

I thought I was responding to #7323. Sorry about the noise.

On 15.8.2012, at 7.19, Jeffrey Jones notifications@github.com wrote:

I forgot to mention that the JRuby ticket for this issue can be found here: https://jira.codehaus.org/browse/JRUBY-6723


Reply to this email directly or view it on GitHub.

@astathopoulos
Copy link

The change

from:
s.to_s.gsub(/&/, "&").gsub(/"/, """).gsub(/>/, ">").gsub(/</, "<").html_safe

to:
s.gsub(/[&"'><]/n) { |special| HTML_ESCAPE[special] }.html_safe

in Rails 3.0.17 introduced the same warning problem when using ruby 1.9 which is solved at the latest 3-1 stable.

Please backport the fix to 3.0

@steveklabnik
Copy link
Member

Please backport the fix to 3.0

Bugfixes do not get backported to anything but 3.2. 3.0 and 3.1 are security fix only nowadays.

@vanderhoorn
Copy link
Contributor

@steveklabnik your argument is totally valid, if a point release doesn't introduce a problem. If a point release does introduce a problem (i.e. regression), your argument is wrong. Regression introduced in point releases should always be fixed.

@rafaelfranca
Copy link
Member

I don't know if you guys saw but I fixed this issue in the 3-0-stable branch

@vanderhoorn
Copy link
Contributor

Ahh, I see. Many thanks!

@pedroadame
Copy link
Contributor

pedroadame commented Jun 16, 2016

I'm getting this warning in Rails 4.2.6 with MRI 2.3.0

config/initializers/string_replace_rare_chars.rb
    class String
      def replace_rare_chars
        self.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.to_s
      end
    end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests