Non-ASCII character escape sequence in string_content results in SyntaxError #360

plentz opened this Issue Oct 25, 2012 · 17 comments


plentz commented Oct 25, 2012

First of all, calm down guys, this is a pre-1.7 bug(?), I just forgot to report it. it happens when using delocalize gem.

this is the code that causes the error

  Time.parse_localized "201101011200"

the output is something like this:

  1) Order from_sync imports order from mobile data
     Failure/Error: Order.apply sync_data, :on => order
       invalid multibyte character: /Março/
     # ./app/models/xpto.rb:86:in `extract_datetime'

Probably the invalid multibyte character error is caused by a string contains the pt-br translation for march, 'março'

With MRI 1.9.x it works. Putting # encoding: utf-8 in the top of the file solves the problem.


Does the same issue manifest with 1.7.0? We are not maintaining 1.6.x.

plentz commented Oct 29, 2012

@BanzaiMan yup Hiro. it's still here @ 1.7.

headius commented Oct 30, 2012

With both 1.9.3 and JRuby I get "uninitialized constant Date::MONTHNAMES" when I try to do -rdelocalize -e "your code". Something's not right.

plentz commented Oct 30, 2012

@headius maybe because delocalize is intended to be used with rails. maybe this helps:

$ irb
1.9.3p286 :001 > Date::MONTHNAMES
NameError: uninitialized constant Date
    from (irb):1
    from /Users/plentz/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :002 > require 'date'
 => true 
1.9.3p286 :003 > Date::MONTHNAMES
\ => [nil, "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"] 
1.9.3p286 :004 >
headius commented Oct 31, 2012

I'm having trouble reproducing this. Do you think it's possible for you to add some logging to the suspect lines of code and reduce this to something we can run without all of delocalize? I tried to run their tests, but it depends on the sqlite C extension and other stuff...

plentz commented Dec 25, 2012

@headius I think I got it. It's really weird, but there we go(foo.rb):

# encoding: utf-8
"ééa".gsub!(/\bmar\ç\b/, "março")

and then...

$ ruby -v
ruby 1.9.3p327 (2012-11-10 revision 37606) [x86_64-darwin12.2.0]
$ ruby foo.rb
$ rvm jruby
$ ruby -v
jruby 1.7.1 (1.9.3p327) 2012-12-03 30a153b on Java HotSpot(TM) 64-Bit Server VM 1.6.0_37-b06-434-11M3909 [darwin-x86_64]
$ ruby foo.rb
SyntaxError: foo.rb:2: invalid multibyte char (UTF-8)

Looks like escaping a non-ASCII character with a backslash is problematic in a String or a Regexp.

$ jruby -v; jruby -e '"\ç"'
jruby (1.9.3p327) 2012-12-23 761e54c on Java HotSpot(TM) 64-Bit Server VM 1.7.0_10-b18 [darwin-x86_64]
SyntaxError: -e:2: invalid multibyte char (UTF-8)
$ jruby -e '/\ç/'
SyntaxError: -e:2: invalid multibyte char (UTF-8)

This does not happen in MRI (trunk or 1.9.3).

plentz commented Dec 27, 2012

@BanzaiMan Hiro, I was thinking now... I think we actually found another problem - not the same I posted in the issue - since it doesn't change the behaviour adding or removing # encoding: utf-8


@plentz I believe the problem is with the parser, and it happens no matter what the encoding. I edited the description of this ticket to reflect that. "string_content" refers to the parser token that I believe the error is happening.

enebo commented Jan 3, 2013

Fix on the way

@enebo enebo GH #360: Non-ASCII character escape sequence in string_content result…
…s in SyntaxError
enebo commented Jan 3, 2013

Fixed in commit f0a6484.

plentz commented Jan 9, 2013

I've just updated to jruby 1.7.2 and my fear was confirmed: we actually found another problem while trying to create a test case for the initial problem of this issue. I still have the error

       invalid multibyte character: /Março/

if I remove # encoding: utf-8 from some of my tests. What should I do? Create another issue?


@plentz Hmm. So you uncovered a separate issue (which @enebo fixed) during the course of this ticket. Perhaps somehow you are trying to create a Regexp with non-ASCII bytes. If this is indeed what's happening, isn't it the correct behavior? Without a reliable reproduction, this is a little hard to say definitively.

plentz commented Jan 9, 2013

@BanzaiMan yup, the problem that @enebo fixed isn't the same issue (he fixed the one we found while trying to isolate the original problem). About what's the correct behaviour, I really don't know, but it works with MRI(which I assume is "the correct behaviour").

$ ruby -v
ruby 1.9.3p327 (2012-11-10) [x86_64-darwin12.2.0]
$ rspec spec/models/sync/order_log_sync_spec.rb 
Finished in 8.71 seconds
1 examples, 0 failures
$ rvm jruby
$ ruby -v
jruby 1.7.2 (1.9.3p327) 2013-01-04 302c706 on Java HotSpot(TM) 64-Bit Server VM 1.6.0_37-b06-434-11M3909 [darwin-x86_64]
$ rspec spec/models/sync/order_log_sync_spec.rb


  1) OrderLog#import 
     Failure/Error: sync_result[:log_timestamp] = Time.parse_localized "#{date}#{time}"
       invalid multibyte character: /Março/
     # ./spec/models/sync/order_log_sync_spec.rb:39:in `(root)'
     # ./spec/models/sync/order_log_sync_spec.rb:42:in `(root)'

Finished in 13.67 seconds
1 examples, 1 failures

Failed examples:

rspec ./spec/models/sync/order_log_sync_spec.rb:42 # OrderLog#import 
enebo commented Jan 9, 2013

Create another issue please. My last fix did not correct this and any fix we do make will land for 1.7.3; so it would be nice to have another issue to show progress.

plentz commented Jan 9, 2013

Done. Thanks @enebo

