Non-ASCII character escape sequence in string_content results in SyntaxError #360

Closed
plentz opened this Issue Oct 25, 2012 · 17 comments

Projects

None yet

4 participants

@plentz
plentz commented Oct 25, 2012

First of all, calm down guys, this is a pre-1.7 bug(?), I just forgot to report it. it happens when using delocalize gem.

this is the code that causes the error

  Time.parse_localized "201101011200"

the output is something like this:

  1) Order from_sync imports order from mobile data
     Failure/Error: Order.apply sync_data, :on => order
     RegexpError:
       invalid multibyte character: /Março/
     # ./app/models/xpto.rb:86:in `extract_datetime'

Probably the invalid multibyte character error is caused by a string contains the pt-br translation for march, 'março'

With MRI 1.9.x it works. Putting # encoding: utf-8 in the top of the file solves the problem.

@BanzaiMan
Member

Does the same issue manifest with 1.7.0? We are not maintaining 1.6.x.

@plentz
plentz commented Oct 29, 2012

@BanzaiMan yup Hiro. it's still here @ 1.7.

@headius
Member
headius commented Oct 30, 2012

With both 1.9.3 and JRuby I get "uninitialized constant Date::MONTHNAMES" when I try to do -rdelocalize -e "your code". Something's not right.

@plentz
plentz commented Oct 30, 2012

@headius maybe because delocalize is intended to be used with rails. maybe this helps:

$ irb
1.9.3p286 :001 > Date::MONTHNAMES
NameError: uninitialized constant Date
    from (irb):1
    from /Users/plentz/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :002 > require 'date'
 => true 
1.9.3p286 :003 > Date::MONTHNAMES
\ => [nil, "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"] 
1.9.3p286 :004 >
@headius
Member
headius commented Oct 31, 2012

I'm having trouble reproducing this. Do you think it's possible for you to add some logging to the suspect lines of code and reduce this to something we can run without all of delocalize? I tried to run their tests, but it depends on the sqlite C extension and other stuff...

@plentz
plentz commented Dec 25, 2012

@headius I think I got it. It's really weird, but there we go(foo.rb):

# encoding: utf-8
"ééa".gsub!(/\bmar\ç\b/, "março")

and then...

$ ruby -v
ruby 1.9.3p327 (2012-11-10 revision 37606) [x86_64-darwin12.2.0]
$ ruby foo.rb
$ rvm jruby
$ ruby -v
jruby 1.7.1 (1.9.3p327) 2012-12-03 30a153b on Java HotSpot(TM) 64-Bit Server VM 1.6.0_37-b06-434-11M3909 [darwin-x86_64]
$ ruby foo.rb
SyntaxError: foo.rb:2: invalid multibyte char (UTF-8)
@BanzaiMan
Member

Looks like escaping a non-ASCII character with a backslash is problematic in a String or a Regexp.

$ jruby -v; jruby -e '"\ç"'
jruby 1.7.2.dev (1.9.3p327) 2012-12-23 761e54c on Java HotSpot(TM) 64-Bit Server VM 1.7.0_10-b18 [darwin-x86_64]
SyntaxError: -e:2: invalid multibyte char (UTF-8)
$ jruby -e '/\ç/'
SyntaxError: -e:2: invalid multibyte char (UTF-8)

This does not happen in MRI (trunk or 1.9.3).

@plentz
plentz commented Dec 27, 2012

@BanzaiMan Hiro, I was thinking now... I think we actually found another problem - not the same I posted in the issue - since it doesn't change the behaviour adding or removing # encoding: utf-8

@BanzaiMan
Member

@plentz I believe the problem is with the parser, and it happens no matter what the encoding. I edited the description of this ticket to reflect that. "string_content" refers to the parser token that I believe the error is happening.

@plentz plentz added a commit to plentz/jruby_report that referenced this issue Dec 28, 2012
@plentz plentz adding test for jruby/jruby#360 b38cce2
@enebo enebo was assigned Dec 29, 2012
@enebo
Member
enebo commented Jan 3, 2013

Fix on the way

@enebo enebo added a commit that referenced this issue Jan 3, 2013
@enebo enebo GH #360: Non-ASCII character escape sequence in string_content result…
…s in SyntaxError
f0a6484
@enebo
Member
enebo commented Jan 3, 2013

Fixed in commit f0a6484.

@enebo enebo closed this Jan 3, 2013
@plentz
plentz commented Jan 9, 2013

I've just updated to jruby 1.7.2 and my fear was confirmed: we actually found another problem while trying to create a test case for the initial problem of this issue. I still have the error

RegexpError:
       invalid multibyte character: /Março/

if I remove # encoding: utf-8 from some of my tests. What should I do? Create another issue?

@BanzaiMan
Member

@plentz Hmm. So you uncovered a separate issue (which @enebo fixed) during the course of this ticket. Perhaps somehow you are trying to create a Regexp with non-ASCII bytes. If this is indeed what's happening, isn't it the correct behavior? Without a reliable reproduction, this is a little hard to say definitively.

@plentz
plentz commented Jan 9, 2013

@BanzaiMan yup, the problem that @enebo fixed isn't the same issue (he fixed the one we found while trying to isolate the original problem). About what's the correct behaviour, I really don't know, but it works with MRI(which I assume is "the correct behaviour").

$ ruby -v
ruby 1.9.3p327 (2012-11-10) [x86_64-darwin12.2.0]
$ rspec spec/models/sync/order_log_sync_spec.rb 
.
Finished in 8.71 seconds
1 examples, 0 failures
$ rvm jruby
$ ruby -v
jruby 1.7.2 (1.9.3p327) 2013-01-04 302c706 on Java HotSpot(TM) 64-Bit Server VM 1.6.0_37-b06-434-11M3909 [darwin-x86_64]
$ rspec spec/models/sync/order_log_sync_spec.rb
F

Failures:

  1) OrderLog#import 
     Failure/Error: sync_result[:log_timestamp] = Time.parse_localized "#{date}#{time}"
     RegexpError:
       invalid multibyte character: /Março/
     # ./spec/models/sync/order_log_sync_spec.rb:39:in `(root)'
     # ./spec/models/sync/order_log_sync_spec.rb:42:in `(root)'

Finished in 13.67 seconds
1 examples, 1 failures

Failed examples:

rspec ./spec/models/sync/order_log_sync_spec.rb:42 # OrderLog#import 
@enebo
Member
enebo commented Jan 9, 2013

Create another issue please. My last fix did not correct this and any fix we do make will land for 1.7.3; so it would be nice to have another issue to show progress.

@plentz
plentz commented Jan 9, 2013

Done. Thanks @enebo

@eregon eregon added a commit that referenced this issue Nov 27, 2016
@eregon eregon Squashed 'spec/ruby/' changes from a2e5952..852254a
852254a Fixed setting ThreadGroup for signal handler Thread.
0efde43 Fix flock spec: it only applies on Solaris and needs some setup
29b472c Split an example to get guards outside the example
4859bbb [Truffle] Compile c specs with extconf make
621ce1b spawn.rb: Pass PATH variable to started process.
7727dac setuid_spec.rb: Set execute bit before setting setuid bit on Solaris.
78a81b4 Handle no supplementary groups in grpowned? specs
8901448 Solaris returns -1 on rdev on ordinary files; update test to use /dev/zfs on Solaris.
30ae2c3 Fix UDPSocket.new spec, Solaris reports a different error on unknown protocol family.
8b352c3 Move Solaris File#flock specs in a different describe
9cd374e [Truffle] Show command if compilation fails.
4fc62b4 Improve Thread::Backtrace::Location#label specs
6fb234b Add spec for remove_method against cloned singleton.
9755210 Add spec for replacing singleton method in a cloned singleton.
327594e Add specs for UNIXSocket#inspect
5e79514 Add spec for IO#inspect's owner. See jruby/jruby#4262
51f80aa /regexp/.source should not display escape characters
7bda8e3 Require squiggly_heredoc only in examples so they can easily be tagged and the file be loaded
389f951 use SEPARATOR and ALT_SEPARATOR in File.basename
c594305 Fix File.dirname on Windows (including UNC)
5a15ff7 fix File.dirname with backslashes in unix
4ba6827 No TypeError on dup (#360)
3f3b283 Home directory is taken from system (#359)
913edd4 Merge pull request #357 from eregon/share_delete
85b1e0e Add spec for File::SHARE_DELETE
b61fdb5 Make sure all spawn specs with minimal environment have a successful exit status
71152a1 Always disable rubygems for spawn specs with minimal environment
b182988 Fix spec to use absolute path in unlink specs
5c33417 Use paths inside the temp directory for the unlink spec
3a1b57c Restrict testing with Ruby 2.3 64-bit on AppVeyor
772254d Add spec for File::TMPFILE
e1ea963 Add spec for define_method and define_singleton_method requiring an explicit block
66daf29 Merge pull request #355 from ruby/shugo/fix-unsetenv_others_on_icc
74b4241 Add --disable-gems if CC is icc.
be0bf17 Merge pull request #354 from nobu/fix-complex_equal_value
43b176b The result of `==` should be truthy or falsy
22ee32d The result of `==` should be truthy or falsy
2aea831 Try fixing Process.groups by ignoring the primary group id
97f987f Fix Process#groups spec
27960d0 Merge pull request #352 from ruby/revert-351-did_you_mean_spec
dabb1fe Revert "DidYouMean specs for ruby 2.3"
0c53f4c Fix fixture path on windows
b6ddea7 Add debug_info.rb fixture
a0e55db Add spec to test frozen string debug generation
0391aea Add String#+@ in conjunction  with freeze-magic-comment spec
a6edbbf Merge pull request #351 from mjago/did_you_mean_spec
2d79878 Fix Ruby Version
e075552 DidYouMean specs for ruby 2.3
3ce9295 Merge pull request #350 from mjago/aix_invalid_syslog_constants
9757670 As with solaris avoid testing aix unknown syslog constants
d152cba Fix Ruby version.
7783c4c Module#include and Module#prepend don't accept no-arguments in 2.4.
5eb8b40 Require round-to-nearest-even behavior for Ruby 2.4+ (#322)
39f89d8 Merge pull request #348 from nobu/bug/fork-feature-fix
45efa58 Check fork feature by respond_to?
fe4673a Merge pull request #347 from mjago/fix_remaining_encodings
b53a488 Fix encoding in optional/capi/string_spec.rb
abfba52 Fix encoding in optional/capi/encoding_spec.rb
c43ea57 Fix encoding in library/zlib/inflate/set_dictionary_spec.rb
1c54447 Fix encoding in library/socket/socket/gethostbyname_spec.rb
8113f43 Fix encoding in library/openssl/shared/constants.rb
1551379 Fix encoding in language/string_spec.rb
c5d6ec9 Fix encoding in language/regexp/escapes_spec.rb
9631f02 Fix encoding in language/regexp/encoding_spec.rb
8dfae6c Fix encoding in core/time/_load_spec.rb
c3f1719 Fix encoding in core/time/_dump_spec.rb
99af163 Fix encoding in core/symbol/casecmp_spec.rb
854f7b2 Fix encoding in core/string/valid_encoding_spec.rb
182b382 Fix encoding in core/string/unicode_normalize_spec.rb
738584e Fix encoding in core/string/squeeze_spec.rb
b12b094 Fix encoding in core/string/slice_spec.rb
cd5ced3 Fix encoding in core/string/shared/succ.rb
bf3d210 Fix encoding in core/string/shared/eql.rb
1ddfd79 Fix encoding in core/string/shared/encode.rb
990e4bc Fix encoding in core/string/shared/each_codepoint_without_block.rb

git-subtree-dir: spec/ruby
git-subtree-split: 852254a3371e3ce3e3007c49d0c23cf615e0409b
d454687
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment