New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser seems to choke on Japanese-encoded text #3679

Closed
headius opened this Issue Feb 18, 2016 · 3 comments

Comments

Projects
None yet
2 participants
@headius
Member

headius commented Feb 18, 2016

Environment

JRuby on ruby-2.3 branch

Expected Behavior

The following script should parse and execute:

eval "C\u{30a8 30e9 30fc} = 1".encode("EUC-JP")

Actual Behavior

The parser gets an error from the jcoding library. This could mean the incoming text is improperly encoded, but that seems less likely than some issue in the parser not walking characters correctly.

$ jruby -e 'eval "C\u{30a8 30e9 30fc} = 1".encode("EUC-JP")'
Unhandled Java exception: org.jcodings.exception.EncodingException: invalid code point value
org.jcodings.exception.EncodingException: invalid code point value
      codeToMbcLength at org/jcodings/specific/BaseEUCJPEncoding.java:57
      codeToMbcLength at org/jcodings/specific/EUCJPEncoding.java:24
      isMultiByteChar at org/jruby/lexer/LexingCommon.java:253
     isIdentifierChar at org/jruby/lexer/LexingCommon.java:243
           identifier at org/jruby/lexer/yacc/RubyLexer.java:1446
                yylex at org/jruby/lexer/yacc/RubyLexer.java:1048
            nextToken at org/jruby/lexer/yacc/RubyLexer.java:336
              yyparse at org/jruby/parser/RubyParser.java:1618
              yyparse at org/jruby/parser/RubyParser.java:1569
                parse at org/jruby/parser/RubyParser.java:5359
                parse at org/jruby/parser/Parser.java:121
                parse at org/jruby/parser/Parser.java:77
            parseEval at org/jruby/Ruby.java:2793
            prepareIC at org/jruby/ir/interpreter/Interpreter.java:213
           evalCommon at org/jruby/ir/interpreter/Interpreter.java:168
      evalWithBinding at org/jruby/ir/interpreter/Interpreter.java:202
           evalCommon at org/jruby/RubyKernel.java:1006
               eval19 at org/jruby/RubyKernel.java:973

A different but possibly related issue occurs with ISO-2022-JP:

$ jruby -e 'eval "class C\u{30a8 30e9 30fc} < RuntimeError; self; end".encode("ISO-2022-JP")'
SyntaxError: (eval):1: Invalid char `\33' (') in expression
class CB%(%i!< < RuntimeError; self; end
   eval at org/jruby/RubyKernel.java:973
  <top> at -e:1

This affects the Ruby 2.3 test TestException#test_errinfo_encoding_in_debug.

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Mar 1, 2016

Member

This was fixed as part of ruby-2.3 branch and I am too lazy to figure out which commit (parser had many many changes recently).

Member

enebo commented Mar 1, 2016

This was fixed as part of ruby-2.3 branch and I am too lazy to figure out which commit (parser had many many changes recently).

@enebo enebo closed this Mar 1, 2016

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Mar 3, 2016

Member

I attempted to untag the specs I tagged, but there's other issues that prevent them from passing (exception messages only supporting unicode, for one).

Member

headius commented Mar 3, 2016

I attempted to untag the specs I tagged, but there's other issues that prevent them from passing (exception messages only supporting unicode, for one).

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Mar 3, 2016

Member

@headius ok. So things parse but we likely snag when we need proper Java string later

Member

enebo commented Mar 3, 2016

@headius ok. So things parse but we likely snag when we need proper Java string later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment