CSV parse fails when string with mutibyte character terminates with CR-LF #1222

Closed
crawlik opened this Issue Nov 13, 2013 · 2 comments

Projects

None yet

3 participants

@crawlik
crawlik commented Nov 13, 2013
jruby -v -e 'require "csv"; CSV.parse("®\r\n")'
jruby 1.7.6 (1.9.3p392) 2013-11-11 fffffff on Java HotSpot(TM) 64-Bit Server VM 1.7.0_21-b11 [linux-amd64]
RubyStringIO.java:688:in `bm_search': java.lang.ArrayIndexOutOfBoundsException: -82
    from RubyStringIO.java:644:in `internalGets19'
    from RubyStringIO.java:798:in `getsOnly'
    from RubyStringIO.java:788:in `gets19'
    from RubyStringIO$INVOKER$i$0$2$gets19.gen:-1:in `call'
    from JavaMethod.java:665:in `call'
    from DynamicMethod.java:206:in `call'
    from CachingCallSite.java:326:in `cacheAndCall'
    from CachingCallSite.java:170:in `call'
    from CallOneArgNode.java:57:in `interpret'
    from DAsgnNode.java:110:in `interpret'
    from IfNode.java:110:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from BlockNode.java:71:in `interpret'
    from ASTInterpreter.java:112:in `INTERPRET_BLOCK'
    from Interpreted19Block.java:206:in `evalBlockBody'
    from Interpreted19Block.java:157:in `yield'
    from Interpreted19Block.java:130:in `yieldSpecific'
    from Block.java:111:in `yieldSpecific'
    from RubyKernel.java:1517:in `loop'
    from RubyKernel$INVOKER$s$0$0$loop.gen:-1:in `call'
    from CachingCallSite.java:316:in `cacheAndCall'
    from CachingCallSite.java:145:in `callBlock'
    from CachingCallSite.java:154:in `callIter'
    from FCallNoArgBlockNode.java:32:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from BlockNode.java:71:in `interpret'
    from ASTInterpreter.java:74:in `INTERPRET_METHOD'
    from InterpretedMethod.java:139:in `call'
    from DefaultMethod.java:182:in `call'
    from CachingCallSite.java:306:in `cacheAndCall'
    from CachingCallSite.java:136:in `call'
    from VCallNode.java:88:in `interpret'
    from LocalAsgnNode.java:123:in `interpret'
    from WhileNode.java:127:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from IfNode.java:116:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from ASTInterpreter.java:74:in `INTERPRET_METHOD'
    from InterpretedMethod.java:161:in `call'
    from DefaultMethod.java:190:in `call'
    from RubyClass.java:527:in `finvoke'
    from Helpers.java:479:in `invoke'
    from RubyEnumerable.java:97:in `callEach'
    from RubyEnumerable.java:392:in `to_a19'
    from RubyEnumerable$INVOKER$s$to_a19.gen:-1:in `call'
    from CachingCallSite.java:306:in `cacheAndCall'
    from CachingCallSite.java:136:in `call'
    from VCallNode.java:88:in `interpret'
    from LocalAsgnNode.java:123:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from BlockNode.java:71:in `interpret'
    from ASTInterpreter.java:74:in `INTERPRET_METHOD'
    from InterpretedMethod.java:139:in `call'
    from DefaultMethod.java:182:in `call'
    from CachingCallSite.java:306:in `cacheAndCall'
    from CachingCallSite.java:136:in `call'
    from CallNoArgNode.java:60:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from EnsureNode.java:96:in `interpret'
    from BeginNode.java:83:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from IfNode.java:116:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from BlockNode.java:71:in `interpret'
    from ASTInterpreter.java:74:in `INTERPRET_METHOD'
    from InterpretedMethod.java:182:in `call'
    from DefaultMethod.java:198:in `call'
    from CachingCallSite.java:326:in `cacheAndCall'
    from CachingCallSite.java:170:in `call'
    from -e:1:in `__file__'
    from -e:-1:in `load'
    from Ruby.java:810:in `runScript'
    from Ruby.java:803:in `runScript'
    from Ruby.java:672:in `runNormally'
    from Ruby.java:521:in `runFromMain'
    from Main.java:395:in `doRunFromMain'
    from Main.java:290:in `internalRun'
    from Main.java:217:in `run'
    from Main.java:197:in `main'

Removing ether CR or LF fixes the problem

$ jruby -v -e 'require "csv"; p CSV.parse("®\n")'
jruby 1.7.6 (1.9.3p392) 2013-11-11 fffffff on Java HotSpot(TM) 64-Bit Server VM 1.7.0_21-b11 [linux-amd64]
[["®"]]
jruby -v -e 'require "csv"; p CSV.parse("®\r")'
jruby 1.7.6 (1.9.3p392) 2013-11-11 fffffff on Java HotSpot(TM) 64-Bit Server VM 1.7.0_21-b11 [linux-amd64]
[["®"]]

Using ASCII character in the string doesn't fail parsing as well

jruby -v -e 'require "csv"; p CSV.parse("1\r\n")'
jruby 1.7.6 (1.9.3p392) 2013-11-11 fffffff on Java HotSpot(TM) 64-Bit Server VM 1.7.0_21-b11 [linux-amd64]
[["1"]]

BTW, compare it with MRI ruby. It works there.

ruby -v -e 'require "csv"; p CSV.parse("¢\r\n")'
ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-linux]
[["¢"]]
@headius
Member
headius commented Nov 13, 2013

Nice find...we must be counting the non-ascii character's bytes wrong and walking off the end of some array.

@dragonsinth

It's a signed/unsigned bug in bm_search.

Original:

i += skip[big[i + bstart]];

Should be:

i += skip[big[i + bstart] & 0xFF];

big is an array of signed bytes (-128 to 127), using the value directly as the index into skip causes underflow.

@enebo enebo closed this in 1ab5c65 Dec 4, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment