Skip to content
This repository

CSV parse fails when string with mutibyte character terminates with CR-LF #1222

Closed
crawlik opened this Issue November 13, 2013 · 2 comments

3 participants

Alex Vinnik Charles Oliver Nutter Scott Blum
Alex Vinnik
jruby -v -e 'require "csv"; CSV.parse("®\r\n")'
jruby 1.7.6 (1.9.3p392) 2013-11-11 fffffff on Java HotSpot(TM) 64-Bit Server VM 1.7.0_21-b11 [linux-amd64]
RubyStringIO.java:688:in `bm_search': java.lang.ArrayIndexOutOfBoundsException: -82
    from RubyStringIO.java:644:in `internalGets19'
    from RubyStringIO.java:798:in `getsOnly'
    from RubyStringIO.java:788:in `gets19'
    from RubyStringIO$INVOKER$i$0$2$gets19.gen:-1:in `call'
    from JavaMethod.java:665:in `call'
    from DynamicMethod.java:206:in `call'
    from CachingCallSite.java:326:in `cacheAndCall'
    from CachingCallSite.java:170:in `call'
    from CallOneArgNode.java:57:in `interpret'
    from DAsgnNode.java:110:in `interpret'
    from IfNode.java:110:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from BlockNode.java:71:in `interpret'
    from ASTInterpreter.java:112:in `INTERPRET_BLOCK'
    from Interpreted19Block.java:206:in `evalBlockBody'
    from Interpreted19Block.java:157:in `yield'
    from Interpreted19Block.java:130:in `yieldSpecific'
    from Block.java:111:in `yieldSpecific'
    from RubyKernel.java:1517:in `loop'
    from RubyKernel$INVOKER$s$0$0$loop.gen:-1:in `call'
    from CachingCallSite.java:316:in `cacheAndCall'
    from CachingCallSite.java:145:in `callBlock'
    from CachingCallSite.java:154:in `callIter'
    from FCallNoArgBlockNode.java:32:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from BlockNode.java:71:in `interpret'
    from ASTInterpreter.java:74:in `INTERPRET_METHOD'
    from InterpretedMethod.java:139:in `call'
    from DefaultMethod.java:182:in `call'
    from CachingCallSite.java:306:in `cacheAndCall'
    from CachingCallSite.java:136:in `call'
    from VCallNode.java:88:in `interpret'
    from LocalAsgnNode.java:123:in `interpret'
    from WhileNode.java:127:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from IfNode.java:116:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from ASTInterpreter.java:74:in `INTERPRET_METHOD'
    from InterpretedMethod.java:161:in `call'
    from DefaultMethod.java:190:in `call'
    from RubyClass.java:527:in `finvoke'
    from Helpers.java:479:in `invoke'
    from RubyEnumerable.java:97:in `callEach'
    from RubyEnumerable.java:392:in `to_a19'
    from RubyEnumerable$INVOKER$s$to_a19.gen:-1:in `call'
    from CachingCallSite.java:306:in `cacheAndCall'
    from CachingCallSite.java:136:in `call'
    from VCallNode.java:88:in `interpret'
    from LocalAsgnNode.java:123:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from BlockNode.java:71:in `interpret'
    from ASTInterpreter.java:74:in `INTERPRET_METHOD'
    from InterpretedMethod.java:139:in `call'
    from DefaultMethod.java:182:in `call'
    from CachingCallSite.java:306:in `cacheAndCall'
    from CachingCallSite.java:136:in `call'
    from CallNoArgNode.java:60:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from EnsureNode.java:96:in `interpret'
    from BeginNode.java:83:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from IfNode.java:116:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from BlockNode.java:71:in `interpret'
    from ASTInterpreter.java:74:in `INTERPRET_METHOD'
    from InterpretedMethod.java:182:in `call'
    from DefaultMethod.java:198:in `call'
    from CachingCallSite.java:326:in `cacheAndCall'
    from CachingCallSite.java:170:in `call'
    from -e:1:in `__file__'
    from -e:-1:in `load'
    from Ruby.java:810:in `runScript'
    from Ruby.java:803:in `runScript'
    from Ruby.java:672:in `runNormally'
    from Ruby.java:521:in `runFromMain'
    from Main.java:395:in `doRunFromMain'
    from Main.java:290:in `internalRun'
    from Main.java:217:in `run'
    from Main.java:197:in `main'

Removing ether CR or LF fixes the problem

$ jruby -v -e 'require "csv"; p CSV.parse("®\n")'
jruby 1.7.6 (1.9.3p392) 2013-11-11 fffffff on Java HotSpot(TM) 64-Bit Server VM 1.7.0_21-b11 [linux-amd64]
[["®"]]
jruby -v -e 'require "csv"; p CSV.parse("®\r")'
jruby 1.7.6 (1.9.3p392) 2013-11-11 fffffff on Java HotSpot(TM) 64-Bit Server VM 1.7.0_21-b11 [linux-amd64]
[["®"]]

Using ASCII character in the string doesn't fail parsing as well

jruby -v -e 'require "csv"; p CSV.parse("1\r\n")'
jruby 1.7.6 (1.9.3p392) 2013-11-11 fffffff on Java HotSpot(TM) 64-Bit Server VM 1.7.0_21-b11 [linux-amd64]
[["1"]]

BTW, compare it with MRI ruby. It works there.

ruby -v -e 'require "csv"; p CSV.parse("¢\r\n")'
ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-linux]
[["¢"]]
Charles Oliver Nutter
Owner

Nice find...we must be counting the non-ascii character's bytes wrong and walking off the end of some array.

Scott Blum

It's a signed/unsigned bug in bm_search.

Original:

i += skip[big[i + bstart]];

Should be:

i += skip[big[i + bstart] & 0xFF];

big is an array of signed bytes (-128 to 127), using the value directly as the index into skip causes underflow.

Thomas E Enebo enebo closed this in 1ab5c65 December 04, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.