Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV parse fails when string with mutibyte character terminates with CR-LF #1222

Closed
crawlik opened this Issue Nov 13, 2013 · 2 comments

Comments

Projects
None yet
3 participants
@crawlik
Copy link

crawlik commented Nov 13, 2013

jruby -v -e 'require "csv"; CSV.parse("®\r\n")'
jruby 1.7.6 (1.9.3p392) 2013-11-11 fffffff on Java HotSpot(TM) 64-Bit Server VM 1.7.0_21-b11 [linux-amd64]
RubyStringIO.java:688:in `bm_search': java.lang.ArrayIndexOutOfBoundsException: -82
    from RubyStringIO.java:644:in `internalGets19'
    from RubyStringIO.java:798:in `getsOnly'
    from RubyStringIO.java:788:in `gets19'
    from RubyStringIO$INVOKER$i$0$2$gets19.gen:-1:in `call'
    from JavaMethod.java:665:in `call'
    from DynamicMethod.java:206:in `call'
    from CachingCallSite.java:326:in `cacheAndCall'
    from CachingCallSite.java:170:in `call'
    from CallOneArgNode.java:57:in `interpret'
    from DAsgnNode.java:110:in `interpret'
    from IfNode.java:110:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from BlockNode.java:71:in `interpret'
    from ASTInterpreter.java:112:in `INTERPRET_BLOCK'
    from Interpreted19Block.java:206:in `evalBlockBody'
    from Interpreted19Block.java:157:in `yield'
    from Interpreted19Block.java:130:in `yieldSpecific'
    from Block.java:111:in `yieldSpecific'
    from RubyKernel.java:1517:in `loop'
    from RubyKernel$INVOKER$s$0$0$loop.gen:-1:in `call'
    from CachingCallSite.java:316:in `cacheAndCall'
    from CachingCallSite.java:145:in `callBlock'
    from CachingCallSite.java:154:in `callIter'
    from FCallNoArgBlockNode.java:32:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from BlockNode.java:71:in `interpret'
    from ASTInterpreter.java:74:in `INTERPRET_METHOD'
    from InterpretedMethod.java:139:in `call'
    from DefaultMethod.java:182:in `call'
    from CachingCallSite.java:306:in `cacheAndCall'
    from CachingCallSite.java:136:in `call'
    from VCallNode.java:88:in `interpret'
    from LocalAsgnNode.java:123:in `interpret'
    from WhileNode.java:127:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from IfNode.java:116:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from ASTInterpreter.java:74:in `INTERPRET_METHOD'
    from InterpretedMethod.java:161:in `call'
    from DefaultMethod.java:190:in `call'
    from RubyClass.java:527:in `finvoke'
    from Helpers.java:479:in `invoke'
    from RubyEnumerable.java:97:in `callEach'
    from RubyEnumerable.java:392:in `to_a19'
    from RubyEnumerable$INVOKER$s$to_a19.gen:-1:in `call'
    from CachingCallSite.java:306:in `cacheAndCall'
    from CachingCallSite.java:136:in `call'
    from VCallNode.java:88:in `interpret'
    from LocalAsgnNode.java:123:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from BlockNode.java:71:in `interpret'
    from ASTInterpreter.java:74:in `INTERPRET_METHOD'
    from InterpretedMethod.java:139:in `call'
    from DefaultMethod.java:182:in `call'
    from CachingCallSite.java:306:in `cacheAndCall'
    from CachingCallSite.java:136:in `call'
    from CallNoArgNode.java:60:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from EnsureNode.java:96:in `interpret'
    from BeginNode.java:83:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from IfNode.java:116:in `interpret'
    from NewlineNode.java:105:in `interpret'
    from BlockNode.java:71:in `interpret'
    from ASTInterpreter.java:74:in `INTERPRET_METHOD'
    from InterpretedMethod.java:182:in `call'
    from DefaultMethod.java:198:in `call'
    from CachingCallSite.java:326:in `cacheAndCall'
    from CachingCallSite.java:170:in `call'
    from -e:1:in `__file__'
    from -e:-1:in `load'
    from Ruby.java:810:in `runScript'
    from Ruby.java:803:in `runScript'
    from Ruby.java:672:in `runNormally'
    from Ruby.java:521:in `runFromMain'
    from Main.java:395:in `doRunFromMain'
    from Main.java:290:in `internalRun'
    from Main.java:217:in `run'
    from Main.java:197:in `main'

Removing ether CR or LF fixes the problem

$ jruby -v -e 'require "csv"; p CSV.parse("®\n")'
jruby 1.7.6 (1.9.3p392) 2013-11-11 fffffff on Java HotSpot(TM) 64-Bit Server VM 1.7.0_21-b11 [linux-amd64]
[["®"]]
jruby -v -e 'require "csv"; p CSV.parse("®\r")'
jruby 1.7.6 (1.9.3p392) 2013-11-11 fffffff on Java HotSpot(TM) 64-Bit Server VM 1.7.0_21-b11 [linux-amd64]
[["®"]]

Using ASCII character in the string doesn't fail parsing as well

jruby -v -e 'require "csv"; p CSV.parse("1\r\n")'
jruby 1.7.6 (1.9.3p392) 2013-11-11 fffffff on Java HotSpot(TM) 64-Bit Server VM 1.7.0_21-b11 [linux-amd64]
[["1"]]

BTW, compare it with MRI ruby. It works there.

ruby -v -e 'require "csv"; p CSV.parse("¢\r\n")'
ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-linux]
[["¢"]]
@headius

This comment has been minimized.

Copy link
Member

headius commented Nov 13, 2013

Nice find...we must be counting the non-ascii character's bytes wrong and walking off the end of some array.

@dragonsinth

This comment has been minimized.

Copy link

dragonsinth commented Dec 4, 2013

It's a signed/unsigned bug in bm_search.

Original:

i += skip[big[i + bstart]];

Should be:

i += skip[big[i + bstart] & 0xFF];

big is an array of signed bytes (-128 to 127), using the value directly as the index into skip causes underflow.

@enebo enebo closed this in 1ab5c65 Dec 4, 2013

enebo added a commit that referenced this issue Dec 4, 2013

enebo added a commit that referenced this issue Dec 4, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.