script with long DATA block hangs #873

tlossen opened this Issue Jul 10, 2013 · 4 comments

3 participants


my script has 10 lines above the "END" marker, and 750.000 lines below. it runs fine in ruby-1.9.2-p290, but when i start it in jruby-1.7.4, it just hangs and never even gets to read the first line of DATA (at least not in the first 5 minutes).

JRuby Team member

How big is this file sizewise? We do not have the same impl as MRI (which uses FILE*) and we end up allocating a big bytearrayinputstream out of that section 1k at a time. Assuming memory is not an issue we can probably bump this size up to a larger number like 32k since not many people use END and you are not the first large data set person.

If you could make a script to generate a representative END dataset we can probably poke at this and improve our impl. Ultimately, we want a read/write END data section preferably on top of NIO, but I know we looked at that in the past and there were some issues.


the file is 17MB, with roughly 25 chars per line:

$ ls -l by_started_at
-rw-r--r--@ 1 tim  staff  18224247 Jul 10 10:16 by_started_at
$ wc -l by_started_at
742919 by_started_at
JRuby Team member

So yeah, @enebo was right about the cause. We read the DATA contents all into memory currently, 1k at a time. Those bytes go into a slowly-growing array, so larger files will take DATA.size / 1024 read + resize + copy operations. It just ends up doing too much work.

I'm going to do a short-term fix to increase the buffer size. For a 10MB file, a 64k buffer loads DATA almost immediately.

We are also talking about the longer-term fix to actually pass the real stream/channel for DATA rather than reading into memory.

@headius headius closed this in fead88b Jul 10, 2013

cool, that was quick!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment