Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

script with long DATA block hangs #873

Closed
tlossen opened this Issue Jul 10, 2013 · 4 comments

Comments

Projects
None yet
3 participants
@tlossen
Copy link

tlossen commented Jul 10, 2013

my script has 10 lines above the "END" marker, and 750.000 lines below. it runs fine in ruby-1.9.2-p290, but when i start it in jruby-1.7.4, it just hangs and never even gets to read the first line of DATA (at least not in the first 5 minutes).

@enebo

This comment has been minimized.

Copy link
Member

enebo commented Jul 10, 2013

How big is this file sizewise? We do not have the same impl as MRI (which uses FILE*) and we end up allocating a big bytearrayinputstream out of that section 1k at a time. Assuming memory is not an issue we can probably bump this size up to a larger number like 32k since not many people use END and you are not the first large data set person.

If you could make a script to generate a representative END dataset we can probably poke at this and improve our impl. Ultimately, we want a read/write END data section preferably on top of NIO, but I know we looked at that in the past and there were some issues.

@tlossen

This comment has been minimized.

Copy link
Author

tlossen commented Jul 10, 2013

the file is 17MB, with roughly 25 chars per line:

$ ls -l by_started_at
-rw-r--r--@ 1 tim  staff  18224247 Jul 10 10:16 by_started_at
$ wc -l by_started_at
742919 by_started_at
@headius

This comment has been minimized.

Copy link
Member

headius commented Jul 10, 2013

So yeah, @enebo was right about the cause. We read the DATA contents all into memory currently, 1k at a time. Those bytes go into a slowly-growing array, so larger files will take DATA.size / 1024 read + resize + copy operations. It just ends up doing too much work.

I'm going to do a short-term fix to increase the buffer size. For a 10MB file, a 64k buffer loads DATA almost immediately.

We are also talking about the longer-term fix to actually pass the real stream/channel for DATA rather than reading into memory.

@headius headius closed this in fead88b Jul 10, 2013

headius added a commit that referenced this issue Jul 10, 2013

@tlossen

This comment has been minimized.

Copy link
Author

tlossen commented Jul 11, 2013

cool, that was quick!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.