Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming bug #160

Closed
eapache opened this issue Jun 9, 2014 · 12 comments
Closed

Streaming bug #160

eapache opened this issue Jun 9, 2014 · 12 comments

Comments

@eapache
Copy link

eapache commented Jun 9, 2014

I started investigating a case where I was getting unexpected character at line 1, column 1 [sparse.c:688] even though the next character appeared to be a { however I have not been able to get that down to a small test case. I did, however, find this one in the process:

Script:

#!/usr/bin/env ruby

require 'oj'

File.open('tst', 'r') do |file|
  puts file.gets
  Oj.load(file) do |val|
    puts val
  end
end

And file tst:

5
{"a": 2}

And the only output is 5, no json from oj.

Hopefully this has the same root cause as the real one I'm trying to track down.

@ohler55
Copy link
Owner

ohler55 commented Jun 10, 2014

Good test case. I'll will get it fixed over the next couple of days. Assuming it is an Oj bug. :-)

@ohler55
Copy link
Owner

ohler55 commented Jun 10, 2014

Faster than I though. The problem is that the file implementation preloads the file for efficiency when using gets. Oj also tried to optimize by using the file descriptor but Ruby has already loaded the file and moved the position of the file descriptor to the end of the file. If you use file.read(1) instead the file is not preloaded and Oj works as expected.

As for the fix I'm tempted to document the behavior and move on but I'll run some performance tests and see how big a hit it is to use the cached file object. There is another work around as well if you use streams instead. Anyway, stay tuned.

@eapache
Copy link
Author

eapache commented Jun 10, 2014

Huh. So if I call gets on the file first, then streaming isn't going to win me any performance anyways because the entire file will be preloaded? Interesting...

@ohler55
Copy link
Owner

ohler55 commented Jun 10, 2014

I suspect there is a limit to the preloaded size. I have some ideas though. Give me a day or two.

@ohler55
Copy link
Owner

ohler55 commented Jun 16, 2014

I haven't forgotten about this. Just been busy with work.

@ohler55
Copy link
Owner

ohler55 commented Jun 21, 2014

I pushed a version that I believe fixes the problem. Can you try it? It on github but the gem is not released yet.

@eapache
Copy link
Author

eapache commented Jun 23, 2014

Yes this appears to fix the problem. Thanks!

Regarding timing, I'm still finding it much faster to call Oj.load(file.gets) than to use the streaming functionality. For reference:

Using var = Oj.load(file):

bundle exec bin/starscope -f tst --no-update  4.60s user 0.24s system 99% cpu 4.851 total
bundle exec bin/starscope -f tst --no-update  4.53s user 0.24s system 99% cpu 4.779 total
bundle exec bin/starscope -f tst --no-update  4.57s user 0.24s system 99% cpu 4.826 total

Using var = Oj.load(file.gets)

bundle exec bin/starscope -f tst --no-update  2.88s user 0.27s system 99% cpu 3.163 total
bundle exec bin/starscope -f tst --no-update  2.89s user 0.27s system 99% cpu 3.165 total
bundle exec bin/starscope -f tst --no-update  2.94s user 0.27s system 99% cpu 3.224 total

Using Oj.load(file) {|x| var = x}

bundle exec bin/starscope -f tst --no-update  4.54s user 0.25s system 99% cpu 4.802 total
bundle exec bin/starscope -f tst --no-update  4.55s user 0.24s system 99% cpu 4.799 total
bundle exec bin/starscope -f tst --no-update  4.56s user 0.24s system 99% cpu 4.813 total

Where the file is ~43MB in size (not compressed).

@eapache
Copy link
Author

eapache commented Jun 23, 2014

Ah, if I read the commit correctly then you skip streaming mode when not at the front of the file, so my streaming-mode times weren't actually using streaming mode (since I was calling file.gets before invoking Oj).

@ohler55
Copy link
Owner

ohler55 commented Jun 23, 2014

That would make a difference. Good to close?

@eapache
Copy link
Author

eapache commented Jun 23, 2014

If I take the file.gets out to actually use streaming mode, for var = Oj.load(file) I get

bundle exec bin/starscope -f tst2 --no-update  4.03s user 0.24s system 99% cpu 4.288 total
bundle exec bin/starscope -f tst2 --no-update  4.05s user 0.25s system 99% cpu 4.306 total
bundle exec bin/starscope -f tst2 --no-update  4.03s user 0.24s system 99% cpu 4.284 total

which is much better than without streaming mode, but still much slower than var = Oj.load(file.gets).

This ticket can be closed though.

@eapache eapache closed this as completed Jun 23, 2014
@eapache
Copy link
Author

eapache commented Jun 23, 2014

I guess Oj can't know that the current json object to parse is exactly and only the current line, since it has to handle pretty-printed json as well.

@ohler55
Copy link
Owner

ohler55 commented Jun 23, 2014

I'm afraid that is true. JSON often spans multi lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants