-
-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming bug #160
Comments
Good test case. I'll will get it fixed over the next couple of days. Assuming it is an Oj bug. :-) |
Faster than I though. The problem is that the file implementation preloads the file for efficiency when using gets. Oj also tried to optimize by using the file descriptor but Ruby has already loaded the file and moved the position of the file descriptor to the end of the file. If you use file.read(1) instead the file is not preloaded and Oj works as expected. As for the fix I'm tempted to document the behavior and move on but I'll run some performance tests and see how big a hit it is to use the cached file object. There is another work around as well if you use streams instead. Anyway, stay tuned. |
Huh. So if I call gets on the file first, then streaming isn't going to win me any performance anyways because the entire file will be preloaded? Interesting... |
I suspect there is a limit to the preloaded size. I have some ideas though. Give me a day or two. |
I haven't forgotten about this. Just been busy with work. |
I pushed a version that I believe fixes the problem. Can you try it? It on github but the gem is not released yet. |
Yes this appears to fix the problem. Thanks! Regarding timing, I'm still finding it much faster to call Using
Using
Using
Where the file is ~43MB in size (not compressed). |
Ah, if I read the commit correctly then you skip streaming mode when not at the front of the file, so my streaming-mode times weren't actually using streaming mode (since I was calling |
That would make a difference. Good to close? |
If I take the
which is much better than without streaming mode, but still much slower than This ticket can be closed though. |
I guess Oj can't know that the current json object to parse is exactly and only the current line, since it has to handle pretty-printed json as well. |
I'm afraid that is true. JSON often spans multi lines. |
I started investigating a case where I was getting
unexpected character at line 1, column 1 [sparse.c:688]
even though the next character appeared to be a{
however I have not been able to get that down to a small test case. I did, however, find this one in the process:Script:
And file
tst
:And the only output is
5
, no json from oj.Hopefully this has the same root cause as the real one I'm trying to track down.
The text was updated successfully, but these errors were encountered: