-
-
Notifications
You must be signed in to change notification settings - Fork 765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 'readText()' method in JsonParser #15
Comments
Review changes
Review changes
Review changes
issue #15 - readtext in jsonparser
I need to parse JSON with huge text fields (up to 500MB). Using the readText(Writer) methods still needs a lot of memory, because it reads the whole text field into memory. Is there any plan to make this more efficient? From code reading I would assume that giving the writer down to _finishString() could help here. Then the string finisher could use only one (some?) segment by writing it to the writer if it's full and reusing it. |
No one is working on this currently as far as I know; I do not have time to work on this now and probably not for a while (unless I'd need it myself for some reason). But anyone who wants to work on it would be more than welcome to do so! And yes, lazy initial handling (only decoding opening quote) is intended to allow more efficient read+write operation like you suggest. There are multiple backends (byte-based UTF8, character/Reader-based, async) to consider, but implementation could be relatively simple if it just addresses 2 common ones (Reader/byte-based; maybe DataInput one -- async could not be supported anyway I suspect. Put another way: the reason this one has not been tackled is not necessarily due to inherent complexity of implementing support when API already exists. |
Looks like I re-filed this as #1288; could close that but in this case I'll do the opposite, close this, older issue. |
Current JsonParser.getText() requires reading of the whole JSON String value as String.
While convenient, this may not be optimal when processing large payloads.
As an alternative method, there should be something like:
boolean readText(Writer w);
which would read JSON String value, and pass it using given Writer; but possibly in separate chunks, without aggregating it. This allows caller to do incremental processing and avoid potentially big temporary memory usage.
In addition, for non-blocking parser implementations, this method could do partial decoding, meaning that it would only parse part of textual value; return value indicating whether full contents (true) or partial content (false) was processed.
The text was updated successfully, but these errors were encountered: