Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'readText()' method in JsonParser #15

Closed
cowtowncoder opened this issue May 27, 2012 · 3 comments
Closed

Add 'readText()' method in JsonParser #15

cowtowncoder opened this issue May 27, 2012 · 3 comments

Comments

@cowtowncoder
Copy link
Member

Current JsonParser.getText() requires reading of the whole JSON String value as String.
While convenient, this may not be optimal when processing large payloads.

As an alternative method, there should be something like:

boolean readText(Writer w);

which would read JSON String value, and pass it using given Writer; but possibly in separate chunks, without aggregating it. This allows caller to do incremental processing and avoid potentially big temporary memory usage.

In addition, for non-blocking parser implementations, this method could do partial decoding, meaning that it would only parse part of textual value; return value indicating whether full contents (true) or partial content (false) was processed.

LokeshN added a commit to LokeshN/jackson-core that referenced this issue May 15, 2016
LokeshN added a commit to LokeshN/jackson-core that referenced this issue May 16, 2016
LokeshN added a commit to LokeshN/jackson-core that referenced this issue May 16, 2016
LokeshN added a commit to LokeshN/jackson-core that referenced this issue May 18, 2016
LokeshN added a commit to LokeshN/jackson-core that referenced this issue May 18, 2016
cowtowncoder added a commit that referenced this issue May 18, 2016
@MikePieperSer
Copy link

I need to parse JSON with huge text fields (up to 500MB). Using the readText(Writer) methods still needs a lot of memory, because it reads the whole text field into memory.

Is there any plan to make this more efficient?

From code reading I would assume that giving the writer down to _finishString() could help here. Then the string finisher could use only one (some?) segment by writing it to the writer if it's full and reusing it.

@cowtowncoder
Copy link
Member Author

cowtowncoder commented Apr 29, 2021

No one is working on this currently as far as I know; I do not have time to work on this now and probably not for a while (unless I'd need it myself for some reason). But anyone who wants to work on it would be more than welcome to do so!

And yes, lazy initial handling (only decoding opening quote) is intended to allow more efficient read+write operation like you suggest. There are multiple backends (byte-based UTF8, character/Reader-based, async) to consider, but implementation could be relatively simple if it just addresses 2 common ones (Reader/byte-based; maybe DataInput one -- async could not be supported anyway I suspect.

Put another way: the reason this one has not been tackled is not necessarily due to inherent complexity of implementing support when API already exists.

@cowtowncoder
Copy link
Member Author

Looks like I re-filed this as #1288; could close that but in this case I'll do the opposite, close this, older issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants