Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming parsing of JSON with WebClient [SPR-17328] #21862

Closed
spring-issuemaster opened this issue Oct 3, 2018 · 3 comments

Comments

@spring-issuemaster
Copy link
Collaborator

@spring-issuemaster spring-issuemaster commented Oct 3, 2018

Julian Orth opened SPR-17328 and commented

Hi,

AsyncRestTemplate was deprecated in #19962 in favor of WebClient. However, WebClient does not seem to support all of the use cases that AsyncRestTemplate supports (and which RestTemplate does not support.)

Example

Consider the following JSON:

{
    "a": [
        {
            "x": 2,
            "y": 1
        }
    ],
    "b": [
        {
            "x": 3,
            "y": 1
        }
    ]
} 

where both arrays (a and b) have 1,000,000,000 elements each. The goal is to calculate the sum of all x - y over both arrays. (E.g. (2 - 1) + (3 - 1) = 3 in the example above.) 

Solution with AsyncRestTemplate

With AsyncRestTemplate, this is easy: Call AsyncRestTemplate#execute with a ResponseExtractor, plug the InputStream into a Jackson JsonParser, use ObjectMapper to deserialize each array element ad-hoc into

class V {
   int x;
   int y;
},

update the sum, proceed to the next element. Since only one Object of type V needs to be in memory at a time, the memory requirements are constant and low.

Overall, performing this streaming processing of the JSON can probably be done in 25 lines of code using Jackson and AsyncRestTemplate.

The Problem with WebClient

With WebClient, this kind of processing seems to be practically impossible. Jackson appears to only support async parsing at the token level. Anything at a higher level (e.g. ObjectMapper) needs to have all tokens available in a blocking way to parse them.

Therefore, to implement the kind of streaming processing described above, I would have to manually keep track of the JSON tokens parsed and then plug them into an ObjectMapper all at once when I've detected the end of an array element. This is basically what Spring currently does to support streaming of top-level arrays:

WebClient.create().get().exchange().flatMapMany(r -> r.bodyToFlux(V.class)) 

However, even to support only this very limited streaming of top-level array elements, Spring had to re-implement about 200 lines of Jackson logic to keep track of the current depth in the token stream (Jackson2Tokenizer).

Question

Since AsyncRestTemplate is deprecated, there no longer seems to be an encouraged and practical way in Spring 5 to do asynchronous streaming of JSON data. There are several ways to improve this situation:

  1. Un-deprecate AsyncRestTemplate
  2. Upstream complete async support in Jackson
  3. Provide a much expanded version of Jackson2Tokenizer to the public that handles more complicated cases such as the one described above

What are your thoughts on the matter and do you have plans to address this problem in a future release?

Thanks
Julian

PS: A similar problem exists on the server side. With web-mvc, an object returned from a REST endpoint would be streamed into the output stream via Jackson, keeping the memory requirements low. With webflux, a Mono<Object> returned from a REST endpoint will first be serialized into a String before it is written to the output stream.


Affects: 5.0.9

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Oct 4, 2018

Rossen Stoyanchev commented

Jackson2Decoder supports streaming from json arrays as well as streaming line-delimited JSON when "content-type:application/stream+json" with a wire format:

{x:"2", y:"1"}
{x:"3", y:"1"} 
...

Both of which translate naturally to Flux<V>.

For something like your case we could allow decoding to Flux<Map.Entry<K,V>> to support streaming from a map. The same on the encoding side where for "application/json" and given Flux<Map.Entry<K,V>> to produce valid JSON we'd have write out a similar map. By the way, Spring MVC also supports Flux and Mono for return values, so you can stream too.

Are you in control of both client and server side? In other words do you care about both encoding and decoding, and likewise are you able to even consider alternative output on the wire? 

One extra consideration on the encoding side. For streaming media types like "text/event+stream" and "application/stream+json" we write and flush on every element since elements (e.g. events) may have time gaps in between. Your case sounds more like streaming lots of data without loading it in memory, so explicit flushing might not be required still. We'd have to consider some extra options for this, such as passing hints to EncodingHttpMessageWriter that wraps the JacksonEncoder.

 

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Feb 4, 2019

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Feb 11, 2019

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.