Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AbstractJackson2Decoder.decodeToMono should use non-blocking parser #25838

Open
philsttr opened this issue Sep 29, 2020 · 4 comments
Open

AbstractJackson2Decoder.decodeToMono should use non-blocking parser #25838

philsttr opened this issue Sep 29, 2020 · 4 comments
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) status: pending-design-work Needs design work before any code can be developed type: enhancement A general enhancement
Milestone

Comments

@philsttr
Copy link

AbstractJackson2Decoder.decodeToMono currently uses DataBufferUtils.join to join all DataBuffers, converts the result to an input stream, and then uses jackson's blocking parser. This approach reads all DataBuffers for the entire payload into memory at the same time. For large payloads, this is not memory efficient.

On the other hand, AbstractJackson2Decoder.decode uses jackson's non-blocking parser. This approach does not require reading all buffers into memory at the same time. i.e. each buffer can be processed individually, and released as it is parsed.

For larger payloads, I think it would be better if decodeToMono also used jackson's non-blocking parser. This would allow each data buffer to be processed individually and released, rather than having to read the entire payload into memory.

Using the non-blocking parser would allow the maxInMemorySize to be increased to accommodate occasional large payloads, and allow the server to be more memory efficient when reading them.

It would also bring the behavior of JSON parsing in WebFlux more inline with the behavior of JSON parsing in WebMVC. WebMVC's AbstractJackson2HttpMessageConverter does not read the entire payload into memory and does not enforce a maxInMemorySize.

FWIW, I started looking into this when I hit a DataBufferLimitException for some occasional large payloads. I'd like to increase the maxInMemorySize, but then I noticed the consequence of loading everything into memory at the same time, which was a bit surprising since WebMVC doesn't have this limit.

What are your thoughts?

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Sep 29, 2020
@rstoyanchev
Copy link
Contributor

rstoyanchev commented Sep 30, 2020

The non-blocking parser would parse and release buffers but only to build a parsed representation that will accumulate until the complete value for the Mono is fully aggregated. So I don't think it will make a difference since we end up buffering either the raw bytes or the parsed representation.

In fact if you see in Jackson2Tokenizer where we do async parsing for a stream of objects, we still count the number of bytes parsed until the parsed Object is published (e.g. through a Flux), and that means you'll still be impacted by the maxInMemorySize limit when a single Object within a stream is large enough.

The main difference is that with buffering of the raw bytes, when they are fully accumulated, during the parsing we'll have both the raw data and the parsed representation briefly, but immediately after the parsing the raw bytes are released. Aside from that the maxInMemorySize would either way need to accommodate the size of the expected data for a given object.

@rstoyanchev rstoyanchev added the status: waiting-for-feedback We need additional information before we can continue label Sep 30, 2020
@philsttr
Copy link
Author

I still think it will be somewhat beneficial to not require all of the raw bytes from the payload to be in memory at the same time. I understand that the parsed representation is required to be in memory, there's no way around that.

I'd like WebFlux to be more like WebMVC in this regard. Since WebMVC uses an InputStream, WebMVC does not require all bytes to be in memory while parsing. Tangentially, WebMVC does not have a concept of maxInMemorySize either. I'm ok with having maxInMemorySize, but I would like the freedom to increase it, without needing both the full raw bytes and the parsed objects in memory at the same time.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Sep 30, 2020
@rstoyanchev
Copy link
Contributor

rstoyanchev commented Oct 1, 2020

I think there are additional performance considerations to keep in mind with async parsing, see for example FasterXML/jackson-core#478. We could create a benchmark to understand those trade-offs as it is currently.

For memory use, suppose we have a 1 MB body. For WebFlux it will grow to 1MB of raw data, then briefly spike during parsing, and then return to around 1 MB after raw data is released. For WebMvc it will grow up to 1MB but will have smaller spikes along the way, depending on the sizes of chunks while they are parsed. Furthermore the longer it takes to accumulate data, depending on network latency, the less of an impact parse spikes will have on overall memory.

So on the balance I'm not sure it's clear which would be better but this is worth investigating in order to motivate either a change, or remaining as is, or perhaps something in between that provides choice.

Tangentially, WebMVC does not have a concept of maxInMemorySize either

Spring MVC doesn't have this for unrelated reason. It's more to do with the fact that in Spring MVC it comes down to counting bytes on the InputStream. In WebFlux there is a lot more knowledge in the codecs, especially with streaming and async parsing, and it's harder to know how to count. This is why we decided to expose such settings in WebFlux.

@rstoyanchev rstoyanchev added in: web Issues in web modules (web, webmvc, webflux, websocket) status: pending-design-work Needs design work before any code can be developed type: enhancement A general enhancement and removed status: feedback-provided Feedback has been provided status: pending-design-work Needs design work before any code can be developed status: waiting-for-triage An issue we've not yet triaged or decided on labels Oct 1, 2020
@rstoyanchev rstoyanchev added this to the 5.x Backlog milestone Oct 1, 2020
@philsttr
Copy link
Author

philsttr commented Oct 3, 2020

Thanks for considering this. I definitely agree it requires more investigation to determine the right balance.

For memory use, suppose we have a 1 MB body. For WebFlux it will grow to 1MB of raw data, then briefly spike during parsing, and then return to around 1 MB after raw data is released.

That brief spike is what I'm worried about, because that spike contains all the raw data, plus the parsed representation.

WebMVC would never spike that high, since it would only ever contain a single chunk of the raw data (not all of the raw data), plus the parsed representation.

For larger payloads, this could be a significant difference, especially if the raw data is character data (e.g. strings), and the parsed data is not (e.g. integers, booleans, byte arrays, which are much more compact)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) status: pending-design-work Needs design work before any code can be developed type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

3 participants