New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check memory usage before decoding async response #74594
Conversation
Pinging @elastic/es-search (Team:Search) |
switch (fieldName) { | ||
case RESULT_FIELD: | ||
ensureExpectedToken(XContentParser.Token.VALUE_STRING, parser.currentToken(), parser); | ||
final CharBuffer encodedBuffer = parser.charBuffer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we can use parser.binaryValue()
instead that will provide already decoded from Base64 array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the code of getBinaryValue
and readBinaryValue
of JsonParser. Both use two intermediate buffers while decoding a Base64 string. The current approach doesn't require extra memory. I know we shouldn't utilize these optimizations, but they help when encoding/decoding huge responses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, Nhat.
I've checked parser.charBuffer
implementation, it uses String::toCharArray()
which will allocate
a newly character array whose length is the length of this string. So it means that we first allocate 2X Response size memory, and only use circuit breaker for another 1X memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some other ideas, what do you think of it:
- You said we can record the size of search response, what if we record the size of just encoded Based 64 string (which we should know)
- Another very simple way is to use setting
search.max_async_search_response_size
that Set max allowed size for stored async response #74455 introduces. We can just check for max possible size available in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked parser.charBuffer implementation, it uses String::toCharArray() which will allocate
a newly character array whose length is the length of this string.
Great catch, thanks Mayya! I have updated this PR to account an extra buffer to the XContentParser. Unlike the reserved memory for the response, we can release this memory immediately after parsing.
You said we can record the size of search response, what if we record the size of just encoded Based 64 string (which we should know)
We already have it when parsing the xContent (i.e., encodedBuffer.length()).
Another very simple way is to use setting search.max_async_search_response_size that Set max allowed size for stored async response #74455 introduces. We can just check for max possible size available in memory.
We can overly reserve the memory especially when users use a large value for this setting.
Thanks Mayya for your review. I have addressed your feedback. Would you please take another look? |
boolean restoreResponseHeaders, boolean checkAuthentication, | ||
Counter reservedBytes) { | ||
// Reserve an extra buffer as XContentParser can use it to hold parsed values. | ||
circuitBreaker.addEstimateBytesAndMaybeBreak(source.length(), "parse xContent of async response"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if this circuitBreaker raises an exception? Should it be inside try statement to make sure we release allocated bytes number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this circuitBreaker raises an exception, then we haven't reserved the memory yet. Therefore, we shouldn't release it.
@dnhatn Nhat, thanks for iterating. The PR overall LGTM. Or is there some extra benefit of this modified code? Reduced memory? |
I prefer not to introduce these optimizations too, but the old implementation can use up to 4 times memory of the source length: the internal buffer of the parser, a base64 encoded string, a base64 decoded buffer, and a response. I will try to simplify the code. |
@mayya-sharipova I pushed cc66b09 to simplify the code. Would you mind taking another look? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dnhatn Thanks for iterating Nhat! I like how the code looks now! Great job!
@mayya-sharipova Thanks so much for your reviews. |
This change makes sure the system has enough memory before decoding an async search response as a large response can lead to OOM.
This change makes sure the system has enough memory before decoding an async search response as a large response can lead to OOM.
This change makes sure the system has enough memory before decoding an async search response as a large response can lead to OOM.