-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON Marshaling of large responses is excessively expensive #3601
Comments
See #3536, but in general if you request something that takes 1GB of memory you're likely to have problems. |
Also, we have protections against this type of large query so querying 3d of data at 15s resolution should immediately return an error. This is to encourage such requests to be broken up into smaller requests. |
Sorry if this is too far out of context, but what do we generally think of optionally marshaling as protobuf instead of json? (something I recently had a discussion with @DirectXMan12 about) |
Let me respond to the 2 separate responses.
1. Lots of data == bad
The problem isn't that the data is so large (2G of data on a box with 256G
ram is nothing) the problem is the size amplification (1-10G). This is
soely due to the json marshaling doing it all before writing anything. This
is an easy fix that just requires streaming the Marshald bytes to the wire.
2. Protections in place
This is true for the query_range api, but if you send a request (like the
issue states) of `somemetricname[3d]` there are no protections in place in
the current implementation (if there was I wouldn't have seen the problem,
nor filed the issue).
The question is, if I submit a pr (1) speeding up marshaling and (2) moving
api marshaling to streaming, will you accept it?
On Dec 19, 2017 11:20 AM, "Brian Brazil" <notifications@github.com> wrote:
Also, we have protections against this type of large query so querying 3d
of data at 15s resolution should immediately return an error. This is to
encourage such requests to be broken up into smaller requests.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3601 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABi3nZ1EXu_Uwuxyfj6PG0-yyfd2rT1uks5tCAx1gaJpZM4RHWtC>
.
|
IMO that's orthogonal and presumably you'd see the same issues (since it's
a streaming problem, not the serialization itself).
…On Dec 19, 2017 11:57 AM, "Frederic Branczyk" ***@***.***> wrote:
Sorry if this is too far out of context, but what do we generally think of
optionally marshaling as protobuf instead of json? (cc @DirectXMan12
<https://github.com/directxman12>)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3601 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABi3nTKI6bZdVFCIVddgk8sbu3gb3v4oks5tCBUsgaJpZM4RHWtC>
.
|
It doesn't make a difference, we still need to build up all the data in RAM before we can start writing it out. |
I've spent some time working on this over the last day, and I've made significant progress. I have a PR open (prometheus/common#112) which speeds up marshaling using the current Using this PR'd version of |
In discussing a related PR on For completeness I replace
|
That is quite different from the benckmarks jsoniter have themselves, so I'm a bit suspicious. I'd really like to see an end-to-end test. One way or the other this is all getting assembled in memory before being marshalled, so smaller queries are needed for this volume of data. |
I'm not sure what you are suspicious of? Both of the benchmarks aren't black-boxed, they are open source-- meaning you can look at them and determine if there is something wrong. If you don't want to bother looking into it, I've added end-to-end data (hitting the |
#3536 improved this. Your use case continues to not be sane. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
What did you do?
Sent a query looking like
metricname[3d]
to prometheus.What did you expect to see?
metricname
has ~5k labelsets, and from my math I'm expecting to get a large data response (~2G)What did you see instead? Under which circumstances?
Instead, I seemingly never get a response from prometheus. More interestingly I see a large increase in memory utilization to the point that prometheus stops scraping and eventually OOMs. From digging in more I found that the issue is all to do with how prometheus is marshaling out the response on the HTTP API. With my below test script, we generate 3d worth of data (at a 15s period) for 5k timeseries. We generate that data (~500ms and ~1.2G RAM) and then json marshal that data (~2m -- and consumes ~11G of RAM).
Issues
Suggestions
Suggestion 1
In an effort to alleviate both problems I suggest the json marshaling is made to stream the data to the wire. There's no need to make a copy of it all in memory first, especially in the API case where we literally just write to the buffer. A terrible-hacky example would be something like:
In this example we would spin over every entry in the response and marshal that out. This means that each samplestream (in this example) would need to be in memory, but we'd then write it to the wire and no longer need it in memory. In addition this means that the request is "cancelable" at each encode step (if the client disconnects, then you get a stream closed error). Of course the "correct" implementation of this would require a bit of type switching
Suggestion 2
Change the marshaling of the various structs to be codegen'd. Most of them are partially there, but there are some minor improvements that can be made that would give you ~2x boost in performance (mostly copying less, and reflecting less).
For both of these I'd more than happy to implement it (its not that bad), but I wanted to get some feedback prior to implementation.
Repro Script
The text was updated successfully, but these errors were encountered: