-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming APIs #3690
Comments
I'm not sure on this one, large responses are something we should avoid. Anything too big to fit in RAM is going to take out a web browser too. #3601 is outside of sane usage for example. This would also prevent us from ever having a limit on the number of time series returned, as we'd only find that out after we've already returned a 200. |
promql data expansion has the same problem and although I am not sure for a good solution yet I think it would be nice if we can reduce the memory footprint. |
Anything too big to fit in RAM is going to take out a web browser too
That's making a pretty big assumption about who the client is.
You'll probably suggest other clients chunk up their requests themselves.
This does not really scale well – neither UX nor performance wise.
This would also prevent us from ever having a limit on the number of time
series returned
Not really. You can get that number either approximated but also trivially
precisely based on index lookups.
Please consider that there may be other valid client use cases outside of
web browsers that care about decent performance and flexible reuslt sizes.
That generally implies some sort of chunked and streaming API to me as well.
…On Tue, Jan 16, 2018 at 3:18 PM Krasi Georgiev ***@***.***> wrote:
promql data expansion has the same problem and although I am not sure for
a good solution yet I think it would be nice if we can reduce the memory
footprint.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3690 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEuA8tnNTQTU-s5ya8BNK2og1IjpyyULks5tLK-2gaJpZM4RflpJ>
.
|
Hi, |
this would be a very long topic , but in general we would need to change promql is quite complex and I am still getting my head around it, but willing to team with someone willing to spend some time on this. |
PromQL are where the wins are. Web is much more involved, as that'd involve creating brand new APIs for use cases which are unclear. |
@krasi-georgiev This sounds pretty cool to me. There seems to be a lot of complexity under the hood and I would love to have some discussion over the same. |
I can't give you any specific advice as I haven't spent that much in that part of the code myself, but in general get familiar with
this should be enough to get us started. |
I am just thinking that we can store the compacted data not the original data to reduce the memory usage. Here is paper named Optimal Quantile Approximation in Streams. In this paper, it proposes a new data structure called compactor to store data. If we don't need the Accurate answer, it can use smaller space to maintain the statistic over streams. Then, you can use this data structure to answer the query about what is the quantile of a new data. This data structure can also be mergeable. So we can use this method to compact the data in every chunk and merge all chunk when we want to use them. But using this data structure may cause some loss of our data and some metric types of our data such as count type. I am not sure about if this notion will help to reduce the memory usage. I am looking forward to getting some suggestions. By the way, I want to be helpful to this part. |
@Titanssword That doesn't really help in PromQL, there's only two functions that calculate quantiles like that and they're not very commonly used. |
I studied a bit about creating a streaming API, but there aren't many resources for the same. An alternative to this would be using WebSockets, which sometimes cause problems with proxies and firewalls. Here's a proof of concept for a simple streaming API.
Gist > https://gist.github.com/himanshub16/98f7c00a39256d58de838394a55682ff I went through some code in web and found protobuf in use. So, this should require special care. |
@himanshub16 maybe should hang around the IRC dev channel and try to get some feedback there. I am definitely interested in getting involved but my focus is on tsdb for the next few weeks. |
Based on work I've been doing on PromQL in the past while, streaming results is even less practical than I first thought. We still need to have the whole result in memory before we start to send anything, so changing our HTTP APIs from what they are today isn't going to buy us anything. |
@brian-brazil can you share your thoughts? |
Right now we need to evaluate the entire query to know whether it succeeded to determine the status code, and the status code is the first thing we send via HTTP. PromQL currently evaluates range queries by step, to stream that you'd have to resend the metric names on every step which would blow up response sizes by at least 5x. Making that better would require a non-trivial protocol, making ad-hoc usage of the APIs much harder. A more efficient PromQL would evaluate each node fully for all steps at once, which offers no scope for streaming responses as you would still need be building up the entire response first. To me this seems like a solution in search of a problem. Everything I've seen talked about is already doable with our current API. |
after the dev discussions at Promcon there are no objections that there would be no benefits from streaming in remote read and too complicated to be worth it in the promql. |
Wise discussions and not getting too excited saved us from technical debt. |
a short version of the discussion will be published soon. #3443 (comment)
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Just FYI, there is movement in this direction, follow up discussion and announcement on #4517 |
Various of our endpoints would benefit from being streaming, meaning that we send results in chunks rather than collecting all results, and then rendering them in a single response. This is useful, for large responses, but foreshadowing, this might also be useful in order to stream new samples to a front end, although I would see that as a separate issue.
For this path to be optimal, it requires that all the underlying parts are also streaming, meaning TSDB (to my knowledge this is already the case), as well as the query engine. My gut feeling also tells me that depending on the implementation isolation in TSDB would also be required.
(Somewhat) related: #3691 #3443 #3601 https://github.com/prometheus/tsdb/issues/260
@fabxc @gouthamve @brian-brazil
The text was updated successfully, but these errors were encountered: