Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming APIs #3690

Closed
brancz opened this issue Jan 16, 2018 · 20 comments
Closed

Streaming APIs #3690

brancz opened this issue Jan 16, 2018 · 20 comments

Comments

@brancz
Copy link
Member

brancz commented Jan 16, 2018

Various of our endpoints would benefit from being streaming, meaning that we send results in chunks rather than collecting all results, and then rendering them in a single response. This is useful, for large responses, but foreshadowing, this might also be useful in order to stream new samples to a front end, although I would see that as a separate issue.

For this path to be optimal, it requires that all the underlying parts are also streaming, meaning TSDB (to my knowledge this is already the case), as well as the query engine. My gut feeling also tells me that depending on the implementation isolation in TSDB would also be required.

(Somewhat) related: #3691 #3443 #3601 https://github.com/prometheus/tsdb/issues/260

@fabxc @gouthamve @brian-brazil

@brian-brazil
Copy link
Contributor

I'm not sure on this one, large responses are something we should avoid. Anything too big to fit in RAM is going to take out a web browser too. #3601 is outside of sane usage for example.

This would also prevent us from ever having a limit on the number of time series returned, as we'd only find that out after we've already returned a 200.

@krasi-georgiev
Copy link
Contributor

promql data expansion has the same problem and although I am not sure for a good solution yet I think it would be nice if we can reduce the memory footprint.

@fabxc
Copy link
Contributor

fabxc commented Jan 16, 2018 via email

@himanshub16
Copy link

Hi,
I read about this on GSOC'18 ideas page and would like to contribute to it.
I would like to have some guidance on what component would require change/upgrade for this to work.
Thanks.

@krasi-georgiev
Copy link
Contributor

this would be a very long topic , but in general we would need to change
https://github.com/prometheus/prometheus/tree/master/promql
https://github.com/prometheus/prometheus/tree/master/web

promql is quite complex and I am still getting my head around it, but willing to team with someone willing to spend some time on this.

@brian-brazil
Copy link
Contributor

PromQL are where the wins are. Web is much more involved, as that'd involve creating brand new APIs for use cases which are unclear.

@himanshub16
Copy link

@krasi-georgiev This sounds pretty cool to me. There seems to be a lot of complexity under the hood and I would love to have some discussion over the same.
Meanwhile, my exams are getting over by Monday, and I would try to get my hands dirty. Would like to have suggestions on how to uncover the layers without getting overwhelmed.

@krasi-georgiev
Copy link
Contributor

I can't give you any specific advice as I haven't spent that much in that part of the code myself, but in general get familiar with

  • the code execution of promql(vscode with breakpoints, fmt.Println(...)).
  • golang profiling - run different queries and collect profiles to see where most of the CPU and Memory is spent in promql.

this should be enough to get us started.

@Titanssword
Copy link

I am just thinking that we can store the compacted data not the original data to reduce the memory usage. Here is paper named Optimal Quantile Approximation in Streams.

In this paper, it proposes a new data structure called compactor to store data. If we don't need the Accurate answer, it can use smaller space to maintain the statistic over streams. Then, you can use this data structure to answer the query about what is the quantile of a new data. This data structure can also be mergeable. So we can use this method to compact the data in every chunk and merge all chunk when we want to use them.

But using this data structure may cause some loss of our data and some metric types of our data such as count type. I am not sure about if this notion will help to reduce the memory usage.

I am looking forward to getting some suggestions. By the way, I want to be helpful to this part.

@brian-brazil
Copy link
Contributor

@Titanssword That doesn't really help in PromQL, there's only two functions that calculate quantiles like that and they're not very commonly used.

@himanshub16
Copy link

I studied a bit about creating a streaming API, but there aren't many resources for the same.
Having used Twitter's streaming API recently, I tried to figure out how they work from this code.
Twitter's streaming API keeps the connection initiated by the client open and keeps writing gzipped data to the stream.

An alternative to this would be using WebSockets, which sometimes cause problems with proxies and firewalls.
The data can be gzipped/plain and the counterpart has to be done in the browser.

Here's a proof of concept for a simple streaming API.

waitDur, _ := time.ParseDuration("2s")

http.HandleFunc("/hello", func(w http.ResponseWriter, r *http.Request) {
	flusher := w.(http.Flusher)

	for i := 0; i < 10; i++ {
		fmt.Fprintln(w, "Hello world", i+1, time.Now().Unix())
		flusher.Flush()
		fmt.Println("hello", i+1)
		time.Sleep(waitDur)
	}
})

Gist > https://gist.github.com/himanshub16/98f7c00a39256d58de838394a55682ff

image

I went through some code in web and found protobuf in use. So, this should require special care.
Am I heading in the right direction?
Would like to have feedback on the same.

@krasi-georgiev
Copy link
Contributor

@himanshub16 maybe should hang around the IRC dev channel and try to get some feedback there.

I am definitely interested in getting involved but my focus is on tsdb for the next few weeks.

@brian-brazil
Copy link
Contributor

Based on work I've been doing on PromQL in the past while, streaming results is even less practical than I first thought. We still need to have the whole result in memory before we start to send anything, so changing our HTTP APIs from what they are today isn't going to buy us anything.

@brancz
Copy link
Member Author

brancz commented Mar 7, 2018

@brian-brazil can you share your thoughts?

@brian-brazil
Copy link
Contributor

Right now we need to evaluate the entire query to know whether it succeeded to determine the status code, and the status code is the first thing we send via HTTP.

PromQL currently evaluates range queries by step, to stream that you'd have to resend the metric names on every step which would blow up response sizes by at least 5x. Making that better would require a non-trivial protocol, making ad-hoc usage of the APIs much harder.

A more efficient PromQL would evaluate each node fully for all steps at once, which offers no scope for streaming responses as you would still need be building up the entire response first.

To me this seems like a solution in search of a problem. Everything I've seen talked about is already doable with our current API.

@krasi-georgiev
Copy link
Contributor

after the dev discussions at Promcon there are no objections that there would be no benefits from streaming in remote read and too complicated to be worth it in the promql.

@himanshub16
Copy link

Wise discussions and not getting too excited saved us from technical debt.

@krasi-georgiev
Copy link
Contributor

a short version of the discussion will be published soon.

#3443 (comment)
@brancz and @brian-brazil just corrected me so It seems I missed some of the main points.

We did conclude that streaming in the remote read endpoint would be good, but we're unclear how if that's practical to mix in to the current implementation.

That was not the conclusion, just that similar to how the Thanos store API does it would be more beneficial.

@lock
Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019
@prometheus prometheus unlocked this conversation Jun 7, 2019
@bwplotka
Copy link
Member

bwplotka commented Jun 7, 2019

Just FYI, there is movement in this direction, follow up discussion and announcement on #4517

@lock lock bot locked and limited conversation to collaborators Dec 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants