Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement line simplification algorithm #5311

Closed
fredwangwang opened this Issue Mar 5, 2019 · 6 comments

Comments

Projects
None yet
2 participants
@fredwangwang
Copy link

fredwangwang commented Mar 5, 2019

Proposal

Problem description
Although large query is not recommanded in the prometheus documentation, but it is extremely useful to do analysis sometimes (applying arima on the given data for forecasting for example). Currently the only way to get the data over the long time span without losing detail is to get every single datapoints. But that data is usually more than enough to run the analysis and the payload can be very big. Feels like prometheus can do some post-processing before returning the data.

Potential Solution
Douglas Peucker algorithm is a widly used algorithm to simplify the complex lines without losing much details, and the computation overhead is small enough. Here is an online demo showing what the algorithm can do.
To implement the api, it could expose an query paramenter named something like douglas_peucker: tolerance. If the parameter is given, run douglas peucker with defined tolerance before returning the data.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 5, 2019

That looks like it's take more CPU than simply processing the data, and you'd have to read all the data too. It'd also not really fit with the PromQL computational model.

@fredwangwang

This comment has been minimized.

Copy link
Author

fredwangwang commented Mar 6, 2019

@brian-brazil I am quite new to Prometheus so I am not too familiar with PromQL computational model. But I do see functions like holt_winters which does some data "reshaping" before returning the data. So does it make sense to implement a douglas_peucker PromQL function then?

The typical complexity is O(n lg n), so it shouldn't take too much CPU power as it is essentially as expensive as calling sort.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 6, 2019

@fredwangwang

This comment has been minimized.

Copy link
Author

fredwangwang commented Mar 6, 2019

PromQL only returns one value from a given time series when working over a
range

I am not sure I follow... But using a query like total_request{job="router",ip="1.2.3.4"} ranging from 2018-6-1 to 2019-1-1 would return a Instant vectors right? What I understand (might be wrong), the instant vectors is just a 2d line (time as the x-axis), so douglas_peucker could apply.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 6, 2019

No, each vertical slice is an instant vector calculated independently. If you wanted to do this processing, you'd want to do it after getting the PromQL result over HTTP.

@fredwangwang

This comment has been minimized.

Copy link
Author

fredwangwang commented Mar 6, 2019

@brian-brazil Thanks for the chat! I am going to close this for now and implement it on the consumer side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.