What about less accurate but more efficient percentile calculation? #513

tisba · 2014-05-07T16:23:12Z

I was wondering, what your opinion is about offering a less accurate, but way more efficient method for percentile calculation.

AFAIK influx currently calculates percentiles in a naive way. There are more efficient (and way faster approaches) to do this (t-DIGESTS used in Elasticsearch e.g.). If you have many events, distributed over your cluster, then it might be nice to offer the user a way to trace accuracy for speed.

I have to deal with millions of events that I'd like to run percentile (actually I calculate entire percentile ranges) calculations over. The performance is okay-isch right now, but not very good for an interactive use case.

jvshahid · 2014-05-07T16:26:20Z

Sounds interesting. We've talking about this for a while. I'll read the paper and try to get something out there soon for you to try.

tisba · 2014-10-08T14:22:54Z

Did you had any time to take a look at this, @jvshahid? Performance of percentile calculation is starting to give me a little headache.

tisba · 2014-10-27T14:35:20Z

Just FYI

I've read the t-DIGESTS paper during my vacation multiple times and I think it's a great fit for InfluxDB. I will have to take a closer look at this algorithm for a project (most probably in Erlang), but I'm happy to share my remarks regarding a possible implementation in Go.

Why I think t-DIGESTS is a good fit:

it's very easy (and efficiently) parallelizable
precision (error) can be easily configured and there are very few parameters to configure tradeoffs (sample amount, compression ratio and size for digest data structures)

jvshahid · 2014-10-27T14:58:35Z

@tisba that would be great

seiflotfy · 2016-02-05T15:30:53Z

I am looking through the right place in the code to implement it.
Also I am considering adding some new SQL statement like in Redshift: SELECT APPROXIMATE DISTINCT(COUNT()) ... Which implements a hyperloglog++. Should be faster

beckettsean · 2016-03-18T22:35:43Z

As mentioned in my post to the mailing list we are experimenting with simplifying our open GitHub Issues. This feature request has been rolled into an aggregate issue for all function requests, so that we can close this issue until we are ready to work on it.

You may continue to make comments here. Closing the issue does not mean we are rejecting this idea.

jvshahid added this to the Next release milestone May 12, 2014

jvshahid modified the milestones: 0.8.4, Next release Oct 9, 2014

jvshahid added feature labels Oct 9, 2014

jvshahid modified the milestone: 0.8.4 Oct 21, 2014

toddboom added the idea label Nov 25, 2014

beckettsean removed idea labels Apr 21, 2015

beckettsean added this to the Next Point Release milestone May 15, 2015

beckettsean modified the milestones: Next Point Release, Longer term Aug 6, 2015

beckettsean added area/performance area/functions labels Sep 17, 2015

beckettsean added the kind/feature-request label Jan 5, 2016

beckettsean mentioned this issue Mar 7, 2016

[[feature collection]] requested Functions and query operators #5930

Open

32 tasks

beckettsean closed this as completed Mar 18, 2016

adamperlin mentioned this issue Dec 10, 2018

Implement Functions from InfluxQL and TICKscript influxdata/flux#430

Closed

45 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What about less accurate but more efficient percentile calculation? #513

What about less accurate but more efficient percentile calculation? #513

tisba commented May 7, 2014

jvshahid commented May 7, 2014

tisba commented Oct 8, 2014

tisba commented Oct 27, 2014

jvshahid commented Oct 27, 2014

seiflotfy commented Feb 5, 2016

beckettsean commented Mar 18, 2016

What about less accurate but more efficient percentile calculation? #513

What about less accurate but more efficient percentile calculation? #513

Comments

tisba commented May 7, 2014

jvshahid commented May 7, 2014

tisba commented Oct 8, 2014

tisba commented Oct 27, 2014

jvshahid commented Oct 27, 2014

seiflotfy commented Feb 5, 2016

beckettsean commented Mar 18, 2016