benchmark: CKMS slow + excessive memory #32

ljw1004 · 2021-11-10T14:13:17Z

I wrote a benchmark to test the performance of CKMS and GK.

Findings: CKMS error=0.0001 delivers better and faster results than the error=0.001 suggested in its doc-comment. However, CKMS doesn't have any "sweet spot" - over the entire range where CKMS is feasible, it's slower and more space-intensive than Gk and even than just blindly storing every single value. This is at odds with what I expected from the paper, and also with the claimed memory bounds, so I wonder if there's an implementation bug? (Also, if we can live with just P99, then "store the top 1% of values in a priority queue" is competitive up to 10M values!!)

Method: The benchmark does ckms./gk.insert(value) a number of times then obtains quantiles. I measured wall-time using std::time::Instant::now() / .elapsed(), and I measured heap memory with stats_alloc::Region::new(&GLOBAL) / .change().bytes_allocated - bytes_deallocated + bytes_reallocated. I ran it with cargo run --release on my Macbook. I tried with a normal distribution in the range -0.5 to 1.5, and a pareto distribution in the range 5.0 to 20.0. As a baseline, I added another algorithm "ALL" which keeps every single value in memory - this tells me "perfect" expected values of min/P50/P99/max to judge how accurate GK/CKMS are, and there's no justification in taking more memory than this!

VARYING "ERROR" PARAMETER... (1M values)
- error=0.1: ALL 4mb/0.01s, GK 1k/0.1s, CKMS 390mb/6.5s <- ckms and gk are inaccurate
- error=0.01: ALL 4mb/0.01s, GK 11k/0.1s, CKMS 1.1tb/5s <- ckms and gk are inaccurate
- error=0.001: ALL 4mb/0.01s, GK 95k/0.3s, CKMS 750mb/2s <- ckms p99 weak and max inaccurate
- error=0.0001: ALL 4mb/0.01s, GK 770k/3s, CKMS 240mb/2s <- ckms max inaccurate
- error=0.000_01: ALL 4mb/0.01s, GK 12mb/65s, CKMS 66mb/23s
- error=0.000_001: ALL 4mb/0.01s, GK too slow, CKMS 94mb/230s <-- gk too slow
VARYING NUMBER OF VALUES... (error_gk=0.001, error_ckms=0.0001)
- count=10k: ALL 40k/0s, GK 95k/0.006s, CKMS 748k/0.02s
- count=100k: ALL 400k/0.001s, GK 95k/0.04s, CKMS 8mb/0.2s
- count=1M: ALL 4mb/0.01s, GK 95k/0.3s, CKMS 240mb/2.5s
- count=10M: ALL 40mb/0.1s, GK 95k/3s, CKMS 8tb/40s
- count=100M: ALL 400mb/1s, GK 95k/30s, CKMS too slow <-- ckms too slow

The text was updated successfully, but these errors were encountered:

ljw1004 · 2021-11-10T14:13:34Z

Here's the benchmark source code.

[package]
name = "rf"
version = "0.1.0"
edition = "2018"

[dependencies]
quantiles = "0.7.1"
rand = "0.8.4"
rand_distr = "0.4.2"
ordered-float = "2.8.0"
rand_pcg = "0.3.1"
stats_alloc = "0.1.8"
tdigest = "0.2.2"
thousands = "0.2.0"

fn main() {
    test_counts();
    println!("\n\nHERE'S HOW WE SETTLED ON PARAMETERS");
    test_gk_and_cksm_params();
    test_digest_params();
}

#[allow(dead_code)]
fn test_counts() {
    let counts = vec![10_000, 100_000, 1_000_000, 10_000_000, 100_000_000];
    let ckms_error = 0.0001;
    let gk_error = 0.001;
    let tdigest_batch = 20_000;
    let tdigest_max_size = 200;
    let dn = rand_distr::Normal::new(0.5f64, 0.2f64).unwrap();
    let dp = rand_distr::Pareto::new(5f64, 10f64).unwrap();
    for count in counts {
        let mut a0 = NoAggregate::new();
        let mut am = MeanAggregate::new();
        let mut av = AllValues::new(count);
        let mut at = TopValues::new(count);
        let mut aq = QuantilesCKMS::new(ckms_error);
        let mut ag = QuantilesGK::new(gk_error);
        let mut ad = TDigestAg::new(tdigest_batch, tdigest_max_size);

        println!("\nCOUNT={}, GK_ERROR={}, CKMS_ERROR={}, TDIGEST_BATCH={}, TDIGEST_MAX_SIZE={}", count.separate_with_underscores(), gk_error, ckms_error, tdigest_batch.separate_with_underscores(), tdigest_max_size);
        println!("    NORMAL DISTRIBITION");
        test(count, &mut a0, dn);
        test(count, &mut am, dn);
        test(count, &mut av, dn);
        test(count, &mut at, dn);
        test(count, &mut ag, dn);
        if count < 50_000_000 {test(count, &mut aq, dn);}
        test(count, &mut ad, dn);
        println!("    PARETO DISTRIBUTION");
        test(count, &mut a0, dp);
        test(count, &mut am, dp);
        test(count, &mut av, dp);
        test(count, &mut at, dp);
        test(count, &mut ag, dp);
        if count < 50_000_000 {test(count, &mut aq, dp);}
        test(count, &mut ad, dp);
    }
}

#[allow(dead_code)]
fn test_digest_params() {
    let count = 10_000_000;
    let dp = rand_distr::Pareto::new(5f64, 10f64).unwrap();
    let dn = rand_distr::Normal::new(0.5f64, 0.2f64).unwrap();
    for max_size in [10, 100, 500, 1000, 5000] {
        let batch = 20_000;
        let mut av = AllValues::new(count);
        let mut at = TDigestAg::new(batch, max_size);
        println!("\nMAX_SIZE={}, BATCH={}, COUNT={}", max_size, batch.separate_with_underscores(), count.separate_with_underscores());
        println!("    NORMAL DISTRIBITION");
        test(count, &mut av, dn);
        test(count, &mut at, dn);
        println!("    PARETO DISTRIBUTION");
        test(count, &mut av, dp);
        test(count, &mut at, dp);
    }
    println!("");
    for batch in [100, 1000, 5000, 10_000, 20_000, 50_000, 100_000] {
        let max_size = 200;
        let mut av = AllValues::new(count);
        let mut at = TDigestAg::new(batch, max_size);
        println!("\nBATCH={}, MAX_SIZE={}, COUNT={}", batch.separate_with_underscores(), max_size, count.separate_with_underscores());
        println!("    NORMAL DISTRIBITION");
        test(count, &mut av, dn);
        test(count, &mut at, dn);
        println!("    PARETO DISTRIBUTION");
        test(count, &mut av, dp);
        test(count, &mut at, dp);
    }
}

#[allow(dead_code)]
fn test_gk_and_cksm_params() {
    let count = 1_000_000;
    let dp = rand_distr::Pareto::new(5f64, 10f64).unwrap();
    let dn = rand_distr::Normal::new(0.5f64, 0.2f64).unwrap();
    for error in [0.1, 0.01, 0.001, 0.0001, 0.000_01, 0.000_001] {
        let mut av = AllValues::new(count);
        let mut ag = QuantilesGK::new(error);
        let mut aq = QuantilesCKMS::new(error);
        println!("\nERROR={}, COUNT={}", error, count.separate_with_underscores());
        println!("    NORMAL DISTRIBITION");
        test(count, &mut av, dn);
        if error > 0.000005 {test(count, &mut ag, dn);}
        test(count, &mut aq, dn);
        println!("    PARETO DISTRIBUTION");
        test(count, &mut av, dp);
        if error > 0.000005 {test(count, &mut ag, dp);}
        test(count, &mut aq, dp);
    }
}

trait Aggregate {
    fn anew(&self) -> Self;
    fn insert(&mut self, value: f64);
    fn render(&mut self) -> String;
}

// INSTRUMENTED_SYSTEM is an instrumented instance of the system allocator
#[global_allocator]
static GLOBAL: &stats_alloc::StatsAlloc<std::alloc::System> = &stats_alloc::INSTRUMENTED_SYSTEM;

fn test<A: Aggregate, D: rand::distributions::Distribution<f64>>(
    count: usize,
    aggregate: &mut A,
    distribution: D,
) {
    let mut rng = rand_pcg::Pcg64::new(0xcafef00dd15ea5e5, 0xa02bdbf7bb3c0a7ac28fa16a64abf96);
    let start = std::time::Instant::now();
    let startmem = stats_alloc::Region::new(&GLOBAL);
    let mut aggregate = aggregate.anew();
    for _ in 0..count {
        let value = distribution.sample(&mut rng);
        aggregate.insert(value);
    }
    let insert_elapsed = start.elapsed().as_secs_f64();
    let start = std::time::Instant::now();
    let fmt = aggregate.render();
    let fmt_elapsed = start.elapsed().as_secs_f64();
    let mem = startmem.change();
    let bytes_change = mem.bytes_allocated as isize - mem.bytes_deallocated as isize + mem.bytes_reallocated;
    println!(
        "        {:.3}s+{:.3}s, {}k heap, {}k stack, {}",
        insert_elapsed,
        fmt_elapsed,
        bytes_change / 1024,
        std::mem::size_of_val(&aggregate) / 1024,
        fmt,
    );
}

struct NoAggregate {}
impl NoAggregate {
    fn new() -> Self {
        Self {}
    }
}
impl Aggregate for NoAggregate {
    fn insert(&mut self, _value: f64) {}
    fn anew(&self) -> NoAggregate { Self::new() }
    fn render(&mut self) -> String {
        format!("NoAggregate")
    }
}

#[derive(Default)]
struct MeanAggregate {
    min: Option<f64>,
    max: Option<f64>,
    mean: f64,
    variance_sum: f64,
    count: usize,
}
impl MeanAggregate {
    fn new() -> Self {
        std::default::Default::default()
    }
}
impl Aggregate for MeanAggregate {
    fn render(&mut self) -> String {
        format!(
            "MeanAggregate, min={:.4}, mean={:.4} (stdev {:.4}),  max={:.4}",
            self.min.unwrap(),
            self.mean,
            (self.variance_sum / self.count as f64).sqrt(),
            self.max.unwrap(),
        )
    }

    fn anew(&self) -> Self { MeanAggregate::new() }
    fn insert(&mut self, value: f64) {
        match self.min {
            None => self.min = Some(value),
            Some(min) if value < min => self.min = Some(value),
            _ => {}
        }
        match self.max {
            None => self.max = Some(value),
            Some(max) if value > max => self.max = Some(value),
            _ => {}
        }
        self.count += 1;
        let new_mean = self.mean + (value - self.mean) / self.count as f64;
        self.variance_sum += (value - self.mean) * (value - new_mean);
        self.mean = new_mean;
    }
}

struct AllValues {
    values: Vec<f32>,
}
impl AllValues {
    fn new(count: usize) -> Self {
        let values = Vec::with_capacity(count);
        AllValues { values }
    }
}
impl Aggregate for AllValues {
    fn render(&mut self) -> String {
        self.values.sort_by(|a, b| a.partial_cmp(b).unwrap());
        let len = self.values.len();
        format!(
            "AllValues, min={:.4}, P50={:.4}, P99={:.4}, max={:.4}",
            self.values[0],
            self.values[len / 2],
            self.values[len * 99 / 100],
            self.values[len - 1],
        )
    }

    fn anew(&self) -> Self { Self::new(self.values.capacity()) }
    fn insert(&mut self, value: f64) {
        self.values.push(value as f32);
    }
}

struct TopValues {
    count: usize,
    values: std::collections::BinaryHeap<std::cmp::Reverse<ordered_float::NotNan<f32>>>,
}
impl TopValues {
    fn new(count: usize) -> Self {
        let capacity = std::cmp::max(count / 100, 1);
        let values = std::collections::BinaryHeap::with_capacity(capacity);
        TopValues { count, values }
    }
}
impl Aggregate for TopValues {
    fn render(&mut self) -> String {
        let p99 = self.values.peek().unwrap().0;
        let max = self.values.drain().min().unwrap().0;
        format!("TopValues, p99={:.4}, max={:.4}", p99, max)
    }
    fn anew(&self) -> Self { Self::new(self.count) }
    fn insert(&mut self, value: f64) {
        let value = value as f32;
        let value = std::cmp::Reverse(unsafe { ordered_float::NotNan::new_unchecked(value) });

        if self.values.len() < self.values.capacity() {
            self.values.push(value);
        } else if self.values.peek().unwrap().0 < value.0 {
            self.values.pop();
            self.values.push(value);
        } else {
        }
    }
}

struct QuantilesCKMS {
    error: f64,
    q: quantiles::ckms::CKMS<f64>,
}
impl QuantilesCKMS {
    fn new(error: f64) -> Self {
        let q = quantiles::ckms::CKMS::new(error);
        QuantilesCKMS { error, q }
    }
}
impl Aggregate for QuantilesCKMS {
    fn render(&mut self) -> String {
        format!(
            "QuantilesCKMS, min={:.4}, mean={:.4}, p50={:.4}, P99={:.4}, max={:.4}",
            self.q.query(0.0).unwrap().1,
            self.q.cma().unwrap(),
            self.q.query(0.5).unwrap().1,
            self.q.query(0.99).unwrap().1,
            self.q.query(1.0).unwrap().1,
        )
    }
    fn anew(&self) -> Self {Self::new(self.error) }
    fn insert(&mut self, value: f64) {
        self.q.insert(value);
    }
}

struct QuantilesGK {
    error: f64,
    q: quantiles::greenwald_khanna::Stream<ordered_float::NotNan<f64>>,
}
impl QuantilesGK {
    fn new(error: f64) -> Self {
        let q = quantiles::greenwald_khanna::Stream::new(error);
        QuantilesGK { q, error }
    }
}
impl Aggregate for QuantilesGK {
    fn render(&mut self) -> String {
        format!(
            "QuantilesGK, min={:.4}, p50={:.4}, P99={:.4}, max={:.4}",
            self.q.quantile(0.0),
            self.q.quantile(0.5),
            self.q.quantile(0.99),
            self.q.quantile(1.0),
        )
    }
    fn anew(&self) -> Self {
        Self::new(self.error)
    }
    fn insert(&mut self, value: f64) {
        let value = unsafe { ordered_float::NotNan::new_unchecked(value) };
        self.q.insert(value);
    }
}

struct TDigestAg {
    batch: Vec<f64>,
    t : tdigest::TDigest,
}
impl TDigestAg {
    fn new(batch: usize, max_size: usize) -> Self { 
        let batch = Vec::with_capacity(batch);
        let t = tdigest::TDigest::new_with_size(max_size);
        TDigestAg { batch, t}
    }
    fn merge(&mut self) {
        let capacity = self.batch.capacity();
        let prev = std::mem::replace(&mut self.batch, Vec::with_capacity(capacity));
        self.t = self.t.merge_unsorted(prev);
        self.batch.clear();
    }
}
impl Aggregate for TDigestAg {
    fn render(&mut self) -> String {
        self.merge();
        format!("TDigest, min={:.4}, mean={:.4}, P50={:.4}, P99={:.4}, max={:.4}",
        self.t.min(),
        self.t.mean(),
        self.t.estimate_quantile(0.5),
        self.t.estimate_quantile(0.99),
        self.t.max(),
    )
    }
    fn anew(&self) -> Self { Self::new(self.batch.capacity(), self.t.max_size()) }
    fn insert(&mut self, value: f64) {
        if self.batch.len() == self.batch.capacity() {
            self.merge();
        }
        self.batch.push(value);
    }
}

ljw1004 · 2021-11-10T14:24:40Z

Here are the raw results from the benchmark on my Macbook.

How do the algorithms scale with number of values?

COUNT=10_000, GK_ERROR=0.001, CKMS_ERROR=0.0001, TDIGEST_BATCH=20_000, TDIGEST_MAX_SIZE=200
- NORMAL DISTRIBITION
  - 0.000s+0.000s, 0k heap, 0k stack, NoAggregate
  - 0.000s+0.000s, 0k heap, 0k stack, MeanAggregate, min=-0.2296, mean=0.5025 (stdev 0.2014), max=1.1950
  - 0.000s+0.001s, 39k heap, 0k stack, AllValues, min=-0.2296, P50=0.5025, P99=0.9703, max=1.1950
  - 0.000s+0.000s, 0k heap, 0k stack, TopValues, p99=0.9703, max=1.1950
  - 0.004s+0.000s, 95k heap, 0k stack, QuantilesGK, min=-0.2148, p50=0.5028, P99=0.9684, max=1.1950
  - 0.017s+0.000s, 748k heap, 0k stack, QuantilesCKMS, min=-0.2296, mean=0.5025, p50=0.5025, P99=0.9692, max=1.1950
  - 0.000s+0.001s, 159k heap, 0k stack, TDigest, min=-0.2296, mean=0.5025, P50=0.5023, P99=0.9690, max=1.1950
- PARETO DISTRIBUTION
  - 0.000s+0.000s, 0k heap, 0k stack, NoAggregate
  - 0.000s+0.000s, 0k heap, 0k stack, MeanAggregate, min=5.0001, mean=5.5513 (stdev 0.6114), max=12.6260
  - 0.000s+0.001s, 39k heap, 0k stack, AllValues, min=5.0001, P50=5.3559, P99=7.9406, max=12.6260
  - 0.000s+0.000s, 0k heap, 0k stack, TopValues, p99=7.9406, max=12.6260
  - 0.005s+0.000s, 95k heap, 0k stack, QuantilesGK, min=5.0005, p50=5.3568, P99=7.9528, max=12.6260
  - 0.019s+0.000s, 771k heap, 0k stack, QuantilesCKMS, min=5.0001, mean=5.5513, p50=5.3558, P99=7.9353, max=12.6260
  - 0.000s+0.001s, 159k heap, 0k stack, TDigest, min=5.0001, mean=5.5513, P50=5.3560, P99=7.9415, max=12.6260
COUNT=100_000, GK_ERROR=0.001, CKMS_ERROR=0.0001, TDIGEST_BATCH=20_000, TDIGEST_MAX_SIZE=200
- NORMAL DISTRIBITION
  - 0.001s+0.000s, 0k heap, 0k stack, NoAggregate
  - 0.001s+0.000s, 0k heap, 0k stack, MeanAggregate, min=-0.4538, mean=0.5010 (stdev 0.1999), max=1.3948
  - 0.001s+0.010s, 390k heap, 0k stack, AllValues, min=-0.4538, P50=0.5011, P99=0.9645, max=1.3948
  - 0.001s+0.000s, 3k heap, 0k stack, TopValues, p99=0.9645, max=1.3948
  - 0.039s+0.000s, 95k heap, 0k stack, QuantilesGK, min=-0.4538, p50=0.5009, P99=0.9684, max=1.3948
  - 0.247s+0.002s, 8520k heap, 0k stack, QuantilesCKMS, min=-0.4538, mean=0.5010, p50=0.5011, P99=0.9639, max=1.3948
  - 0.006s+0.001s, 160k heap, 0k stack, TDigest, min=-0.4538, mean=0.5010, P50=0.5012, P99=0.9631, max=1.3948
- PARETO DISTRIBUTION
  - 0.000s+0.000s, 0k heap, 0k stack, NoAggregate
  - 0.003s+0.000s, 0k heap, 0k stack, MeanAggregate, min=5.0000, mean=5.5536 (stdev 0.6211), max=15.3146
  - 0.002s+0.009s, 390k heap, 0k stack, AllValues, min=5.0000, P50=5.3564, P99=7.9297, max=15.3146
  - 0.004s+0.000s, 3k heap, 0k stack, TopValues, p99=7.9297, max=15.3146
  - 0.043s+0.000s, 95k heap, 0k stack, QuantilesGK, min=5.0005, p50=5.3568, P99=7.9528, max=15.3146
  - 0.256s+0.002s, 7967k heap, 0k stack, QuantilesCKMS, min=5.0000, mean=5.5536, p50=5.3565, P99=7.9353, max=15.3146
  - 0.007s+0.001s, 160k heap, 0k stack, TDigest, min=5.0000, mean=5.5536, P50=5.3563, P99=7.9369, max=15.3146
COUNT=1_000_000, GK_ERROR=0.001, CKMS_ERROR=0.0001, TDIGEST_BATCH=20_000, TDIGEST_MAX_SIZE=200
- NORMAL DISTRIBITION
  - 0.006s+0.000s, 0k heap, 0k stack, NoAggregate
  - 0.009s+0.000s, 0k heap, 0k stack, MeanAggregate, min=-0.4963, mean=0.5002 (stdev 0.2000), max=1.4952
  - 0.008s+0.111s, 3906k heap, 0k stack, AllValues, min=-0.4963, P50=0.5001, P99=0.9653, max=1.4952
  - 0.012s+0.000s, 39k heap, 0k stack, TopValues, p99=0.9653, max=1.4952
  - 0.334s+0.000s, 95k heap, 0k stack, QuantilesGK, min=-0.4963, p50=0.5004, P99=0.9684, max=1.4952
  - 2.469s+0.049s, 240590k heap, 0k stack, QuantilesCKMS, min=-0.4963, mean=0.5002, p50=0.5001, P99=0.9651, max=1.2733
  - 0.063s+0.001s, 168k heap, 0k stack, TDigest, min=-0.4963, mean=0.5002, P50=0.5002, P99=0.9645, max=1.4952
- PARETO DISTRIBUTION
  - 0.000s+0.000s, 0k heap, 0k stack, NoAggregate
  - 0.025s+0.000s, 0k heap, 0k stack, MeanAggregate, min=5.0000, mean=5.5552 (stdev 0.6208), max=18.0523
  - 0.024s+0.104s, 3906k heap, 0k stack, AllValues, min=5.0000, P50=5.3585, P99=7.9284, max=18.0523
  - 0.030s+0.000s, 39k heap, 0k stack, TopValues, p99=7.9284, max=18.0523
  - 0.336s+0.000s, 95k heap, 0k stack, QuantilesGK, min=5.0005, p50=5.3592, P99=7.9528, max=18.0523
  - 2.654s+0.055s, 251135k heap, 0k stack, QuantilesCKMS, min=5.0000, mean=5.5552, p50=5.3585, P99=7.9254, max=13.0034
  - 0.080s+0.001s, 168k heap, 0k stack, TDigest, min=5.0000, mean=5.5552, P50=5.3584, P99=7.9292, max=18.0523
COUNT=10_000_000, GK_ERROR=0.001, CKMS_ERROR=0.0001, TDIGEST_BATCH=20_000, TDIGEST_MAX_SIZE=200
- NORMAL DISTRIBITION
  - 0.050s+0.000s, 0k heap, 0k stack, NoAggregate
  - 0.083s+0.000s, 0k heap, 0k stack, MeanAggregate, min=-0.6524, mean=0.5000 (stdev 0.2000), max=1.4952
  - 0.078s+1.301s, 39063k heap, 0k stack, AllValues, min=-0.6524, P50=0.5000, P99=0.9655, max=1.4952
  - 0.132s+0.000s, 390k heap, 0k stack, TopValues, p99=0.9655, max=1.4952
  - 3.117s+0.000s, 95k heap, 0k stack, QuantilesGK, min=-0.6524, p50=0.5004, P99=0.9684, max=1.4952
  - 43.448s+1.681s, 8418141k heap, 0k stack, QuantilesCKMS, min=-0.6524, mean=0.5000, p50=0.5000, P99=0.9649, max=1.2744
  - 0.616s+0.001s, 253k heap, 0k stack, TDigest, min=-0.6524, mean=0.5000, P50=0.4997, P99=0.9655, max=1.4952
- PARETO DISTRIBUTION
  - 0.000s+0.000s, 0k heap, 0k stack, NoAggregate
  - 0.261s+0.000s, 0k heap, 0k stack, MeanAggregate, min=5.0000, mean=5.5554 (stdev 0.6207), max=26.5894
  - 0.237s+1.287s, 39063k heap, 0k stack, AllValues, min=5.0000, P50=5.3588, P99=7.9237, max=26.5894
  - 0.378s+0.000s, 390k heap, 0k stack, TopValues, p99=7.9237, max=26.5894
  - 3.604s+0.000s, 95k heap, 0k stack, QuantilesGK, min=5.0005, p50=5.3592, P99=7.9528, max=26.5894
  - 43.045s+1.739s, 8170118k heap, 0k stack, QuantilesCKMS, min=5.0000, mean=5.5554, p50=5.3588, P99=7.9187, max=12.9397
  - 0.771s+0.001s, 253k heap, 0k stack, TDigest, min=5.0000, mean=5.5554, P50=5.3582, P99=7.9199, max=26.5894
COUNT=100_000_000, GK_ERROR=0.001, CKMS_ERROR=0.0001, TDIGEST_BATCH=20_000, TDIGEST_MAX_SIZE=200
- NORMAL DISTRIBITION
  - 0.461s+0.000s, 0k heap, 0k stack, NoAggregate
  - 0.785s+0.000s, 0k heap, 0k stack, MeanAggregate, min=-0.6524, mean=0.5000 (stdev 0.2000), max=1.6590
  - 0.668s+13.481s, 390625k heap, 0k stack, AllValues, min=-0.6524, P50=0.5000, P99=0.9653, max=1.6590
  - 1.540s+0.002s, 3906k heap, 0k stack, TopValues, p99=0.9653, max=1.6590
  - 30.659s+0.000s, 95k heap, 0k stack, QuantilesGK, min=-0.6524, p50=0.5004, P99=0.9684, max=1.6590
  - 6.526s+0.001s, 1096k heap, 0k stack, TDigest, min=-0.6524, mean=0.5000, P50=0.4997, P99=0.9642, max=1.6590
- PARETO DISTRIBUTION
  - 0.000s+0.000s, 0k heap, 0k stack, NoAggregate
  - 2.462s+0.000s, 0k heap, 0k stack, MeanAggregate, min=5.0000, mean=5.5556 (stdev 0.6212), max=28.6652
  - 2.075s+13.934s, 390625k heap, 0k stack, AllValues, min=5.0000, P50=5.3589, P99=7.9248, max=28.6652
  - 3.160s+0.001s, 3906k heap, 0k stack, TopValues, p99=7.9248, max=28.6652
  - 33.288s+0.000s, 95k heap, 0k stack, QuantilesGK, min=5.0005, p50=5.3592, P99=7.9528, max=28.6652
  - 7.380s+0.001s, 1096k heap, 0k stack, TDigest, min=5.0000, mean=5.5556, P50=5.3596, P99=7.9236, max=28.6652

How did we settle on "error" parameter for GK/CKMS?

ERROR=0.1, COUNT=1_000_000
- NORMAL DISTRIBITION
  - 0.006s+0.106s, 3906k heap, 0k stack, AllValues, min=-0.4963, P50=0.5001, P99=0.9653, max=1.4952
  - 0.118s+0.000s, 1k heap, 0k stack, QuantilesGK, min=0.2029, p50=0.5300, P99=1.4952, max=1.4952
  - 7.334s+0.000s, 390787k heap, 0k stack, QuantilesCKMS, min=-0.4963, mean=0.5002, p50=0.4784, P99=0.7855, max=0.9930
- PARETO DISTRIBUTION
  - 0.023s+0.098s, 3906k heap, 0k stack, AllValues, min=5.0000, P50=5.3585, P99=7.9284, max=18.0523
  - 0.128s+0.000s, 1k heap, 0k stack, QuantilesGK, min=5.0000, p50=5.3925, P99=18.0523, max=18.0523
  - 6.658s+0.000s, 318529k heap, 0k stack, QuantilesCKMS, min=5.0000, mean=5.5552, p50=5.3134, P99=6.6236, max=6.6236
ERROR=0.01, COUNT=1_000_000
- NORMAL DISTRIBITION
  - 0.007s+0.103s, 3906k heap, 0k stack, AllValues, min=-0.4963, P50=0.5001, P99=0.9653, max=1.4952
  - 0.148s+0.000s, 11k heap, 0k stack, QuantilesGK, min=-0.4963, p50=0.4996, P99=1.4952, max=1.4952
  - 5.554s+0.006s, 1152568k heap, 0k stack, QuantilesCKMS, min=-0.4963, mean=0.5002, p50=0.4979, P99=0.9232, max=0.9689
- PARETO DISTRIBUTION
  - 0.025s+0.106s, 3906k heap, 0k stack, AllValues, min=5.0000, P50=5.3585, P99=7.9284, max=18.0523
  - 0.179s+0.000s, 11k heap, 0k stack, QuantilesGK, min=5.0000, p50=5.3610, P99=18.0523, max=18.0523
  - 5.661s+0.005s, 1229721k heap, 0k stack, QuantilesCKMS, min=5.0000, mean=5.5552, p50=5.3564, P99=7.4251, max=7.9711
ERROR=0.001, COUNT=1_000_000
- NORMAL DISTRIBITION
  - 0.008s+0.106s, 3906k heap, 0k stack, AllValues, min=-0.4963, P50=0.5001, P99=0.9653, max=1.4952
  - 0.310s+0.000s, 95k heap, 0k stack, QuantilesGK, min=-0.4963, p50=0.5004, P99=0.9684, max=1.4952
  - 2.133s+0.018s, 798287k heap, 0k stack, QuantilesCKMS, min=-0.4963, mean=0.5002, p50=0.5001, P99=0.9599, max=1.1207
- PARETO DISTRIBUTION
  - 0.023s+0.099s, 3906k heap, 0k stack, AllValues, min=5.0000, P50=5.3585, P99=7.9284, max=18.0523
  - 0.327s+0.000s, 95k heap, 0k stack, QuantilesGK, min=5.0005, p50=5.3592, P99=7.9528, max=18.0523
  - 2.314s+0.020s, 719869k heap, 0k stack, QuantilesCKMS, min=5.0000, mean=5.5552, p50=5.3586, P99=7.8787, max=10.7186
ERROR=0.0001, COUNT=1_000_000
- NORMAL DISTRIBITION
  - 0.007s+0.103s, 3906k heap, 0k stack, AllValues, min=-0.4963, P50=0.5001, P99=0.9653, max=1.4952
  - 3.497s+0.000s, 767k heap, 0k stack, QuantilesGK, min=-0.2888, p50=0.5002, P99=0.9656, max=1.4952
  - 2.586s+0.052s, 240590k heap, 0k stack, QuantilesCKMS, min=-0.4963, mean=0.5002, p50=0.5001, P99=0.9651, max=1.2733
- PARETO DISTRIBUTION
  - 0.025s+0.107s, 3906k heap, 0k stack, AllValues, min=5.0000, P50=5.3585, P99=7.9284, max=18.0523
  - 3.548s+0.000s, 767k heap, 0k stack, QuantilesGK, min=5.0000, p50=5.3585, P99=7.9353, max=18.0523
  - 2.680s+0.050s, 251135k heap, 0k stack, QuantilesCKMS, min=5.0000, mean=5.5552, p50=5.3585, P99=7.9254, max=13.0034
ERROR=0.00001, COUNT=1_000_000
- NORMAL DISTRIBITION
  - 0.007s+0.102s, 3906k heap, 0k stack, AllValues, min=-0.4963, P50=0.5001, P99=0.9653, max=1.4952
  - 65.390s+0.000s, 12287k heap, 0k stack, QuantilesGK, min=-0.4963, p50=0.5001, P99=0.9653, max=1.4952
  - 23.766s+0.173s, 66786k heap, 0k stack, QuantilesCKMS, min=-0.4963, mean=0.5002, p50=0.5001, P99=0.9652, max=1.4952
- PARETO DISTRIBUTION
  - 0.025s+0.111s, 3906k heap, 0k stack, AllValues, min=5.0000, P50=5.3585, P99=7.9284, max=18.0523
  - 65.391s+0.000s, 12287k heap, 0k stack, QuantilesGK, min=5.0000, p50=5.3585, P99=7.9293, max=18.0523
  - 24.265s+0.188s, 67604k heap, 0k stack, QuantilesCKMS, min=5.0000, mean=5.5552, p50=5.3585, P99=7.9293, max=18.0523
ERROR=0.000001, COUNT=1_000_000
- NORMAL DISTRIBITION
  - 0.007s+0.104s, 3906k heap, 0k stack, AllValues, min=-0.4963, P50=0.5001, P99=0.9653, max=1.4952
  - 252.283s+0.566s, 94079k heap, 0k stack, QuantilesCKMS, min=-0.4963, mean=0.5002, p50=0.5001, P99=0.9653, max=1.4952
- PARETO DISTRIBUTION
  - 0.023s+0.105s, 3906k heap, 0k stack, AllValues, min=5.0000, P50=5.3585, P99=7.9284, max=18.0523
  - 254.217s+0.551s, 94016k heap, 0k stack, QuantilesCKMS, min=5.0000, mean=5.5552, p50=5.3585, P99=7.9284, max=18.0523

How did we settle on "batch" and "max-size" parameters for TDigest?

MAX_SIZE=10, BATCH=20_000, COUNT=10_000_000
- NORMAL DISTRIBITION
  - 0.060s+1.390s, 39063k heap, 0k stack, AllValues, min=-0.6524, P50=0.5000, P99=0.9655, max=1.4952
  - 0.652s+0.001s, 250k heap, 0k stack, TDigest, min=-0.6524, mean=0.5000, P50=0.5004, P99=0.9844, max=1.4952
- PARETO DISTRIBUTION
  - 0.242s+1.294s, 39063k heap, 0k stack, AllValues, min=5.0000, P50=5.3588, P99=7.9237, max=26.5894
  - 0.786s+0.001s, 250k heap, 0k stack, TDigest, min=5.0000, mean=5.5554, P50=5.3515, P99=8.2127, max=26.5894
MAX_SIZE=100, BATCH=20_000, COUNT=10_000_000
- NORMAL DISTRIBITION
  - 0.063s+1.322s, 39063k heap, 0k stack, AllValues, min=-0.6524, P50=0.5000, P99=0.9655, max=1.4952
  - 0.689s+0.001s, 251k heap, 0k stack, TDigest, min=-0.6524, mean=0.5000, P50=0.5000, P99=0.9649, max=1.4952
- PARETO DISTRIBUTION
  - 0.225s+1.191s, 39063k heap, 0k stack, AllValues, min=5.0000, P50=5.3588, P99=7.9237, max=26.5894
  - 0.721s+0.001s, 251k heap, 0k stack, TDigest, min=5.0000, mean=5.5554, P50=5.3588, P99=7.9328, max=26.5894
MAX_SIZE=500, BATCH=20_000, COUNT=10_000_000
- NORMAL DISTRIBITION
  - 0.057s+1.229s, 39063k heap, 0k stack, AllValues, min=-0.6524, P50=0.5000, P99=0.9655, max=1.4952
  - 0.616s+0.001s, 257k heap, 0k stack, TDigest, min=-0.6524, mean=0.5000, P50=0.5000, P99=0.9657, max=1.4952
- PARETO DISTRIBUTION
  - 0.204s+1.179s, 39063k heap, 0k stack, AllValues, min=5.0000, P50=5.3588, P99=7.9237, max=26.5894
  - 0.732s+0.001s, 257k heap, 0k stack, TDigest, min=5.0000, mean=5.5554, P50=5.3589, P99=7.9221, max=26.5894
MAX_SIZE=1000, BATCH=20_000, COUNT=10_000_000
- NORMAL DISTRIBITION
  - 0.057s+1.208s, 39063k heap, 0k stack, AllValues, min=-0.6524, P50=0.5000, P99=0.9655, max=1.4952
  - 0.601s+0.002s, 265k heap, 0k stack, TDigest, min=-0.6524, mean=0.5000, P50=0.5000, P99=0.9655, max=1.4952
- PARETO DISTRIBUTION
  - 0.210s+1.181s, 39063k heap, 0k stack, AllValues, min=5.0000, P50=5.3588, P99=7.9237, max=26.5894
  - 0.751s+0.001s, 265k heap, 0k stack, TDigest, min=5.0000, mean=5.5554, P50=5.3588, P99=7.9230, max=26.5894
MAX_SIZE=5000, BATCH=20_000, COUNT=10_000_000
- NORMAL DISTRIBITION
  - 0.062s+1.201s, 39063k heap, 0k stack, AllValues, min=-0.6524, P50=0.5000, P99=0.9655, max=1.4952
  - 0.646s+0.001s, 314k heap, 0k stack, TDigest, min=-0.6524, mean=0.5000, P50=0.5000, P99=0.9656, max=1.4952
- PARETO DISTRIBUTION
  - 0.209s+1.214s, 39063k heap, 0k stack, AllValues, min=5.0000, P50=5.3588, P99=7.9237, max=26.5894
  - 0.828s+0.001s, 315k heap, 0k stack, TDigest, min=5.0000, mean=5.5554, P50=5.3588, P99=7.9235, max=26.5894
BATCH=100, MAX_SIZE=200, COUNT=10_000_000
- NORMAL DISTRIBITION
  - 0.062s+1.204s, 39063k heap, 0k stack, AllValues, min=-0.6524, P50=0.5000, P99=0.9655, max=1.4952
  - 0.638s+0.000s, 0k heap, 0k stack, TDigest, min=-0.6524, mean=0.5000, P50=0.4996, P99=0.9657, max=1.4952
- PARETO DISTRIBUTION
  - 0.213s+1.199s, 39063k heap, 0k stack, AllValues, min=5.0000, P50=5.3588, P99=7.9237, max=26.5894
  - 0.776s+0.000s, 0k heap, 0k stack, TDigest, min=5.0000, mean=5.5554, P50=5.3588, P99=7.9290, max=26.5894
BATCH=1_000, MAX_SIZE=200, COUNT=10_000_000
- NORMAL DISTRIBITION
  - 0.057s+1.315s, 39063k heap, 0k stack, AllValues, min=-0.6524, P50=0.5000, P99=0.9655, max=1.4952
  - 0.509s+0.000s, 635k heap, 0k stack, TDigest, min=-0.6524, mean=0.5000, P50=0.5003, P99=0.9658, max=1.4952
- PARETO DISTRIBUTION
  - 0.214s+1.272s, 39063k heap, 0k stack, AllValues, min=5.0000, P50=5.3588, P99=7.9237, max=26.5894
  - 0.652s+0.000s, 635k heap, 0k stack, TDigest, min=5.0000, mean=5.5554, P50=5.3583, P99=7.9290, max=26.5894
BATCH=5_000, MAX_SIZE=200, COUNT=10_000_000
- NORMAL DISTRIBITION
  - 0.059s+1.145s, 39063k heap, 0k stack, AllValues, min=-0.6524, P50=0.5000, P99=0.9655, max=1.4952
  - 0.515s+0.000s, 417k heap, 0k stack, TDigest, min=-0.6524, mean=0.5000, P50=0.5000, P99=0.9646, max=1.4952
- PARETO DISTRIBUTION
  - 0.261s+1.167s, 39063k heap, 0k stack, AllValues, min=5.0000, P50=5.3588, P99=7.9237, max=26.5894
  - 0.671s+0.000s, 417k heap, 0k stack, TDigest, min=5.0000, mean=5.5554, P50=5.3588, P99=7.9242, max=26.5894
BATCH=10_000, MAX_SIZE=200, COUNT=10_000_000
- NORMAL DISTRIBITION
  - 0.061s+1.166s, 39063k heap, 0k stack, AllValues, min=-0.6524, P50=0.5000, P99=0.9655, max=1.4952
  - 0.602s+0.001s, 268k heap, 0k stack, TDigest, min=-0.6524, mean=0.5000, P50=0.5001, P99=0.9647, max=1.4952
- PARETO DISTRIBUTION
  - 0.231s+1.297s, 39063k heap, 0k stack, AllValues, min=5.0000, P50=5.3588, P99=7.9237, max=26.5894
  - 0.810s+0.001s, 268k heap, 0k stack, TDigest, min=5.0000, mean=5.5554, P50=5.3591, P99=7.9114, max=26.5894
BATCH=20_000, MAX_SIZE=200, COUNT=10_000_000
- NORMAL DISTRIBITION
  - 0.063s+1.334s, 39063k heap, 0k stack, AllValues, min=-0.6524, P50=0.5000, P99=0.9655, max=1.4952
  - 0.683s+0.001s, 253k heap, 0k stack, TDigest, min=-0.6524, mean=0.5000, P50=0.4997, P99=0.9655, max=1.4952
- PARETO DISTRIBUTION
  - 0.218s+1.243s, 39063k heap, 0k stack, AllValues, min=5.0000, P50=5.3588, P99=7.9237, max=26.5894
  - 0.804s+0.001s, 253k heap, 0k stack, TDigest, min=5.0000, mean=5.5554, P50=5.3582, P99=7.9199, max=26.5894
BATCH=50_000, MAX_SIZE=200, COUNT=10_000_000
- NORMAL DISTRIBITION
  - 0.062s+1.191s, 39063k heap, 0k stack, AllValues, min=-0.6524, P50=0.5000, P99=0.9655, max=1.4952
  - 0.630s+0.003s, 431k heap, 0k stack, TDigest, min=-0.6524, mean=0.5000, P50=0.4998, P99=0.9659, max=1.4952
- PARETO DISTRIBUTION
  - 0.202s+1.141s, 39063k heap, 0k stack, AllValues, min=5.0000, P50=5.3588, P99=7.9237, max=26.5894
  - 0.772s+0.003s, 431k heap, 0k stack, TDigest, min=5.0000, mean=5.5554, P50=5.3578, P99=7.9251, max=26.5894
BATCH=100_000, MAX_SIZE=200, COUNT=10_000_000
- NORMAL DISTRIBITION
  - 0.057s+1.153s, 39063k heap, 0k stack, AllValues, min=-0.6524, P50=0.5000, P99=0.9655, max=1.4952
  - 0.667s+0.006s, 803k heap, 0k stack, TDigest, min=-0.6524, mean=0.5000, P50=0.5000, P99=0.9655, max=1.4952
- PARETO DISTRIBUTION
  - 0.202s+1.148s, 39063k heap, 0k stack, AllValues, min=5.0000, P50=5.3588, P99=7.9237, max=26.5894
  - 0.810s+0.006s, 803k heap, 0k stack, TDigest, min=5.0000, mean=5.5554, P50=5.3588, P99=7.8829, max=26.5894

ljw1004 mentioned this issue Nov 10, 2021

Proposal: literature review suggests ZhangWang algorithm #33

Open

PSeitz mentioned this issue Apr 3, 2023

percentiles metric aggregation quickwit-oss/tantivy#1763

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark: CKMS slow + excessive memory #32

benchmark: CKMS slow + excessive memory #32

ljw1004 commented Nov 10, 2021

ljw1004 commented Nov 10, 2021 •

edited

Loading

ljw1004 commented Nov 10, 2021 •

edited

Loading

benchmark: CKMS slow + excessive memory #32

benchmark: CKMS slow + excessive memory #32

Comments

ljw1004 commented Nov 10, 2021

ljw1004 commented Nov 10, 2021 • edited Loading

ljw1004 commented Nov 10, 2021 • edited Loading

How do the algorithms scale with number of values?

How did we settle on "error" parameter for GK/CKMS?

How did we settle on "batch" and "max-size" parameters for TDigest?

ljw1004 commented Nov 10, 2021 •

edited

Loading

ljw1004 commented Nov 10, 2021 •

edited

Loading