calculating percentile is expensive #136

komuw · 2022-09-26T12:38:45Z

The way we calculate percentile is expensive;

Lines 142 to 171 in 1c6a5fc

    
           // todo: refactor this and its dependents. 
        
           // currently they consume 9.04MB and 80ms as measured by the `BenchmarkAllMiddlewares` benchmark. 
        
           func (lq *latencyQueue) getP99(now time.Time, samplingPeriod time.Duration, minSampleSize int) (p99latency time.Duration) { 
        
           	lq.mu.Lock() 
        
           	defer lq.mu.Unlock() 
        
           	_hold := []latency{} 
        
           	for _, lat := range lq.sl { 
        
           		at := time.Unix(lat.at, 0).UTC() 
        
           		elapsed := now.Sub(at) 
        
           		if elapsed < 0 { 
        
           			// `at` is in the future. Ignore those values 
        
           			break 
        
           		} 
        
           		if elapsed <= samplingPeriod { 
        
           			// is the elapsed time within the samplingPeriod? 
        
           			_hold = append(_hold, lat) 
        
           		} 
        
           	} 
        
           	if len(_hold) < minSampleSize { 
        
           		// the number of requests in the last `samplingPeriod` seconds is less than 
        
           		// is neccessary to make a decision 
        
           		return 0 * time.Millisecond 
        
           	} 
        
           	return percentile(_hold, 99) 
        
           } 
        
           func percentile(N []latency, pctl float64) time.Duration {

currently they consume 9.04MB and 80ms as measured by the BenchmarkAllMiddlewares benchmark.

from statistics import quantiles

x=[1,2,3,4,5,6,7,8,9,10]
# NB: For meaningful results, the number of data points in data should be larger than n
nine_nine_percentile = quantiles(x, n=100)[98] # 99th percentile
10.89

# alternative
index = int( (99/100) * len(x) )
alternative_nine_nine_percentile = sorted(x)[index]
10

I'm making the assertion that the alternative will be faster. And since this are just statistics; not that inaccurate for our purposes.

The text was updated successfully, but these errors were encountered:

komuw · 2022-09-26T12:57:57Z

import random 
import statistics

def one(data):
    nine_nine = statistics.quantiles(data, n=100)[98]
    return nine_nine

def two(data):
    x = sorted(data)
    index = int( (99/100) * len(x) )
    nine_nine = x[index]
    return  nine_nine

data = range(1000)

one(data) # 989.99
two(data) # 990


def get_data():
    data = []
    for i in range(10_000):
        a = random.randint(1, 1000 + i )
        data.append(a)
    return data

data = get_data()
one(data) # 9508.96
two(data) # 9509

What: - Simplify loadshedding implementation Why: - Fixes: #136 go test -run=XXXX -bench=BenchmarkLoadShedder -count=1 github.com/komuw/ong/middleware benchstat old.txt new.txt name old time/op new time/op delta LoadShedder-8 5.13s ± 9% 5.04s ± 7% ~ (p=0.409 n=20+18) name old alloc/op new alloc/op delta LoadShedder-8 297kB ± 1% 61kB ± 1% -79.55% (p=0.000 n=20+16) name old allocs/op new allocs/op delta LoadShedder-8 1.73k ± 1% 0.91k ± 1% -47.41% (p=0.000 n=20+16)

komuw mentioned this issue Sep 26, 2022

issues/136: simplify loadshedding implementation #138

Merged

komuw closed this as completed in #138 Sep 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calculating percentile is expensive #136

calculating percentile is expensive #136

komuw commented Sep 26, 2022 •

edited

Loading

komuw commented Sep 26, 2022 •

edited

Loading

calculating percentile is expensive #136

calculating percentile is expensive #136

Comments

komuw commented Sep 26, 2022 • edited Loading

komuw commented Sep 26, 2022 • edited Loading

komuw commented Sep 26, 2022 •

edited

Loading

komuw commented Sep 26, 2022 •

edited

Loading