Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculating percentile is expensive #136

Closed
komuw opened this issue Sep 26, 2022 · 1 comment · Fixed by #138
Closed

calculating percentile is expensive #136

komuw opened this issue Sep 26, 2022 · 1 comment · Fixed by #138

Comments

@komuw
Copy link
Owner

komuw commented Sep 26, 2022

The way we calculate percentile is expensive;

ong/middleware/loadshed.go

Lines 142 to 171 in 1c6a5fc

// todo: refactor this and its dependents.
// currently they consume 9.04MB and 80ms as measured by the `BenchmarkAllMiddlewares` benchmark.
func (lq *latencyQueue) getP99(now time.Time, samplingPeriod time.Duration, minSampleSize int) (p99latency time.Duration) {
lq.mu.Lock()
defer lq.mu.Unlock()
_hold := []latency{}
for _, lat := range lq.sl {
at := time.Unix(lat.at, 0).UTC()
elapsed := now.Sub(at)
if elapsed < 0 {
// `at` is in the future. Ignore those values
break
}
if elapsed <= samplingPeriod {
// is the elapsed time within the samplingPeriod?
_hold = append(_hold, lat)
}
}
if len(_hold) < minSampleSize {
// the number of requests in the last `samplingPeriod` seconds is less than
// is neccessary to make a decision
return 0 * time.Millisecond
}
return percentile(_hold, 99)
}
func percentile(N []latency, pctl float64) time.Duration {

currently they consume 9.04MB and 80ms as measured by the BenchmarkAllMiddlewares benchmark.

from statistics import quantiles

x=[1,2,3,4,5,6,7,8,9,10]
# NB: For meaningful results, the number of data points in data should be larger than n
nine_nine_percentile = quantiles(x, n=100)[98] # 99th percentile
10.89

# alternative
index = int( (99/100) * len(x) )
alternative_nine_nine_percentile = sorted(x)[index]
10

I'm making the assertion that the alternative will be faster. And since this are just statistics; not that inaccurate for our purposes.

@komuw
Copy link
Owner Author

komuw commented Sep 26, 2022

import random 
import statistics

def one(data):
    nine_nine = statistics.quantiles(data, n=100)[98]
    return nine_nine

def two(data):
    x = sorted(data)
    index = int( (99/100) * len(x) )
    nine_nine = x[index]
    return  nine_nine

data = range(1000)

one(data) # 989.99
two(data) # 990


def get_data():
    data = []
    for i in range(10_000):
        a = random.randint(1, 1000 + i )
        data.append(a)
    return data

data = get_data()
one(data) # 9508.96
two(data) # 9509

komuw added a commit that referenced this issue Sep 26, 2022
What:
-  Simplify loadshedding implementation

Why:
- Fixes: #136

go test -run=XXXX -bench=BenchmarkLoadShedder -count=1 github.com/komuw/ong/middleware
benchstat old.txt new.txt         
                                                
name           old time/op    new time/op    delta
LoadShedder-8     5.13s ± 9%     5.04s ± 7%     ~     (p=0.409 n=20+18)

name           old alloc/op   new alloc/op   delta
LoadShedder-8     297kB ± 1%      61kB ± 1%  -79.55%  (p=0.000 n=20+16)

name           old allocs/op  new allocs/op  delta
LoadShedder-8     1.73k ± 1%     0.91k ± 1%  -47.41%  (p=0.000 n=20+16)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant