feat: PeakEWMA load balancer with response success rate #2274

jizhuozhi · 2023-03-29T18:10:03Z

Issues associated with this PR

Solutions

Forked from #2253. But two differents.

Duration with EWMA

The first is that using the two level implementation of an EWMA to calculate duration characteristics. The first level will be counted according to the second time interval to get the arithmetic mean, and then decay it through the exponential moving weighted average (EWMA).

Success Rate with EWMA

The second is that additional success rate metrics added. It's for fail-fast scenario.

For example, in a mixed deployment scenario, 1C, 2C, 4C, etc. exist at the same time. The slot-based concurrency control algorithm will manage the remaining available cores. When the core 1C server is overloaded and fail-fast, the load balancing based on response time Algorithms may mistakenly think this is best instead of considering servers with 2C or 4C (since they have a lot of active connections). But if we factor the response success rate into the calculation, we can avoid this from happening, because it knows that it is due to fast failure rather than it is really fast.

The success rate also using two-level EWMA to achieve the success rate of recession, avoiding the problem of low sensitivity of the arithmetic mean under very large samples.

Benchmark

goos: linux
goarch: amd64
pkg: mosn.io/mosn/pkg/upstream/cluster
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkShortestResponseLoadBalancer_ChooseHost
BenchmarkShortestResponseLoadBalancer_ChooseHost-4   	 4357245	       287.4 ns/op
PASS

Code Style

Make sure Goimports has run
Show Golint result

Use supermonkey to mock `time.Now` instead of a function variable

Use `Host.Weight()` as factor

Make random chosen at most once for each host (cherry picked from commit d8f4988)

# Conflicts: # pkg/upstream/cluster/host.go # pkg/upstream/cluster/mock_test.go # pkg/upstream/cluster/stats.go

codecov · 2023-03-30T06:16:24Z

Codecov Report

Patch coverage: 73.71% and project coverage change: -0.01 ⚠️

Comparison is base (3525891) 60.30% compared to head (62fd37c) 60.29%.

❗ Current head 62fd37c differs from pull request most recent head 8e5db43. Consider uploading reports for the commit 8e5db43 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2274      +/-   ##
==========================================
- Coverage   60.30%   60.29%   -0.01%     
==========================================
  Files         422      423       +1     
  Lines       37220    37410     +190     
==========================================
+ Hits        22446    22558     +112     
- Misses      12540    12611      +71     
- Partials     2234     2241       +7

Impacted Files	Coverage Δ
pkg/metrics/store.go	`76.61% <0.00%> (-6.00%)`	⬇️
pkg/metrics/store_lazy.go	`33.64% <0.00%> (-7.74%)`	⬇️
pkg/metrics/upstream.go	`0.00% <ø> (ø)`
pkg/types/upstream.go	`60.71% <ø> (ø)`
pkg/proxy/downstream.go	`57.70% <25.00%> (-0.33%)`	⬇️
pkg/upstream/cluster/loadbalancer.go	`79.84% <86.59%> (+2.18%)`	⬆️
pkg/metrics/ewma/ewma.go	`100.00% <100.00%> (ø)`
pkg/proxy/upstream.go	`48.59% <100.00%> (+0.97%)`	⬆️
pkg/upstream/cluster/stats.go	`100.00% <100.00%> (ø)`

... and 6 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

pkg/upstream/cluster/loadbalancer.go

Make linter happy again

jizhuozhi · 2023-04-01T08:42:40Z

Here is the formula for EWMA

$$ S_t = \alpha * i + (1 - \alpha) * S_{t-1} $$

A constant $\alpha$ appears in this formula. Many implementations will require the value of $\alpha$ to be configured to control sensitivity to worse upstreams or only default value, but there is no guidance on how to configure this value.

In this PR, the specific calculation method is given: the $\alpha$ value required to decay from 1 to a sufficiently small value $\beta$ within a fixed time $t$. Let $S_0 = 1$ then

$$ S_t = \alpha * 0 + (1 - \alpha) * S_{t-1} = (1 - \alpha) * S_{t-1} = (1 - \alpha) ^ t = \beta $$

It equals to $alpha = 1 - \beta^{-t}$. So users no longer need to pay attention to how to configure $\alpha$, only need to know the expected $\beta$ and $t$

jizhuozhi added 5 commits March 29, 2023 17:28

feat: intelli load balancer with the shortest response

c3e7c8b

feat: intelli load balancer with the shortest response

296aee5

Use supermonkey to mock `time.Now` instead of a function variable

feat: intelli load balancer with the shortest response

f3c5eb7

Use `Host.Weight()` as factor

feat: shortest response loadbalancer

ddc7dfe

Make random chosen at most once for each host (cherry picked from commit d8f4988)

Merge remote-tracking branch 'origin/master' into ewma_balancer

0e7efa4

# Conflicts: # pkg/upstream/cluster/host.go # pkg/upstream/cluster/mock_test.go # pkg/upstream/cluster/stats.go

mosn-community-bot bot added cla:yes size/XL labels Mar 29, 2023

jizhuozhi marked this pull request as draft March 29, 2023 18:12

jizhuozhi marked this pull request as ready for review March 30, 2023 02:35

jizhuozhi marked this pull request as draft March 30, 2023 02:45

jizhuozhi marked this pull request as ready for review March 30, 2023 06:02

muyuan0 reviewed Mar 30, 2023

View reviewed changes

pkg/upstream/cluster/loadbalancer.go Outdated Show resolved Hide resolved

jizhuozhi mentioned this pull request Mar 30, 2023

feat: PeakEWMA load balancer #2253

Merged

jizhuozhi marked this pull request as draft March 31, 2023 10:14

jizhuozhi changed the title ~~feat: shortest response loadbalancer with success rate~~ feat: an intelli load balancer combining multiple metrics Mar 31, 2023

jizhuozhi marked this pull request as ready for review March 31, 2023 11:57

jizhuozhi marked this pull request as draft March 31, 2023 13:12

jizhuozhi marked this pull request as ready for review March 31, 2023 13:12

feat: an intelli load balancer combining multiple metrics

f6c6a48

jizhuozhi marked this pull request as draft March 31, 2023 17:01

jizhuozhi added 2 commits April 1, 2023 03:03

feat: an intelli load balancer combining multiple metrics

a1694e2

Make linter happy again

feat: an intelli load balancer combining multiple metrics

fbd9bdb

jizhuozhi marked this pull request as ready for review March 31, 2023 19:42

jizhuozhi added 3 commits April 2, 2023 02:45

feat: an intelli load balancer combining multiple metrics

b965087

feat: an intelli load balancer combining multiple metrics

074f547

feat: an intelli load balancer combining multiple metrics

a037510

jizhuozhi marked this pull request as draft April 2, 2023 05:03

feat: an intelli load balancer combining multiple metrics

f52a48d

jizhuozhi added 2 commits April 2, 2023 17:44

feat: an intelli load balancer combining multiple metrics

c3d3dfc

feat: an intelli load balancer combining multiple metrics

302b7b7

jizhuozhi changed the title ~~feat: an intelli load balancer combining multiple metrics~~ feat: PeakEWMA load balancer with response success rate Apr 2, 2023

jizhuozhi added 5 commits April 3, 2023 01:20

feat: PeakEWMA load balancer with response success rate

bae12fc

feat: PeakEWMA load balancer with response success rate

dae84ef

feat: PeakEWMA load balancer with response success rate

810a00f

feat: PeakEWMA load balancer with response success rate

eb63909

feat: PeakEWMA load balancer with response success rate

8e5db43

jizhuozhi closed this Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: PeakEWMA load balancer with response success rate #2274

feat: PeakEWMA load balancer with response success rate #2274

jizhuozhi commented Mar 29, 2023 •

edited

codecov bot commented Mar 30, 2023 •

edited

jizhuozhi commented Apr 1, 2023 •

edited

feat: PeakEWMA load balancer with response success rate #2274

feat: PeakEWMA load balancer with response success rate #2274

Conversation

jizhuozhi commented Mar 29, 2023 • edited

Issues associated with this PR

Solutions

Duration with EWMA

Success Rate with EWMA

Benchmark

Code Style

codecov bot commented Mar 30, 2023 • edited

Codecov Report

jizhuozhi commented Apr 1, 2023 • edited

jizhuozhi commented Mar 29, 2023 •

edited

codecov bot commented Mar 30, 2023 •

edited

jizhuozhi commented Apr 1, 2023 •

edited