Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: PeakEWMA load balancer with response success rate #2274

Closed
wants to merge 19 commits into from
Closed

feat: PeakEWMA load balancer with response success rate #2274

wants to merge 19 commits into from

Conversation

jizhuozhi
Copy link
Contributor

@jizhuozhi jizhuozhi commented Mar 29, 2023

Issues associated with this PR

#2252

Solutions

Forked from #2253. But two differents.

Duration with EWMA

The first is that using the two level implementation of an EWMA to calculate duration characteristics. The first level will be counted according to the second time interval to get the arithmetic mean, and then decay it through the exponential moving weighted average (EWMA).

Success Rate with EWMA

The second is that additional success rate metrics added. It's for fail-fast scenario.

For example, in a mixed deployment scenario, 1C, 2C, 4C, etc. exist at the same time. The slot-based concurrency control algorithm will manage the remaining available cores. When the core 1C server is overloaded and fail-fast, the load balancing based on response time Algorithms may mistakenly think this is best instead of considering servers with 2C or 4C (since they have a lot of active connections). But if we factor the response success rate into the calculation, we can avoid this from happening, because it knows that it is due to fast failure rather than it is really fast.

The success rate also using two-level EWMA to achieve the success rate of recession, avoiding the problem of low sensitivity of the arithmetic mean under very large samples.

Benchmark

goos: linux
goarch: amd64
pkg: mosn.io/mosn/pkg/upstream/cluster
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkShortestResponseLoadBalancer_ChooseHost
BenchmarkShortestResponseLoadBalancer_ChooseHost-4   	 4357245	       287.4 ns/op
PASS

Code Style

  • Make sure Goimports has run
  • Show Golint result

Use supermonkey to mock `time.Now` instead of a function variable
Make random chosen at most once for each host

(cherry picked from commit d8f4988)
# Conflicts:
#	pkg/upstream/cluster/host.go
#	pkg/upstream/cluster/mock_test.go
#	pkg/upstream/cluster/stats.go
@jizhuozhi jizhuozhi marked this pull request as draft March 29, 2023 18:12
@jizhuozhi jizhuozhi marked this pull request as ready for review March 30, 2023 02:35
@jizhuozhi jizhuozhi marked this pull request as draft March 30, 2023 02:45
@jizhuozhi jizhuozhi marked this pull request as ready for review March 30, 2023 06:02
@codecov
Copy link

codecov bot commented Mar 30, 2023

Codecov Report

Patch coverage: 73.71% and project coverage change: -0.01 ⚠️

Comparison is base (3525891) 60.30% compared to head (62fd37c) 60.29%.

❗ Current head 62fd37c differs from pull request most recent head 8e5db43. Consider uploading reports for the commit 8e5db43 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2274      +/-   ##
==========================================
- Coverage   60.30%   60.29%   -0.01%     
==========================================
  Files         422      423       +1     
  Lines       37220    37410     +190     
==========================================
+ Hits        22446    22558     +112     
- Misses      12540    12611      +71     
- Partials     2234     2241       +7     
Impacted Files Coverage Δ
pkg/metrics/store.go 76.61% <0.00%> (-6.00%) ⬇️
pkg/metrics/store_lazy.go 33.64% <0.00%> (-7.74%) ⬇️
pkg/metrics/upstream.go 0.00% <ø> (ø)
pkg/types/upstream.go 60.71% <ø> (ø)
pkg/proxy/downstream.go 57.70% <25.00%> (-0.33%) ⬇️
pkg/upstream/cluster/loadbalancer.go 79.84% <86.59%> (+2.18%) ⬆️
pkg/metrics/ewma/ewma.go 100.00% <100.00%> (ø)
pkg/proxy/upstream.go 48.59% <100.00%> (+0.97%) ⬆️
pkg/upstream/cluster/stats.go 100.00% <100.00%> (ø)

... and 6 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jizhuozhi jizhuozhi marked this pull request as draft March 31, 2023 10:14
@jizhuozhi jizhuozhi changed the title feat: shortest response loadbalancer with success rate feat: an intelli load balancer combining multiple metrics Mar 31, 2023
@jizhuozhi jizhuozhi marked this pull request as ready for review March 31, 2023 11:57
@jizhuozhi jizhuozhi marked this pull request as draft March 31, 2023 13:12
@jizhuozhi jizhuozhi marked this pull request as ready for review March 31, 2023 13:12
@jizhuozhi jizhuozhi marked this pull request as draft March 31, 2023 17:01
@jizhuozhi jizhuozhi marked this pull request as ready for review March 31, 2023 19:42
@jizhuozhi
Copy link
Contributor Author

jizhuozhi commented Apr 1, 2023

Here is the formula for EWMA

$$ S_t = \alpha * i + (1 - \alpha) * S_{t-1} $$

A constant $\alpha$ appears in this formula. Many implementations will require the value of $\alpha$ to be configured to control sensitivity to worse upstreams or only default value, but there is no guidance on how to configure this value.

In this PR, the specific calculation method is given: the $\alpha$ value required to decay from 1 to a sufficiently small value $\beta$ within a fixed time $t$. Let $S_0 = 1$ then

$$ S_t = \alpha * 0 + (1 - \alpha) * S_{t-1} = (1 - \alpha) * S_{t-1} = (1 - \alpha) ^ t = \beta $$

It equals to $alpha = 1 - \beta^{-t}$. So users no longer need to pay attention to how to configure $\alpha$, only need to know the expected $\beta$ and $t$

@jizhuozhi jizhuozhi marked this pull request as draft April 2, 2023 05:03
@jizhuozhi jizhuozhi changed the title feat: an intelli load balancer combining multiple metrics feat: PeakEWMA load balancer with response success rate Apr 2, 2023
@jizhuozhi jizhuozhi closed this Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants