Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Balancing algorithm (Peak) EWMA #1570

Open
git001 opened this issue Feb 24, 2022 · 5 comments
Open

New Balancing algorithm (Peak) EWMA #1570

git001 opened this issue Feb 24, 2022 · 5 comments
Labels
type: feature This issue describes a feature request / wishlist.

Comments

@git001
Copy link
Contributor

git001 commented Feb 24, 2022

Your Feature Request

It would be nice to have a balancing Algorithm which uses the backend server round-trip time (RTT) for weighting of the backend servers.
There is the peak exponentially-weighted moving average (“peak EWMA”) algorithm which uses the RTT from health checks or normal requests to measure a "fast" backend.

Cite from https://twitter.github.io/finagle/guide/Clients.html#power-of-two-choices-p2c-peak-ewma

Power of Two Choices (P2C) + Peak EWMA [2]

Backed by the P2C distributor, Peak EWMA uses a moving average over an endpoint’s round-trip time (RTT) that is highly sensitive to peaks. This average is then weighted by the number of outstanding requests, effectively increasing our resolution per-request. It is designed to react to slow endpoints more quickly than least loaded by penalizing them when they exhibit slow response times. This load metric operates under the assumption that a loaded endpoint takes time to recover and so it is generally safe for the advertised load to incorporate an endpoint’s history. However, this assumption breaks down in the presence of long polling clients.

Here also the link of linkerd' description https://linkerd.io/2016/03/16/beyond-round-robin-load-balancing-for-latency/

As far as I know the haproxy code I think there will be several steps to implement this feature.

  1. optional: add bc_rtt or server_rtt similar to fc_rtt https://github.com/haproxy/haproxy/blob/master/src/tcp_sample.c#L478
  2. add a rtt to the server struct? or use the already created calc from https://github.com/haproxy/haproxy/blob/master/src/log.c#L2452 "%Td"
  3. add to tcp-check that the rtt from the check request is set to the server.
  4. calculate the weight based on rtt and requests in queue for the backend servers, I think this is the trickiest part.
  5. add balance ewma or balance pwma

Any opinions on this?

What are you trying to do?

I try to keep the latency for the clients low even when some servers handle "slow" requests.

Output of haproxy -vv

Feature haproxy versions
@git001 git001 added the type: feature This issue describes a feature request / wishlist. label Feb 24, 2022
@wtarreau
Copy link
Member

I didn't know about peak EWMA, and it may actually solve this old desire to automatically compute weights based on response time and load without creating an oscillator. But at first glance it seems to me that we don't need to create a new LB algo for this, but instead use this to constantly adjust weights so that it could be used with any LB algo (e.g. even consistent hash, thus allowing cache farms to smoothly adjust to the load). An in-depth lecture of the document is needed to get the whole picture, though. Thanks Alex!

@git001
Copy link
Contributor Author

git001 commented Dec 9, 2022

That's a quite good hint in case the rtt is not available.

https://github.com/tower-rs/tower/blob/master/tower/src/load/peak_ewma.rs#L32

/// When no latency information has been measured for an endpoint, an arbitrary default
/// RTT of 1 second is used to prevent the endpoint from being overloaded before a
/// meaningful baseline can be established..

Let me write some questions for implementation here so that we have something like a brainstorming for the implementation phase.

  • Should be there any default value for "rtt" something like latency default 1ms
  • What value should be used when no rtt is available
    • Health check duration?
    • Time of first byte or first body?
  • How often should the weight be calculated (update rate)?

Let me try to create a simple picture for the flow.

flowchart TD;
  subgraph backends [backend]

  subgraph be_servers [servers]
      direction LR
      id3([Server 1]) -..- id4([Server N]);
  end

   subgraph requests [requests]
      id3([Server 1])
      id4([Server N]);
  end

  subgraph health [health_check]
      id3([Server 1])
      id4([Server N]);
  end

  requests --- |get rrt or duration and set weight| be_servers
  health --- |get rrt or duration and set weight| be_servers

end

@git001
Copy link
Contributor Author

git001 commented May 3, 2023

added bc_rtt with this commit 5529c99

@jaroslawr
Copy link
Contributor

Should this really be based on TCP-level RTT? I think the linkerd/finagle docs use "round-trip time" to mean "RPC round-trip time" so the time between:
A) when a RPC request started being sent, and
B) when RPC response has been completely received
This should be highly correlated with overall server load because the time it takes for CPUs to compute the RPC response is nearly always much much higher than the network latency and this time increases as CPUs become more busy. Meanwhile I would expect the TCP-level RTT exposed by the kernel to be dominated by network latency - the time the backend server spends processing the TCP segments using its CPU should be a negligible part of the overall measured RTT unless the server is under really really heavy load, so I would expect this to not vary that much with changing server load and hence be much worse for load balancing. I think this TCP RTT is updated e.g. when you receive an ACK after sending a TCP segment, so that it is updated several times even just in the process of sending the request.

@git001
Copy link
Contributor Author

git001 commented May 30, 2023

@jaroslawr Well you are right, there should be more the one factor for the algorithm to decide which server should be used as you can see on the picture. The TCP-level RTT is just one factor, but this feature was not in HAProxy so I contributed this feature 😃 . I think the #1977 is a per-requirement before the (Peak) EWMA algorithm could be implemented, but that's just my opinion. 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature This issue describes a feature request / wishlist.
Projects
None yet
Development

No branches or pull requests

3 participants