Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a constant-time load balancer #266

Merged
merged 23 commits into from
Jul 11, 2019
Merged

Use a constant-time load balancer #266

merged 23 commits into from
Jul 11, 2019

Conversation

olix0r
Copy link
Member

@olix0r olix0r commented Jun 5, 2019

tower-rs/tower#288 changed the Balance implementation substantially so
that it is no longer responsible for driving services to readiness.
Instead, a new layer, spawn_ready is inserted around endpoint stack to
ensure that readiness is driven on a background task.

In support of this change, the pending layer was removed from the
enpdoint stack and, instead, the discovery system is now responsible for
driving pending services to be materialized.

tower-rs/tower#288 changed the Balance implementation substantially so
that it is no longer responsible for driving services to readiness.
Instead, a new layer, `spawn_ready` is inserted around endpoint stack to
ensure that readiness is driven on a background task.

In support of this change, the `pending` layer was removed from the
enpdoint stack and, instead, the discovery system is now responsible for
driving pending services to be materialized.
@olix0r olix0r self-assigned this Jun 5, 2019
@hawkw
Copy link
Member

hawkw commented Jun 5, 2019

Looks like CI is angry for rustfmt reasons.

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay for actually being constant-time! I'd love to see some benchmarks of the performance with the new balancer...

This branch looks mostly looks good. The one thing I was unsure about was the interaction between the FuturesUnordered of pending services in Discover and the processing of Remove and NoEndpoints updates --- it's a little bit hard to follow. I'm not currently sure what the correct behaviour is when we receive a removal update for a pending service...

src/app/main.rs Outdated Show resolved Hide resolved
src/proxy/resolve.rs Outdated Show resolved Hide resolved
src/proxy/resolve.rs Outdated Show resolved Hide resolved
src/proxy/resolve.rs Outdated Show resolved Hide resolved
src/proxy/resolve.rs Outdated Show resolved Hide resolved
src/proxy/resolve.rs Outdated Show resolved Hide resolved
In doing so, I identified some raciness in the Discover implementation.
Now, additions will cancel any previous make for that address; and we
ensure that resolution is polled before the stream of pending services
to accomodate for this.
src/svc.rs Show resolved Hide resolved
src/proxy/resolve.rs Show resolved Hide resolved
src/proxy/resolve.rs Outdated Show resolved Hide resolved
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes to the race conditions in the cancelation code look good --- thanks! This looks about ready to merge to me. I commented on some non-blocking nits.

src/proxy/resolve.rs Show resolved Hide resolved
src/proxy/resolve.rs Outdated Show resolved Hide resolved
src/proxy/resolve.rs Outdated Show resolved Hide resolved
src/proxy/resolve.rs Outdated Show resolved Hide resolved
@olix0r olix0r requested a review from adleong July 9, 2019 19:49
@olix0r
Copy link
Member Author

olix0r commented Jul 9, 2019

Docker images pushed as config.linkerd.io/proxy-version: ver-update-tower-balance-3

@olix0r
Copy link
Member Author

olix0r commented Jul 10, 2019

:; linkerd stat -n strest-new-balance po --to po -t 600s                   
NAME                                                  MESHED   SUCCESS         RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN
strest-client-edge-19-7-3-655bccb976-swzgc               1/1   100.00%   1867.8rps           1ms           3ms          32ms          6
strest-client-update-tower-balance-5f84fd9ff7-cz882      1/1   100.00%   2052.4rps           1ms           3ms          19ms          6

strest-pr-266.yml

@olix0r
Copy link
Member Author

olix0r commented Jul 10, 2019

I've also tried this without the spawn-ready layer for comparison. Throughput/latency is roughly the same, but spawn-ready does appear to reduce CPU utilization:

:; kubectl top po --containers -n strest-new-balance |awk '$1 ~ /^strest-client-/ && $2 ~ /^linkerd-proxy$/ { print $0 }'
strest-client-edge-19-7-3-655bccb976-nq8zl            linkerd-proxy   68m          2Mi             
strest-client-no-spawn-ready-79ff7b98fd-9hhp7         linkerd-proxy   75m          3Mi             
strest-client-update-tower-balance-5f84fd9ff7-ktpmn   linkerd-proxy   59m          2Mi
:; linkerd stat -n strest-new-balance po --to po
NAME                                                  MESHED   SUCCESS        RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN
strest-client-edge-19-7-3-655bccb976-nq8zl               1/1   100.00%   138.9rps           1ms         464ms         888ms          6
strest-client-no-spawn-ready-79ff7b98fd-9hhp7            1/1   100.00%   161.2rps           1ms         407ms         816ms          6
strest-client-update-tower-balance-5f84fd9ff7-ktpmn      1/1   100.00%   160.1rps           1ms         403ms         855ms          6

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⭐ Makes sense to me

@olix0r olix0r merged commit 3a3ec3b into master Jul 11, 2019
@olix0r olix0r deleted the ver/update-tower-balance branch July 11, 2019 00:13
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Jul 11, 2019
Each time we update the proxy from the linkerd2-proxy repo, we make the
change slightly differently. The bin/update-proxy-version does all the
steps needed to update the proxy version up to and including making a
commit to this repo.

The proxy version is now stored in a .proxy-version file and is
consumed directly by Dockerfile-proxy, which both simplifies the
Dockerfile and the update process.

This script formats commit messages and emits output as follows:

```
commit c05198a851f69bdc7007974a0ef1f4c01c98d0ce (HEAD -> ver/proxy-update)
Author: Oliver Gould <ver@buoyant.io>
Date:   Thu Jul 11 17:23:05 2019 +0000

    proxy: Update to linkerd/linkerd2-proxy#3a3ec3b

    * linkerd/linker2-proxy#0cc58cd fallback: Clarify fallback layering (linkerd/linkerd2-proxy#288)
    * linkerd/linker2-proxy#b71349a Replace `log` and `env-logger` with `tracing` and `tracing-fmt` (linkerd/linkerd2-proxy#277)
    * linkerd/linker2-proxy#3a3ec3b Use a constant-time load balancer (linkerd/linkerd2-proxy#266)

diff --git a/.proxy-version b/.proxy-version
index f81f40de..d7faa12d 100644
--- a/.proxy-version
+++ b/.proxy-version
@@ -1 +1 @@
-05b012d
+3a3ec3b
```
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Jul 11, 2019
Each time we update the proxy from the linkerd2-proxy repo, we make the
change slightly differently. The bin/git-commit-proxy-version does all the
steps needed to update the proxy version up to and including making a
commit to this repo.

The proxy version is now stored in a .proxy-version file and is
consumed directly by Dockerfile-proxy, which both simplifies the
Dockerfile and the update process.

This script formats commit messages and emits output as follows:

```
commit c05198a851f69bdc7007974a0ef1f4c01c98d0ce (HEAD -> ver/proxy-update)
Author: Oliver Gould <ver@buoyant.io>
Date:   Thu Jul 11 17:23:05 2019 +0000

    proxy: Update to linkerd/linkerd2-proxy#3a3ec3b

    * linkerd/linkerd2-proxy#0cc58cd fallback: Clarify fallback layering (linkerd/linkerd2-proxy#288)
    * linkerd/linkerd2-proxy#b71349a Replace `log` and `env-logger` with `tracing` and `tracing-fmt` (linkerd/linkerd2-proxy#277)
    * linkerd/linkerd2-proxy#3a3ec3b Use a constant-time load balancer (linkerd/linkerd2-proxy#266)

diff --git a/.proxy-version b/.proxy-version
index f81f40de..d7faa12d 100644
--- a/.proxy-version
+++ b/.proxy-version
@@ -1 +1 @@
-05b012d
+3a3ec3b
```
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Jul 11, 2019
* linkerd/linkerd2-proxy#0cc58cd fallback: Clarify fallback layering (linkerd/linkerd2-proxy#288)
* linkerd/linkerd2-proxy#b71349a Replace `log` and `env-logger` with `tracing` and `tracing-fmt` (linkerd/linkerd2-proxy#277)
* linkerd/linkerd2-proxy#3a3ec3b Use a constant-time load balancer (linkerd/linkerd2-proxy#266)
* linkerd/linkerd2-proxy#f0ef775 Add `/proxy-log-level` endpoint to admin server (linkerd/linkerd2-proxy#279)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Jul 11, 2019
* fallback: Clarify fallback layering (linkerd/linkerd2-proxy#288)
* Replace `log` and `env-logger` with `tracing` and `tracing-fmt` (linkerd/linkerd2-proxy#277)
* Use a constant-time load balancer (linkerd/linkerd2-proxy#266)
* Add `/proxy-log-level` endpoint to admin server (linkerd/linkerd2-proxy#279)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Jul 11, 2019
* fallback: Clarify fallback layering (linkerd/linkerd2-proxy#288)
* Replace `log` and `env-logger` with `tracing` and `tracing-fmt` (linkerd/linkerd2-proxy#277)
* Use a constant-time load balancer (linkerd/linkerd2-proxy#266)
* Add `/proxy-log-level` endpoint to admin server (linkerd/linkerd2-proxy#279)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants