Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clientv3: Upgrade to round robin balancer based on gRPC 1.12 balancer API #9860

Merged
merged 39 commits into from Jun 16, 2018

Conversation

@jpbetz
Copy link
Contributor

commented Jun 15, 2018

To simplify balancer failover logic, leverage gPRC's new load balancer API and ease gRPC dependency upgrades, we've rewritten the etcd clientv3 load balancer implementation. This PR merges the new load balancer development branch to master.

Design: docs/client-architecture.rst

Benchmark: https://github.com/coreos/dbtester/tree/master/test-results/2018Q2-02-etcd-client-balancer

Key changes:

  • Round Robin load balancing
  • Use gRPC's new load balancer API
  • Interceptor based retries
gyuho and others added 30 commits Mar 19, 2018
vendor: upgrade "grpc/grpc-go" to v1.11.1
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
clientv3/balancer: initial commit
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
*: introduce mock server for testing load balancing and add a simple …
…happy-path load balancer test

Author:    Joe Betz <jpbetz@google.com>
Date:      Wed Mar 28 15:51:33 2018 -0700
pkg/mock/mockserver: support restart
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
clientv3/balancer: add more failover tests with resolver
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
clientv3/balancer: use new mock server in tests
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
clientv3/balancer: add "TestRoundRobinBalancedPassthrough" (WIP)
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
clientv3: dial with context when creating authenticator
Otherwise, "grpc.Dial" blocks when "grpc.WithTimeout" dial
option gets deprecated.

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
clientv3: pass "grpc.WithBlock" on "TestDialTimeout"
Otherwise, grpc.DialContext would just return before
connection is up.

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
clientv3: deprecate "grpc.WithTimeout" in favor of "grpc.DialContext"
"grpc.WithTimeout" dial option is being deprecated.

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
clientv3: Fix auth client to use endpoints instead of host when diali…
…ng, fix tests to block on dial when required.
clientv3: remove unused "dialerrc"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
vendor: add "go-grpc-middleware/util/backoffutils"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
*: fix fmt tests, reenable "testEmbedEtcdGracefulStop"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

@jpbetz jpbetz added this to the etcd-v3.4 milestone Jun 15, 2018

@gyuho

This comment has been minimized.

Copy link
Member

commented Jun 15, 2018

We already extensively tested this branch. And benchmark against current master branch shows no regression with (slightly) better read throughputs.

# Write 1M keys, 256-byte key, 1KB value, Best Throughput (etcd 1K clients with 100 conns)
+---------------------------------------+-----------------------------+---------------------------------+
|                                       | etcd-v3.4-b241e383-go1.10.3 | etcd-v3.4-balancer0615-go1.10.3 |
+---------------------------------------+-----------------------------+---------------------------------+
|                         TOTAL-SECONDS |                 31.1256 sec |                     31.2477 sec |
|                  TOTAL-REQUEST-NUMBER |                   1,000,000 |                       1,000,000 |
|                        MAX-THROUGHPUT |              33,760 req/sec |                  34,587 req/sec |
|                        AVG-THROUGHPUT |              32,127 req/sec |                  32,002 req/sec |
|                        MIN-THROUGHPUT |               4,965 req/sec |                  10,454 req/sec |
|                       FASTEST-LATENCY |                   4.6587 ms |                       2.4888 ms |
|                           AVG-LATENCY |                  31.0604 ms |                      31.2033 ms |
|                       SLOWEST-LATENCY |                 117.5620 ms |                     114.9492 ms |
|                           Latency p10 |                13.431526 ms |                    14.691959 ms |
|                           Latency p25 |                17.993337 ms |                    19.586467 ms |
|                           Latency p50 |                24.734914 ms |                    25.571253 ms |
|                           Latency p75 |                42.801499 ms |                    42.138762 ms |
|                           Latency p90 |                57.777309 ms |                    55.289961 ms |
|                           Latency p95 |                65.311487 ms |                    60.855029 ms |
|                           Latency p99 |                78.819013 ms |                    75.192049 ms |
|                         Latency p99.9 |                97.808156 ms |                    92.254135 ms |
|      SERVER-TOTAL-NETWORK-RX-DATA-SUM |                      5.2 GB |                          5.2 GB |
|      SERVER-TOTAL-NETWORK-TX-DATA-SUM |                      3.9 GB |                          4.0 GB |
|           CLIENT-TOTAL-NETWORK-RX-SUM |                      258 MB |                          324 MB |
|           CLIENT-TOTAL-NETWORK-TX-SUM |                      1.5 GB |                          1.6 GB |
|                  SERVER-MAX-CPU-USAGE |                    440.30 % |                        537.67 % |
|               SERVER-MAX-MEMORY-USAGE |                      1.2 GB |                          1.2 GB |
|                  CLIENT-MAX-CPU-USAGE |                    570.00 % |                        593.00 % |
|               CLIENT-MAX-MEMORY-USAGE |                       95 MB |                          171 MB |
|                    CLIENT-ERROR-COUNT |                           0 |                               0 |
|  SERVER-AVG-READS-COMPLETED-DELTA-SUM |                           0 |                              73 |
|    SERVER-AVG-SECTORS-READS-DELTA-SUM |                           0 |                               0 |
| SERVER-AVG-WRITES-COMPLETED-DELTA-SUM |                     103,846 |                         109,864 |
|  SERVER-AVG-SECTORS-WRITTEN-DELTA-SUM |                  23,873,928 |                      20,586,688 |
|           SERVER-AVG-DISK-SPACE-USAGE |                      2.7 GB |                          2.7 GB |
+---------------------------------------+-----------------------------+---------------------------------+
# Read 3M same keys, 256-byte key, 1KB value, Best Throughput (etcd 1K clients with 100 conns)
+---------------------------------------+-----------------------------+---------------------------------+
|                                       | etcd-v3.4-b241e383-go1.10.3 | etcd-v3.4-balancer0615-go1.10.3 |
+---------------------------------------+-----------------------------+---------------------------------+
|                         TOTAL-SECONDS |                 17.8744 sec |                     17.8226 sec |
|                  TOTAL-REQUEST-NUMBER |                   3,000,000 |                       3,000,000 |
|                        MAX-THROUGHPUT |             176,763 req/sec |                 172,164 req/sec |
|                        AVG-THROUGHPUT |             167,837 req/sec |                 168,325 req/sec |
|                        MIN-THROUGHPUT |              38,290 req/sec |                   7,453 req/sec |
|                       FASTEST-LATENCY |                   0.5131 ms |                       0.5025 ms |
|                           AVG-LATENCY |                   4.6043 ms |                       4.6358 ms |
|                       SLOWEST-LATENCY |                  37.8623 ms |                      29.7872 ms |
|                           Latency p10 |                 1.729814 ms |                     2.372096 ms |
|                           Latency p25 |                 2.383698 ms |                     3.036887 ms |
|                           Latency p50 |                 3.961112 ms |                     4.055946 ms |
|                           Latency p75 |                 6.137971 ms |                     5.684766 ms |
|                           Latency p90 |                 8.458589 ms |                     7.767217 ms |
|                           Latency p95 |                10.006860 ms |                     9.068512 ms |
|                           Latency p99 |                13.232563 ms |                    12.085174 ms |
|                         Latency p99.9 |                18.042299 ms |                    16.128133 ms |
|      SERVER-TOTAL-NETWORK-RX-DATA-SUM |                      1.2 GB |                          1.3 GB |
|      SERVER-TOTAL-NETWORK-TX-DATA-SUM |                      4.5 GB |                          4.6 GB |
|           CLIENT-TOTAL-NETWORK-RX-SUM |                      4.4 GB |                          4.8 GB |
|           CLIENT-TOTAL-NETWORK-TX-SUM |                      1.2 GB |                          1.3 GB |
|                  SERVER-MAX-CPU-USAGE |                    891.33 % |                        867.33 % |
|               SERVER-MAX-MEMORY-USAGE |                       58 MB |                           68 MB |
|                  CLIENT-MAX-CPU-USAGE |                   1453.00 % |                       1510.00 % |
|               CLIENT-MAX-MEMORY-USAGE |                      158 MB |                          255 MB |
|                    CLIENT-ERROR-COUNT |                           0 |                               0 |
|  SERVER-AVG-READS-COMPLETED-DELTA-SUM |                           0 |                               0 |
|    SERVER-AVG-SECTORS-READS-DELTA-SUM |                           0 |                               0 |
| SERVER-AVG-WRITES-COMPLETED-DELTA-SUM |                          51 |                              96 |
|  SERVER-AVG-SECTORS-WRITTEN-DELTA-SUM |                         448 |                           1,112 |
|           SERVER-AVG-DISK-SPACE-USAGE |                       64 MB |                           64 MB |
+---------------------------------------+-----------------------------+---------------------------------+

ref. https://github.com/coreos/dbtester/tree/master/test-results/2018Q2-02-etcd-client-balancer

Once CIs pass, it should be safe to merge. And we will keep testing after merge.

The design doc will be served here https://etcd.readthedocs.io/en/latest.

Thanks a lot @jpbetz!

@gyuho gyuho merged commit d866cf8 into etcd-io:master Jun 16, 2018

2 of 3 checks passed

semaphoreci The build failed on Semaphore.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jenkins-ppc64le Build finished.
Details
@WIZARD-CXY

This comment has been minimized.

Copy link
Contributor

commented Nov 30, 2018

nice

@jingyih jingyih referenced this pull request Mar 14, 2019
26 of 33 tasks complete
hexfusion added a commit to hexfusion/etcd that referenced this pull request Aug 5, 2019
backport etcd-io#9860 watch_test.go
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
hexfusion added a commit to hexfusion/etcd that referenced this pull request Aug 5, 2019
backport etcd-io#9860 watch_test.go
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
hexfusion added a commit to hexfusion/etcd that referenced this pull request Aug 5, 2019
backports from etcd-io#9860
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
hexfusion added a commit to hexfusion/etcd that referenced this pull request Aug 5, 2019
backports from etcd-io#9860
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
hexfusion added a commit to hexfusion/etcd that referenced this pull request Aug 5, 2019
backports from etcd-io#9860
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
hexfusion added a commit to hexfusion/etcd that referenced this pull request Aug 5, 2019
backports form etcd-io#9860
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
hexfusion added a commit to hexfusion/etcd that referenced this pull request Aug 5, 2019
backports form etcd-io#9860
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
hexfusion added a commit to hexfusion/etcd that referenced this pull request Aug 5, 2019
backports form etcd-io#9860
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
hexfusion added a commit to hexfusion/etcd that referenced this pull request Aug 5, 2019
backports form etcd-io#9860
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
hexfusion added a commit to hexfusion/etcd that referenced this pull request Aug 5, 2019
backports from etcd-io#9860
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
hexfusion added a commit to hexfusion/etcd that referenced this pull request Aug 5, 2019
clientv3/integration: clientv3: Fix TLS test failures from gRPC bump
These changes were originally fixed in etcd-io#9860 commit 9304d1a

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
@hexfusion hexfusion referenced this pull request Aug 6, 2019
0 of 4 tasks complete
hexfusion added a commit to hexfusion/etcd that referenced this pull request Aug 6, 2019
clientv3/integration: clientv3: Fix TLS test failures from gRPC bump
These changes were originally fixed in etcd-io#9860 commit 9304d1a

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants
You can’t perform that action at this time.