New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad performance after enabling etcd authentication. #5840

Closed
gradywang opened this Issue Jul 1, 2016 · 3 comments

Comments

2 participants
@gradywang

gradywang commented Jul 1, 2016

I tested that etcd has a bad performance after enabling authentication, my test steps as below:

Install etcd(v2.3.6) cluster with three nodes by following steps:

export ETCD_NODE1="172.29.2.61"
export ETCD_NODE2="172.29.2.62"
export ETCD_NODE3="172.29.2.63"

Run on ETCD_NODE1:
$ nohup ./etcd --name infra0 --initial-advertise-peer-urls http://${ETCD_NODE1}:2380 \
  --listen-peer-urls http://0.0.0.0:2380 \
  --listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
  --initial-advertise-peer-urls http://${ETCD_NODE1}:2380 \
  --advertise-client-urls http://${ETCD_NODE1}:2379,http://${ETCD_NODE1}:4001 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster infra0=http://${ETCD_NODE1}:2380,infra1=http://${ETCD_NODE2}:2380,infra2=http://${ETCD_NODE3}:2380 \
  --initial-cluster-state new > /opt/etcd.log 2>&1 &

Run on ETCD_NODE2:
$ nohup ./etcd --name infra1 --initial-advertise-peer-urls http://${ETCD_NODE2}:2380 \
  --listen-peer-urls http://0.0.0.0:2380 \
  --listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
  --initial-advertise-peer-urls http://${ETCD_NODE2}:2380 \
  --advertise-client-urls http://${ETCD_NODE2}:2379,http://${ETCD_NODE2}:4001 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster infra0=http://${ETCD_NODE1}:2380,infra1=http://${ETCD_NODE2}:2380,infra2=http://${ETCD_NODE3}:2380 \
  --initial-cluster-state new > /opt/etcd.log 2>&1 &

Run on ETCD_NODE3:
$ nohup ./etcd --name infra2 --initial-advertise-peer-urls http://${ETCD_NODE3}:2380 \
  --listen-peer-urls http://0.0.0.0:2380 \
  --listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
  --initial-advertise-peer-urls http://${ETCD_NODE3}:2380 \
  --advertise-client-urls http://${ETCD_NODE3}:2379,http://${ETCD_NODE3}:4001 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster infra0=http://${ETCD_NODE1}:2380,infra1=http://${ETCD_NODE2}:2380,infra2=http://${ETCD_NODE3}:2380 \
  --initial-cluster-state new > /opt/etcd.log 2>&1 &

Run etcdctl on ETCD_NODE1 to check the cluster status:
# ./etcdctl cluster-health
member 3d4f49268b4bad6f is healthy: got healthy result from http://172.29.2.61:2379
member 4c4996a252de1673 is healthy: got healthy result from http://172.29.2.62:2379
member 86aa40dcb9e42f47 is healthy: got healthy result from http://172.29.2.63:2379
cluster is healthy
root@dcos-01:/opt/etcd-v3.0.0-linux-amd64# ./etcdctl member list
3d4f49268b4bad6f: name=infra0 peerURLs=http://172.29.2.61:2380 clientURLs=http://172.29.2.61:2379,http://172.29.2.61:4001 isLeader=true
4c4996a252de1673: name=infra1 peerURLs=http://172.29.2.62:2380 clientURLs=http://172.29.2.62:2379,http://172.29.2.62:4001 isLeader=false
86aa40dcb9e42f47: name=infra2 peerURLs=http://172.29.2.63:2380 clientURLs=http://172.29.2.63:2379,http://172.29.2.63:4001 isLeader=false

Test steps:

Monitor the cpu usage on the etcd leader node (ETCD_NODE1) and run the test script to insert 10k key-value pairs into etcd:

# /bin/bash initETCD.sh http://172.29.2.61:2379
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10.7M    0 10.7M    0     0  65.1M      0 --:--:-- --:--:-- --:--:-- 65.3M
Insert 10000 key-value(1KB) pairs into etcd "http://172.29.2.61:2379" takes about "69(s)".

# top
top - 10:02:34 up 140 days, 21:37,  2 users,  load average: 0.41, 0.25, 0.86
Tasks: 124 total,   2 running, 122 sleeping,   0 stopped,   0 zombie
%Cpu(s): 10.3 us, 12.1 sy,  0.0 ni, 74.8 id,  0.2 wa,  0.0 hi,  0.7 si,  0.0 st
KiB Mem:  16433996 total,  9784204 used,  6649792 free,   587496 buffers
KiB Swap: 16777212 total,    27336 used, 16749876 free.  8105512 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3998 root      20   0 10.069g  59060  15208 S  19.6  0.4   0:08.57 etcd
......
......

Then enable the etcd authentication and test again:

# etcdctl user add root
New password:
User root created
# etcdctl -u root:root auth enable
Authentication Enabled

# /bin/bash initETCD.sh http://root:root@172.29.2.61:2379
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10.8M    0 10.8M    0     0  38.1M      0 --:--:-- --:--:-- --:--:-- 38.2M
Insert 10000 key-value(1KB) pairs into etcd "http://root:root@172.29.2.61:2379" takes about "1174(s)".

# top
top - 10:12:39 up 140 days, 21:47,  2 users,  load average: 0.39, 0.16, 0.51
Tasks: 123 total,   1 running, 122 sleeping,   0 stopped,   0 zombie
%Cpu(s): 26.1 us,  1.1 sy,  0.0 ni, 74.7 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  16433996 total,  9858000 used,  6575996 free,   587496 buffers
KiB Swap: 16777212 total,    27336 used, 16749876 free.  8125924 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3998 root      20   0 10.171g 114532  15272 S  96.4  0.7   0:58.66 etcd
24180 root      20   0   12444   3204   2960 S   0.7  0.0   0:00.18 bash

Test results:

Refer to above results, before enabling etcd authentication, insert 10k key-value(1KB each value) pairs into etcd will take about 69s and the average cpu usage is 10.3%, but after enabling etcd auth, it needs to take about 1174s, and the average cpu usage reach to 26.6%. Cloud anyone tell me why etcd has such a different performance after enabling auth? Can I do something for optimization?

Append the test script initETCD.sh as below:

#!/bin/bash
#-----------------------------------------
# Name: initETCD.sh
# Description:
#    This script is used to pre create some key-value pairs in etcd cluster.
# Parameters:
#    $1: The number of key-value pairs.
# Return Value:
#    None
#------------------------------------------------

ETCD_URL=$1
NUM=$2

# Check the specified etcd cluster URLs.
_prefix=`echo ${ETCD_URL:0:4}`
if [ "x${_prefix}" != "xhttp" ] ; then
    echo "Please specify the first parameter correctly, such as \"http://9.111.254.41:2379\" exit ... "
    exit 1
fi

if [ x"$NUM" = "x" ]; then
    NUM=10000
fi

# clear up
curl ${ETCD_URL}/v2/keys/status?recursive=true -XDELETE > /dev/null 2>&1

text=`openssl rand -hex 512`
index=0
spent_time=0

start_time=`date +%s`
while((${index}<${NUM}))
do
    # Generate a random string with 1K
    # text=`openssl rand -hex 512`
    UUID=`cat /proc/sys/kernel/random/uuid`
    curl ${ETCD_URL}/v2/keys/status/$UUID -XPUT -d value=$text > /dev/null 2>&1
    let "index = $index + 1"
done
end_time=`date +%s`

let "spent_time = $end_time - $start_time"
insert_num=`curl ${ETCD_URL}/v2/keys/status?recursive=true | jq '. | .node.nodes[].key' | wc -l`
echo "Insert $NUM key-value(1KB) pairs into etcd \"$ETCD_URL\" takes about \"${spent_time}(s)\". "
@gradywang

This comment has been minimized.

Show comment
Hide comment
@gradywang

gradywang Jul 1, 2016

In addition, I found that before enabling etcd auth, I can start 10k etcd clients to watch a key in etcd, but after enabling etcd auth, it does not, when the number of watched etcd clients reach to 300, then etcd leader cpu usage will reach to 99.6%, and all etcd client operation does not work, and I found some errors in etcd log:

......
2016-07-01 09:16:34.120567 W | etcdserver: server is likely overloaded
2016-07-01 09:16:35.257437 W | etcdserver: failed to send out heartbeat on time (deadline exceeded for 936.918092ms)
2016-07-01 09:16:35.257471 W | etcdserver: server is likely overloaded
2016-07-01 09:16:35.257477 W | etcdserver: failed to send out heartbeat on time (deadline exceeded for 936.914383ms)
2016-07-01 09:16:35.257482 W | etcdserver: server is likely overloaded
2016-07-01 09:16:36.082682 W | etcdserver: failed to send out heartbeat on time (deadline exceeded for 625.246459ms)
2016-07-01 09:16:36.082711 W | etcdserver: server is likely overloaded
2016-07-01 09:16:36.082717 W | etcdserver: failed to send out heartbeat on time (deadline exceeded for 625.239833ms)
2016-07-01 09:16:36.082721 W | etcdserver: server is likely overloaded
2016-07-01 09:16:36.655453 W | etcdserver: failed to send out heartbeat on time (deadline exceeded for 372.770696ms)
2016-07-01 09:16:36.655484 W | etcdserver: server is likely overloaded
2016-07-01 09:16:36.655490 W | etcdserver: failed to send out heartbeat on time (deadline exceeded for 372.773088ms)
2016-07-01 09:16:36.655495 W | etcdserver: server is likely overloaded
......
......

gradywang commented Jul 1, 2016

In addition, I found that before enabling etcd auth, I can start 10k etcd clients to watch a key in etcd, but after enabling etcd auth, it does not, when the number of watched etcd clients reach to 300, then etcd leader cpu usage will reach to 99.6%, and all etcd client operation does not work, and I found some errors in etcd log:

......
2016-07-01 09:16:34.120567 W | etcdserver: server is likely overloaded
2016-07-01 09:16:35.257437 W | etcdserver: failed to send out heartbeat on time (deadline exceeded for 936.918092ms)
2016-07-01 09:16:35.257471 W | etcdserver: server is likely overloaded
2016-07-01 09:16:35.257477 W | etcdserver: failed to send out heartbeat on time (deadline exceeded for 936.914383ms)
2016-07-01 09:16:35.257482 W | etcdserver: server is likely overloaded
2016-07-01 09:16:36.082682 W | etcdserver: failed to send out heartbeat on time (deadline exceeded for 625.246459ms)
2016-07-01 09:16:36.082711 W | etcdserver: server is likely overloaded
2016-07-01 09:16:36.082717 W | etcdserver: failed to send out heartbeat on time (deadline exceeded for 625.239833ms)
2016-07-01 09:16:36.082721 W | etcdserver: server is likely overloaded
2016-07-01 09:16:36.655453 W | etcdserver: failed to send out heartbeat on time (deadline exceeded for 372.770696ms)
2016-07-01 09:16:36.655484 W | etcdserver: server is likely overloaded
2016-07-01 09:16:36.655490 W | etcdserver: failed to send out heartbeat on time (deadline exceeded for 372.773088ms)
2016-07-01 09:16:36.655495 W | etcdserver: server is likely overloaded
......
......
@xiang90

This comment has been minimized.

Show comment
Hide comment
@xiang90

xiang90 Jul 1, 2016

Contributor

this is a long time issue. and this issue is a dup with #3223. You can check that issue for more information. We will probably get this "fixed" by 3.1.

Contributor

xiang90 commented Jul 1, 2016

this is a long time issue. and this issue is a dup with #3223. You can check that issue for more information. We will probably get this "fixed" by 3.1.

@xiang90 xiang90 closed this Jul 1, 2016

@gradywang

This comment has been minimized.

Show comment
Hide comment
@gradywang

gradywang Jul 1, 2016

@xiang90 Thanks for your reply.

gradywang commented Jul 1, 2016

@xiang90 Thanks for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment