Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: read-only range request took too long with etcd 3.2.24 #70082

Open
ArchiFleKs opened this issue Oct 22, 2018 · 40 comments

Comments

@ArchiFleKs
Copy link

commented Oct 22, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:

When deploying a brand new cluster with either t2.medium (with 4Go RAM) instances or even t3.large (with 8Go RAM) I get errors in ETCD logs :

2018-10-22 11:22:14.706081 W | etcdserver: read-only range request "key:\"foo\" " with result "range_response_count:0 size:5" took too long (1.310910984s) to execute
2018-10-22 11:22:16.247803 W | etcdserver: read-only range request "key:\"/registry/persistentvolumeclaims/\" range_end:\"/registry/persistentvolumeclaims0\" " with
result "range_response_count:0 size:5" took too long (128.414553ms) to execute                                                                                       
2018-10-22 11:22:16.361601 W | etcdserver: read-only range request "key:\"/registry/persistentvolumeclaims\" range_end:\"/registry/persistentvolumeclaimt\" count_onl
y:true " with result "range_response_count:0 size:5" took too long (242.137365ms) to execute                                                                         
2018-10-22 11:22:16.363943 W | etcdserver: read-only range request "key:\"/registry/configmaps\" range_end:\"/registry/configmapt\" count_only:true " with result "ra
nge_response_count:0 size:7" took too long (115.223655ms) to execute                                                                                                 
2018-10-22 11:22:16.364641 W | etcdserver: read-only range request "key:\"/registry/configmaps/\" range_end:\"/registry/configmaps0\" " with result "range_response_c
ount:1 size:2564" took too long (117.026891ms) to execute                                                                                                            
2018-10-22 11:22:18.298297 W | wal: sync duration of 1.84846442s, expected less than 1s                                                                             
2018-10-22 11:22:20.375327 W | wal: sync duration of 2.067539108s, expected less than 1s                                                                             
proto: no coders for int                                                      
proto: no encoder for ValueSize int [GetProperties]                                                                                                                  
2018-10-22 11:22:20.892736 W | etcdserver: request "header:<ID:16312995093338348449 username:\"kube-apiserver-etcd-client\" auth_revision:1 > txn:<compare:<target:MO
D key:\"/registry/serviceaccounts/kube-system/node-controller\" mod_revision:283 > success:<request_put:<key:\"/registry/serviceaccounts/kube-system/node-controller\
" value_size:166 >> failure:<>>" with result "size:16" took too long (516.492646ms) to execute    

What you expected to happen:

I expect the logs to be exempt of errors.

How to reproduce it (as minimally and precisely as possible):

Launch a kubeadm cluster with Kubernetes version v1.12.1

Anything else we need to know?:

When downgrading to Kubernetes v1.11.3 there are no error anymore, also, staying in v1.12.1 and manually downgrading etcd to version v3.2.18 (which is ship with kubernetes v1.11.3) workaround the issue.

Environment:

  • Kubernetes version (use kubectl version): v.1.12.1
  • Cloud provider or hardware configuration: aws
  • OS (e.g. from /etc/os-release): Coreos 1855.4.0
  • Kernel (e.g. uname -a): Linux ip-10-0-3-11.eu-west-1.compute.internal 4.14.67-coreos #1 SMP Mon Sep 10 23:14:26 UTC 2018 x86_64 Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz GenuineIntel GNU/Linux
  • Install tools: kubeadm
  • Others: etcd version 3.2.24 as per kubernetes 1.12.1
@ArchiFleKs

This comment has been minimized.

Copy link
Author

commented Oct 22, 2018

/kind bug

@ArchiFleKs

This comment has been minimized.

Copy link
Author

commented Oct 22, 2018

/sig api-machinery

@ArchiFleKs

This comment has been minimized.

Copy link
Author

commented Oct 22, 2018

@dims

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

/area etcd

@jennybuckley

This comment has been minimized.

Copy link
Contributor

commented Oct 25, 2018

@jingyih

This comment has been minimized.

Copy link
Contributor

commented Oct 25, 2018

Are you using SSD or HDD? (I think t2 and t3 instances could come with either SSD or HDD?) Do you see 'wal: sync duration of' warning message with etcd v3.2.18?

@ArchiFleKs

This comment has been minimized.

Copy link
Author

commented Oct 26, 2018

@jingyih you might be right, turns out my launch template are using HDD (standard and not GP2).

I'll retry with etcd 3.2.24 and report here.

To answer your question I have no such error with etcd 3.2.18

@lixianyang

This comment has been minimized.

Copy link

commented Oct 27, 2018

kubernetes version: 1.12.1
etcd version: 3.2.24
os: CentOS Linux release 7.5.1804
kernel: 3.10.0-862.2.3.el7.x86_64
vm on openstack, ceph rbd as storage, 16 core, 32G memory
load average: 0.45, 0.36, 0.41

free -h
              total        used        free      shared  buff/cache   available
Mem:            31G        5.5G        361M        1.6G         25G         23G
Swap:            0B          0B          0B
Oct 27 16:19:45 k8s-web3. etcd[24400]: finished scheduled compaction at 1435982 (took 1.77165ms)
Oct 27 16:24:45 k8s-web3. etcd[24400]: store.index: compact 1436519
Oct 27 16:24:45 k8s-web3. etcd[24400]: finished scheduled compaction at 1436519 (took 1.682657ms)
Oct 27 16:27:14 k8s-web3. etcd[24400]: read-only range request "key:\"/registry/initializerconfigurations/\" range_end:\"/registry/initializerconfigurations0\" " with result "range_response_count:0 size:6" took too long (103.929647ms) to execute
Oct 27 16:27:14 k8s-web3. etcd[24400]: read-only range request "key:\"/registry/initializerconfigurations/\" range_end:\"/registry/initializerconfigurations0\" " with result "range_response_count:0 size:6" took too long (104.75876ms) to execute
Oct 27 16:27:14 k8s-web3. etcd[24400]: read-only range request "key:\"/registry/initializerconfigurations/\" range_end:\"/registry/initializerconfigurations0\" " with result "range_response_count:0 size:6" took too long (104.778056ms) to execute
Oct 27 16:29:45 k8s-web3. etcd[24400]: store.index: compact 1437057
Oct 27 16:29:45 k8s-web3. etcd[24400]: finished scheduled compaction at 1437057 (took 2.345841ms)
Oct 27 16:30:10 k8s-web3. etcd[24400]: read-only range request "key:\"/registry/secrets\" range_end:\"/registry/secrett\" count_only:true " with result "range_response_count:0 size:8" took too long (530.546591ms) to execute
Oct 27 16:30:10 k8s-web3. etcd[24400]: read-only range request "key:\"/registry/priorityclasses\" range_end:\"/registry/priorityclasset\" count_only:true " with result "range_response_count:0 size:8" took too long (317.166687ms) to execute
@lixianyang

This comment has been minimized.

Copy link

commented Oct 27, 2018

Probablely the disk is too slow

https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md#what-does-the-etcd-warning-apply-entries-took-too-long-mean

https://github.com/etcd-io/etcd/blob/master/Documentation/metrics.md#disk

Etcd metrics in my cluster:

# HELP etcd_disk_wal_fsync_duration_seconds The latency distributions of fsync called by wal.
# TYPE etcd_disk_wal_fsync_duration_seconds histogram
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.001"} 0
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.002"} 0
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.004"} 98806
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.008"} 630289
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.016"} 770350
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.032"} 775197
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.064"} 776834
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.128"} 777738
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.256"} 778567
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.512"} 779201
etcd_disk_wal_fsync_duration_seconds_bucket{le="1.024"} 779595
etcd_disk_wal_fsync_duration_seconds_bucket{le="2.048"} 779788
etcd_disk_wal_fsync_duration_seconds_bucket{le="4.096"} 779875
etcd_disk_wal_fsync_duration_seconds_bucket{le="8.192"} 779897
etcd_disk_wal_fsync_duration_seconds_bucket{le="+Inf"} 779902
etcd_disk_wal_fsync_duration_seconds_sum 6224.105096278966
etcd_disk_wal_fsync_duration_seconds_count 779902
@ArchiFleKs

This comment has been minimized.

Copy link
Author

commented Oct 27, 2018

Hi, I tested with with gp2 ssd on AWS I have the same issue. I don’t have the same wal fsync duration error though.

Some colleague of mine has the same issue with rancher and the same etcd version on ssd also.

@ArchiFleKs

This comment has been minimized.

Copy link
Author

commented Oct 27, 2018

I’ll try with EBS optimized instance and also dedicated disk to rule out disk latency.

The cluster seems to function normally event with etcd 3.2.24.

@jpbetz

This comment has been minimized.

Copy link
Contributor

commented Oct 30, 2018

Please check the backend_commit_duration_seconds metric and report back (per https://github.com/etcd-io/etcd/blob/master/Documentation/metrics.md#disk). If the value is high we can attribute the issue to disk latency and if it's low we know we should investigate further.

@ArchiFleKs

This comment has been minimized.

Copy link
Author

commented Oct 31, 2018

@jpbetz here is what I have :

etcd_disk_backend_commit_duration_seconds_bucket{le="0.001"} 1
etcd_disk_backend_commit_duration_seconds_bucket{le="0.002"} 228127
etcd_disk_backend_commit_duration_seconds_bucket{le="0.004"} 348658
etcd_disk_backend_commit_duration_seconds_bucket{le="0.008"} 352308
etcd_disk_backend_commit_duration_seconds_bucket{le="0.016"} 354316
etcd_disk_backend_commit_duration_seconds_bucket{le="0.032"} 354594
etcd_disk_backend_commit_duration_seconds_bucket{le="0.064"} 354672
etcd_disk_backend_commit_duration_seconds_bucket{le="0.128"} 354757
etcd_disk_backend_commit_duration_seconds_bucket{le="0.256"} 354838
etcd_disk_backend_commit_duration_seconds_bucket{le="0.512"} 354841
etcd_disk_backend_commit_duration_seconds_bucket{le="1.024"} 354842
etcd_disk_backend_commit_duration_seconds_bucket{le="2.048"} 354842
etcd_disk_backend_commit_duration_seconds_bucket{le="4.096"} 354842
etcd_disk_backend_commit_duration_seconds_bucket{le="8.192"} 354842
etcd_disk_backend_commit_duration_seconds_bucket{le="+Inf"} 354842
etcd_disk_backend_commit_duration_seconds_sum 752.9406572860102
etcd_disk_backend_commit_duration_seconds_count 354842

I'm not sure how ot read this

@RiceBowlJr

This comment has been minimized.

Copy link

commented Oct 31, 2018

I have the same issue with my cluster, didn't notice before I saw that issue opened.
kubernetes version: 1.12.2
etcd version: 3.2.24
os: Ubuntu 16.04.5 LTS
kernel: Linux 4.9.58-xxxx-std-ipv6-64 #1 SMP Mon Oct 23 11:35:59 CEST 2017 x86_64 x86_64 x86_64 GNU/Linux
VM on kimsufi.com (Serveur KS-4C - i5-2300 - 16GB - 1x2To)
load average: 0.45, 0.36, 0.41

free -h
              total        used        free      shared  buff/cache   available
Mem:          7,7Gi       4,1Gi       1,1Gi       529Mi       2,5Gi       2,8Gi
Swap:         7,9Gi       1,7Gi       6,1Gi

Errors:

2018-10-29 13:47:59.656160 W | etcdserver: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-scheduler\" " with result "range_response_count:1 size:440" took too long (105.398841ms) to execute
2018-10-29 13:48:15.855931 W | etcdserver: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-controller-manager\" " with result "range_response_count:1 size:458" took too long (107.508688ms) to execute
2018-10-29 13:49:16.905572 W | etcdserver: read-only range request "key:\"/registry/events/kube-system/coredns-576cbf47c7-hzgxm.15621870fab0aa01\" " with result "range_response_count:1 size:495" took too long (125.232824ms) to execute
2018-10-29 13:49:16.905630 W | etcdserver: read-only range request "key:\"/registry/pods/kube-system/coredns-576cbf47c7-hzgxm\" " with result "range_response_count:1 size:1337" took too long (125.512723ms) to execute
2018-10-29 13:49:16.905667 W | etcdserver: read-only range request "key:\"/registry/pods/kube-system/coredns-576cbf47c7-hzgxm\" " with result "range_response_count:1 size:1337" took too long (125.429248ms) to execute
2018-10-29 13:49:26.897178 W | etcdserver: read-only range request "key:\"/registry/pods/kube-system/coredns-576cbf47c7-hzgxm\" " with result "range_response_count:1 size:1337" took too long (108.176701ms) to execute
2018-10-29 13:49:26.897266 W | etcdserver: read-only range request "key:\"/registry/events/kube-system/coredns-576cbf47c7-hzgxm.15621870fab0aa01\" " with result "range_response_count:1 size:495" took too long (107.939493ms) to execute
2018-10-29 13:49:26.897479 W | etcdserver: read-only range request "key:\"/registry/pods/kube-system/coredns-576cbf47c7-hzgxm\" " with result "range_response_count:1 size:1337" took too long (107.949024ms) to execute
2018-10-29 13:52:50.820939 W | etcdserver: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-controller-manager\" " with result "range_response_count:1 size:459" took too long (106.079668ms) to execute
@jpbetz

This comment has been minimized.

Copy link
Contributor

commented Oct 31, 2018

@ArchiFleKs It says of 354842 total operations, 228127 took less than or equal to .002 seconds, 348658 took less than or equal to .004 seconds,where the 348658 number includes those that took less than .002 seconds as well. There are a very small portion (85 to be exact) of disk backend commits taking over 128ms. I'm not well enough calibrated to say for sure if those number are out of healthy range or not, but they don't look particularly alarming. The wal: sync duration of 2.067539108s, expected less than 1s line from the original report does, however, look quite alarming and suggests high disk IO latency. How about your etcd_disk_wal_fsync_duration_seconds_bucket metric, @ArchiFleKs? Does that show anything?

@ArchiFleKs

This comment has been minimized.

Copy link
Author

commented Nov 2, 2018

Yes I agree for the original report, just to clarify, this is an empty cluster, with nothing on it, except a fresh cluster with kubeadm 1.12, and coredns on it.

What I find very weird is the fact that I have no issue with the same config and etcd 3.18. I'll check the etcd_disk_wal_fsync_duration_seconds_bucket and report asap

@ArchiFleKs

This comment has been minimized.

Copy link
Author

commented Nov 4, 2018

Hi,

etcd_disk_wal_fsync_duration_seconds_bucket{le="0.001"} 926288
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.002"} 1.269401e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.004"} 1.327218e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.008"} 1.332407e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.016"} 1.334417e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.032"} 1.334858e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.064"} 1.335046e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.128"} 1.335151e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.256"} 1.335183e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.512"} 1.335185e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="1.024"} 1.335187e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="2.048"} 1.335187e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="4.096"} 1.335188e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="8.192"} 1.335188e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="+Inf"} 1.335188e+06
etcd_disk_wal_fsync_duration_seconds_sum 1336.26308381703
etcd_disk_wal_fsync_duration_seconds_count 1.335188e+06
@jpbetz

This comment has been minimized.

Copy link
Contributor

commented Nov 5, 2018

@ArchiFleKs looks like there was one fsync that took between 2.048-4.096 seconds an two that took between 0.512-1.024 seconds. This would result in messages like the one you saw ("wal: sync duration of 2.067539108s, expected less than 1s" error"). https://github.com/etcd-io/etcd/blob/e8b940f268a80c14f7082589f60cbfd3de531d12/wal/wal.go#L572 both tallies this metric and logs the message.

If you're seeing the log message at higher rates than the metric suggests, that might require further investigation, but the log message does appear to correctly telling us that there was excessively high disk latency.

@lixianyang

This comment has been minimized.

Copy link

commented Nov 29, 2018

So, what's the suggestion of etcd version in 1.12.x ? 3.2.24 ?

@jpbetz

This comment has been minimized.

Copy link
Contributor

commented Nov 29, 2018

@lixianyang Yes, etcd 3.2.24 is recommended for k8s 1.12.x.

@kovetskiy

This comment has been minimized.

Copy link

commented Dec 10, 2018

is there any possible workaround? experiencing the same issue with azure, I also tried to run etcd on separated ssd disk, didn't help.

@kovetskiy

This comment has been minimized.

Copy link

commented Dec 12, 2018

@lixianyang if you still wonder, I tried 3.2.18 and it works better than 3.2.24, I don't see these read-only request took too long messages anymore.

@lostick

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2018

FYI, having the same issue with 3.2.25 on 1.12.3.

@jcrowthe

This comment has been minimized.

Copy link

commented Mar 14, 2019

As an update here, I'm seeing this same read-only range request ... took too long error in Azure k8s clusters with ~200 nodes, but notably the etcd data is mounted on ramdisk. This means that our 99th percentile backend commit duration is ~6ms, indicating this is not due to slowness of disk.

Have we identified if this is due to the version of etcd? Currently 3.2.24 is the recommended etcd version for 1.12 and 1.13, with 1.14 updating to 3.3.10 (source)

@lixianyang

This comment has been minimized.

Copy link

commented Apr 19, 2019

Same issue in etcd 3.3.10

@dotbalo

This comment has been minimized.

Copy link

commented Apr 29, 2019

My kubernetes cluster is v1.13.5 and etcd is v3.3.10.
Then I have the same error, but my etcd cluster uses an SSD hard drive. The read and write speed is as follows:

[k8s-user@k8s-master01 etcd]$ sudo dd if=/dev/zero of=testspeed bs=5M count=500
500+0 records in
500+0 records out
2621440000 bytes (2.6 GB) copied, 2.14125 s, 1.2 GB/s
[k8s-user@k8s-master01 etcd]$ sudo dd if=/dev/zero of=testspeed bs=5M count=500
500+0 records in
500+0 records out
2621440000 bytes (2.6 GB) copied, 3.47247 s, 755 MB/s
[k8s-user@k8s-master01 etcd]$ sudo dd if=/dev/zero of=testspeed bs=5M count=500
500+0 records in
500+0 records out
2621440000 bytes (2.6 GB) copied, 3.5717 s, 734 MB/s
[k8s-user@k8s-master01 etcd]$ sudo dd if=/dev/zero of=testspeed bs=5M count=500
500+0 records in
500+0 records out
2621440000 bytes (2.6 GB) copied, 3.58915 s, 730 MB/s
[k8s-user@k8s-master01 etcd]$ sudo dd of=/dev/zero if=testspeed bs=5M count=500
500+0 records in
500+0 records out
2621440000 bytes (2.6 GB) copied, 0.503606 s, 5.2 GB/s
[k8s-user@k8s-master01 etcd]$ sudo dd of=/dev/zero if=testspeed bs=5M count=500
500+0 records in
500+0 records out
2621440000 bytes (2.6 GB) copied, 0.614005 s, 4.3 GB/s
[k8s-user@k8s-master01 etcd]$ sudo dd of=/dev/zero if=testspeed bs=5M count=500
500+0 records in
500+0 records out
2621440000 bytes (2.6 GB) copied, 0.522527 s, 5.0 GB/s

error logs:

Apr 29 12:11:10 k8s-master01 etcd: read-only range request "key:\"/registry/daemonsets\" range_end:\"/registry/daemonsett\" count_only:true " with result "range_response_count:0 size:8" took too long (333.471477ms) to execute
Apr 29 12:11:14 k8s-master01 etcd: read-only range request "key:\"/registry/configmaps\" range_end:\"/registry/configmapt\" count_only:true " with result "range_response_count:0 size:8" took too long (221.436047ms) to execute
Apr 29 12:11:17 k8s-master01 etcd: read-only range request "key:\"/registry/configmaps\" range_end:\"/registry/configmapt\" count_only:true " with result "range_response_count:0 size:8" took too long (640.252868ms) to execute
Apr 29 12:11:21 k8s-master01 etcd: read-only range request "key:\"/registry/deployments\" range_end:\"/registry/deploymentt\" count_only:true " with result "range_response_count:0 size:8" took too long (179.700872ms) to execute
Apr 29 12:11:21 k8s-master01 etcd: read-only range request "key:\"/registry/secrets/kube-system/metrics-server-token-sccgm\" " with result "range_response_count:1 size:2761" took too long (184.019088ms) to execute
Apr 29 12:11:21 k8s-master01 etcd: read-only range request "key:\"/registry/services/specs\" range_end:\"/registry/services/spect\" count_only:true " with result "range_response_count:0 size:8" took too long (120.62712ms) to execute
Apr 29 12:11:21 k8s-master01 etcd: read-only range request "key:\"/registry/statefulsets\" range_end:\"/registry/statefulsett\" count_only:true " with result "range_response_count:0 size:8" took too long (158.092446ms) to execute

Is my disk too slow?

@Nefelim4ag

This comment has been minimized.

Copy link

commented Apr 29, 2019

@dotbalo , yep very-very slow ssd disk
(Your test are useless, you at least must use oflag/iflag direct/sync. Better use fio/ioping, because you must care about iops/latency, not throughput)

@dotbalo

This comment has been minimized.

Copy link

commented Apr 30, 2019

@Nefelim4ag Thank for your answer. I tested the BandWidth, IOPS, and Latency on this SSD hard drive separately. The test results are as follows:

IOPS Test:

Sequential Write
-----------------
[root@k8s-master03 temp]# fio --name=sequential_write_iops_test --filename=testfile --filesize=10G --time_based --ramp_time=2s --runtime=1m --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 --bs=4k --iodepth=4096 --rw=write --group_reporting


sequential_write_iops_test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=4096
fio-3.1
Starting 1 process
sequential_write_iops_test: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][71.6%][r=0KiB/s,w=98.5MiB/s][r=0,w=25.2k IOPS][eta 00m:25s]
sequential_write_iops_test: (groupid=0, jobs=1): err= 0: pid=10884: Tue Apr 30 10:24:39 2019
  write: IOPS=30.7k, BW=120MiB/s (126MB/s)(7212MiB/60005msec)
    slat (usec): min=4, max=90212, avg=28.25, stdev=293.83
    clat (msec): min=3, max=2266, avg=133.20, stdev=103.51
     lat (msec): min=3, max=2266, avg=133.23, stdev=103.53
    clat percentiles (msec):
     |  1.00th=[   57],  5.00th=[   67], 10.00th=[   72], 20.00th=[   82],
     | 30.00th=[   92], 40.00th=[  106], 50.00th=[  116], 60.00th=[  128],
     | 70.00th=[  144], 80.00th=[  165], 90.00th=[  201], 95.00th=[  236],
     | 99.00th=[  384], 99.50th=[  558], 99.90th=[ 1854], 99.95th=[ 2022],
     | 99.99th=[ 2265]
   bw (  KiB/s): min= 4672, max=255016, per=99.76%, avg=122775.19, stdev=48923.02, samples=120
   iops        : min= 1168, max=63754, avg=30693.76, stdev=12230.82, samples=120
  lat (msec)   : 4=0.01%, 10=0.01%, 20=0.02%, 50=0.06%, 100=35.66%
  lat (msec)   : 250=60.46%, 500=3.40%, 750=0.30%, 1000=0.04%, 2000=0.21%
  lat (msec)   : >=2000=0.05%
  cpu          : usr=9.00%, sys=40.71%, ctx=44121, majf=0, minf=33
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=105.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwt: total=0,1842133,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4096

Run status group 0 (all jobs):
  WRITE: bw=120MiB/s (126MB/s), 120MiB/s-120MiB/s (126MB/s-126MB/s), io=7212MiB (7562MB), run=60005-60005msec

Disk stats (read/write):
    dm-1: ios=0/1941738, merge=0/0, ticks=0/7197168, in_queue=7200241, util=99.88%, aggrios=0/1941216, aggrmerge=0/522, aggrticks=0/6178305, aggrin_queue=7143170, aggrutil=99.43%
  vdc: ios=0/1941216, merge=0/522, ticks=0/6178305, in_queue=7143170, util=99.43%

I think the performance of this ssd hard drive is not very good, and the latency of the hard drive is too high, but my k8s cluster is a new cluster, and I estimate that the node will not exceed 100. Will this disk performance affect the stability of the cluster?

@Nefelim4ag

This comment has been minimized.

Copy link

commented Apr 30, 2019

@dotbalo,
Have you any problems with your cluster?
I think not, if not, then which problem you try to solve and what do you here?

If you just see several records of "slow requests" that not a problem.
If you see that many times per day, see cluster freeze, slow requests for 1s+ you have REAL PROBLEM.


I mean:
You don't understand what you are doing.
You don't try to read how to measure drive performance.
You come here and ask "Are my disk too slow?".
You have a normal SOHO SSD disk, it's enough for etcd.

Current issue does not have direct relationship with disk io.

That's a internal etcd problem, and/or K8s key-value design issue,
which can cause slow range requests, and can cause because of slow write IO.

I also see that messages sometimes, but that cause nothing and will not.

@ghost

This comment has been minimized.

Copy link

commented May 4, 2019

I have met the same problem in k8s cluster, this warning occurs 10+/s every time controller-manager elect leader on new master, and this will last 30+ mins. In this time range, most of the requests latency will increase extremely, etcd backend commit latency as well.

Cluster Info:
Kubernete version: v1.13.4
Etcd version: 3.2.24
Scale: 5000 nodes, 3 masters, 3 etcd instances and running totally 55000+ pods.

rand-write: (groupid=3, jobs=1): err= 0: pid=5325: Sat May  4 10:56:03 2019
  write: IOPS=12.2k, BW=47.8MiB/s (50.2MB/s)(2870MiB/60001msec)
    slat (usec): min=2, max=413, avg= 5.06, stdev= 2.46
    clat (usec): min=12, max=20147, avg=319.02, stdev=589.52
     lat (usec): min=53, max=20152, avg=324.60, stdev=589.61
    clat percentiles (usec):
     |  1.00th=[   64],  5.00th=[   76], 10.00th=[   85], 20.00th=[   91],
     | 30.00th=[   95], 40.00th=[   99], 50.00th=[  104], 60.00th=[  113],
     | 70.00th=[  126], 80.00th=[  281], 90.00th=[  775], 95.00th=[ 2040],
     | 99.00th=[ 2606], 99.50th=[ 2671], 99.90th=[ 4555], 99.95th=[ 5800],                                                              
     | 99.99th=[10945]                                              
   bw (  KiB/s): min=39832, max=135936, per=99.94%, avg=48946.31, stdev=8554.08, samples=119                                            
   iops        : min= 9958, max=33984, avg=12236.56, stdev=2138.52, samples=119                                                         
  lat (usec)   : 20=0.01%, 50=0.01%, 100=43.32%, 250=36.04%, 500=3.87%                                                                  
  lat (usec)   : 750=6.23%, 1000=3.48%                              
  lat (msec)   : 2=1.84%, 4=5.09%, 10=0.10%, 20=0.01%, 50=0.01%     
  cpu          : usr=2.60%, sys=8.78%, ctx=290994, majf=0, minf=9   
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%                                                          
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%                                                         
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%                                                         
     issued rwt: total=0,734693,0, short=0,0,0, dropped=0,0,0       
     latency   : target=0, window=0, percentile=100.00%, depth=4   
@jingyih

This comment has been minimized.

Copy link
Contributor

commented May 6, 2019

As discussed in another thread, the warning message is added in v3.2.23. Between v3.2.18 and v3.2.24, there are no changes related to performance (based on etcd changelogs for each release). Is there an actual performance regression when using etcd v3.2.24 in K8s 1.12?

@jordanrinke

This comment has been minimized.

Copy link

commented Jun 12, 2019

I am using 3.3.10 with k8s 1.14 and seeing the same issue. My SSD volume gets around 600MBps. I was able to move the etcd data dir to an NVME drive that gets around 2980MBps and everything seems to be working fine now. So there is definitely a performance issue. This is an empty k8s cluster, it certainly shouldn't need NVME speeds for that. I can reproduce this by flipping the data volume to the SSD or the NVME drive, if there are any commands I could run / outputs I could provide that would be useful. When going back to the SSD drive the "took too long" errors start popping up in the log within a minute.

For me, while this is happening CoreDNS fails, because it can't look up any service records so it is in fact an impacting issue not just a warning.

@dotbalo

This comment has been minimized.

Copy link

commented Jun 12, 2019

@jordanrinke The warning "took to long" may not be a problem with SSD disk performance, or it may be caused by high network latency. If it is not very frequent, it has no effect on the cluster. Before I created a new cluster, "took too long" will appear. Later, I migrated the etcd data directory to other storage, and there is no warning, just like you.

@jordanrinke

This comment has been minimized.

Copy link

commented Jun 12, 2019

This is all local, single node, single master, so it certainly isn't network latency. It is constant, I can't perform any service lookups it will error out saying the request took too long. Which in my case I discovered because CoreDNS was not working, because it could not list any services from kubernetes. That goes away when I move it to the NVME, move it back to the SSD and the errors start again.

@bbotte

This comment has been minimized.

Copy link

commented Aug 2, 2019

Install etcd 3.3.13 or later to solve the problem, I have verified it.
No relationship with disk or network

@llussy

This comment has been minimized.

Copy link

commented Aug 17, 2019

I am using 3.3.13 with k8s 1.13.9 and seeing the same issue.

# etcd -version
etcd Version: 3.3.13
Git SHA: 98d3084
Go Version: go1.10.8
Go OS/Arch: linux/amd64

# kubectl  version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.9", GitCommit:"3e4f6a92de5f259ef313ad876bb008897f6a98f0", GitTreeState:"clean", BuildDate:"2019-08-05T09:22:00Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.9", GitCommit:"3e4f6a92de5f259ef313ad876bb008897f6a98f0", GitTreeState:"clean", BuildDate:"2019-08-05T09:14:29Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

# /var/log/messages    a lot of timeout
Aug 17 17:37:30 k8s-master kube-apiserver: Trace[30044365]: [751.474113ms] [751.310476ms] Object stored in database
Aug 17 17:37:30 k8s-master etcd: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-controller-manager\" " with result "range_response_count:1 size:464" took too long (468.584252ms) to execute
Aug 17 17:37:32 k8s-master etcd: read-only range request "key:\"/registry/events/\" range_end:\"/registry/events0\" limit:500 " with result "range_response_count:1 size:471" took too long (462.97586ms) to execute

# pidstat -d 1 10
Linux 4.20.2-1.el7.elrepo.x86_64 (k8s-master)         08/17/2019      _x86_64_        (24 CPU)

05:38:02 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:38:03 PM     0      8911      0.00      3.92      3.92  rsyslogd
05:38:03 PM     0     17571      0.00     27.45      0.00  etcd
05:38:03 PM   167     32227      0.00      3.92      0.00  ceph-mon

05:38:03 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:38:04 PM     0     17571      0.00     44.00      0.00  etcd

05:38:04 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:38:05 PM     0      8911      0.00      8.00      4.00  rsyslogd
05:38:05 PM     0     17571      0.00     56.00      0.00  etcd
05:38:05 PM     0     25082      0.00      8.00      8.00  confd
05:38:05 PM   167     32227      0.00      8.00      0.00  ceph-mon
05:38:05 PM     0    667331      0.00     16.00      0.00  monitor-agent

05:38:05 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:38:06 PM     0      8911      0.00      4.00      4.00  rsyslogd
05:38:06 PM     0     17571      0.00     76.00      0.00  etcd
05:38:06 PM     0     25082      0.00     16.00     16.00  confd
05:38:06 PM   167     32227      0.00      4.00      0.00  ceph-mon

etcd write is not high

@bbotte

This comment has been minimized.

Copy link

commented Aug 19, 2019

I am using 3.3.13 with k8s 1.13.9 and seeing the same issue.

# etcd -version
etcd Version: 3.3.13
Git SHA: 98d3084
Go Version: go1.10.8
Go OS/Arch: linux/amd64

# kubectl  version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.9", GitCommit:"3e4f6a92de5f259ef313ad876bb008897f6a98f0", GitTreeState:"clean", BuildDate:"2019-08-05T09:22:00Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.9", GitCommit:"3e4f6a92de5f259ef313ad876bb008897f6a98f0", GitTreeState:"clean", BuildDate:"2019-08-05T09:14:29Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

# /var/log/messages    a lot of timeout
Aug 17 17:37:30 k8s-master kube-apiserver: Trace[30044365]: [751.474113ms] [751.310476ms] Object stored in database
Aug 17 17:37:30 k8s-master etcd: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-controller-manager\" " with result "range_response_count:1 size:464" took too long (468.584252ms) to execute
Aug 17 17:37:32 k8s-master etcd: read-only range request "key:\"/registry/events/\" range_end:\"/registry/events0\" limit:500 " with result "range_response_count:1 size:471" took too long (462.97586ms) to execute

# pidstat -d 1 10
Linux 4.20.2-1.el7.elrepo.x86_64 (k8s-master)         08/17/2019      _x86_64_        (24 CPU)

05:38:02 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:38:03 PM     0      8911      0.00      3.92      3.92  rsyslogd
05:38:03 PM     0     17571      0.00     27.45      0.00  etcd
05:38:03 PM   167     32227      0.00      3.92      0.00  ceph-mon

05:38:03 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:38:04 PM     0     17571      0.00     44.00      0.00  etcd

05:38:04 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:38:05 PM     0      8911      0.00      8.00      4.00  rsyslogd
05:38:05 PM     0     17571      0.00     56.00      0.00  etcd
05:38:05 PM     0     25082      0.00      8.00      8.00  confd
05:38:05 PM   167     32227      0.00      8.00      0.00  ceph-mon
05:38:05 PM     0    667331      0.00     16.00      0.00  monitor-agent

05:38:05 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:38:06 PM     0      8911      0.00      4.00      4.00  rsyslogd
05:38:06 PM     0     17571      0.00     76.00      0.00  etcd
05:38:06 PM     0     25082      0.00     16.00     16.00  confd
05:38:06 PM   167     32227      0.00      4.00      0.00  ceph-mon

etcd write is not high

You look at it for a few more days, and compare it with before, whether it has affected the cluster anymore. Occasional errors can be ignored

@dotbalo

This comment has been minimized.

Copy link

commented Aug 19, 2019

I think this is caused by the slow disk or high network latency. I also had this warning before. Then I changed the data directory of etcd to a high-performance SSD storage, and there was almost no warning of "took too long".
Here is the monitor diagram for my etcd:

image

And My cluster version:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:26:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

etcd version:

$ etcd -version
etcd Version: 3.3.9
Git SHA: fca8add78
Go Version: go1.10.3
Go OS/Arch: linux/amd64

So I think it is caused by slow disk , not VERSION.

@llussy

This comment has been minimized.

Copy link

commented Aug 19, 2019

I am using 3.3.13 with k8s 1.13.9 and seeing the same issue.

# etcd -version
etcd Version: 3.3.13
Git SHA: 98d3084
Go Version: go1.10.8
Go OS/Arch: linux/amd64

# kubectl  version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.9", GitCommit:"3e4f6a92de5f259ef313ad876bb008897f6a98f0", GitTreeState:"clean", BuildDate:"2019-08-05T09:22:00Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.9", GitCommit:"3e4f6a92de5f259ef313ad876bb008897f6a98f0", GitTreeState:"clean", BuildDate:"2019-08-05T09:14:29Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

# /var/log/messages    a lot of timeout
Aug 17 17:37:30 k8s-master kube-apiserver: Trace[30044365]: [751.474113ms] [751.310476ms] Object stored in database
Aug 17 17:37:30 k8s-master etcd: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-controller-manager\" " with result "range_response_count:1 size:464" took too long (468.584252ms) to execute
Aug 17 17:37:32 k8s-master etcd: read-only range request "key:\"/registry/events/\" range_end:\"/registry/events0\" limit:500 " with result "range_response_count:1 size:471" took too long (462.97586ms) to execute

# pidstat -d 1 10
Linux 4.20.2-1.el7.elrepo.x86_64 (k8s-master)         08/17/2019      _x86_64_        (24 CPU)

05:38:02 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:38:03 PM     0      8911      0.00      3.92      3.92  rsyslogd
05:38:03 PM     0     17571      0.00     27.45      0.00  etcd
05:38:03 PM   167     32227      0.00      3.92      0.00  ceph-mon

05:38:03 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:38:04 PM     0     17571      0.00     44.00      0.00  etcd

05:38:04 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:38:05 PM     0      8911      0.00      8.00      4.00  rsyslogd
05:38:05 PM     0     17571      0.00     56.00      0.00  etcd
05:38:05 PM     0     25082      0.00      8.00      8.00  confd
05:38:05 PM   167     32227      0.00      8.00      0.00  ceph-mon
05:38:05 PM     0    667331      0.00     16.00      0.00  monitor-agent

05:38:05 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:38:06 PM     0      8911      0.00      4.00      4.00  rsyslogd
05:38:06 PM     0     17571      0.00     76.00      0.00  etcd
05:38:06 PM     0     25082      0.00     16.00     16.00  confd
05:38:06 PM   167     32227      0.00      4.00      0.00  ceph-mon

etcd write is not high

You look at it for a few more days, and compare it with before, whether it has affected the cluster anymore. Occasional errors can be ignored

it's OK now,I did not do anything. now disk io normal. I don't know what was affected before.
IO diagram:(IO high for 33 hours)
image

@bbotte

This comment has been minimized.

Copy link

commented Aug 20, 2019

I think this is caused by the slow disk or high network latency. I also had this warning before. Then I changed the data directory of etcd to a high-performance SSD storage, and there was almost no warning of "took too long".
Here is the monitor diagram for my etcd:

image

And My cluster version:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:26:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

etcd version:

$ etcd -version
etcd Version: 3.3.9
Git SHA: fca8add78
Go Version: go1.10.3
Go OS/Arch: linux/amd64

So I think it is caused by slow disk , not VERSION.

etcd-io/etcd#10808 etcd-io/bbolt#141 look at these , i hope can help you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.