Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdv3 revision glitch #11643

Closed
johscheuer opened this issue Feb 21, 2020 · 1 comment
Closed

etcdv3 revision glitch #11643

johscheuer opened this issue Feb 21, 2020 · 1 comment

Comments

@johscheuer
Copy link

In our current setup we see a very strange behaviour of etcdv3. We write some keys at once (in a very short interval in a programmatic way) and then we use some tooling to read these values again. In our setup we see that etcdctl and our tools like confd report different values with a different revision which is pretty strange (I dropped the actual values). All I did was etcdctl get ? --prefix:

{"header":{"cluster_id":8332270229614305663,"member_id":4452025200488535725,"revision":98324,"raft_term":140},"kvs":[],"count":42}
{"header":{"cluster_id":8332270229614305663,"member_id":17822275587060487155,"revision":60820,"raft_term":140},"kvs":[],"count":32}
{"header":{"cluster_id":8332270229614305663,"member_id":4452025200488535725,"revision":98324,"raft_term":140},"kvs":[],"count":42}
{"header":{"cluster_id":8332270229614305663,"member_id":17822275587060487155,"revision":60820,"raft_term":140},"kvs":[],"count":32}
{"header":{"cluster_id":8332270229614305663,"member_id":4452025200488535725,"revision":98324,"raft_term":140},"kvs":[],"count":42}
{"header":{"cluster_id":8332270229614305663,"member_id":17822275587060487155,"revision":60820,"raft_term":140},"kvs":[],"count":32}
{"header":{"cluster_id":8332270229614305663,"member_id":17822275587060487155,"revision":60820,"raft_term":140},"kvs":[],"count":32}

and even if we wait over an our there is a revision glitch (it seems like it depends which node answers):

{"header":{"cluster_id":8332270229614305663,"member_id":17822275587060487155,"revision":60886,"raft_term":140},"kvs":[],"count":38}
{"header":{"cluster_id":8332270229614305663,"member_id":4452025200488535725,"revision":99223,"raft_term":140},"kvs":[],"count":42}
{"header":{"cluster_id":8332270229614305663,"member_id":17822275587060487155,"revision":60886,"raft_term":140},"kvs":[],"count":38}
{"header":{"cluster_id":8332270229614305663,"member_id":17822275587060487155,"revision":60886,"raft_term":140},"kvs":[],"count":38}

The funny thing is that there are no error messages in the log (we also tried to defrag the nodes without any success).

This is the member status (one thing that confuses me is the difference in the database size):

etcdctl --endpoints=... endpoint status  --cluster -w table
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://10.76.184.242:2379 | 20256066780ab59a |   3.4.3 |   52 MB |      true |       140 |   10918428 |
| https://10.76.184.241:2379 | 3dc8c554fe34d6ad |   3.4.3 |   44 MB |     false |       140 |   10918433 |
| https://10.76.184.243:2379 | f75571384fdb53f3 |   3.4.3 |   26 MB |     false |       140 |   10918434 |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+

Our version:

etcd --version
etcd Version: 3.4.3
Git SHA: 3cf2f69b5
Go Version: go1.12.12
Go OS/Arch: linux/amd64

Config:

sudo systemctl cat etcd
# /etc/systemd/system/etcd.service
[Unit]
Description=etcd
Documentation=https://github.com/coreos

[Service]
ExecStart=/usr/local/bin/etcd \
  --advertise-client-urls="https://10.76.184.242:2379" \
  --auto-compaction-retention="0" \
  --cert-file="/etc/etcd/cert.pem" \
  --client-cert-auth="false" \
  --data-dir="/var/lib/etcd" \
  --election-timeout="1000" \
  --enable-v2="true" \
  --heartbeat-interval="100" \
  --initial-advertise-peer-urls="https://10.76.184.242:2380" \
  --initial-cluster="node01=https://10.76.184.241:2380,node02=https://10.76.184.242:2380,node03=https://10.76.184.243:2380" \
  --initial-cluster-state="new" \
  --initial-cluster-token="etcd-cluster-0" \
  --key-file="/etc/etcd/cert-key.pem" \
  --listen-client-urls="https://10.76.184.242:2379,https://127.0.0.1:2379" \
  --listen-peer-urls="https://10.76.184.242:2380" \
  --log-output="default" \
  --log-package-levels="" \
  --max-snapshots="5" \
  --max-wals="5" \
  --name="node02" \
  --peer-cert-file="/etc/etcd/cert.pem" \
  --peer-client-cert-auth="true" \
  --peer-key-file="/etc/etcd/cert-key.pem" \
  --peer-trusted-ca-file="/etc/etcd/ca.pem" \
  --quota-backend-bytes="0" \
  --snapshot-count="100000" \
  --trusted-ca-file="" \
  --wal-dir="" 
Restart=on-failure
RestartSec=5
Type=notify

[Install]
WantedBy=multi-user.target

and the OS:

cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
@johscheuer
Copy link
Author

Since #11651 is merged I close this one too (was the same error).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant