Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watch dropping an event when compacting on delete #18089

Open
4 tasks
serathius opened this issue May 28, 2024 · 37 comments
Open
4 tasks

Watch dropping an event when compacting on delete #18089

serathius opened this issue May 28, 2024 · 37 comments
Assignees
Labels
area/robustness-testing priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release/v3.4 release/v3.5 type/bug

Comments

@serathius
Copy link
Member

Bug report criteria

What happened?

When testing #17833 I encountered watch breaking reliable guarantee.

Report included
1716928418720891814.zip

What did you expect to happen?

Watch should not break guarantee

How can we reproduce it (as minimally and precisely as possible)?

TODO

Anything else we need to know?

No response

Etcd version (please run commands below)

v3.5

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

logger.go:146: 2024-05-28T22:33:31.699+0200 INFO    starting proxy...       {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "from": "https://localhost:20013", "to": "https://localhost:20011"}
    logger.go:146: 2024-05-28T22:33:31.699+0200 INFO    starting proxy...       {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0", "from": "https://localhost:20003", "to": "https://localhost:20001"}
    logger.go:146: 2024-05-28T22:33:31.699+0200 INFO    starting proxy...       {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1", "from": "https://localhost:20008", "to": "https://localhost:20006"}
    logger.go:146: 2024-05-28T22:33:31.699+0200 INFO    starting server...      {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
    logger.go:146: 2024-05-28T22:33:31.699+0200 INFO    starting server...      {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0"}
    logger.go:146: 2024-05-28T22:33:31.699+0200 INFO    spawning process        {"args": ["/tmp/etcd-release-3.5-failpoints/bin/etcd", "--name=TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "--listen-client-urls=http://localhost:20010", "--advertise-client-urls=http://localhost:20010", "--listen-peer-urls=https://localhost:20011", "--initial-advertise-peer-urls=https://localhost:20013", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdHighTrafficClusterOfSize34219590533/003", "--snapshot-count=1000", "--peer-cert-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/server.crt", "--peer-key-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/server.key.insecure", "--peer-trusted-ca-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/ca.crt", "--initial-cluster-token=new", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=1000", "--experimental-snapshot-catchup-entries=100", "--experimental-compaction-batch-limit=1000", "--initial-cluster=TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0=https://localhost:20003,TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1=https://localhost:20008,TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2=https://localhost:20013", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory1597671579/001", "name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:32381", "ETCD_VERIFY=all", "PATH=/home/serathius/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.3.linux-amd64/bin:/home/serathius/.gvm/gos/go1.22/bin:/home/serathius/bin:/home/serathius//bin/:/home/serathius/.gvm/pkgsets/go1.22/global/bin:/home/serathius/.gvm/gos/go1.22/bin:/home/serathius/.gvm/pkgsets/go1.22/global/overlay/bin:/home/serathius/.gvm/bin:/home/serathius/.gvm/bin:/home/serathius/.cabal/bin/:/home/serathius/.local/bin:/home/serathius/.local/share/umake/bin:/home/serathius/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
    logger.go:146: 2024-05-28T22:33:31.699+0200 INFO    spawning process        {"args": ["/tmp/etcd-release-3.5-failpoints/bin/etcd", "--name=TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=https://localhost:20001", "--initial-advertise-peer-urls=https://localhost:20003", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdHighTrafficClusterOfSize34219590533/001", "--snapshot-count=1000", "--peer-cert-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/server.crt", "--peer-key-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/server.key.insecure", "--peer-trusted-ca-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/ca.crt", "--experimental-watch-progress-notify-interval=100ms", "--initial-cluster-token=new", "--experimental-snapshot-catchup-entries=100", "--experimental-compaction-batch-limit=1000", "--snapshot-count=1000", "--initial-cluster=TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0=https://localhost:20003,TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1=https://localhost:20008,TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2=https://localhost:20013", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory1597671579/001", "name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "PATH=/home/serathius/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.3.linux-amd64/bin:/home/serathius/.gvm/gos/go1.22/bin:/home/serathius/bin:/home/serathius//bin/:/home/serathius/.gvm/pkgsets/go1.22/global/bin:/home/serathius/.gvm/gos/go1.22/bin:/home/serathius/.gvm/pkgsets/go1.22/global/overlay/bin:/home/serathius/.gvm/bin:/home/serathius/.gvm/bin:/home/serathius/.cabal/bin/:/home/serathius/.local/bin:/home/serathius/.local/share/umake/bin:/home/serathius/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
    logger.go:146: 2024-05-28T22:33:31.699+0200 INFO    starting server...      {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1"}
    logger.go:146: 2024-05-28T22:33:31.699+0200 INFO    spawning process        {"args": ["/tmp/etcd-release-3.5-failpoints/bin/etcd", "--name=TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1", "--listen-client-urls=http://localhost:20005", "--advertise-client-urls=http://localhost:20005", "--listen-peer-urls=https://localhost:20006", "--initial-advertise-peer-urls=https://localhost:20008", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdHighTrafficClusterOfSize34219590533/002", "--snapshot-count=1000", "--peer-cert-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/server.crt", "--peer-key-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/server.key.insecure", "--peer-trusted-ca-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/ca.crt", "--experimental-snapshot-catchup-entries=100", "--initial-cluster-token=new", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=1000", "--experimental-compaction-batch-limit=1000", "--initial-cluster=TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0=https://localhost:20003,TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1=https://localhost:20008,TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2=https://localhost:20013", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory1597671579/001", "name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:22381", "ETCD_VERIFY=all", "PATH=/home/serathius/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.3.linux-amd64/bin:/home/serathius/.gvm/gos/go1.22/bin:/home/serathius/bin:/home/serathius//bin/:/home/serathius/.gvm/pkgsets/go1.22/global/bin:/home/serathius/.gvm/gos/go1.22/bin:/home/serathius/.gvm/pkgsets/go1.22/global/overlay/bin:/home/serathius/.gvm/bin:/home/serathius/.gvm/bin:/home/serathius/.cabal/bin/:/home/serathius/.local/bin:/home/serathius/.local/share/umake/bin:/home/serathius/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
    logger.go:146: 2024-05-28T22:33:33.927+0200 INFO    started server. {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1", "pid": 191460}
    logger.go:146: 2024-05-28T22:33:33.927+0200 INFO    started server. {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0", "pid": 191459}
    logger.go:146: 2024-05-28T22:33:33.927+0200 INFO    started server. {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "pid": 191458}
    logger.go:146: 2024-05-28T22:33:33.931+0200 INFO    Start traffic
    logger.go:146: 2024-05-28T22:33:34.969+0200 INFO    Triggering failpoint    {"failpoint": "raftBeforeLeaderSend=panic()"}
    cluster.go:1066: members agree on a leader, members:map[12416079282240904009:1 13770228943176794332:2 16914881897345358027:0] , leader:13770228943176794332
    logger.go:146: 2024-05-28T22:33:35.012+0200 INFO    Setting up gofailpoint  {"failpoint": "raftBeforeLeaderSend=panic()"}
    logger.go:146: 2024-05-28T22:33:35.012+0200 INFO    Waiting for member to exit      {"member": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
    logger.go:146: 2024-05-28T22:33:35.019+0200 INFO    server exited   {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "code": 2}
    logger.go:146: 2024-05-28T22:33:35.019+0200 INFO    Member exited as expected       {"member": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
    logger.go:146: 2024-05-28T22:33:35.019+0200 INFO    starting server...      {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
    logger.go:146: 2024-05-28T22:33:35.019+0200 INFO    spawning process        {"args": ["/tmp/etcd-release-3.5-failpoints/bin/etcd", "--name=TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "--listen-client-urls=http://localhost:20010", "--advertise-client-urls=http://localhost:20010", "--listen-peer-urls=https://localhost:20011", "--initial-advertise-peer-urls=https://localhost:20013", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdHighTrafficClusterOfSize34219590533/003", "--snapshot-count=1000", "--peer-cert-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/server.crt", "--peer-key-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/server.key.insecure", "--peer-trusted-ca-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/ca.crt", "--initial-cluster-token=new", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=1000", "--experimental-snapshot-catchup-entries=100", "--experimental-compaction-batch-limit=1000", "--initial-cluster=TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0=https://localhost:20003,TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1=https://localhost:20008,TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2=https://localhost:20013", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory1597671579/001", "name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:32381", "ETCD_VERIFY=all", "PATH=/home/serathius/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.3.linux-amd64/bin:/home/serathius/.gvm/gos/go1.22/bin:/home/serathius/bin:/home/serathius//bin/:/home/serathius/.gvm/pkgsets/go1.22/global/bin:/home/serathius/.gvm/gos/go1.22/bin:/home/serathius/.gvm/pkgsets/go1.22/global/overlay/bin:/home/serathius/.gvm/bin:/home/serathius/.gvm/bin:/home/serathius/.cabal/bin/:/home/serathius/.local/bin:/home/serathius/.local/share/umake/bin:/home/serathius/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
    logger.go:146: 2024-05-28T22:33:36.129+0200 INFO    started server. {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "pid": 191560}
    logger.go:146: 2024-05-28T22:33:36.130+0200 INFO    Finished triggering failpoint   {"failpoint": "raftBeforeLeaderSend=panic()"}
    logger.go:146: 2024-05-28T22:33:37.138+0200 INFO    Finished traffic
    logger.go:146: 2024-05-28T22:33:38.141+0200 INFO    Reporting complete traffic      {"successes": 2340, "failures": 74, "successRate": 0.9693454846727423, "period": "3.207153256s", "qps": 729.6190151257306}
    logger.go:146: 2024-05-28T22:33:38.141+0200 INFO    Reporting traffic before failure injection      {"successes": 1250, "failures": 1, "successRate": 0.9992006394884093, "period": "1.038661462s", "qps": 1203.4720125198983}
    logger.go:146: 2024-05-28T22:33:38.141+0200 INFO    Reporting traffic during failure injection      {"successes": 64, "failures": 41, "successRate": 0.6095238095238096, "period": "1.161063679s", "qps": 55.12186898751485}
    logger.go:146: 2024-05-28T22:33:38.141+0200 INFO    Reporting traffic after failure injection       {"successes": 1026, "failures": 32, "successRate": 0.9697542533081286, "period": "1.007428115s", "qps": 1018.4349480856011}
    logger.go:146: 2024-05-28T22:33:38.141+0200 INFO    Finished simulating traffic     {"max-revision": 1107}
    logger.go:146: 2024-05-28T22:33:38.202+0200 INFO    Validating linearizable operations      {"timeout": "5m0s"}
    logger.go:146: 2024-05-28T22:33:38.697+0200 INFO    Linearization success   {"duration": "494.602505ms"}
    logger.go:146: 2024-05-28T22:33:38.698+0200 INFO    Validating watch
    logger.go:146: 2024-05-28T22:33:38.715+0200 ERROR   Broke watch guarantee   {"guarantee": "reliable", "client": 4, "diff": "  []model.PersistedEvent{\n  \t... // 725 identical elements\n  \t{Event: {Type: \"put-operation\", Key: \"key4\", Value: {Value: \"632\"}}, Revision: 627},\n  \t{Event: {Type: \"put-operation\", Key: \"key7\", Value: {Value: \"641\"}}, Revision: 628},\n- \t{Event: model.Event{Type: \"delete-operation\", Key: \"key0\"}, Revision: 628},\n  \t{Event: {Type: \"put-operation\", Key: \"key5\", Value: {Value: \"642\"}}, Revision: 628},\n  \t{Event: {Type: \"put-operation\", Key: \"key0\", Value: {Value: \"643\"}}, Revision: 629, ...},\n  \t... // 555 identical elements\n  }\n"}
    validate.go:49: Failed validating watch history, err: broke Reliable - a sequence of events will never drop any subsequence of events; if there are events ordered in time as a < b < c, then if the watch receives events a and c, it is guaranteed to receive b
    logger.go:146: 2024-05-28T22:33:38.715+0200 INFO    Validating serializable operations
    logger.go:146: 2024-05-28T22:33:38.721+0200 INFO    Saving robustness test report   {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814"}
    logger.go:146: 2024-05-28T22:33:38.721+0200 INFO    Saving member data dir  {"member": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0", "path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/server-TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0"}
    logger.go:146: 2024-05-28T22:33:38.721+0200 INFO    Saving member data dir  {"member": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1", "path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/server-TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1"}
    logger.go:146: 2024-05-28T22:33:38.721+0200 INFO    Saving member data dir  {"member": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/server-TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
    logger.go:146: 2024-05-28T22:33:38.721+0200 INFO    Saving watch operations {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-1/watch.json"}
    logger.go:146: 2024-05-28T22:33:38.721+0200 INFO    no KV operations for client, skip persisting    {"client-id": 1}
    logger.go:146: 2024-05-28T22:33:38.721+0200 INFO    no watch operations for client, skip persisting {"client-id": 2}
    logger.go:146: 2024-05-28T22:33:38.721+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-2/operations.json"}
    logger.go:146: 2024-05-28T22:33:38.721+0200 INFO    Saving watch operations {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-3/watch.json"}
    logger.go:146: 2024-05-28T22:33:38.722+0200 INFO    no KV operations for client, skip persisting    {"client-id": 3}
    logger.go:146: 2024-05-28T22:33:38.722+0200 INFO    Saving watch operations {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-4/watch.json"}
    logger.go:146: 2024-05-28T22:33:38.722+0200 INFO    no KV operations for client, skip persisting    {"client-id": 4}
    logger.go:146: 2024-05-28T22:33:38.722+0200 INFO    no watch operations for client, skip persisting {"client-id": 5}
    logger.go:146: 2024-05-28T22:33:38.722+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-5/operations.json"}
    logger.go:146: 2024-05-28T22:33:38.723+0200 INFO    no watch operations for client, skip persisting {"client-id": 6}
    logger.go:146: 2024-05-28T22:33:38.723+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-6/operations.json"}
    logger.go:146: 2024-05-28T22:33:38.723+0200 INFO    no watch operations for client, skip persisting {"client-id": 7}
    logger.go:146: 2024-05-28T22:33:38.723+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-7/operations.json"}
    logger.go:146: 2024-05-28T22:33:38.724+0200 INFO    no watch operations for client, skip persisting {"client-id": 8}
    logger.go:146: 2024-05-28T22:33:38.724+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-8/operations.json"}
    logger.go:146: 2024-05-28T22:33:38.724+0200 INFO    no watch operations for client, skip persisting {"client-id": 9}
    logger.go:146: 2024-05-28T22:33:38.725+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-9/operations.json"}
    logger.go:146: 2024-05-28T22:33:38.725+0200 INFO    no watch operations for client, skip persisting {"client-id": 10}
    logger.go:146: 2024-05-28T22:33:38.725+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-10/operations.json"}
    logger.go:146: 2024-05-28T22:33:38.726+0200 INFO    no watch operations for client, skip persisting {"client-id": 11}
    logger.go:146: 2024-05-28T22:33:38.726+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-11/operations.json"}
    logger.go:146: 2024-05-28T22:33:38.726+0200 INFO    no watch operations for client, skip persisting {"client-id": 12}
    logger.go:146: 2024-05-28T22:33:38.726+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-12/operations.json"}
    logger.go:146: 2024-05-28T22:33:38.727+0200 INFO    no watch operations for client, skip persisting {"client-id": 13}
    logger.go:146: 2024-05-28T22:33:38.727+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-13/operations.json"}
    logger.go:146: 2024-05-28T22:33:38.727+0200 INFO    no watch operations for client, skip persisting {"client-id": 14}
    logger.go:146: 2024-05-28T22:33:38.727+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-14/operations.json"}
    logger.go:146: 2024-05-28T22:33:38.728+0200 INFO    no watch operations for client, skip persisting {"client-id": 15}
    logger.go:146: 2024-05-28T22:33:38.728+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-15/operations.json"}
    logger.go:146: 2024-05-28T22:33:38.728+0200 INFO    no watch operations for client, skip persisting {"client-id": 16}
    logger.go:146: 2024-05-28T22:33:38.728+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-16/operations.json"}
    logger.go:146: 2024-05-28T22:33:38.729+0200 INFO    Saving visualization    {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/history.html"}
    logger.go:146: 2024-05-28T22:33:38.772+0200 INFO    killing server...       {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0"}
    logger.go:146: 2024-05-28T22:33:38.772+0200 INFO    killing server...       {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1"}
    logger.go:146: 2024-05-28T22:33:38.772+0200 INFO    killing server...       {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
    logger.go:146: 2024-05-28T22:33:38.772+0200 INFO    stopping server...      {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
    logger.go:146: 2024-05-28T22:33:38.772+0200 INFO    stopping server...      {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0"}
    logger.go:146: 2024-05-28T22:33:38.772+0200 INFO    stopping server...      {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1"}
    logger.go:146: 2024-05-28T22:33:38.777+0200 INFO    stopped server. {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0"}
    logger.go:146: 2024-05-28T22:33:38.777+0200 INFO    stopping proxy...       {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0"}
    logger.go:146: 2024-05-28T22:33:38.777+0200 INFO    stopped server. {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1"}
    logger.go:146: 2024-05-28T22:33:38.777+0200 INFO    stopping proxy...       {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1"}
    logger.go:146: 2024-05-28T22:33:38.777+0200 INFO    stopped server. {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
    logger.go:146: 2024-05-28T22:33:38.777+0200 INFO    stopping proxy...       {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
@serathius serathius changed the title Watch missed a delete event from multi TXN [robustness tests] Watch missed a delete event from multi TXN May 28, 2024
@serathius
Copy link
Member Author

serathius commented May 29, 2024

cc @MadhavJivrajani @siyuanfoundation @fuweid

Can someone take a look and double check my findings?

@fuweid
Copy link
Member

fuweid commented Jun 4, 2024

/assign

@serathius
Copy link
Member Author

I rerun the report on latest version of #17833 and it found that linearization was broken. Looks like a false positive due to some bug.

@serathius serathius reopened this Jun 7, 2024
@serathius
Copy link
Member Author

serathius commented Jun 7, 2024

Ok, ok I encountered another case.

Failpoint used: raftBeforeLeaderSend=panic()
Scenario: TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1
Etcd version:

./bin/etcd --version
etcd Version: 3.6.0-alpha.0
Git SHA: 6f916d428
Go Version: go1.22.4
Go OS/Arch: linux/amd64

Report: 1717749910517706368.tar.gz

Found when adding compaction to operations serathius@5959110

Logs showing broken watch guarantee due to missing log:

=== RUN   TestDataReports/1717749910517706368
    logger.go:146: 2024-06-07T11:31:24.546+0200 INFO    Validating linearizable operations      {"timeout": "5m0s"}
    logger.go:146: 2024-06-07T11:31:24.950+0200 INFO    Linearization success   {"duration": "404.080944ms"}
    logger.go:146: 2024-06-07T11:31:24.951+0200 INFO    Validating watch
    logger.go:146: 2024-06-07T11:31:24.960+0200 ERROR   Broke watch guarantee   {"guarantee": "reliable", "client": 2}
  []model.PersistedEvent{
        ... // 154 identical elements
        {Event: {Type: "put-operation", Key: "key3", Value: {Hash: 3071434148}}, Revision: 139},
        {Event: {Type: "put-operation", Key: "key5", Value: {Hash: 2150768673}}, Revision: 140},
-       {Event: model.Event{Type: "delete-operation", Key: "key6"}, Revision: 141},
        {Event: {Type: "put-operation", Key: "key8", Value: {Value: "121"}}, Revision: 141, ...},
        {Event: {Type: "put-operation", Key: "key9", Value: {Value: "122"}}, Revision: 141},
        ... // 134 identical elements
  }

    validate.go:49: Failed validating watch history, err: broke Reliable - a sequence of events will never drop any subsequence of events; if there are events ordered in time as a < b < c, then if the watch receives events a and c, it is guaranteed to receive b
    logger.go:146: 2024-06-07T11:31:24.960+0200 INFO    Validating serializable operations
    logger.go:146: 2024-06-07T11:31:24.967+0200 INFO    Saving visualization    {"path": "./src/go.etcd.io/etcd/tests/robustness/testdata/1717749910517706368/history.html"}

Screenshot showing a transaction for revision 141 and fact that key "key6" was present so delete really happened.
image

Watch response showing missing delete event
image

Complete log:

=== RUN   TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1
(./etcd_--version) (1897882): etcd Version: 3.6.0-alpha.0
(./etcd_--version) (1897882): Git SHA: 6f916d428
(./etcd_--version) (1897882): Go Version: go1.22.4
(./etcd_--version) (1897882): Go OS/Arch: linux/amd64
(./etcd_--version) (1897894): etcd Version: 3.6.0-alpha.0
(./etcd_--version) (1897894): Git SHA: 6f916d428
(./etcd_--version) (1897894): Go Version: go1.22.4
(./etcd_--version) (1897894): Go OS/Arch: linux/amd64
    logger.go:146: 2024-06-07T10:45:06.314+0200 INFO    starting server...      {"name": "member-0"}
    logger.go:146: 2024-06-07T10:45:06.314+0200 INFO    spawning process        {"args": ["./etcd", "--name=member-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001", "--snapshot-count=1000", "--experimental-compaction-batch-limit=100", "--initial-cluster-token=new", "--election-timeout=521", "--experimental-snapshot-catchup-entries=100", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=1000", "--heartbeat-interval=101", "--initial-cluster=member-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory2482998486/001", "name": "member-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/toolchain@v0.0.1-go1.22.4.linux-amd64/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.3237+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_VERIFY=all"}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.323787+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_UNSUPPORTED_ARCH=amd64"}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.32395+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.323991+0200","caller":"etcdmain/etcd.go:63","msg":"Running: ","args":["./etcd","--name=member-0","--listen-client-urls=http://localhost:20000","--advertise-client-urls=http://localhost:20000","--listen-peer-urls=http://localhost:20001","--initial-advertise-peer-urls=http://localhost:20001","--initial-cluster-token=new","--data-dir","/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","--snapshot-count=1000","--experimental-compaction-batch-limit=100","--initial-cluster-token=new","--election-timeout=521","--experimental-snapshot-catchup-entries=100","--experimental-watch-progress-notify-interval=100ms","--snapshot-count=1000","--heartbeat-interval=101","--initial-cluster=member-0=http://localhost:20001","--initial-cluster-state=new"]}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.3244+0200","caller":"etcdmain/etcd.go:123","msg":"Initialize and start etcd server","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","dir-type":"empty"}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.324433+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.324445+0200","caller":"embed/etcd.go:125","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:20001"]}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.324785+0200","caller":"embed/etcd.go:133","msg":"configuring client listeners","listen-client-urls":["http://localhost:20000"]}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.324963+0200","caller":"embed/etcd.go:314","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"6f916d428","go-version":"go1.22.4","go-os":"linux","go-arch":"amd64","max-cpu-set":24,"max-cpu-available":24,"member-initialized":false,"name":"member-0","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","wal-dir":"","wal-dir-dedicated":"","member-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member","force-new-cluster":false,"heartbeat-interval":"101ms","election-timeout":"505ms","initial-election-tick-advance":true,"snapshot-count":1000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":100,"initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[],"experimental-set-member-localaddr":false,"experimental-local-address":"","cors":["*"],"host-whitelist":["*"],"initial-cluster":"member-0=http://localhost:20001","initial-cluster-state":"new","initial-cluster-token":"new","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.325529+0200","logger":"bbolt","caller":"backend/backend.go:197","msg":"Opening db file (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db) with mode 2d72772d2d2d2d2d2d2d and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: hashmap, ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc0003943b0}"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.332264+0200","logger":"bbolt","caller":"bbolt@v1.4.0-alpha.1/db.go:321","msg":"Opening bbolt db (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db) successfully"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.3323+0200","caller":"storage/backend.go:80","msg":"opened backend db","path":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db","took":"6.84944ms"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.332419+0200","caller":"etcdserver/bootstrap.go:93","msg":"bootstrapping cluster"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.332446+0200","caller":"etcdserver/bootstrap.go:100","msg":"bootstrapping storage"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345644+0200","caller":"etcdserver/bootstrap.go:108","msg":"bootstrapping raft"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345682+0200","caller":"etcdserver/bootstrap.go:493","msg":"starting local member","local-member-id":"ca50e9357181d758","cluster-id":"34f27e83b3bc2ff"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345701+0200","caller":"etcdserver/server.go:306","msg":"bootstrap successfully"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345758+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=()"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.34579+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:870","msg":"ca50e9357181d758 became follower at term 0"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345798+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:483","msg":"newRaft ca50e9357181d758 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345809+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:870","msg":"ca50e9357181d758 became follower at term 1"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.34584+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=(14578408409545168728)"}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.360383+0200","caller":"auth/store.go:1134","msg":"simple token is not cryptographically signed"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.364937+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":1}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372639+0200","caller":"storage/quota.go:93","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372684+0200","caller":"etcdserver/server.go:585","msg":"starting etcd server","local-member-id":"ca50e9357181d758","local-server-version":"3.6.0-alpha.0","cluster-version":"to_be_decided"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372729+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372781+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap","suffix":"snap","max":5,"interval":"30s"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372793+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/wal","suffix":"wal","max":5,"interval":"30s"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372768+0200","caller":"etcdserver/server.go:470","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"ca50e9357181d758","forward-ticks":4,"forward-duration":"404ms","election-ticks":5,"election-timeout":"505ms"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372839+0200","caller":"embed/etcd.go:610","msg":"serving peer traffic","address":"127.0.0.1:20001"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372854+0200","caller":"embed/etcd.go:582","msg":"cmux::serve","address":"127.0.0.1:20001"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.37288+0200","caller":"embed/etcd.go:283","msg":"now serving peer/client/metrics","local-member-id":"ca50e9357181d758","initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[]}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372974+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=(14578408409545168728)"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.3731+0200","caller":"membership/cluster.go:411","msg":"added member","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","added-peer-id":"ca50e9357181d758","added-peer-peer-urls":["http://localhost:20001"],"added-peer-is-learner":false}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548218+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:956","msg":"ca50e9357181d758 is starting a new election at term 1"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548313+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:899","msg":"ca50e9357181d758 became pre-candidate at term 1"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548384+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1043","msg":"ca50e9357181d758 received MsgPreVoteResp from ca50e9357181d758 at term 1"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548402+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgPreVoteResp votes and 0 vote rejections"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548425+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:883","msg":"ca50e9357181d758 became candidate at term 2"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.559268+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1043","msg":"ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 2"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.559325+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgVoteResp votes and 0 vote rejections"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.559374+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:938","msg":"ca50e9357181d758 became leader at term 2"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.559391+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/node.go:370","msg":"raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 2"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.561842+0200","caller":"etcdserver/server.go:2318","msg":"setting up initial cluster version using v3 API","cluster-version":"3.6"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.56395+0200","caller":"etcdserver/server.go:1774","msg":"published local member to cluster through raft","local-member-id":"ca50e9357181d758","local-member-attributes":"{Name:member-0 ClientURLs:[http://localhost:20000]}","cluster-id":"34f27e83b3bc2ff","publish-timeout":"6.01s"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.56395+0200","caller":"embed/serve.go:104","msg":"ready to serve client requests"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.564116+0200","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.564142+0200","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.564779+0200","caller":"v3rpc/health.go:62","msg":"grpc service status changed","service":"","status":"SERVING"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.565461+0200","caller":"membership/cluster.go:594","msg":"set initial cluster version","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","cluster-version":"3.6"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.565615+0200","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.6"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.565717+0200","caller":"version/monitor.go:116","msg":"cluster version differs from storage version.","cluster-version":"3.6.0","storage-version":"3.5.0"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.565719+0200","caller":"etcdserver/server.go:2338","msg":"cluster version is updated","cluster-version":"3.6"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.56587+0200","caller":"schema/migration.go:65","msg":"updated storage version","new-storage-version":"3.6.0"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.567317+0200","caller":"embed/serve.go:188","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"127.0.0.1:20000"}
    logger.go:146: 2024-06-07T10:45:06.570+0200 INFO    started server. {"name": "member-0", "pid": 1897904}
    logger.go:146: 2024-06-07T10:45:06.577+0200 INFO    Start traffic
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.592935+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":3}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.593112+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":3,"took":"120.206µs","hash":1734659494,"current-db-size-bytes":20480,"current-db-size":"20 kB","current-db-size-in-use-bytes":16384,"current-db-size-in-use":"16 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.593139+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1734659494,"revision":3,"compact-revision":-1}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.694305+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":37}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.700835+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":37,"took":"6.440584ms","hash":2304164051,"current-db-size-bytes":303104,"current-db-size":"303 kB","current-db-size-in-use-bytes":274432,"current-db-size-in-use":"274 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.700886+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":2304164051,"revision":37,"compact-revision":3}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.742881+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":58}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.749877+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":58,"took":"6.926064ms","hash":1646957282,"current-db-size-bytes":376832,"current-db-size":"377 kB","current-db-size-in-use-bytes":335872,"current-db-size-in-use":"336 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.749911+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1646957282,"revision":58,"compact-revision":37}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.802825+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":77}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.813113+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":77,"took":"10.227838ms","hash":539014635,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":417792,"current-db-size-in-use":"418 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.813147+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":539014635,"revision":77,"compact-revision":58}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.85343+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":85}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.857414+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":85,"took":"3.90014ms","hash":1467298815,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":229376,"current-db-size-in-use":"229 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.857438+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1467298815,"revision":85,"compact-revision":77}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.074676+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":97}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.079062+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":97,"took":"4.321109ms","hash":4113805127,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":114688,"current-db-size-in-use":"115 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.079099+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":4113805127,"revision":97,"compact-revision":85}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.148722+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":99}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.153407+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":99,"took":"4.623808ms","hash":1249150654,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":110592,"current-db-size-in-use":"111 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.153465+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1249150654,"revision":99,"compact-revision":97}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.198642+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":104}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.202576+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":104,"took":"3.889339ms","hash":1699712947,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":114688,"current-db-size-in-use":"115 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.202591+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1699712947,"revision":104,"compact-revision":99}
    logger.go:146: 2024-06-07T10:45:07.573+0200 INFO    Triggering failpoint    {"failpoint": "raftBeforeLeaderSend=panic()"}
./etcdctl (./etcdctl_--endpoints=http://localhost:20000_--command-timeout=1.1s_get_0_-w_json_-w_json_-w_json) (1897931): {"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":139,"raft_term":2}}
./etcdctl (./etcdctl_--endpoints=http://localhost:20000_endpoint_status_-w_json) (1897941): [{"Endpoint":"http://localhost:20000","Status":{"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":140,"raft_term":2},"version":"3.6.0-alpha.0","dbSize":450560,"leader":14578408409545168728,"raftIndex":178,"raftTerm":2,"raftAppliedIndex":177,"dbSizeInUse":294912,"storageVersion":"3.6.0"}}]
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.601459+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":137}
    cluster.go:1066: members agree on a leader, members:map[14578408409545168728:0] , leader:14578408409545168728
    logger.go:146: 2024-06-07T10:45:07.608+0200 INFO    Setting up gofailpoint  {"failpoint": "raftBeforeLeaderSend=panic()"}
    logger.go:146: 2024-06-07T10:45:07.608+0200 INFO    Waiting for member to exit      {"member": "member-0"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.610769+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":137,"took":"9.25467ms","hash":583134963,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":368640,"current-db-size-in-use":"369 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.610806+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":583134963,"revision":137,"compact-revision":104}
member-0 (1897904): panic: failpoint panic: {}
member-0 (1897904): 
member-0 (1897904): goroutine 154 [running]:
member-0 (1897904): go.etcd.io/gofail/runtime.actPanic(0x10100000000?)
member-0 (1897904):        go.etcd.io/gofail@v0.1.0/runtime/terms.go:318 +0x65
member-0 (1897904): go.etcd.io/gofail/runtime.(*term).do(...)
member-0 (1897904):        go.etcd.io/gofail@v0.1.0/runtime/terms.go:290
member-0 (1897904): go.etcd.io/gofail/runtime.(*terms).eval(0xc00016fbc0?)
member-0 (1897904):        go.etcd.io/gofail@v0.1.0/runtime/terms.go:105 +0xe3
member-0 (1897904): go.etcd.io/gofail/runtime.(*Failpoint).Acquire(0xc0001f9c90?)
member-0 (1897904):        go.etcd.io/gofail@v0.1.0/runtime/failpoint.go:38 +0x98
member-0 (1897904): go.etcd.io/etcd/server/v3/etcdserver.(*raftNode).start.func1()
member-0 (1897904):        go.etcd.io/etcd/server/v3/etcdserver/raft.go:229 +0x716
member-0 (1897904): created by go.etcd.io/etcd/server/v3/etcdserver.(*raftNode).start in goroutine 168
member-0 (1897904):        go.etcd.io/etcd/server/v3/etcdserver/raft.go:166 +0x6f
    logger.go:146: 2024-06-07T10:45:07.617+0200 INFO    server exited   {"name": "member-0", "code": 2}
    logger.go:146: 2024-06-07T10:45:07.617+0200 INFO    Member exited as expected       {"member": "member-0"}
    logger.go:146: 2024-06-07T10:45:07.617+0200 INFO    starting server...      {"name": "member-0"}
    logger.go:146: 2024-06-07T10:45:07.617+0200 INFO    spawning process        {"args": ["./etcd", "--name=member-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001", "--snapshot-count=1000", "--experimental-compaction-batch-limit=100", "--initial-cluster-token=new", "--election-timeout=521", "--experimental-snapshot-catchup-entries=100", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=1000", "--heartbeat-interval=101", "--initial-cluster=member-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory2482998486/001", "name": "member-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/toolchain@v0.0.1-go1.22.4.linux-amd64/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.627613+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_VERIFY=all"}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.627706+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_UNSUPPORTED_ARCH=amd64"}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.627725+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.62775+0200","caller":"etcdmain/etcd.go:63","msg":"Running: ","args":["./etcd","--name=member-0","--listen-client-urls=http://localhost:20000","--advertise-client-urls=http://localhost:20000","--listen-peer-urls=http://localhost:20001","--initial-advertise-peer-urls=http://localhost:20001","--initial-cluster-token=new","--data-dir","/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","--snapshot-count=1000","--experimental-compaction-batch-limit=100","--initial-cluster-token=new","--election-timeout=521","--experimental-snapshot-catchup-entries=100","--experimental-watch-progress-notify-interval=100ms","--snapshot-count=1000","--heartbeat-interval=101","--initial-cluster=member-0=http://localhost:20001","--initial-cluster-state=new"]}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.627828+0200","caller":"etcdmain/etcd.go:106","msg":"server has already been initialized","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","dir-type":"member"}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.627848+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.627856+0200","caller":"embed/etcd.go:125","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:20001"]}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.628737+0200","caller":"embed/etcd.go:133","msg":"configuring client listeners","listen-client-urls":["http://localhost:20000"]}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.628956+0200","caller":"embed/etcd.go:314","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"6f916d428","go-version":"go1.22.4","go-os":"linux","go-arch":"amd64","max-cpu-set":24,"max-cpu-available":24,"member-initialized":true,"name":"member-0","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","wal-dir":"","wal-dir-dedicated":"","member-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member","force-new-cluster":false,"heartbeat-interval":"101ms","election-timeout":"505ms","initial-election-tick-advance":true,"snapshot-count":1000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":100,"initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[],"experimental-set-member-localaddr":false,"experimental-local-address":"","cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.62947+0200","logger":"bbolt","caller":"backend/backend.go:197","msg":"Opening db file (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db) with mode 2d72772d2d2d2d2d2d2d and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: hashmap, ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc000126248}"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.636941+0200","logger":"bbolt","caller":"bbolt@v1.4.0-alpha.1/db.go:321","msg":"Opening bbolt db (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db) successfully"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.637029+0200","caller":"storage/backend.go:80","msg":"opened backend db","path":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db","took":"7.640918ms"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.637902+0200","caller":"etcdserver/bootstrap.go:440","msg":"No snapshot found. Recovering WAL from scratch!"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.637928+0200","caller":"etcdserver/bootstrap.go:231","msg":"recovered v3 backend","backend-size-bytes":520192,"backend-size":"520 kB","backend-size-in-use-bytes":126976,"backend-size-in-use":"127 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.639787+0200","caller":"etcdserver/bootstrap.go:579","msg":"restarting local member","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","commit-index":178}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.639855+0200","caller":"etcdserver/bootstrap.go:93","msg":"bootstrapping cluster"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640008+0200","caller":"etcdserver/bootstrap.go:100","msg":"bootstrapping storage"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640301+0200","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.6"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640316+0200","caller":"membership/cluster.go:291","msg":"recovered/added member from store","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","recovered-remote-peer-id":"ca50e9357181d758","recovered-remote-peer-urls":["http://localhost:20001"],"recovered-remote-peer-is-learner":false}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640327+0200","caller":"membership/cluster.go:301","msg":"set cluster version from store","cluster-version":"3.6"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640337+0200","caller":"etcdserver/bootstrap.go:108","msg":"bootstrapping raft"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640351+0200","caller":"etcdserver/server.go:306","msg":"bootstrap successfully"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640396+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=()"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640423+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:870","msg":"ca50e9357181d758 became follower at term 2"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640438+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:483","msg":"newRaft ca50e9357181d758 [peers: [], term: 2, commit: 178, applied: 0, lastindex: 179, lastterm: 2]"}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.644606+0200","caller":"auth/store.go:1134","msg":"simple token is not cryptographically signed"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.648367+0200","caller":"mvcc/kvstore.go:330","msg":"restored last compact revision","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":137}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.648847+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":140}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660157+0200","caller":"storage/quota.go:93","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660204+0200","caller":"etcdserver/server.go:576","msg":"starting etcd server","local-member-id":"ca50e9357181d758","local-server-version":"3.6.0-alpha.0","cluster-id":"34f27e83b3bc2ff","cluster-version":"3.6"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660292+0200","caller":"etcdserver/server.go:470","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"ca50e9357181d758","forward-ticks":4,"forward-duration":"404ms","election-ticks":5,"election-timeout":"505ms"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660307+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660366+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap","suffix":"snap","max":5,"interval":"30s"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660347+0200","caller":"embed/etcd.go:610","msg":"serving peer traffic","address":"127.0.0.1:20001"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660385+0200","caller":"embed/etcd.go:283","msg":"now serving peer/client/metrics","local-member-id":"ca50e9357181d758","initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[]}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660397+0200","caller":"embed/etcd.go:582","msg":"cmux::serve","address":"127.0.0.1:20001"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660382+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/wal","suffix":"wal","max":5,"interval":"30s"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660464+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=(14578408409545168728)"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.66053+0200","caller":"membership/cluster.go:420","msg":"ignore already added member","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","added-peer-id":"ca50e9357181d758","added-peer-peer-urls":["http://localhost:20001"],"added-peer-is-learner":false}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660578+0200","caller":"membership/cluster.go:586","msg":"updated cluster version","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","from":"3.6","to":"3.6"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.944302+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:956","msg":"ca50e9357181d758 is starting a new election at term 2"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.944353+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:899","msg":"ca50e9357181d758 became pre-candidate at term 2"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.944416+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1043","msg":"ca50e9357181d758 received MsgPreVoteResp from ca50e9357181d758 at term 2"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.94444+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgPreVoteResp votes and 0 vote rejections"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.944485+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:883","msg":"ca50e9357181d758 became candidate at term 3"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.95921+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1043","msg":"ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 3"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.959254+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgVoteResp votes and 0 vote rejections"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.959286+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:938","msg":"ca50e9357181d758 became leader at term 3"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.9593+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/node.go:370","msg":"raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 3"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.964178+0200","caller":"etcdserver/server.go:1774","msg":"published local member to cluster through raft","local-member-id":"ca50e9357181d758","local-member-attributes":"{Name:member-0 ClientURLs:[http://localhost:20000]}","cluster-id":"34f27e83b3bc2ff","publish-timeout":"6.01s"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.964176+0200","caller":"embed/serve.go:104","msg":"ready to serve client requests"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.964348+0200","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.96437+0200","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.964837+0200","caller":"v3rpc/health.go:62","msg":"grpc service status changed","service":"","status":"SERVING"}
    logger.go:146: 2024-06-07T10:45:07.966+0200 INFO    started server. {"name": "member-0", "pid": 1897950}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.966818+0200","caller":"embed/serve.go:188","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"127.0.0.1:20000"}
    logger.go:146: 2024-06-07T10:45:07.967+0200 INFO    Finished triggering failpoint   {"failpoint": "raftBeforeLeaderSend=panic()"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.640086+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":141}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.647928+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":141,"took":"7.766192ms","hash":361777187,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":258048,"current-db-size-in-use":"258 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.647968+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":361777187,"revision":141,"compact-revision":137}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.660836+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":149}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.668671+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":149,"took":"7.779358ms","hash":3929730471,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":262144,"current-db-size-in-use":"262 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.668707+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":3929730471,"revision":149,"compact-revision":141}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.679649+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":158}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.691774+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":158,"took":"12.045594ms","hash":1641924706,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":225280,"current-db-size-in-use":"225 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.691827+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1641924706,"revision":158,"compact-revision":149}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.691851+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":159}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.699572+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":159,"took":"7.690631ms","hash":1830603898,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":147456,"current-db-size-in-use":"148 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.6996+0200","caller":"mvcc/kvstore.go:248","msg":"previous compaction was interrupted, skip storing compaction hash value"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.69961+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":161}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.703539+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":161,"took":"3.907723ms","hash":545936533,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":188416,"current-db-size-in-use":"188 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.703569+0200","caller":"mvcc/kvstore.go:248","msg":"previous compaction was interrupted, skip storing compaction hash value"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.726023+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":168}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.730396+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":168,"took":"4.324136ms","hash":4120988724,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":233472,"current-db-size-in-use":"234 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.730417+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":4120988724,"revision":168,"compact-revision":161}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.797393+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":213}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.801495+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":213,"took":"4.014924ms","hash":2299796790,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":319488,"current-db-size-in-use":"320 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.801512+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":2299796790,"revision":213,"compact-revision":168}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.822852+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":225}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.828806+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":225,"took":"5.900397ms","hash":3531019373,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":65536,"current-db-size-in-use":"66 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.828848+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":3531019373,"revision":225,"compact-revision":213}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.944933+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":240}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.948934+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":240,"took":"3.945123ms","hash":2274426623,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":69632,"current-db-size-in-use":"70 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.948977+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":2274426623,"revision":240,"compact-revision":225}
    logger.go:146: 2024-06-07T10:45:09.002+0200 INFO    Finished traffic
    logger.go:146: 2024-06-07T10:45:10.014+0200 INFO    Reporting complete traffic      {"successes": 678, "failures": 31, "successRate": 0.9562764456981664, "period": "2.425517855s", "qps": 279.52793610748336}
    logger.go:146: 2024-06-07T10:45:10.014+0200 INFO    Reporting traffic before failure injection      {"successes": 402, "failures": 0, "successRate": 1, "period": "996.95569ms", "qps": 403.22754966171067}
    logger.go:146: 2024-06-07T10:45:10.014+0200 INFO    Reporting traffic during failure injection      {"successes": 7, "failures": 14, "successRate": 0.3333333333333333, "period": "393.883383ms", "qps": 17.77175758643263}
    logger.go:146: 2024-06-07T10:45:10.014+0200 INFO    Reporting traffic after failure injection       {"successes": 269, "failures": 17, "successRate": 0.9405594405594405, "period": "1.034678782s", "qps": 259.98406914272647}
    logger.go:146: 2024-06-07T10:45:10.015+0200 INFO    Finished simulating traffic     {"max-revision": 248}
    logger.go:146: 2024-06-07T10:45:10.027+0200 INFO    Validating linearizable operations      {"timeout": "5m0s"}
    logger.go:146: 2024-06-07T10:45:10.502+0200 INFO    Linearization success   {"duration": "475.746216ms"}
    logger.go:146: 2024-06-07T10:45:10.503+0200 INFO    Validating watch
    logger.go:146: 2024-06-07T10:45:10.509+0200 ERROR   Broke watch guarantee   {"guarantee": "reliable", "client": 2, "diff": "  []model.PersistedEvent{\n  \t... // 154 identical elements\n  \t{Event: {Type: \"put-operation\", Key: \"key3\", Value: {Hash: 3071434148}}, Revision: 139},\n  \t{Event: {Type: \"put-operation\", Key: \"key5\", Value: {Hash: 2150768673}}, Revision: 140},\n- \t{Event: model.Event{Type: \"delete-operation\", Key: \"key6\"}, Revision: 141},\n  \t{Event: {Type: \"put-operation\", Key: \"key8\", Value: {Value: \"121\"}}, Revision: 141, ...},\n  \t{Event: {Type: \"put-operation\", Key: \"key9\", Value: {Value: \"122\"}}, Revision: 141},\n  \t... // 134 identical elements\n  }\n"}
    validate.go:49: Failed validating watch history, err: broke Reliable - a sequence of events will never drop any subsequence of events; if there are events ordered in time as a < b < c, then if the watch receives events a and c, it is guaranteed to receive b
    logger.go:146: 2024-06-07T10:45:10.509+0200 INFO    Validating serializable operations
    logger.go:146: 2024-06-07T10:45:10.517+0200 INFO    Saving robustness test report   {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368"}
    logger.go:146: 2024-06-07T10:45:10.517+0200 INFO    Saving member data dir  {"member": "member-0", "path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/server-member-0"}
    logger.go:146: 2024-06-07T10:45:10.518+0200 INFO    no watch operations for client, skip persisting {"client-id": 1}
    logger.go:146: 2024-06-07T10:45:10.518+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-1/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.519+0200 INFO    Saving watch operations {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-2/watch.json"}
    logger.go:146: 2024-06-07T10:45:10.519+0200 INFO    no KV operations for client, skip persisting    {"client-id": 2}
    logger.go:146: 2024-06-07T10:45:10.519+0200 INFO    no watch operations for client, skip persisting {"client-id": 3}
    logger.go:146: 2024-06-07T10:45:10.519+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-3/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.520+0200 INFO    no watch operations for client, skip persisting {"client-id": 4}
    logger.go:146: 2024-06-07T10:45:10.520+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-4/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.522+0200 INFO    no watch operations for client, skip persisting {"client-id": 5}
    logger.go:146: 2024-06-07T10:45:10.522+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-5/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.523+0200 INFO    no watch operations for client, skip persisting {"client-id": 6}
    logger.go:146: 2024-06-07T10:45:10.523+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-6/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.524+0200 INFO    no watch operations for client, skip persisting {"client-id": 7}
    logger.go:146: 2024-06-07T10:45:10.524+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-7/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.525+0200 INFO    no watch operations for client, skip persisting {"client-id": 8}
    logger.go:146: 2024-06-07T10:45:10.525+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-8/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.526+0200 INFO    no watch operations for client, skip persisting {"client-id": 9}
    logger.go:146: 2024-06-07T10:45:10.526+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-9/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.527+0200 INFO    no watch operations for client, skip persisting {"client-id": 10}
    logger.go:146: 2024-06-07T10:45:10.527+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-10/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.528+0200 INFO    Saving visualization    {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/history.html"}
    logger.go:146: 2024-06-07T10:45:10.583+0200 INFO    killing server...       {"name": "member-0"}
    logger.go:146: 2024-06-07T10:45:10.583+0200 INFO    stopping server...      {"name": "member-0"}
    logger.go:146: 2024-06-07T10:45:10.588+0200 INFO    stopped server. {"name": "member-0"}

@serathius serathius added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jun 7, 2024
@serathius
Copy link
Member Author

@serathius
Copy link
Member Author

serathius commented Jun 7, 2024

@fuweid Think this is related to #17780

See the following logs:


member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.648367+0200","caller":"mvcc/kvstore.go:330","msg":"restored last compact revision","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":137}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.648847+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":140}
...
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.966818+0200","caller":"embed/serve.go:188","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"127.0.0.1:20000"}
    logger.go:146: 2024-06-07T10:45:07.967+0200 INFO    Finished triggering failpoint   {"failpoint": "raftBeforeLeaderSend=panic()"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.640086+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":141}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.647928+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":141,"took":"7.766192ms","hash":361777187,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":258048,"current-db-size-in-use":"258 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.647968+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":361777187,"revision":141,"compact-revision":137}

Looks like we lost first delete after restoring from restoring from "last compact revision".
Off by one error on tombstone?

@fuweid
Copy link
Member

fuweid commented Jun 7, 2024

Yes. Let me try to reproduce it in my local. Update it later.

@serathius serathius changed the title [robustness tests] Watch missed a delete event from multi TXN [robustness tests] Watch dropping a delete event when resuming from crash Jun 7, 2024
@serathius
Copy link
Member Author

serathius commented Jun 7, 2024

I managed to reproduce it without multi TXN, just a delete request. This extends the impact to Kubernetes.

Again the common thing is that etcd misses a watch even if it restores kvstore on revision before a delete.

In last repro:

    logger.go:146: 2024-06-07T14:46:59.319+0200 ERROR   Broke watch guarantee   {"guarantee": "reliable", "client": 1}
  []model.PersistedEvent{
        ... // 160 identical elements
        {Event: {Type: "delete-operation", Key: "key6"}, Revision: 140},
        {Event: {Type: "put-operation", Key: "key9", Value: {Value: "138"}}, Revision: 141, ...},
-       {Event: model.Event{Type: "delete-operation", Key: "key7"}, Revision: 142},
        {Event: {Type: "put-operation", Key: "key3", Value: {Value: "147"}}, Revision: 143},
        {Event: {Type: "put-operation", Key: "key3", Value: {Value: "148"}}, Revision: 144},
        ... // 79 identical elements
  }

etcd bootstrap log:

{"level":"info","ts":"2024-06-07T14:46:56.832687+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":141}

@serathius
Copy link
Member Author

serathius commented Jun 7, 2024

Something similar reproduced on v3.4, however this time it was long after crash, and it flagged broken resumable properly. Which just means it provided incorrect event on a newly opened watch which provided revision.

=== RUN   TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1
    logger.go:146: 2024-06-07T15:10:05.793+0200 INFO    starting server...      {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
    logger.go:146: 2024-06-07T15:10:05.793+0200 INFO    spawning process        {"args": ["/tmp/etcd-release-3.4-failpoints/bin/etcd", "--name=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001", "--snapshot-count=1000", "--snapshot-count=1000", "--initial-cluster-token=new", "--experimental-compaction-batch-limit=100", "--experimental-watch-progress-notify-interval=100ms", "--initial-cluster=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory1937017586/001", "name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/toolchain@v0.0.1-go1.22.4.linux-amd64/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
member-0 (2063001): WARNING: Package "github.com/golang/protobuf/protoc-gen-go/generator" is deprecated.
member-0 (2063001):     A future release of golang/protobuf will delete this package,
member-0 (2063001):     which has long been excluded from the compatibility promise.
member-0 (2063001): 
member-0 (2063001): 2024-06-07 15:10:05.801650 W | pkg/flags: unrecognized environment variable ETCD_VERIFY=all
member-0 (2063001): 2024-06-07 15:10:05.801710 W | pkg/flags: unrecognized environment variable ETCD_UNSUPPORTED_ARCH=amd64
member-0 (2063001): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
member-0 (2063001): 2024-06-07 15:10:05.801731 W | embed: Running http and grpc server on single port. This is not recommended for production.
member-0 (2063001): 2024-06-07 15:10:05.801737 I | etcdmain: etcd Version: 3.4.32
member-0 (2063001): 2024-06-07 15:10:05.801741 I | etcdmain: Git SHA: 88ff600-FAILPOINTS
member-0 (2063001): 2024-06-07 15:10:05.801745 I | etcdmain: Go Version: go1.22.0
member-0 (2063001): 2024-06-07 15:10:05.801750 I | etcdmain: Go OS/Arch: linux/amd64
member-0 (2063001): 2024-06-07 15:10:05.801755 I | etcdmain: setting maximum number of CPUs to 24, total number of available CPUs is 24
member-0 (2063001): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
member-0 (2063001): 2024-06-07 15:10:05.802103 W | embed: Running http and grpc server on single port. This is not recommended for production.
member-0 (2063001): 2024-06-07 15:10:05.802482 I | embed: name = TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0
member-0 (2063001): 2024-06-07 15:10:05.802489 I | embed: data dir = /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001
member-0 (2063001): 2024-06-07 15:10:05.802494 I | embed: member dir = /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member
member-0 (2063001): 2024-06-07 15:10:05.802500 I | embed: heartbeat = 100ms
member-0 (2063001): 2024-06-07 15:10:05.802505 I | embed: election = 1000ms
member-0 (2063001): 2024-06-07 15:10:05.802510 I | embed: snapshot count = 1000
member-0 (2063001): 2024-06-07 15:10:05.802521 I | embed: advertise client URLs = http://localhost:20000
member-0 (2063001): 2024-06-07 15:10:05.823121 I | etcdserver: starting member ca50e9357181d758 in cluster 34f27e83b3bc2ff
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 switched to configuration voters=()
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 became follower at term 0
member-0 (2063001): raft2024/06/07 15:10:05 INFO: newRaft ca50e9357181d758 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 became follower at term 1
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 switched to configuration voters=(14578408409545168728)
member-0 (2063001): 2024-06-07 15:10:05.837990 W | auth: simple token is not cryptographically signed
member-0 (2063001): 2024-06-07 15:10:05.849603 I | etcdserver: starting server... [version: 3.4.32, cluster version: to_be_decided]
member-0 (2063001): 2024-06-07 15:10:05.849682 I | etcdserver: ca50e9357181d758 as single-node; fast-forwarding 9 ticks (election ticks 10)
member-0 (2063001): 2024-06-07 15:10:05.849979 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/snap, suffix: snap.db, max: 5, interval: 30s
member-0 (2063001): 2024-06-07 15:10:05.850015 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/snap, suffix: snap, max: 5, interval: 30s
member-0 (2063001): 2024-06-07 15:10:05.850029 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/wal, suffix: wal, max: 5, interval: 30s
member-0 (2063001): 2024-06-07 15:10:05.851460 I | embed: listening for peers on 127.0.0.1:20001
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 switched to configuration voters=(14578408409545168728)
member-0 (2063001): 2024-06-07 15:10:05.852075 I | etcdserver/membership: added member ca50e9357181d758 [http://localhost:20001] to cluster 34f27e83b3bc2ff
member-0 (2063001): raft2024/06/07 15:10:06 INFO: ca50e9357181d758 is starting a new election at term 1
member-0 (2063001): raft2024/06/07 15:10:06 INFO: ca50e9357181d758 became candidate at term 2
member-0 (2063001): raft2024/06/07 15:10:06 INFO: ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 2
member-0 (2063001): raft2024/06/07 15:10:06 INFO: ca50e9357181d758 became leader at term 2
member-0 (2063001): raft2024/06/07 15:10:06 INFO: raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 2
member-0 (2063001): 2024-06-07 15:10:06.425669 I | etcdserver: setting up the initial cluster version to 3.4
member-0 (2063001): 2024-06-07 15:10:06.427632 I | embed: ready to serve client requests
member-0 (2063001): 2024-06-07 15:10:06.427648 I | etcdserver: published {Name:TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0 ClientURLs:[http://localhost:20000]} to cluster 34f27e83b3bc2ff
member-0 (2063001): 2024-06-07 15:10:06.428428 N | embed: serving insecure client requests on 127.0.0.1:20000, this is strongly discouraged!
member-0 (2063001): 2024-06-07 15:10:06.429334 N | etcdserver/membership: set the initial cluster version to 3.4
member-0 (2063001): 2024-06-07 15:10:06.429387 I | etcdserver/api: enabled capabilities for version 3.4
    logger.go:146: 2024-06-07T15:10:06.436+0200 INFO    started server. {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "pid": 2063001}
    logger.go:146: 2024-06-07T15:10:06.443+0200 INFO    Start traffic
member-0 (2063001): 2024-06-07 15:10:06.463482 I | mvcc: store.index: compact 10
member-0 (2063001): 2024-06-07 15:10:06.467626 I | mvcc: finished scheduled compaction at 10 (took 4.101548ms)
member-0 (2063001): 2024-06-07 15:10:06.485280 I | mvcc: store.index: compact 21
member-0 (2063001): 2024-06-07 15:10:06.491005 I | mvcc: finished scheduled compaction at 21 (took 5.681245ms)
member-0 (2063001): 2024-06-07 15:10:06.499354 I | mvcc: store.index: compact 30
member-0 (2063001): 2024-06-07 15:10:06.503435 I | mvcc: finished scheduled compaction at 30 (took 4.021316ms)
member-0 (2063001): 2024-06-07 15:10:06.516624 I | mvcc: store.index: compact 36
member-0 (2063001): 2024-06-07 15:10:06.522273 I | mvcc: finished scheduled compaction at 36 (took 5.604531ms)
member-0 (2063001): 2024-06-07 15:10:06.531697 I | mvcc: store.index: compact 43
member-0 (2063001): 2024-06-07 15:10:06.544126 I | mvcc: finished scheduled compaction at 43 (took 12.389592ms)
member-0 (2063001): 2024-06-07 15:10:06.544147 I | mvcc: store.index: compact 45
member-0 (2063001): 2024-06-07 15:10:06.549828 I | mvcc: finished scheduled compaction at 45 (took 5.663001ms)
member-0 (2063001): 2024-06-07 15:10:06.565709 I | mvcc: store.index: compact 52
member-0 (2063001): 2024-06-07 15:10:06.569921 I | mvcc: finished scheduled compaction at 52 (took 4.110444ms)
member-0 (2063001): 2024-06-07 15:10:06.576837 I | mvcc: store.index: compact 58
member-0 (2063001): 2024-06-07 15:10:06.580559 I | mvcc: finished scheduled compaction at 58 (took 3.700063ms)
member-0 (2063001): 2024-06-07 15:10:06.593001 I | mvcc: store.index: compact 63
member-0 (2063001): 2024-06-07 15:10:06.598580 I | mvcc: finished scheduled compaction at 63 (took 5.512127ms)
member-0 (2063001): 2024-06-07 15:10:06.610192 I | mvcc: store.index: compact 74
member-0 (2063001): 2024-06-07 15:10:06.610655 W | etcdserver: failed to apply request "compaction:<revision:74 > header:<ID:15517310789174624164 > " with response "" took (2.796µs) to execute, err is mvcc: required revision has been compacted
member-0 (2063001): 2024-06-07 15:10:06.614154 I | mvcc: finished scheduled compaction at 74 (took 3.943601ms)
member-0 (2063001): 2024-06-07 15:10:06.628541 I | mvcc: store.index: compact 76
member-0 (2063001): 2024-06-07 15:10:06.632295 I | mvcc: finished scheduled compaction at 76 (took 3.736361ms)
member-0 (2063001): 2024-06-07 15:10:06.638518 I | mvcc: store.index: compact 79
member-0 (2063001): 2024-06-07 15:10:06.642166 I | mvcc: finished scheduled compaction at 79 (took 3.625864ms)
member-0 (2063001): 2024-06-07 15:10:06.695912 I | mvcc: store.index: compact 81
member-0 (2063001): 2024-06-07 15:10:06.699766 I | mvcc: finished scheduled compaction at 81 (took 3.833413ms)
member-0 (2063001): 2024-06-07 15:10:06.811286 I | mvcc: store.index: compact 88
member-0 (2063001): 2024-06-07 15:10:06.815069 I | mvcc: finished scheduled compaction at 88 (took 3.751339ms)
member-0 (2063001): 2024-06-07 15:10:06.926055 I | mvcc: store.index: compact 98
member-0 (2063001): 2024-06-07 15:10:06.929943 I | mvcc: finished scheduled compaction at 98 (took 3.865914ms)
member-0 (2063001): 2024-06-07 15:10:07.006149 I | mvcc: store.index: compact 104
member-0 (2063001): 2024-06-07 15:10:07.009806 I | mvcc: finished scheduled compaction at 104 (took 3.640982ms)
member-0 (2063001): 2024-06-07 15:10:07.047942 I | mvcc: store.index: compact 105
member-0 (2063001): 2024-06-07 15:10:07.051740 I | mvcc: finished scheduled compaction at 105 (took 3.764455ms)
member-0 (2063001): 2024-06-07 15:10:07.161575 I | mvcc: store.index: compact 113
member-0 (2063001): 2024-06-07 15:10:07.165496 I | mvcc: finished scheduled compaction at 113 (took 3.891012ms)
member-0 (2063001): 2024-06-07 15:10:07.169287 I | mvcc: store.index: compact 114
member-0 (2063001): 2024-06-07 15:10:07.173334 I | mvcc: finished scheduled compaction at 114 (took 4.008702ms)
member-0 (2063001): 2024-06-07 15:10:07.177192 I | mvcc: store.index: compact 115
member-0 (2063001): 2024-06-07 15:10:07.177233 I | mvcc: finished scheduled compaction at 115 (took 15.309µs)
member-0 (2063001): 2024-06-07 15:10:07.336646 I | mvcc: store.index: compact 125
member-0 (2063001): 2024-06-07 15:10:07.340373 I | mvcc: finished scheduled compaction at 125 (took 3.708237ms)
member-0 (2063001): 2024-06-07 15:10:07.396406 I | mvcc: store.index: compact 129
member-0 (2063001): 2024-06-07 15:10:07.400353 I | mvcc: finished scheduled compaction at 129 (took 3.92806ms)
member-0 (2063001): 2024-06-07 15:10:07.421834 I | mvcc: store.index: compact 131
member-0 (2063001): 2024-06-07 15:10:07.426046 I | mvcc: finished scheduled compaction at 131 (took 4.182771ms)
member-0 (2063001): 2024-06-07 15:10:07.432261 W | etcdserver: failed to apply request "compaction:<revision:131 > header:<ID:15517310789174624338 > " with response "" took (9.297µs) to execute, err is mvcc: required revision has been compacted
    logger.go:146: 2024-06-07T15:10:07.439+0200 INFO    Triggering failpoint    {"failpoint": "raftBeforeLeaderSend=panic()"}
/tmp/etcd-release-3.4-failpoints/bin/etcdctl (/tmp/etcd-release-3.4-failpoints/bin/etcdctl_--endpoints=http://localhost:20000_--command-timeout=1.1s_get_0_-w_json_-w_json_-w_json) (2063029): {"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":136,"raft_term":2}}
member-0 (2063001): 2024-06-07 15:10:07.470978 I | mvcc: store.index: compact 135
/tmp/etcd-release-3.4-failpoints/bin/etcdctl (/tmp/etcd-release-3.4-failpoints/bin/etcdctl_--endpoints=http://localhost:20000_endpoint_status_-w_json) (2063037): [{"Endpoint":"http://localhost:20000","Status":{"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":137,"raft_term":2},"version":"3.4.32","dbSize":24576,"leader":14578408409545168728,"raftIndex":171,"raftTerm":2,"raftAppliedIndex":171,"dbSizeInUse":16384}}]
    cluster.go:1066: members agree on a leader, members:map[14578408409545168728:0] , leader:14578408409545168728
    logger.go:146: 2024-06-07T15:10:07.472+0200 INFO    Setting up gofailpoint  {"failpoint": "raftBeforeLeaderSend=panic()"}
    logger.go:146: 2024-06-07T15:10:07.472+0200 INFO    Waiting for member to exit      {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
member-0 (2063001): 2024-06-07 15:10:07.475266 I | mvcc: finished scheduled compaction at 135 (took 4.25666ms)
member-0 (2063001): panic: failpoint panic: {}
member-0 (2063001): 
member-0 (2063001): goroutine 120 [running]:
member-0 (2063001): go.etcd.io/gofail/runtime.actPanic(0x0?)
member-0 (2063001):     ./pkg/mod/go.etcd.io/gofail@v0.1.0/runtime/terms.go:318 +0x65
member-0 (2063001): go.etcd.io/gofail/runtime.(*term).do(...)
member-0 (2063001):     ./pkg/mod/go.etcd.io/gofail@v0.1.0/runtime/terms.go:290
member-0 (2063001): go.etcd.io/gofail/runtime.(*terms).eval(0xc0000bcde0?)
member-0 (2063001):     ./pkg/mod/go.etcd.io/gofail@v0.1.0/runtime/terms.go:105 +0xe3
member-0 (2063001): go.etcd.io/gofail/runtime.(*Failpoint).Acquire(0xc00051bba0?)
member-0 (2063001):     ./pkg/mod/go.etcd.io/gofail@v0.1.0/runtime/failpoint.go:38 +0x98
member-0 (2063001): go.etcd.io/etcd/etcdserver.(*raftNode).start.func1()
member-0 (2063001):     /tmp/etcd-release-3.4-failpoints/etcdserver/raft.go:242 +0xae5
member-0 (2063001): created by go.etcd.io/etcd/etcdserver.(*raftNode).start in goroutine 109
member-0 (2063001):     /tmp/etcd-release-3.4-failpoints/etcdserver/raft.go:165 +0x6f
    logger.go:146: 2024-06-07T15:10:07.479+0200 INFO    server exited   {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "code": 2}
    logger.go:146: 2024-06-07T15:10:07.479+0200 INFO    Member exited as expected       {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
    logger.go:146: 2024-06-07T15:10:07.479+0200 INFO    starting server...      {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
    logger.go:146: 2024-06-07T15:10:07.479+0200 INFO    spawning process        {"args": ["/tmp/etcd-release-3.4-failpoints/bin/etcd", "--name=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001", "--snapshot-count=1000", "--snapshot-count=1000", "--initial-cluster-token=new", "--experimental-compaction-batch-limit=100", "--experimental-watch-progress-notify-interval=100ms", "--initial-cluster=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory1937017586/001", "name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/toolchain@v0.0.1-go1.22.4.linux-amd64/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
member-0 (2063048): WARNING: Package "github.com/golang/protobuf/protoc-gen-go/generator" is deprecated.
member-0 (2063048):     A future release of golang/protobuf will delete this package,
member-0 (2063048):     which has long been excluded from the compatibility promise.
member-0 (2063048): 
member-0 (2063048): 2024-06-07 15:10:07.488960 W | pkg/flags: unrecognized environment variable ETCD_VERIFY=all
member-0 (2063048): 2024-06-07 15:10:07.489016 W | pkg/flags: unrecognized environment variable ETCD_UNSUPPORTED_ARCH=amd64
member-0 (2063048): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
member-0 (2063048): 2024-06-07 15:10:07.489034 W | embed: Running http and grpc server on single port. This is not recommended for production.
member-0 (2063048): 2024-06-07 15:10:07.489040 I | etcdmain: etcd Version: 3.4.32
member-0 (2063048): 2024-06-07 15:10:07.489044 I | etcdmain: Git SHA: 88ff600-FAILPOINTS
member-0 (2063048): 2024-06-07 15:10:07.489048 I | etcdmain: Go Version: go1.22.0
member-0 (2063048): 2024-06-07 15:10:07.489052 I | etcdmain: Go OS/Arch: linux/amd64
member-0 (2063048): 2024-06-07 15:10:07.489056 I | etcdmain: setting maximum number of CPUs to 24, total number of available CPUs is 24
member-0 (2063048): 2024-06-07 15:10:07.489120 N | etcdmain: the server is already initialized as member before, starting as etcd member...
member-0 (2063048): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
member-0 (2063048): 2024-06-07 15:10:07.489130 W | embed: Running http and grpc server on single port. This is not recommended for production.
member-0 (2063048): 2024-06-07 15:10:07.489546 I | embed: name = TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0
member-0 (2063048): 2024-06-07 15:10:07.489553 I | embed: data dir = /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001
member-0 (2063048): 2024-06-07 15:10:07.489558 I | embed: member dir = /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member
member-0 (2063048): 2024-06-07 15:10:07.489562 I | embed: heartbeat = 100ms
member-0 (2063048): 2024-06-07 15:10:07.489566 I | embed: election = 1000ms
member-0 (2063048): 2024-06-07 15:10:07.489571 I | embed: snapshot count = 1000
member-0 (2063048): 2024-06-07 15:10:07.489581 I | embed: advertise client URLs = http://localhost:20000
member-0 (2063048): 2024-06-07 15:10:07.489586 I | embed: initial advertise peer URLs = http://localhost:20001
member-0 (2063048): 2024-06-07 15:10:07.489591 I | embed: initial cluster = 
member-0 (2063048): 2024-06-07 15:10:07.498379 I | etcdserver: restarting member ca50e9357181d758 in cluster 34f27e83b3bc2ff at commit index 172
member-0 (2063048): raft2024/06/07 15:10:07 INFO: ca50e9357181d758 switched to configuration voters=()
member-0 (2063048): raft2024/06/07 15:10:07 INFO: ca50e9357181d758 became follower at term 2
member-0 (2063048): raft2024/06/07 15:10:07 INFO: newRaft ca50e9357181d758 [peers: [], term: 2, commit: 172, applied: 0, lastindex: 172, lastterm: 2]
member-0 (2063048): 2024-06-07 15:10:07.501601 W | auth: simple token is not cryptographically signed
member-0 (2063048): 2024-06-07 15:10:07.505904 I | mvcc: restore compact to 135
member-0 (2063048): 2024-06-07 15:10:07.508923 I | etcdserver: starting server... [version: 3.4.32, cluster version: to_be_decided]
member-0 (2063048): 2024-06-07 15:10:07.509090 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/snap, suffix: snap.db, max: 5, interval: 30s
member-0 (2063048): 2024-06-07 15:10:07.509121 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/snap, suffix: snap, max: 5, interval: 30s
member-0 (2063048): 2024-06-07 15:10:07.509492 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/wal, suffix: wal, max: 5, interval: 30s
member-0 (2063048): raft2024/06/07 15:10:07 INFO: ca50e9357181d758 switched to configuration voters=(14578408409545168728)
member-0 (2063048): 2024-06-07 15:10:07.509806 I | etcdserver/membership: added member ca50e9357181d758 [http://localhost:20001] to cluster 34f27e83b3bc2ff
member-0 (2063048): 2024-06-07 15:10:07.509871 N | etcdserver/membership: set the initial cluster version to 3.4
member-0 (2063048): 2024-06-07 15:10:07.509893 I | etcdserver/api: enabled capabilities for version 3.4
member-0 (2063048): 2024-06-07 15:10:07.510650 W | etcdserver: failed to apply request "compaction:<revision:135 > header:<ID:15517310789174624347 > " with response "" took (8.226µs) to execute, err is mvcc: required revision has been compacted
member-0 (2063048): 2024-06-07 15:10:07.511244 I | embed: listening for peers on 127.0.0.1:20001
member-0 (2063048): raft2024/06/07 15:10:08 INFO: ca50e9357181d758 is starting a new election at term 2
member-0 (2063048): raft2024/06/07 15:10:08 INFO: ca50e9357181d758 became candidate at term 3
member-0 (2063048): raft2024/06/07 15:10:08 INFO: ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 3
member-0 (2063048): raft2024/06/07 15:10:08 INFO: ca50e9357181d758 became leader at term 3
member-0 (2063048): raft2024/06/07 15:10:08 INFO: raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 3
member-0 (2063048): 2024-06-07 15:10:08.716725 I | embed: ready to serve client requests
member-0 (2063048): 2024-06-07 15:10:08.716809 I | etcdserver: published {Name:TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0 ClientURLs:[http://localhost:20000]} to cluster 34f27e83b3bc2ff
member-0 (2063048): 2024-06-07 15:10:08.718833 N | embed: serving insecure client requests on 127.0.0.1:20000, this is strongly discouraged!
    logger.go:146: 2024-06-07T15:10:08.723+0200 INFO    started server. {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "pid": 2063048}
    logger.go:146: 2024-06-07T15:10:08.724+0200 INFO    Finished triggering failpoint   {"failpoint": "raftBeforeLeaderSend=panic()"}
member-0 (2063048): 2024-06-07 15:10:08.734129 I | mvcc: store.index: compact 138
member-0 (2063048): 2024-06-07 15:10:08.734171 W | etcdserver: failed to apply request "compaction:<revision:138 > header:<ID:15517310789175053071 > " with response "" took (1.643µs) to execute, err is mvcc: required revision has been compacted
member-0 (2063048): 2024-06-07 15:10:08.737837 I | mvcc: finished scheduled compaction at 138 (took 3.684213ms)
member-0 (2063048): 2024-06-07 15:10:08.749556 I | mvcc: store.index: compact 145
member-0 (2063048): 2024-06-07 15:10:08.755487 I | mvcc: finished scheduled compaction at 145 (took 5.872415ms)
member-0 (2063048): 2024-06-07 15:10:08.785454 I | mvcc: store.index: compact 180
member-0 (2063048): 2024-06-07 15:10:08.793336 I | mvcc: finished scheduled compaction at 180 (took 7.56219ms)
member-0 (2063048): 2024-06-07 15:10:08.801507 I | mvcc: store.index: compact 183
member-0 (2063048): 2024-06-07 15:10:08.807138 I | mvcc: finished scheduled compaction at 183 (took 5.584654ms)
member-0 (2063048): 2024-06-07 15:10:08.811544 I | mvcc: store.index: compact 188
member-0 (2063048): 2024-06-07 15:10:08.817315 I | mvcc: finished scheduled compaction at 188 (took 5.727472ms)
member-0 (2063048): 2024-06-07 15:10:08.823976 I | mvcc: store.index: compact 195
member-0 (2063048): 2024-06-07 15:10:08.823985 W | etcdserver: failed to apply request "compaction:<revision:195 > header:<ID:15517310789175053189 > " with response "" took (2.235µs) to execute, err is mvcc: required revision has been compacted
member-0 (2063048): 2024-06-07 15:10:08.827999 I | mvcc: finished scheduled compaction at 195 (took 3.994016ms)
member-0 (2063048): 2024-06-07 15:10:08.907484 I | mvcc: store.index: compact 219
member-0 (2063048): 2024-06-07 15:10:08.911307 I | mvcc: finished scheduled compaction at 219 (took 3.790332ms)
member-0 (2063048): 2024-06-07 15:10:08.938127 I | mvcc: store.index: compact 220
member-0 (2063048): 2024-06-07 15:10:08.941826 I | mvcc: finished scheduled compaction at 220 (took 3.665007ms)
member-0 (2063048): 2024-06-07 15:10:09.027783 I | mvcc: store.index: compact 230
member-0 (2063048): 2024-06-07 15:10:09.031653 I | mvcc: finished scheduled compaction at 230 (took 3.827542ms)
member-0 (2063048): 2024-06-07 15:10:09.157620 I | mvcc: store.index: compact 239
member-0 (2063048): 2024-06-07 15:10:09.161540 I | mvcc: finished scheduled compaction at 239 (took 3.871184ms)
member-0 (2063048): 2024-06-07 15:10:09.232656 I | mvcc: store.index: compact 246
member-0 (2063048): 2024-06-07 15:10:09.236458 I | mvcc: finished scheduled compaction at 246 (took 3.767421ms)
member-0 (2063048): 2024-06-07 15:10:09.413328 I | mvcc: store.index: compact 256
member-0 (2063048): 2024-06-07 15:10:09.417222 I | mvcc: finished scheduled compaction at 256 (took 3.859343ms)
member-0 (2063048): 2024-06-07 15:10:09.438400 I | mvcc: store.index: compact 257
member-0 (2063048): 2024-06-07 15:10:09.442288 I | mvcc: finished scheduled compaction at 257 (took 3.864001ms)
member-0 (2063048): 2024-06-07 15:10:09.508357 I | mvcc: store.index: compact 264
member-0 (2063048): 2024-06-07 15:10:09.513601 I | mvcc: finished scheduled compaction at 264 (took 5.151462ms)
member-0 (2063048): 2024-06-07 15:10:09.657334 I | mvcc: store.index: compact 277
member-0 (2063048): 2024-06-07 15:10:09.661195 I | mvcc: finished scheduled compaction at 277 (took 3.822834ms)
member-0 (2063048): 2024-06-07 15:10:09.678618 I | mvcc: store.index: compact 280
member-0 (2063048): 2024-06-07 15:10:09.682446 I | mvcc: finished scheduled compaction at 280 (took 3.801793ms)
member-0 (2063048): 2024-06-07 15:10:09.692338 I | mvcc: store.index: compact 282
member-0 (2063048): 2024-06-07 15:10:09.696220 I | mvcc: finished scheduled compaction at 282 (took 3.859612ms)
    logger.go:146: 2024-06-07T15:10:09.762+0200 INFO    Finished traffic
    logger.go:146: 2024-06-07T15:10:10.773+0200 INFO    Reporting complete traffic      {"successes": 824, "failures": 44, "successRate": 0.9493087557603687, "period": "3.319285557s", "qps": 248.24619209464416}
    logger.go:146: 2024-06-07T15:10:10.773+0200 INFO    Reporting traffic before failure injection      {"successes": 408, "failures": 0, "successRate": 1, "period": "996.375923ms", "qps": 409.48400155189216}
    logger.go:146: 2024-06-07T15:10:10.773+0200 INFO    Reporting traffic during failure injection      {"successes": 25, "failures": 43, "successRate": 0.36764705882352944, "period": "1.284883155s", "qps": 19.45702214455446}
    logger.go:146: 2024-06-07T15:10:10.774+0200 INFO    Reporting traffic after failure injection       {"successes": 391, "failures": 1, "successRate": 0.9974489795918368, "period": "1.038026479s", "qps": 376.6763256142332}
    logger.go:146: 2024-06-07T15:10:10.774+0200 INFO    Finished simulating traffic     {"max-revision": 288}
    logger.go:146: 2024-06-07T15:10:10.780+0200 INFO    Validating linearizable operations      {"timeout": "5m0s"}
    logger.go:146: 2024-06-07T15:10:10.925+0200 INFO    Linearization success   {"duration": "145.013099ms"}
    logger.go:146: 2024-06-07T15:10:10.926+0200 INFO    Validating watch
    logger.go:146: 2024-06-07T15:10:10.926+0200 ERROR   Broke watch guarantee   {"guarantee": "resumable", "client": 2, "request": {"Key":"","Revision":220,"WithPrefix":true,"WithProgressNotify":true,"WithPrevKV":false}, "got-event": {"Type":"put-operation","Key":"key6","Value":{"Value":"235","Hash":0},"Revision":221,"IsCreate":false,"PrevValue":null}, "want-event": {"Type":"delete-operation","Key":"key7","Value":{"Value":"","Hash":0},"Revision":220,"IsCreate":false}}
    validate.go:49: Failed validating watch history, err: broke Resumable - A broken watch can be resumed by establishing a new watch starting after the last revision received in a watch event before the break, so long as the revision is in the history window
    logger.go:146: 2024-06-07T15:10:10.926+0200 INFO    Validating serializable operations
    logger.go:146: 2024-06-07T15:10:10.936+0200 INFO    Saving robustness test report   {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979"}
    logger.go:146: 2024-06-07T15:10:10.936+0200 INFO    Saving member data dir  {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/server-TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
    logger.go:146: 2024-06-07T15:10:10.936+0200 INFO    no watch operations for client, skip persisting {"client-id": 1}
    logger.go:146: 2024-06-07T15:10:10.936+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-1/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.937+0200 INFO    Saving watch operations {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-2/watch.json"}
    logger.go:146: 2024-06-07T15:10:10.937+0200 INFO    no KV operations for client, skip persisting    {"client-id": 2}
    logger.go:146: 2024-06-07T15:10:10.938+0200 INFO    no watch operations for client, skip persisting {"client-id": 3}
    logger.go:146: 2024-06-07T15:10:10.938+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-3/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.939+0200 INFO    no watch operations for client, skip persisting {"client-id": 4}
    logger.go:146: 2024-06-07T15:10:10.939+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-4/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.940+0200 INFO    no watch operations for client, skip persisting {"client-id": 5}
    logger.go:146: 2024-06-07T15:10:10.940+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-5/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.941+0200 INFO    no watch operations for client, skip persisting {"client-id": 6}
    logger.go:146: 2024-06-07T15:10:10.941+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-6/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.943+0200 INFO    no watch operations for client, skip persisting {"client-id": 7}
    logger.go:146: 2024-06-07T15:10:10.943+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-7/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.944+0200 INFO    no watch operations for client, skip persisting {"client-id": 8}
    logger.go:146: 2024-06-07T15:10:10.944+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-8/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.945+0200 INFO    no watch operations for client, skip persisting {"client-id": 9}
    logger.go:146: 2024-06-07T15:10:10.945+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-9/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.946+0200 INFO    no watch operations for client, skip persisting {"client-id": 10}
    logger.go:146: 2024-06-07T15:10:10.946+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-10/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.947+0200 INFO    Saving visualization    {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/history.html"}
    logger.go:146: 2024-06-07T15:10:11.016+0200 INFO    killing server...       {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
    logger.go:146: 2024-06-07T15:10:11.016+0200 INFO    stopping server...      {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
    logger.go:146: 2024-06-07T15:10:11.021+0200 INFO    stopped server. {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
--- FAIL: TestRobustnessExploratory (5.24s)
    --- FAIL: TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1 (5.23s)

@fuweid
Copy link
Member

fuweid commented Jun 9, 2024

Report: 1717749910517706368.tar.gz

For this one, the tombstone of key6 would be deleted by the following compact request compaction:<revision:141 >.
Not sure when the watcher setups watch request. If it setups after compactor deletes that key6, it won't get the event.
However, for the replay, it assumes that etcd process was running. There is mismatch between real events and replay.

   2           178      norm    header:<ID:15517310785104357973 > compaction:<revision:137 >
   2           179      norm    header:<ID:15517310785104357976 > txn:<success:<request_range:<key:"key1" > > success:<request_delete_range:<key:"key6" > > success:<request_put:<key:"key8" value:"121" > > success:<request_put:<key:"key9" value:"122" > > >
   3           180      norm
   3           181      norm    header:<ID:15517310785104689154 > cluster_member_attr_set:<member_ID:14578408409545168728 member_attributes:<name:"TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0" client_urls:"http://localhost:20000" > >
   3           182      norm    header:<ID:15517310785104689159 > txn:<success:<request_put:<key:"key2" value:"125" > > success:<request_put:<key:"key6" value:"126" > > success:<request_put:<key:"key1" value:"127" > > success:<request_put:<key:"key0" value:"128" > > >
   3           183      norm    header:<ID:15517310785104689162 > put:<key:"key7" value:"129" >
   3           184      norm    header:<ID:15517310785104689163 > compaction:<revision:141 >

@fuweid
Copy link
Member

fuweid commented Jun 9, 2024

I managed to reproduce it without multi TXN, just a delete request. This extends the impact to Kubernetes.

Again the common thing is that etcd misses a watch even if it restores kvstore on revision before a delete.

Correct. :( Currently, watcher isn't able to send out all the possible deleted events.

@ahrtr
Copy link
Member

ahrtr commented Jun 9, 2024

Did the client side receive an ErrCompacted? If yes, then it should be expected behaviour.

  • Since the real time events are cached, so as long as etcdserver doesn't restart, then the events won't be lost.
  • But if the etcdserver restarts for whatever reason after compaction but before sending out the watch response, then the client side may miss some compacted events. But the client will definitely get the latest value against any watched key.
    • As long as users don't update a key too frequent, it isn't a big problem

@serathius
Copy link
Member Author

serathius commented Jun 9, 2024

Did the client side receive an ErrCompacted? If yes,

No, it would be recorded in the report.

@serathius
Copy link
Member Author

serathius commented Jun 10, 2024

However, for the replay, it assumes that etcd process was running. There is mismatch between real events and replay.

@fuweid, don't understand the statement. Distributed system needs to uphold it's guarantees even if a node is down. Events on WAL used for replay should match what users observes. For watch client should observe either full history or or get ErrCompacted, as @ahrtr mentioned.

@serathius
Copy link
Member Author

serathius commented Jun 10, 2024

One thing I noticed, that the WatchChan is not broken during etcd downtime, so the time it's down is short enough that the etcd client code retries the request in a transparent way. There might be a bug there too.

Still there is a possibility of a bug in recording client responses, it is a little more complicated to record history of watch than KV requests.

func (c *RecordingClient) watch(ctx context.Context, request model.WatchRequest) clientv3.WatchChan {
ops := []clientv3.OpOption{}
if request.WithPrefix {
ops = append(ops, clientv3.WithPrefix())
}
if request.Revision != 0 {
ops = append(ops, clientv3.WithRev(request.Revision))
}
if request.WithProgressNotify {
ops = append(ops, clientv3.WithProgressNotify())
}
if request.WithPrevKV {
ops = append(ops, clientv3.WithPrevKV())
}
respCh := make(chan clientv3.WatchResponse)
c.watchMux.Lock()
c.watchOperations = append(c.watchOperations, model.WatchOperation{
Request: request,
Responses: []model.WatchResponse{},
})
index := len(c.watchOperations) - 1
c.watchMux.Unlock()
go func() {
defer close(respCh)
for r := range c.client.Watch(ctx, request.Key, ops...) {
c.watchOperations[index].Responses = append(c.watchOperations[index].Responses, ToWatchResponse(r, c.baseTime))
select {
case respCh <- r:
case <-ctx.Done():
return
}
}
}()
return respCh
}

@ahrtr
Copy link
Member

ahrtr commented Jun 10, 2024

@fuweid Please feel free to ping me on slack if you want a quick discussion on this issue.

@ahrtr
Copy link
Member

ahrtr commented Jun 10, 2024

One thing I noticed, that the WatchChan is not broken during etcd downtime, so the time it's down is short enough that the etcd client code retries the request in a transparent way.

It's expected behaviour and by design as long as etcdserver doesn't return any error (e.g ErrCompacted).

@serathius
Copy link
Member Author

serathius commented Jun 10, 2024

Added logs to ensure that issue is not one robustness test, got:

They clearly show that revision 125 was not provided to client.

Adding new event response to watch, revs: 124...124 
./bin/etcdctl (./bin/etcdctl_--endpoints=http://localhost:20000_endpoint_status_-w_json) (3237740): [{"Endpoint":"http://localhost:20000","Status":{"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":123,"raft_term":2},"version":"3.6.0-alpha.0","dbSize":20480,"leader":14578408409545168728,"raftIndex":190,"raftTerm":2,"raftAppliedIndex":189,"dbSizeInUse":16384,"storageVersion":"3.6.0"}}]
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): {"level":"info","ts":"2024-06-10T14:40:21.080206+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":120}
    cluster.go:1066: members agree on a leader, members:map[14578408409545168728:0] , leader:14578408409545168728
    logger.go:146: 2024-06-10T14:40:21.080+0200 INFO    Setting up gofailpoint  {"failpoint": "raftBeforeLeaderSend=panic()"}
    logger.go:146: 2024-06-10T14:40:21.080+0200 INFO    Waiting for member to exit      {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): panic: failpoint panic: {}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): 
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): goroutine 140 [running]:
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/gofail/runtime.actPanic(0x10100409d13?)
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701):        go.etcd.io/gofail@v0.1.0/runtime/terms.go:318 +0x65
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/gofail/runtime.(*term).do(...)
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701):        go.etcd.io/gofail@v0.1.0/runtime/terms.go:290
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/gofail/runtime.(*terms).eval(0xc0000bdce0?)
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701):        go.etcd.io/gofail@v0.1.0/runtime/terms.go:105 +0xe3
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/gofail/runtime.(*Failpoint).Acquire(0xc000257c90?)
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701):        go.etcd.io/gofail@v0.1.0/runtime/failpoint.go:38 +0x98
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/etcd/server/v3/etcdserver.(*raftNode).start.func1()
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701):        go.etcd.io/etcd/server/v3/etcdserver/raft.go:229 +0x716
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): created by go.etcd.io/etcd/server/v3/etcdserver.(*raftNode).start in goroutine 171
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701):        go.etcd.io/etcd/server/v3/etcdserver/raft.go:166 +0x6f
    logger.go:146: 2024-06-10T14:40:21.090+0200 INFO    server exited   {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "code": 2}
    logger.go:146: 2024-06-10T14:40:21.090+0200 INFO    Member exited as expected       {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
    logger.go:146: 2024-06-10T14:40:21.090+0200 INFO    starting server...      {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
Adding new error response to watch, err: rpc error: code = Unavailable desc = error reading from server: EOF 
    logger.go:146: 2024-06-10T14:40:21.090+0200 INFO    spawning process        {"args": ["./bin/etcd", "--name=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001", "--snapshot-count=100", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=100", "--heartbeat-interval=101", "--experimental-compaction-batch-limit=10", "--election-timeout=521", "--experimental-snapshot-catchup-entries=100", "--initial-cluster-token=new", "--initial-cluster=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory2003573077/001", "name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/toolchain@v0.0.1-go1.22.4.linux-amd64/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.100443+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_VERIFY=all"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.100561+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_UNSUPPORTED_ARCH=amd64"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.100609+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.100628+0200","caller":"etcdmain/etcd.go:63","msg":"Running: ","args":["./bin/etcd","--name=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0","--listen-client-urls=http://localhost:20000","--advertise-client-urls=http://localhost:20000","--listen-peer-urls=http://localhost:20001","--initial-advertise-peer-urls=http://localhost:20001","--initial-cluster-token=new","--data-dir","/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001","--snapshot-count=100","--experimental-watch-progress-notify-interval=100ms","--snapshot-count=100","--heartbeat-interval=101","--experimental-compaction-batch-limit=10","--election-timeout=521","--experimental-snapshot-catchup-entries=100","--initial-cluster-token=new","--initial-cluster=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0=http://localhost:20001","--initial-cluster-state=new"]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.100701+0200","caller":"etcdmain/etcd.go:106","msg":"server has already been initialized","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001","dir-type":"member"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.100719+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.100733+0200","caller":"embed/etcd.go:125","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:20001"]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.101260+0200","caller":"embed/etcd.go:133","msg":"configuring client listeners","listen-client-urls":["http://localhost:20000"]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.101669+0200","caller":"embed/etcd.go:314","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"8a0054fec","go-version":"go1.22.4","go-os":"linux","go-arch":"amd64","max-cpu-set":24,"max-cpu-available":24,"member-initialized":true,"name":"TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001","wal-dir":"","wal-dir-dedicated":"","member-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member","force-new-cluster":false,"heartbeat-interval":"101ms","election-timeout":"505ms","initial-election-tick-advance":true,"snapshot-count":100,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":100,"initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[],"experimental-set-member-localaddr":false,"experimental-local-address":"","cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.102192+0200","logger":"bbolt","caller":"backend/backend.go:197","msg":"Opening db file (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap/db) with mode 2d72772d2d2d2d2d2d2d and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: hashmap, ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc0000924a0}"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.109701+0200","logger":"bbolt","caller":"bbolt@v1.4.0-alpha.1/db.go:321","msg":"Opening bbolt db (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap/db) successfully"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.109759+0200","caller":"storage/backend.go:80","msg":"opened backend db","path":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap/db","took":"7.653391ms"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.110518+0200","caller":"etcdserver/bootstrap.go:412","msg":"recovered v2 store from snapshot","snapshot-index":102,"snapshot-size":"7.2 kB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.110534+0200","caller":"storage/backend.go:108","msg":"Skipping snapshot backend","consistent-index":191,"snapshot-index":102}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.110547+0200","caller":"etcdserver/bootstrap.go:231","msg":"recovered v3 backend","backend-size-bytes":20480,"backend-size":"20 kB","backend-size-in-use-bytes":12288,"backend-size-in-use":"12 kB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111143+0200","caller":"etcdserver/bootstrap.go:579","msg":"restarting local member","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","commit-index":191}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111158+0200","caller":"etcdserver/bootstrap.go:93","msg":"bootstrapping cluster"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111260+0200","caller":"etcdserver/bootstrap.go:100","msg":"bootstrapping storage"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111357+0200","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.6"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111368+0200","caller":"membership/cluster.go:291","msg":"recovered/added member from store","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","recovered-remote-peer-id":"ca50e9357181d758","recovered-remote-peer-urls":["http://localhost:20001"],"recovered-remote-peer-is-learner":false}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111377+0200","caller":"membership/cluster.go:301","msg":"set cluster version from store","cluster-version":"3.6"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111391+0200","caller":"etcdserver/bootstrap.go:108","msg":"bootstrapping raft"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111403+0200","caller":"etcdserver/server.go:306","msg":"bootstrap successfully"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111447+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=(14578408409545168728)"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111480+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:870","msg":"ca50e9357181d758 became follower at term 2"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111494+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:483","msg":"newRaft ca50e9357181d758 [peers: [ca50e9357181d758], term: 2, commit: 191, applied: 102, lastindex: 192, lastterm: 2]"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.115179+0200","caller":"auth/store.go:1134","msg":"simple token is not cryptographically signed"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.118827+0200","caller":"mvcc/kvstore.go:330","msg":"restored last compact revision","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":113}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.119624+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":124}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.123574+0200","caller":"mvcc/kvstore.go:413","msg":"resume scheduled compaction","scheduled-compact-revision":120}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.123661+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":120}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141549+0200","caller":"storage/quota.go:93","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141594+0200","caller":"etcdserver/server.go:576","msg":"starting etcd server","local-member-id":"ca50e9357181d758","local-server-version":"3.6.0-alpha.0","cluster-id":"34f27e83b3bc2ff","cluster-version":"3.6"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141643+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":120,"took":"17.934179ms","hash":1204195312,"current-db-size-bytes":20480,"current-db-size":"20 kB","current-db-size-in-use-bytes":16384,"current-db-size-in-use":"16 kB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141667+0200","caller":"mvcc/kvstore.go:248","msg":"previous compaction was interrupted, skip storing compaction hash value"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141660+0200","caller":"etcdserver/server.go:470","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"ca50e9357181d758","forward-ticks":4,"forward-duration":"404ms","election-ticks":5,"election-timeout":"505ms"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141708+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141769+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap","suffix":"snap","max":5,"interval":"30s"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141786+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/wal","suffix":"wal","max":5,"interval":"30s"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141776+0200","caller":"embed/etcd.go:610","msg":"serving peer traffic","address":"127.0.0.1:20001"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141804+0200","caller":"embed/etcd.go:283","msg":"now serving peer/client/metrics","local-member-id":"ca50e9357181d758","initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141807+0200","caller":"embed/etcd.go:582","msg":"cmux::serve","address":"127.0.0.1:20001"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.616829+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:956","msg":"ca50e9357181d758 is starting a new election at term 2"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.617106+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:899","msg":"ca50e9357181d758 became pre-candidate at term 2"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.617166+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1043","msg":"ca50e9357181d758 received MsgPreVoteResp from ca50e9357181d758 at term 2"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.617184+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgPreVoteResp votes and 0 vote rejections"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.617211+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:883","msg":"ca50e9357181d758 became candidate at term 3"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.623853+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1043","msg":"ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 3"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.623883+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgVoteResp votes and 0 vote rejections"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.623904+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/raft.go:938","msg":"ca50e9357181d758 became leader at term 3"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.623916+0200","logger":"raft","caller":"v3@v3.6.0-alpha.0/node.go:370","msg":"raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 3"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628269+0200","caller":"embed/serve.go:104","msg":"ready to serve client requests"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628273+0200","caller":"etcdserver/server.go:1774","msg":"published local member to cluster through raft","local-member-id":"ca50e9357181d758","local-member-attributes":"{Name:TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0 ClientURLs:[http://localhost:20000]}","cluster-id":"34f27e83b3bc2ff","publish-timeout":"6.01s"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628459+0200","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628566+0200","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628867+0200","caller":"v3rpc/health.go:62","msg":"grpc service status changed","service":"","status":"SERVING"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.630495+0200","caller":"embed/serve.go:188","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"127.0.0.1:20000"}
    logger.go:146: 2024-06-10T14:40:21.633+0200 INFO    started server. {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "pid": 3237749}
    logger.go:146: 2024-06-10T14:40:21.635+0200 INFO    Finished triggering failpoint   {"failpoint": "raftBeforeLeaderSend=panic()"}
Adding response watch: header:<cluster_id:238453183653593855 member_id:14578408409545168728 revision:125 raft_term:3 > created:true 
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:22.114627+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":125}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:22.131637+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":125,"took":"16.956782ms","hash":981787830,"current-db-size-bytes":20480,"current-db-size":"20 kB","current-db-size-in-use-bytes":16384,"current-db-size-in-use":"16 kB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:22.131665+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":981787830,"revision":125,"compact-revision":120}
Adding new event response to watch, revs: 126...132 

@ahrtr
Copy link
Member

ahrtr commented Jun 10, 2024

Added logs to ensure that issue is not one robustness test, got:

Where did you add the log? Also suggest to provide a clear summary on the exact steps you did.

serathius added a commit to serathius/etcd that referenced this issue Jun 10, 2024
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
@serathius
Copy link
Member Author

Code #18145

@ahrtr
Copy link
Member

ahrtr commented Jun 10, 2024

TL; DR

Delete events may be missing in watch response. This issue can only be reproduced when the compact revision happens to be a delete operation.

How to reproduce this issue?

  • Start an one member etcd cluster in one terminal
  • Execute commands below in another terminal
$ etcdctl  put k1 v2 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":2,"raft_term":2}}
$ etcdctl  put k1 v3 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":3,"raft_term":2}}
$ etcdctl  put k1 v4 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":4,"raft_term":2}}
$ etcdctl del k1 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":5,"raft_term":2},"deleted":1}
$ etcdctl  put k1 v6 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":6,"raft_term":2}}
  • Compact revision 5
$ etcdctl compact 5
compacted revision 5
  • Watch starting from rev 5
$ etcdctl watch k1 --rev 5
PUT
k1
v6

Obviously the deletion operation with revision 5 is missing.

Alternatively, If you want to try the robustness test, then run command below on #18145,

$ go test -run TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1 -v --count 100 --timeout 1h -failfast

Expected behaviour

As mentioned in #18089 (comment), some revisions may be compacted, and it may impact the watch clients. But it shouldn't never drop any events quietly.

Expected behaviour:

  • Either the client side receives all events: no any events are dropped
  • or the client side gets an ErrCompacted response.

Root cause

Usually etcd keeps the compact revision (see code below). For example, when you compact revision 5, actually etcd only compacts revisions <=4.

available[rev] = struct{}{}

But if the compact revision is a delete operation, then etcd doesn't keep it (in other words, etcd will compact/remove it) (see code below),

// remove any tombstone
if revIndex == len(g.revs)-1 && genIdx != len(ki.generations)-1 {
delete(available, g.revs[revIndex])
}

But the problem is that etcdserver doesn't return ErrCompacted if the startWatchRevision happens to be the last compact revision. Because the if condition is if w.minRev < compactRev instead of if w.minRev <= compactRev.

if w.minRev < compactRev {

Note this isn't a regression. It's an old issue.

Potential solution

It's open to discussion.

When executing compaction operation, we should only keep the latest revision for each key. For example, if the latest revision for "k1" is 10, and user compacts 10, then we should keep 10; otherwise if user compacts any revision <=9, we just compact it, there is no point to keep it. Note that different keys may have different latest revision.

The available is the revisions to keep,

func (ki *keyIndex) compact(lg *zap.Logger, atRev int64, available map[Revision]struct{}) {

We also need to change the if condition to if w.minRev <= compactRev,

if w.minRev < compactRev {

So the watch client will get an ErrCompacted response.

Impact

All versions (including 3.4, 3.5 and main) are impacted.

Major, but it might not be easy to reproduce.

  • As mentioned above, it can only be reproduced when the compact revision happens to be a delete operation.
  • Also usually compaction operation isn't that frequent in production environment. Normally the watch delay won't be greater than compaction interval.

Immediate action

Create a simple e2e test to reliably reproduce this issue. Please refer to the section "How to reproduce this issue?" above.

Brainstorm the solution.

@fuweid
Copy link
Member

fuweid commented Jun 11, 2024

don't understand the statement. Distributed system needs to uphold it's guarantees even if a node is down. Events on WAL used for replay should match what users observes. For watch client should observe either full history or or get ErrCompacted, as @ahrtr mentioned.

Hi @serathius sorry for late reply. In this case 1717749910517706368.tar.gz, both key8 and key9's latest revision is 141. It won't be compacted. But the revision 141 for key6 is tombstone. It could be deleted before sending delete-operation event to client.(Maybe we should log watch-operations in robustness).

IMO, the robustness replay assumes that the delete-operation event is just sent out after applying to db. Even if the compact request is following that delete request, the robustness replay always assumes that the client should receive that delete-operation event.

However, in real world, delete-operation event depends on compactor which is handled in background and we can't control in replay. That's why I said that watch can't guarantee the client can see all the events in current design.

   2           178      norm    header:<ID:15517310785104357973 > compaction:<revision:137 >
   2           179      norm    header:<ID:15517310785104357976 > txn:<success:<request_range:<key:"key1" > > success:<request_delete_range:<key:"key6" > > success:<request_put:<key:"key8" value:"121" > > success:<request_put:<key:"key9" value:"122" > > >
   3           180      norm
   3           181      norm    header:<ID:15517310785104689154 > cluster_member_attr_set:<member_ID:14578408409545168728 member_attributes:<name:"TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0" client_urls:"http://localhost:20000" > >
   3           182      norm    header:<ID:15517310785104689159 > txn:<success:<request_put:<key:"key2" value:"125" > > success:<request_put:<key:"key6" value:"126" > > success:<request_put:<key:"key1" value:"127" > > success:<request_put:<key:"key0" value:"128" > > >
   3           183      norm    header:<ID:15517310785104689162 > put:<key:"key7" value:"129" >
   3           184      norm    header:<ID:15517310785104689163 > compaction:<revision:141 >

Please feel free to ping me on slack if you want a quick discussion on this issue.

@ahrtr will ping you tomorrow. Thanks!

@serathius serathius changed the title [robustness tests] Watch dropping a delete event when resuming from crash Watch dropping an event when compacting on delete Jun 11, 2024
@serathius
Copy link
Member Author

As mentioned above, it can only be reproduced when
the compact revision happens to be a delete operation.
and the delete operation is the last operation for the key. If there is any operations following the delete operation on the key (i.e. adding the key back), then the issue can't be reproduced.

This should be pretty common, delete operations are pretty frequent, for example pods are immutable resource, meaning they are frequently created and then deleted and never resurrected.

Also usually compaction operation isn't that frequent in production environment.

It's once every 5 minutes, ask anyone if they want to build a distributed system that makes a mistake every 5 minutes.

I think that the only saving grace is that this requires a slow watch that is reading events from before compaction window. This should be pretty rare for watch to be 5 minutes behind, however recently I have seen case where lock contention on watch cache caused it to be behind even 2 hours.

@serathius
Copy link
Member Author

@ahrtr confirmed the bug being present on both release-3.4 and release-3.5 branches

Can someone take a look if the bug was introduced recently (introduced in one of the patch versions) or is an old one (tracing back to before v3.4)?

It's important to know if this is a regression in some newer version, or it was always there. If this is a new bug we might need to consider an announcement.

@serathius serathius pinned this issue Jun 11, 2024
@ahrtr
Copy link
Member

ahrtr commented Jun 11, 2024

Can someone take a look if the bug was introduced recently (introduced in one of the patch versions) or is an old one (tracing back to before v3.4)?

Obviously this isn't a regression. It's an old issue. Also updated my above comment.

This should be pretty common, delete operations are pretty frequent, for example pods are immutable resource, meaning they are frequently created and then deleted and never resurrected.

Pls see updated section "Impact" above. added and the delete operation is the last operation for the key

It's once every 5 minutes, ask anyone if they want to build a distributed system that makes a mistake every 5 minutes.

I think that the only saving grace is that this requires a slow watch that is reading events from before compaction window.

  • 5 minutes isn't frequent as compare to watch delay.
  • If you read the reproducing steps in my above comment, there is also very low possibility to see this issue on first watch response.

@jmhbnz
Copy link
Member

jmhbnz commented Jun 11, 2024

Can someone take a look if the bug was introduced recently (introduced in one of the patch versions) or is an old one (tracing back to before v3.4)?

Following the clear manual etcdctl reproduce steps provided by @ahrtr I reproduced the issue on v3.4.0 so it looks like this has been present for at least ~5 years.

@serathius
Copy link
Member Author

Thanks, @jmhbnz, for confirming this on v3.4.0!

I appreciate everyone's insights so far. To clarify my earlier question, it's sometimes valuable to confirm even seemingly obvious details, just to ensure we're all on the same page and can make the most informed decisions going forward.

Now that we know this isn't a recent regression, we can focus on understanding the underlying cause and potential solutions.

@ahrtr
Copy link
Member

ahrtr commented Jun 11, 2024

Just had a quick talk with @serathius , we need to clearly define the behaviour of compaction, and also the impact on watch, Please anyone feel free to comment in https://docs.google.com/document/d/1APJE38DxENsRLLVapRz1CQ6UD4-uYaFutO12Y01VcNw/edit

Eventually we need to document the behaviour in etcd's official document

@henrybear327
Copy link
Contributor

Can someone take a look if the bug was introduced recently (introduced in one of the patch versions) or is an old one (tracing back to before v3.4)?

Following the clear manual etcdctl reproduce steps provided by @ahrtr I reproduced the issue on v3.4.0 so it looks like this has been present for at least ~5 years.

+1 on successfully reproducing the issue

@ahrtr
Copy link
Member

ahrtr commented Jun 11, 2024

Actions

@MadhavJivrajani
Copy link
Contributor

cc @shyamjvs @chaochn47 @dims

@siyuanfoundation
Copy link
Contributor

The code to handle compacted version in watch was written 8 years ago, and it never handled the w.minRev == compactRev case

if w.minRev < compactRev {
select {
case w.ch <- WatchResponse{WatchID: w.id, CompactRevision: compactRev}:
w.compacted = true
wg.delete(w)
default:
// retry next time
}
continue
}

@chaochn47
Copy link
Member

chaochn47 commented Jun 12, 2024

Just as a thought exercise, how can admin of etcd cluster be alerted about the dropped watch events issue in general?

In kubernetes/kubernetes#123072, apiserver_watch_cache_events_received_total{resource="pods"} is useful to detect watch event loss as the diff between two watch client (apiserver) diverges.

However, this issue seems difficult to catch by alarms because just one delete event is missed. Diff by 1 could be caused by different times that client restarted.

Trying to monitor in production and measure how many kubernetes clusters are impacted. Any insights are much appreciated !!

@serathius
Copy link
Member Author

Just as a thought exercise, how can admin of etcd cluster be alerted about the dropped watch events issue in general?

Excellent question! Unfortunately, there's no simple way for etcd cluster admins to be directly alerted about dropped watch events in a production environment. The robustness tests achieve this by collecting comprehensive data – requests from all clients, WAL entries, etc. – and performing in-depth analysis. This level of data collection and analysis isn't feasible in production due to the sheer volume of information involved.

This highlights the importance of rigorous, continuous robustness testing with diverse traffic types and failure injection. It allows us to thoroughly analyze smaller chunks of data in a controlled environment, identifying issues that might otherwise go unnoticed in the vastness of production data.

On Kubernetes side, I have some plans to improve watch cache observability. Stay tuned!

@MadhavJivrajani
Copy link
Contributor

Tried replicating #18089 (comment) in an e2e here: #18201

@ahrtr
Copy link
Member

ahrtr commented Jun 18, 2024

Always keep the compact revision, and delay compact tombstone revision #18162

Thanks @MadhavJivrajani . We need to add more tests ((or update existing tests)), please see all the proposed test cases as mentioned in #18162 (comment). Note I also added the test cases in the doc (under the option 2).

@fuweid @MadhavJivrajani Please consider a whole plan to resolve this issue. Please feel free to ping on slack if you want any discussion.

serathius added a commit to serathius/etcd that referenced this issue Jun 22, 2024
serathius added a commit to serathius/etcd that referenced this issue Jun 22, 2024
serathius added a commit to serathius/etcd that referenced this issue Jun 22, 2024
… other issues

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
serathius added a commit that referenced this issue Jun 24, 2024
Disable robustness test detection of #18089 to allow detecting other issues
ah8ad3 pushed a commit to ah8ad3/etcd that referenced this issue Jul 23, 2024
… other issues

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
fuweid added a commit to fuweid/etcd that referenced this issue Aug 1, 2024
…etecting other issues"

This reverts commit 4fe227c.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/robustness-testing priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release/v3.4 release/v3.5 type/bug
Development

No branches or pull requests

8 participants