Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deflake TestTracing #15501

Merged
merged 1 commit into from
Mar 17, 2023
Merged

deflake TestTracing #15501

merged 1 commit into from
Mar 17, 2023

Conversation

chaochn47
Copy link
Member

@chaochn47 chaochn47 commented Mar 17, 2023

TestTracing has been failing 4 times in the past 2 months in test and test-arm branch.

default randomized election timeout is 1 to 2s, single node will fast-forward 900ms

change the timeout from 1 to 5 seconds to ensure de-flaking this test

cc @fuweid @ahrtr @serathius

    logger.go:130: 2023-03-16T02:01:05.478Z	INFO	default.raft	8e9e05c52164694d is starting a new election at term 1
    logger.go:130: 2023-03-16T02:01:05.478Z	INFO	default.raft	8e9e05c52164694d became pre-candidate at term 1
    logger.go:130: 2023-03-16T02:01:05.478Z	INFO	default.raft	8e9e05c52164694d received MsgPreVoteResp from 8e9e05c52164694d at term 1
    logger.go:130: 2023-03-16T02:01:05.478Z	INFO	default.raft	8e9e05c52164694d became candidate at term 2
    logger.go:130: 2023-03-16T02:01:05.478Z	INFO	default.raft	8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 2
    logger.go:130: 2023-03-16T02:01:05.478Z	INFO	default.raft	8e9e05c52164694d became leader at term 2
    logger.go:130: 2023-03-16T02:01:05.478Z	INFO	default.raft	raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 2
    tracing_test.go:74: failed to start embed.Etcd for test
    logger.go:130: 2023-03-16T02:01:05.478Z	INFO	default	closing etcd server	{"name": "default", "data-dir": "/tmp/TestTracing3718644992/001", "advertise-peer-urls": ["http://localhost:2380"], "advertise-client-urls": ["http://localhost:2379"]}
    logger.go:130: 2023-03-16T02:01:05.479Z	INFO	default	setting up initial cluster version using v3 API	{"cluster-version": "3.6"}
    logger.go:130: 2023-03-16T02:01:05.479Z	INFO	default	ready to serve client requests
    logger.go:130: 2023-03-16T02:01:05.479Z	INFO	default	published local member to cluster through raft	{"local-member-id": "8e9e05c52164694d", "local-member-attributes": "{Name:default ClientURLs:[http://localhost:2379]}", "cluster-id": "cdf818194e3a8c32", "publish-timeout": "7s"}

https://github.com/etcd-io/etcd/actions/runs/4432753985/jobs/7777120960

dev-dsk-chaochn-2c-a26acd76 % etcd-test-analyzer run -token $GITHUB_TOKEN -max-age=1440h -workflow Tests -branch main
Fetching workflow runs from GitHub
Analyzing workflow "Tests"
    [Tests]: Finding workflow runs
    [Tests]: Downloading artifacts
    [Tests]: Downloading artifacts for Tests to "/tmp/test-results-370924858"
    [Tests]: no artifacts found
    [Tests]: no artifacts found
    [Tests]: no artifacts found
    [Tests]: no artifacts found
    [Tests]: no artifacts found
    [Tests]: no artifacts found
    [Tests]: no artifacts found
    [Tests]: no artifacts found
    [Tests]: no artifacts found
Runs: 149, Pass: 125, Fail: 24, Pcnt: 0.838926
│──────────────────────────────────────────────────────────│────────────────────────────────────────────────────────│───────────────│─────────────────────────────────────────────────────────│
│ TEST NAME (15)                                           │ PACKAGE                                                │ FAILURE COUNT │ RECENT FAILURE                                          │
│──────────────────────────────────────────────────────────│────────────────────────────────────────────────────────│───────────────│─────────────────────────────────────────────────────────│
│ TestLeaseDeleteRangeContendDel                           │ go.etcd.io/etcd/tests/v3/integration/clientv3/lease    │             4 │ https://github.com/etcd-io/etcd/actions/runs/4210662140 │
│ TestIssue2904                                            │ go.etcd.io/etcd/tests/v3/integration                   │             4 │ https://github.com/etcd-io/etcd/actions/runs/4340865333 │
│ ExampleCluster_memberUpdate                              │ go.etcd.io/etcd/tests/v3/integration/clientv3/examples │             4 │ https://github.com/etcd-io/etcd/actions/runs/4434917453 │
│ TestLeasingDeleteRangeContendTxn                         │ go.etcd.io/etcd/tests/v3/integration/clientv3/lease    │             3 │ https://github.com/etcd-io/etcd/actions/runs/4168814972 │
│ TestAuthMemberRemove                                     │ go.etcd.io/etcd/tests/v3/common                        │             2 │ https://github.com/etcd-io/etcd/actions/runs/4442362367 │
│ TestMain                                                 │                                                        │             2 │ https://github.com/etcd-io/etcd/actions/runs/4312405975 │
│ TestTracing                                              │ go.etcd.io/etcd/tests/v3/integration                   │             1 │ https://github.com/etcd-io/etcd/actions/runs/4114566574 │
│ TestPeriodicCheckDetectsCorruption                       │ go.etcd.io/etcd/tests/v3/integration                   │             1 │ https://github.com/etcd-io/etcd/actions/runs/4219887092 │
│ TestMemberList                                           │ go.etcd.io/etcd/tests/v3/common                        │             1 │ https://github.com/etcd-io/etcd/actions/runs/4082111151 │
│ TestEtcdVersionFromWAL                                   │ go.etcd.io/etcd/tests/v3/integration                   │             1 │ https://github.com/etcd-io/etcd/actions/runs/4340931587 │
│ TestAuthority                                            │ go.etcd.io/etcd/tests/v3/integration                   │             1 │ https://github.com/etcd-io/etcd/actions/runs/4044117995 │
│ TestMemberList/PeerAutoTLS                               │ go.etcd.io/etcd/tests/v3/common                        │             1 │ https://github.com/etcd-io/etcd/actions/runs/4082111151 │
│ TestAuthority/Size:_3,_Scenario:_"unixs://absolute_path" │ go.etcd.io/etcd/tests/v3/integration                   │             1 │ https://github.com/etcd-io/etcd/actions/runs/4044117995 │
│ TestV3AuthWithLeaseRevokeWithRootJWT                     │ go.etcd.io/etcd/tests/v3/integration                   │             1 │ https://github.com/etcd-io/etcd/actions/runs/3991720228 │
│ TestMaxLearnerInCluster                                  │ go.etcd.io/etcd/tests/v3/integration/clientv3          │             1 │ https://github.com/etcd-io/etcd/actions/runs/4326783729 │
│──────────────────────────────────────────────────────────│────────────────────────────────────────────────────────│───────────────│─────────────────────────────────────────────────────────│
Analyzing workflow "Tests-arm64"
    [Tests-arm64]: Finding workflow runs
    [Tests-arm64]: Downloading artifacts
    [Tests-arm64]: Downloading artifacts for Tests-arm64 to "/tmp/test-results-3975549114"
Runs: 42, Pass: 38, Fail: 4, Pcnt: 0.904762
│────────────────────────│──────────────────────────────────────│───────────────│─────────────────────────────────────────────────────────│
│ TEST NAME              │ PACKAGE                              │ FAILURE COUNT │ RECENT FAILURE                                          │
│────────────────────────│──────────────────────────────────────│───────────────│─────────────────────────────────────────────────────────│
│ TestTracing            │ go.etcd.io/etcd/tests/v3/integration │             3 │ https://github.com/etcd-io/etcd/actions/runs/4432753985 │
│ TestMemberList         │ go.etcd.io/etcd/tests/v3/common      │             1 │ https://github.com/etcd-io/etcd/actions/runs/4149168604 │
│ TestMemberList/PeerTLS │ go.etcd.io/etcd/tests/v3/common      │             1 │ https://github.com/etcd-io/etcd/actions/runs/4149168604 │
│────────────────────────│──────────────────────────────────────│───────────────│─────────────────────────────────────────────────────────│

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

Signed-off-by: Chao Chen <chaochn@amazon.com>
case <-time.After(1 * time.Second):
case <-time.After(5 * time.Second):
// default randomized election timeout is 1 to 2s, single node will fast-forward 900ms
// change the timeout from 1 to 5 seconds to ensure de-flaking this test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense to me. But could you please clarify what does "single node will fast-forward 900ms" mean?

Copy link
Member Author

@chaochn47 chaochn47 Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    logger.go:130: 2023-03-16T02:01:04.478Z	INFO	default	started as single-node; fast-forwarding election ticks	{"local-member-id": "8e9e05c52164694d", "forward-ticks": 9, "forward-duration": "900ms", "election-ticks": 10, "election-timeout": "1s"}

Theoretically maximum wait time for the leader of the single node cluster to be elected should be 2s - 900ms = 1.1s, not including MsgVote and Response round trip.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for the clarification.

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @chaochn47

@ahrtr ahrtr merged commit 6cfe4bc into etcd-io:main Mar 17, 2023
@chaochn47 chaochn47 deleted the defake_TestTracing branch March 17, 2023 23:26
@fuweid
Copy link
Member

fuweid commented Mar 18, 2023

@chaochn47 Thanks! BTW, the etcd-test-analyzer is open-source tool? It looks awesome! I build ghacli to fetch the log but no detail like that.

@chaochn47
Copy link
Member Author

Hi @fuweid

Kudos to @endocrimes

I was inspired from the comment #13167 (comment)

@serathius
Copy link
Member

Great to see the tool in use!

@fuweid
Copy link
Member

fuweid commented Mar 19, 2023

@chaochn47 Thanks! It looks very good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants