[Bug]: TestHAKeeperCanBootstrapAndRepairShards failed #8438

w-zr · 2023-03-14T07:01:10Z

Is there an existing issue for the same bug?

I have checked the existing issues.

Environment

- Version or commit-id (e.g. v0.1.0 or 8b23a93):
- Hardware parameters:
- OS type:
- Others:

Actual Behavior

Expected Behavior

UT should pass.

Steps to Reproduce

No response

Additional information

No response

volgariver6 · 2023-04-10T07:25:24Z

刚创建时的日志不全，跑ut时又发生了一次，上传日志
fail.log

gouhongshen · 2023-06-16T04:17:41Z

the error seems caused by timeout(2s) when invoke state, err := store1.getCheckerState().

and the log detail shows that some communication problems exist between the raft nodes (and the gossip nodes).

after 1000+ go test run -v TestHAKeeperCanBootstrapAndRepairShards and a dozen times make ut without error shows, I tried traffic control by executing shell command:

tc qdisc add dev lo root netem delay 50ms 10ms
tc qdisc add dev lo root netem loss 10%

or add some random sleep in dragonbaot/internal/raft/raft.go::handleHeartbeatMessage().

And I got a similar timeout error, but not totally the same:

the same:

1. leader lost quorum
2. HAKeeper cannot be bootstrapping before the timeout
3. same term num elapsed

deference：

1. no gossip nodes have been marked as failed

so I am not sure it must be the network issue.

gouhongshen · 2023-07-07T10:15:36Z

the memberlist: errors may relate to these issues:

all mentioned k8s environment

gouhongshen · 2023-08-24T10:28:58Z

It's been a long time since it happened again, may close.

YANGGMM · 2023-11-29T05:08:43Z

又发现了
failed link:https://github.com/matrixorigin/matrixone/actions/runs/7027670848/job/19122446502?pr=13068

gouhongshen · 2023-12-04T10:33:06Z

TestHAKeeperCanBootstrapAndRepairShards.txt

gouhongshen · 2024-01-03T11:52:38Z

not working on it

gouhongshen · 2024-01-08T12:10:05Z

not working on it

gouhongshen · 2024-01-11T12:47:48Z

augment the hakeeperDefaultTimeout config may work.

heni02 · 2024-01-23T03:02:18Z

最近CI UT没有失败，先关闭

 https://github.com/matrixorigin/matrixone/actions/runs/7619951914/job/20753888415?pr=14324

w-zr added kind/bug Something isn't working needs-triage labels Mar 14, 2023

w-zr assigned volgariver6 Mar 14, 2023

volgariver6 assigned gouhongshen and unassigned volgariver6 Jun 6, 2023

gouhongshen assigned w-zr and unassigned gouhongshen Aug 24, 2023

w-zr closed this as completed Aug 24, 2023

YANGGMM reopened this Nov 29, 2023

YANGGMM assigned gouhongshen and unassigned w-zr Nov 29, 2023

sukki37 added bug/ut severity/s0 Extreme impact: Cause the application to break down and seriously affect the use and removed needs-triage labels Jan 2, 2024

sukki37 added this to the 1.2.0 milestone Jan 2, 2024

This was referenced Jan 17, 2024

augment the hakeeperDefaultTimeout in ut to avoid TestHAKeeperCanBootstrapAndRepairShards failed #14252

Merged

sync to 1.1: augment the hakeeperDefaultTimeout in ut to avoid TestHAKeeperCanBootstrapAndRepairShards failed #14253

Merged

gouhongshen assigned heni02 and unassigned gouhongshen Jan 18, 2024

sukki37 added the resolved/v1.1.1 label Jan 22, 2024

heni02 closed this as completed Jan 23, 2024

matrix-meow reopened this Jan 23, 2024

heni02 assigned w-zr and unassigned heni02 Jan 23, 2024

w-zr closed this as completed Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: TestHAKeeperCanBootstrapAndRepairShards failed #8438

[Bug]: TestHAKeeperCanBootstrapAndRepairShards failed #8438

w-zr commented Mar 14, 2023

volgariver6 commented Apr 10, 2023

gouhongshen commented Jun 16, 2023 •

edited

Loading

gouhongshen commented Jul 7, 2023 •

edited

Loading

gouhongshen commented Aug 24, 2023

YANGGMM commented Nov 29, 2023

gouhongshen commented Dec 4, 2023

gouhongshen commented Jan 3, 2024

gouhongshen commented Jan 8, 2024

gouhongshen commented Jan 11, 2024

heni02 commented Jan 23, 2024

[Bug]: TestHAKeeperCanBootstrapAndRepairShards failed #8438

[Bug]: TestHAKeeperCanBootstrapAndRepairShards failed #8438

Comments

w-zr commented Mar 14, 2023

Is there an existing issue for the same bug?

Environment

Actual Behavior

Expected Behavior

Steps to Reproduce

Additional information

volgariver6 commented Apr 10, 2023

gouhongshen commented Jun 16, 2023 • edited Loading

gouhongshen commented Jul 7, 2023 • edited Loading

gouhongshen commented Aug 24, 2023

YANGGMM commented Nov 29, 2023

gouhongshen commented Dec 4, 2023

gouhongshen commented Jan 3, 2024

gouhongshen commented Jan 8, 2024

gouhongshen commented Jan 11, 2024

heni02 commented Jan 23, 2024

gouhongshen commented Jun 16, 2023 •

edited

Loading

gouhongshen commented Jul 7, 2023 •

edited

Loading