Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Richgo hangs after the test finish with a failure #30

Closed
nmiculinic opened this issue Jun 16, 2020 · 4 comments
Closed

Richgo hangs after the test finish with a failure #30

nmiculinic opened this issue Jun 16, 2020 · 4 comments
Assignees

Comments

@nmiculinic
Copy link

In the https://github.com/kubermatic/kubecarrier project we're using richgo for parsing the test output:

https://github.com/kubermatic/kubecarrier/blob/e2e-explorations/hack/.e2e-test.sh

kubectl kubecarrier e2e-test run --test.v --test.failfast --test-id=${TEST_ID} | richgo testfilter

Sometimes after the failing test, the richgo hangs. Here's the output from stdout/err:

...
     |     --- FAIL: Integration/apiserver (65.18s)
     |         --- PASS: Integration/apiserver/account-service (1.39s)
     |         --- PASS: Integration/apiserver/region-service (2.20s)
     |         --- PASS: Integration/apiserver/provider-service (2.58s)
     |         --- FAIL: Integration/apiserver/offering-service (60.02s)
     |         --- FAIL: Integration/apiserver/instance-service (137.03s)
FAIL

and after running ps axf I see only richgo is still running; thus my own testing binary producing output has closed.

486497 pts/6    Ss     0:06              \_ /usr/bin/zsh -i
 558511 pts/6    S+     0:00              |   \_ make e2e-test
 569591 pts/6    S+     0:00              |       \_ /bin/bash ./hack/.e2e-test.sh
 569593 pts/6    Sl+    0:00              |           \_ richgo testfilter
 569602 pts/6    S+     0:00              |               \_ cat -

After stracing it:

 ▲ ~/Desktop/kubecarrier sudo strace -fp 569593                                                                                                                                                                                                                                                                                                                           
strace: Process 569593 attached with 5 threads
[pid 569601] futex(0xc000074148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 569600] futex(0xc00004e848, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 569599] epoll_pwait(5,  <unfinished ...>
[pid 569598] restart_syscall(<... resuming interrupted read ...> <unfinished ...>
[pid 569593] waitid(P_PID, 569602, 

Now I have two questions:

  • why does it hang?
  • Why it's executing cat - command

Is there anything more I could do to debug this issue?

@kyoh86 kyoh86 self-assigned this Jun 17, 2020
@kyoh86
Copy link
Owner

kyoh86 commented Jun 28, 2020

I cannot run your test (kubecarrier)
Please give me raw result to find a bug.

kubectl kubecarrier e2e-test run --test.v --test.failfast --test-id=${TEST_ID}  > raw.txt

@kyoh86
Copy link
Owner

kyoh86 commented Jun 28, 2020

Why it's executing cat - command

It's executing cat - to use factoryFunc and editor.Editor interfaces.
It may be able to be refactored (but I have no idea)

@nmiculinic
Copy link
Author

nmiculinic commented Jun 30, 2020

Currently, the situation is as follows:

  • This is highly non-deterministic and hard to replicate
  • Our e2e tests have improved in stability recently, with 0.2.0 release. Even so, they take about 5+ min on average
  • we're not using richgo anymore since we only used it for coloring and prow(CI) doesn't support coloring. We also implemented a small test-line aggregation/sorting post-processing since we're running a lot of parallel tests and their outputs are intertwined. (P.S. go tool test2json is buggy with parallel tests)

Thus I don't have time capacity replicating the issue anymore. If you need help running the e2e tests, I'll gladly help you. For getting started try running them in privileged docker container quay.io/kubecarrier/test as specified in .prow.yaml file how they are being run in the CI.

EDIT: I've checked the exact commit hash, 99fde20f836c1a9c20bfafadd636941cb6deb762 is where e2e-exploring branch currently points to locally (it doesn't appear present in the upstream repo)

https://github.com/kubermatic/kubecarrier/blob/99fde20f836c1a9c20bfafadd636941cb6deb762/hack/.e2e-test.sh

@kyoh86
Copy link
Owner

kyoh86 commented Jul 1, 2020

I see.
Not being worried about this issue, anybody won't check it out.

@kyoh86 kyoh86 closed this as completed Jul 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants