Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubevirt e2e stabilizing refactor #4145

Closed

Conversation

qinqon
Copy link
Contributor

@qinqon qinqon commented Feb 8, 2024

- What this PR does and why is it needed
The kubevirt e2e tests are flaky, sometimes the persistent connection between tests and VM get broken and others the curl to ovn.org to check north soutgh traffic fail.

To fix those the following changes are introduced:

  • Replace agnhost http server with a simple tcp server and configure tcp client with keepalive
  • Replace systemd-resolved at the VM
  • Replace ovn.org with kubernetes.default.svc.cluster.local to verify north/south traffic without external domain dependency.

Closes #3986

- Special notes for reviewers
This PR has being tested with 12 jobs in parallel run two time at author's fork
https://github.com/qinqon/ovn-kubernetes/actions/runs/7830706778/job/21366399247?pr=10

- Description for the changelog
Stabilize kubeirt e2e tests

@coveralls
Copy link

coveralls commented Feb 8, 2024

Coverage Status

Changes unknown
when pulling a7aa34d on qinqon:kubevirt-e2e-stabilizing-refactor
into ** on ovn-org:master**.

@qinqon qinqon force-pushed the kubevirt-e2e-stabilizing-refactor branch from e13179a to ea76570 Compare February 8, 2024 18:31
Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Using the "Reuse" golang http client connection attribute to check that
the connection is not broken is a little hard to debug, this change
replace the http client and http agnhost server with a simple golang TCP
server injected using fedora coreos ignition, its simpler and test is
faster.

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
@qinqon qinqon force-pushed the kubevirt-e2e-stabilizing-refactor branch from ea76570 to a7aa34d Compare February 9, 2024 06:40
@qinqon
Copy link
Contributor Author

qinqon commented Feb 9, 2024

/hold
Is at better state but it has fail again, I will debug it.

@qinqon
Copy link
Contributor Author

qinqon commented Feb 9, 2024

We don't have "reset by peer" error at server this time and we see that the connection is at the server

Failing at

after live migration for the second time to node not owning subnet: Check connectivity is restored after delete deny all network policy
2024-02-09T07:20:48.1190075Z   �[38;5;9m[FAILED] worker1: after live migration for the second time to node not owning subnet: Check connectivity is restored after delete deny all network policy
2024-02-09T07:20:48.1191653Z   Expected success, but got an error:
2024-02-09T07:20:48.1192369Z       <*fmt.wrapError | 0xc000763d20>: 
2024-02-09T07:20:48.1193746Z       failed Write to server: write tcp 172.18.0.1:41446->172.18.0.3:31702: write: broken pipe
2024-02-09T07:20:48.1194723Z       {
2024-02-09T07:20:48.1196036Z           msg: "failed Write to server: write tcp 172.18.0.1:41446->172.18.0.3:31702: write: broken pipe",
2024-02-09T07:20:48.1197218Z           err: <*net.OpError | 0xc00123f9f0>{
2024-02-09T07:20:48.1197908Z               Op: "write",
2024-02-09T07:20:48.1198471Z               Net: "tcp",
2024-02-09T07:20:48.1199748Z               Source: <*net.TCPAddr | 0xc0006ed9b0>{IP: [172, 18, 0, 1], Port: 41446, Zone: ""},
2024-02-09T07:20:48.1201354Z               Addr: <*net.TCPAddr | 0xc0006edaa0>{IP: [172, 18, 0, 3], Port: 31702, Zone: ""},
2024-02-09T07:20:48.1202601Z               Err: <*os.SyscallError | 0xc000763d00>{
2024-02-09T07:20:48.1203335Z                   Syscall: "write",
2024-02-09T07:20:48.1203895Z                   Err: <syscall.Errno>0x20,
2024-02-09T07:20:48.1204307Z               },
2024-02-09T07:20:48.1204642Z           },
2024-02-09T07:20:48.1204937Z       }�[0m
2024-02-09T07:20:48.1206276Z   �[38;5;9mIn �[1m[It]�[0m�[38;5;9m at: �[1m/home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/kubevirt.go:401�[0m �[38;5;243m@ 02/09/24 07:20:46.845�[0m
2024-02-09T07:17:45.5892685Z 2024/02/09 07:16:01 Handling connection 100.64.0.3:37592
2024-02-09T07:17:45.5894256Z 2024/02/09 07:16:02 Handling connection [fd98::3]:46496

@qinqon qinqon closed this Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FAIL] Kubevirt Virtual Machines when live migrated [It] with pre-copy should keep connectivity
2 participants