Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Spawn poststart / prestop pods on the same node as the http pod #101063

Merged
merged 1 commit into from Aug 12, 2021

Conversation

claudiubelu
Copy link
Contributor

What type of PR is this?

/kind flake
/sig testing
/sig networking

/priority important-soon

What this PR does / why we need it:

In the case of multinode clusters, the http server pod and the test cluster can spawn on different nodes, which can be problematic for poststart / prestop hooks, as they are executed by the kubelet itself, and the cross-node lifecycle hook might
fail (according to the Kubernetes network model, it is not mandatory for kubelet to be able to access pods on a different node).

This PR ensures that the test pod spawns on the same node as the http server pod.

Which issue(s) this PR fixes:

Fixes #101062

Special notes for your reviewer:

Flakes:

https://prow.k8s.io/view/gs/k8s-ovn/logs/k8s-e2e-ltsc2019-containerd-flannel-sdnoverlay-master/1380264207156514816
https://prow.k8s.io/view/gs/k8s-ovn/logs/k8s-e2e-ltsc2019-containerd-flannel-sdnoverlay-master/1380083012074475520

Not flaking:

https://testgrid.k8s.io/sig-windows-networking#sac1909-containerd-flannel-sdnbridge-stable

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/flake Categorizes issue or PR as related to a flaky test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Apr 13, 2021
@k8s-ci-robot
Copy link
Contributor

@claudiubelu: The label(s) sig/networking cannot be applied, because the repository doesn't have them.

In response to this:

What type of PR is this?

/kind flake
/sig testing
/sig networking

/priority important-soon

What this PR does / why we need it:

In the case of multinode clusters, the http server pod and the test cluster can spawn on different nodes, which can be problematic for poststart / prestop hooks, as they are executed by the kubelet itself, and the cross-node lifecycle hook might
fail (according to the Kubernetes network model, it is not mandatory for kubelet to be able to access pods on a different node).

This PR ensures that the test pod spawns on the same node as the http server pod.

Which issue(s) this PR fixes:

Fixes #101062

Special notes for your reviewer:

Flakes:

https://prow.k8s.io/view/gs/k8s-ovn/logs/k8s-e2e-ltsc2019-containerd-flannel-sdnoverlay-master/1380264207156514816
https://prow.k8s.io/view/gs/k8s-ovn/logs/k8s-e2e-ltsc2019-containerd-flannel-sdnoverlay-master/1380083012074475520

Not flaking:

https://testgrid.k8s.io/sig-windows-networking#sac1909-containerd-flannel-sdnbridge-stable

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 13, 2021
@k8s-ci-robot k8s-ci-robot added area/test sig/node Categorizes an issue or PR as relevant to SIG Node. labels Apr 13, 2021
@yangjunmyfm192085
Copy link
Contributor

/retest

@SergeyKanzhelev
Copy link
Member

/triage accepted

/retest

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 14, 2021
@SergeyKanzhelev SergeyKanzhelev moved this from Triage to PRs - Needs Reviewer in SIG Node CI/Test Board Apr 14, 2021
@SergeyKanzhelev
Copy link
Member

/assign @fromanirh

@claudiubelu
Copy link
Contributor Author

/test pull-kubernetes-integration

@jsturtevant jsturtevant added this to InProgress (v1.22) in SIG-Windows Apr 22, 2021
@jsturtevant
Copy link
Contributor

Only in Windows Overlay networks (ie Flannel where these tests are flaking) does the pod have to be on the same node. These tests are not flaky on l2bridge setup. I would be more inclined to disable this test for Windows Overlay test setups only and add a sig-windows test for the overlay functionality that requires it to be on the same node (but not sure if that is useful in an meaningful way?)

@claudiubelu claudiubelu force-pushed the tests/lifecycle-hooks branch 3 times, most recently from 62359bd to 90cd3d6 Compare May 18, 2021 14:23
@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 8, 2021
@ehashman ehashman moved this from PRs - Needs Reviewer to PRs Waiting on Author in SIG Node CI/Test Board Jun 10, 2021
In the case of multinode clusters, the http server pod and the test cluster can
spawn on different nodes, which can be problematic for  poststart / prestop hooks,
as they are executed by the kubelet itself, and the cross-node lifecycle hook might
fail (according to the Kubernetes network model, it is not mandatory for kubelet to
be able to access pods on a different node).

This commit ensures that the test pod spawns on the same node as the http server pod.
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2021
@aojea
Copy link
Member

aojea commented Jun 23, 2021

/hold cancel
/lgtm
Thanks for your patience and for bringing up this topic 😅
/assign @thockin @johnbelamaric
#101063 (comment)

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 23, 2021
@aojea
Copy link
Member

aojea commented Jun 23, 2021

what is this conformance pod doing?
the job timesout because it fails

I0623 13:03:39.428] Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
I0623 13:03:39.428]                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
I0623 13:03:39.428] Events:
I0623 13:03:39.428]   Type    Reason   Age   From     Message
I0623 13:03:39.429]   ----    ------   ----  ----     -------
I0623 13:03:39.429]   Normal  Killing  3s    kubelet  Stopping container conformance-container
I0623 13:03:39.613] unable to retrieve container logs for containerd://f46f826857231d4c95cb452518456a4aed2d4f28d73d220ca5a5381090ae587e/workspace/_artifacts/logs
W0623 13:03:39.714] + cleanup

@aojea
Copy link
Member

aojea commented Jun 23, 2021

/test pull-kubernetes-conformance-image-test

this job is very cryptic, I can't see any meaningful log, however, the kind jobs passed so I assume it should be a problem with that specific job

@SergeyKanzhelev SergeyKanzhelev moved this from PRs Waiting on Author to PRs - Needs Approver in SIG Node CI/Test Board Jul 13, 2021
@ehashman ehashman moved this from Triage to Needs Approver in SIG Node PR Triage Jul 22, 2021
@marosset marosset moved this from InProgress (v1.22) to In Progress (v1.23) in SIG-Windows Jul 29, 2021
@marosset marosset moved this from In Progress (v1.23) to Reviewed - Needs Approval From Other SIGs in SIG-Windows Jul 29, 2021
@claudiubelu
Copy link
Contributor Author

Needs milestone.

Copy link
Member

@spiffxp spiffxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm
/milestone v1.23

@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Aug 12, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: claudiubelu, spiffxp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 12, 2021
@k8s-ci-robot k8s-ci-robot merged commit e6c7837 into kubernetes:master Aug 12, 2021
SIG-Windows automation moved this from Reviewed - Needs Approval From Other SIGs to Done (v1.23) Aug 12, 2021
SIG Node CI/Test Board automation moved this from PRs - Needs Approver to Done Aug 12, 2021
SIG Node PR Triage automation moved this from Needs Approver to Done Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/conformance Issues or PRs related to kubernetes conformance tests area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/flake Categorizes issue or PR as related to a flaky test. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Archived in project
SIG-Windows
  
Done (v1.23)
Development

Successfully merging this pull request may close these issues.

HTTP Lifecycle hook tests flake on multinode clusters
10 participants