Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GetPodLogs failures in NetworkPolicy e2e tests #85897

Merged
merged 1 commit into from Jan 13, 2020

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Dec 4, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
GetPodLogs always fails when the tests fail, which is because the tests specify wrong container names when getting logs.

When creating a client Pod, it specifies <podName>-container as container name and <podName>- as Pod GenerateName. For instance, podName client-a will result in client-a-container as the container name and client-a-vx5sv as the actual Pod name, but it always uses the actual Pod name to construct the container name when getting logs, e.g. client-a-vx5sv-container.

This patch fixes it by specifying the same static container name when creating Pod and getting logs.

Which issue(s) this PR fixes:

Fixes #85896

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 4, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @tnqn. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 4, 2019
@k8s-ci-robot k8s-ci-robot added area/test sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 4, 2019
@tnqn
Copy link
Member Author

tnqn commented Dec 4, 2019

/assign @danwinship

@danwinship
Copy link
Contributor

/ok-to-test
can you push a temporary patch to break one of the NetworkPolicy tests so that we can see that the new log-gathering code works?

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 4, 2019
@danwinship
Copy link
Contributor

oh wait, do they even get run in e2e-gce?

@tnqn
Copy link
Member Author

tnqn commented Dec 4, 2019

oh wait, do they even get run in e2e-gce?

I think they don't run in CI, because I also found a failing test case, regardless of which CNI is enabled. I'm about to file an issue for that one.

@tnqn
Copy link
Member Author

tnqn commented Dec 4, 2019

@danwinship I have create issue #85908 for the failing test I mentioned, also created PR #85909 to fix it, could you also take a look when you are available? Thanks for reviewing!

Since the tests are not running in K8s CI, do you have suggestions on how I can prove the PRs are working correctly? I have my own e2e logs, not sure if it helps.

@tnqn
Copy link
Member Author

tnqn commented Dec 4, 2019

/retest

1 similar comment
@tnqn
Copy link
Member Author

tnqn commented Dec 5, 2019

/retest

@danwinship
Copy link
Contributor

Since the tests are not running in K8s CI, do you have suggestions on how I can prove the PRs are working correctly? I have my own e2e logs, not sure if it helps.

oh, I know, you can just remove the [Feature:NetworkPolicy] from the name of whichever test you modify to be broken. Then it will get run, fail, and test out the newly-fixed log-gathering code.

@tnqn tnqn force-pushed the e2e-container-log branch 2 times, most recently from f89f1a8 to c9f085d Compare January 13, 2020 10:20
@tnqn
Copy link
Member Author

tnqn commented Jan 13, 2020

Since the tests are not running in K8s CI, do you have suggestions on how I can prove the PRs are working correctly? I have my own e2e logs, not sure if it helps.

oh, I know, you can just remove the [Feature:NetworkPolicy] from the name of whichever test you modify to be broken. Then it will get run, fail, and test out the newly-fixed log-gathering code.

Thanks a lot @danwinship! Following your suggestion, I reproduced the pod log collection failure by enabling test NetworkPolicy between server and client should allow ingress access from updated pod, see https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/85897/pull-kubernetes-e2e-kind/1216666418167156737:

test/e2e/network/network_policy.go:823
Jan 13 10:33:01.088: Error getting container logs: the server rejected our request for an unknown reason (get pods client-a-v8c88)
test/e2e/network/network_policy.go:1457

With this PR, pod log can be collected correctly, see https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/85897/pull-kubernetes-e2e-kind/1216656351183245313:

test/e2e/network/network_policy.go:823
Jan 13 10:01:59.531: Pod client-a-znwtb should not be able to connect to service svc-server, but was able to connect.
Pod logs:
svc-server.network-policy-9459 (10.96.51.130:80) open

 Current NetworkPolicies:
	[{{ } {allow-pod-b-via-pod-selector  network-policy-9459 /apis/networking.k8s.io/v1/namespaces/network-policy-9459/networkpolicies/allow-pod-b-via-pod-selector d7f332a5-d341-4ea2-9c59-69cc30a40f12 29350 1 2020-01-13 10:01:51 +0000 UTC <nil> <nil> map[] map[] [] []  []} {{map[pod-name:server] []} [{[] [{&LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{LabelSelectorRequirement{Key:pod-name,Operator:DoesNotExist,Values:[],},},} nil nil}]}] [] [Ingress]}}]

 Pods:
	 [Pod: server-x7qtq, Status: &PodStatus{Phase:Running,Conditions:[]PodCondition{PodCondition{Type:Initialized,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2020-01-13 10:01:13 +0000 UTC,Reason:,Message:,},PodCondition{Type:Ready,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2020-01-13 10:01:24 +0000 UTC,Reason:,Message:,},PodCondition{Type:ContainersReady,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2020-01-13 10:01:24 +0000 UTC,Reason:,Message:,},PodCondition{Type:PodScheduled,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2020-01-13 10:01:13 +0000 UTC,Reason:,Message:,},},Message:,Reason:,HostIP:172.17.0.2,PodIP:10.244.2.125,StartTime:2020-01-13 10:01:13 +0000 UTC,ContainerStatuses:[]ContainerStatus{ContainerStatus{Name:server-container-80,State:ContainerState{Waiting:nil,Running:&ContainerStateRunning{StartedAt:2020-01-13 10:01:17 +0000 UTC,},Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:true,RestartCount:0,Image:gcr.io/kubernetes-e2e-test-images/agnhost:2.8,ImageID:gcr.io/kubernetes-e2e-test-images/agnhost@sha256:daf5332100521b1256d0e3c56d697a238eaec3af48897ed9167cbadd426773b5,ContainerID:containerd://eb4b4365cac60c4fcfc1bdf21e2bc1505292359ba91a3c67934b1a833ab9cf3b,Started:*true,},ContainerStatus{Name:server-container-81,State:ContainerState{Waiting:nil,Running:&ContainerStateRunning{StartedAt:2020-01-13 10:01:18 +0000 UTC,},Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:true,RestartCount:0,Image:gcr.io/kubernetes-e2e-test-images/agnhost:2.8,ImageID:gcr.io/kubernetes-e2e-test-images/agnhost@sha256:daf5332100521b1256d0e3c56d697a238eaec3af48897ed9167cbadd426773b5,ContainerID:containerd://f675d9accd15dee0246f852745b7525eb03ed1cd9e98522cdd2faac4355b2b3e,Started:*true,},},QOSClass:BestEffort,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{PodIP{IP:10.244.2.125,},},EphemeralContainerStatuses:[]ContainerStatus{},}
 Pod: server-x7qtq, Status: &PodStatus{Phase:Running,Conditions:[]PodCondition{PodCondition{Type:Initialized,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2020-01-13 10:01:13 +0000 UTC,Reason:,Message:,},PodCondition{Type:Ready,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2020-01-13 10:01:24 +0000 UTC,Reason:,Message:,},PodCondition{Type:ContainersReady,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2020-01-13 10:01:24 +0000 UTC,Reason:,Message:,},PodCondition{Type:PodScheduled,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2020-01-13 10:01:13 +0000 UTC,Reason:,Message:,},},Message:,Reason:,HostIP:172.17.0.2,PodIP:10.244.2.125,StartTime:2020-01-13 10:01:13 +0000 UTC,ContainerStatuses:[]ContainerStatus{ContainerStatus{Name:server-container-80,State:ContainerState{Waiting:nil,Running:&ContainerStateRunning{StartedAt:2020-01-13 10:01:17 +0000 UTC,},Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:true,RestartCount:0,Image:gcr.io/kubernetes-e2e-test-images/agnhost:2.8,ImageID:gcr.io/kubernetes-e2e-test-images/agnhost@sha256:daf5332100521b1256d0e3c56d697a238eaec3af48897ed9167cbadd426773b5,ContainerID:containerd://eb4b4365cac60c4fcfc1bdf21e2bc1505292359ba91a3c67934b1a833ab9cf3b,Started:*true,},ContainerStatus{Name:server-container-81,State:ContainerState{Waiting:nil,Running:&ContainerStateRunning{StartedAt:2020-01-13 10:01:18 +0000 UTC,},Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:true,RestartCount:0,Image:gcr.io/kubernetes-e2e-test-images/agnhost:2.8,ImageID:gcr.io/kubernetes-e2e-test-images/agnhost@sha256:daf5332100521b1256d0e3c56d697a238eaec3af48897ed9167cbadd426773b5,ContainerID:containerd://f675d9accd15dee0246f852745b7525eb03ed1cd9e98522cdd2faac4355b2b3e,Started:*true,},},QOSClass:BestEffort,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{PodIP{IP:10.244.2.125,},},EphemeralContainerStatuses:[]ContainerStatus{},}
]

test/e2e/network/network_policy.go:1477

GetPodLogs always fails when the tests fail, which is because the tests
specify wrong container names when getting logs.

When creating a client Pod, it specifies "<podName>-container" as
container name and "<podName>-" as Pod GenerateName. For instance,
podName "client-a" will result in "client-a-container" as the container
name and "client-a-vx5sv" as the actual Pod name, but it always uses the
actual Pod name to construct the container name when getting logs, e.g.
"client-a-vx5sv-container".

This patch fixes it by specifying the same static container name when
creating Pod and getting logs.
@tnqn
Copy link
Member Author

tnqn commented Jan 13, 2020

/retest

1 similar comment
@tnqn
Copy link
Member Author

tnqn commented Jan 13, 2020

/retest

@danwinship
Copy link
Contributor

/lgtm
/approve
thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 13, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, tnqn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 13, 2020
@k8s-ci-robot k8s-ci-robot merged commit 1c51c44 into kubernetes:master Jan 13, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Jan 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NetworkPolicy e2e tests fail to get container logs
3 participants