Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky Test] PodZoneAffinity tests #6671

Closed
shafeeqes opened this issue Sep 13, 2022 · 7 comments · Fixed by #6724
Closed

[Flaky Test] PodZoneAffinity tests #6671

shafeeqes opened this issue Sep 13, 2022 · 7 comments · Fixed by #6724
Assignees
Labels
area/testing Testing related kind/flake Tracking or fixing a flaky test

Comments

@shafeeqes
Copy link
Contributor

How to categorize this issue?

/area testing
/kind flake

Which test(s)/suite(s) are flaking:

PodZoneAffinity tests
/home/prow/go/src/github.com/gardener/gardener/test/integration/resourcemanager/podzoneaffinity/podzoneaffinity_test.go:26
  when namespace has zone enforcement label with value
  /home/prow/go/src/github.com/gardener/gardener/test/integration/resourcemanager/podzoneaffinity/podzoneaffinity_test.go:79
    [It] should add nodeAffinity
    /home/prow/go/src/github.com/gardener/gardener/test/integration/resourcemanager/podzoneaffinity/podzoneaffinity_test.go:92
  Expected
      <*v1.Affinity | 0x0>: nil
  not to be nil
  In [It] at: /home/prow/go/src/github.com/gardener/gardener/test/integration/resourcemanager/podzoneaffinity/podzoneaffinity_test.go:96

CI link:
https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/6668/pull-gardener-integration/1569616104525926400

Reason for failure:
Mentioned above.

Anything else we need to know:

@rfranzke
Copy link
Member

cc @timuthy

@timuthy
Copy link
Contributor

timuthy commented Sep 15, 2022

I tried to find out more about the flaky test and did some stress testing with more logging:

•! [PANICKED] [0.030 seconds]
PodZoneAffinity tests
/Users/tim/go/src/github.com/gardener/gardener/test/integration/resourcemanager/podzoneaffinity/podzoneaffinity_test.go:28
  when namespace has zone enforcement label with value
  /Users/tim/go/src/github.com/gardener/gardener/test/integration/resourcemanager/podzoneaffinity/podzoneaffinity_test.go:81
    [It] should add nodeAffinity
    /Users/tim/go/src/github.com/gardener/gardener/test/integration/resourcemanager/podzoneaffinity/podzoneaffinity_test.go:96

  Begin Captured GinkgoWriter Output >>
    {"level":"info","ts":"2022-09-14T18:09:30.160+0200","logger":"podzoneaffinity-webhook-test","msg":"Pod","pod":{"namespace":"podzoneaffinity-webhook-test-e000d401","name":"test-m5bqq"}}
  << End Captured GinkgoWriter Output

  Test Panicked
  In [It] at: /Users/tim/go/src/github.com/gardener/gardener/test/integration/resourcemanager/podzoneaffinity/podzoneaffinity_test.go:103

  Affinity is nil for pod podzoneaffinity-webhook-test-e000d401/test-m5bqq

  Full Stack Trace
    github.com/gardener/gardener/test/integration/resourcemanager/podzoneaffinity_test.glob..func2.3.2()
    	/Users/tim/go/src/github.com/gardener/gardener/test/integration/resourcemanager/podzoneaffinity/podzoneaffinity_test.go:103 +0x6e8

--> podzoneaffinity-webhook-test-e000d401/test-m5bqq is the pod which doesn't have the expected affinity configuration set.

However, the webhook sends the anticipated patch to the Kube-Apiserver:

{"level":"info","ts":"2022-09-14T18:09:30.182+0200","logger":"webhook.pod-zone-affinity","msg":"Calling webhook","pod":"podzoneaffinity-webhook-test-e000d401/"}
{"level":"info","ts":"2022-09-14T18:09:30.182+0200","logger":"webhook.pod-zone-affinity","msg":"handleNodeAffinity","pod":"podzoneaffinity-webhook-test-e000d401/"}
{"level":"info","ts":"2022-09-14T18:09:30.184+0200","logger":"webhook.pod-zone-affinity","msg":"Namespace Label","pod":"podzoneaffinity-webhook-test-e000d401/","control-plane.shoot.gardener.cloud/enforce-zone":"zone-a"}
{"level":"info","ts":"2022-09-14T18:09:30.184+0200","logger":"webhook.pod-zone-affinity","msg":"Setting","pod":"podzoneaffinity-webhook-test-e000d401/","Affinity":"&Affinity{NodeAffinity:&NodeAffinity{RequiredDuringSchedulingIgnoredDuringExecution:&NodeSelector{NodeSelectorTerms:[]NodeSelectorTerm{NodeSelectorTerm{MatchExpressions:[]NodeSelectorRequirement{NodeSelectorRequirement{Key:topology.kubernetes.io/zone,Operator:In,Values:[zone-a],},},MatchFields:[]NodeSelectorRequirement{},},},},PreferredDuringSchedulingIgnoredDuringExecution:[]PreferredSchedulingTerm{},},PodAffinity:&PodAffinity{RequiredDuringSchedulingIgnoredDuringExecution:[]PodAffinityTerm{PodAffinityTerm{LabelSelector:&v1.LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{},},Namespaces:[],TopologyKey:topology.kubernetes.io/zone,NamespaceSelector:nil,},},PreferredDuringSchedulingIgnoredDuringExecution:[]WeightedPodAffinityTerm{},},PodAntiAffinity:nil,}"}
{"level":"info","ts":"2022-09-14T18:09:30.184+0200","logger":"webhook.pod-zone-affinity","msg":"Response","pod":"podzoneaffinity-webhook-test-e000d401/","patches":[{"op":"add","path":"/spec/affinity","value":{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.kubernetes.io/zone","operator":"In","values":["zone-a"]}]}]}},"podAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":[{"labelSelector":{},"topologyKey":"topology.kubernetes.io/zone"}]}}}]}
{"level":"info","ts":"2022-09-14T18:09:30.220+0200","logger":"webhook.pod-zone-affinity","msg":"Calling webhook","pod":"podzoneaffinity-webhook-test-44bc1f69/"}

The Kube-Apiserver logs suspicious messages whenever a test run fails:

W0915 07:43:41.203992       1 dispatcher.go:195] Failed calling webhook, failing closed pod-zone-affinity.resources.gardener.cloud: failed calling webhook "pod-zone-affinity.resources.gardener.cloud": failed to call webhook: Post "https://host.docker.internal:9449/webhooks/pod-zone-affinity?timeout=10s": context canceled
E0915 07:43:41.205446       1 finisher.go:175] FinishRequest: post-timeout activity - time-elapsed: 2.091875ms, panicked: false, err: Internal error occurred: failed calling webhook "pod-zone-affinity.resources.gardener.cloud": failed to call webhook: Post "https://host.docker.internal:9449/webhooks/pod-zone-affinity?timeout=10s": context canceled, panic-reason: <nil>

However, it's not yet clear why:

  • The context is cancelled (according to the logs) because we pass a context.Background() to the request and the timeout of 10s in the MutatingwebhookConfiguration also didn't exceed.
  • Why the client.Create call does not fail if Kube-Apiserver has issues contacting the webhook.

@timuthy
Copy link
Contributor

timuthy commented Sep 20, 2022

I will have another look ASAP.

@timuthy
Copy link
Contributor

timuthy commented Sep 20, 2022

/assign

@rfranzke
Copy link
Member

@shafeeqes
Copy link
Contributor Author

A different test:

PodZoneAffinity tests
/home/prow/go/src/github.com/gardener/gardener/test/integration/resourcemanager/podzoneaffinity/podzoneaffinity_test.go:26
  when namespace hasn't zone enforcement label
  /home/prow/go/src/github.com/gardener/gardener/test/integration/resourcemanager/podzoneaffinity/podzoneaffinity_test.go:109
    [It] should not add podAffinity
    /home/prow/go/src/github.com/gardener/gardener/test/integration/resourcemanager/podzoneaffinity/podzoneaffinity_test.go:110
  Begin Captured GinkgoWriter Output >>
    {"level":"debug","ts":"2022-09-21T07:34:09.858Z","logger":"controller-runtime.webhook.webhooks","msg":"received request","webhook":"/webhooks/pod-zone-affinity","UID":"18de0c6c-e779-4d95-8a0d-9fa04c76c7cb","kind":"/v1, Kind=Pod","resource":{"group":"","version":"v1","resource":"pods"}}
    {"level":"debug","ts":"2022-09-21T07:34:09.860Z","logger":"controller-runtime.webhook.webhooks","msg":"wrote response","webhook":"/webhooks/pod-zone-affinity","code":200,"reason":"","UID":"18de0c6c-e779-4d95-8a0d-9fa04c76c7cb","allowed":true}
  << End Captured GinkgoWriter Output
  Expected
      <*v1.Affinity | 0xc0009ae5a0>: {
          NodeAffinity: nil,
          PodAffinity: {
              RequiredDuringSchedulingIgnoredDuringExecution: [
                  {
                      LabelSelector: {
                          MatchLabels: nil,
                          MatchExpressions: nil,
                      },
                      Namespaces: nil,
                      TopologyKey: "topology.kubernetes.io/zone",
                      NamespaceSelector: nil,
                  },
              ],
              PreferredDuringSchedulingIgnoredDuringExecution: nil,
          },
          PodAntiAffinity: nil,
      }
  to be nil
  In [It] at: /home/prow/go/src/github.com/gardener/gardener/test/integration/resourcemanager/podzoneaffinity/podzoneaffinity_test.go:114
------------------------------

Ref: https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/6700/pull-gardener-integration/1572488601646665728

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing Testing related kind/flake Tracking or fixing a flaky test
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants