💚 add debug output for e2e service and job failures #860

mboersma · 2020-08-04T17:57:48Z

What this PR does / why we need it:

Prints a JSON dump of the service or job that failed a WaitFor func in e2e tests. If this seems like a reasonable approach, I'll follow with:

a similar change to WaitForDeploymentsAvailable in CAPI
printing relevant k8s events from the DescribeFailedX funcs

Which issue(s) this PR fixes:

Refs #807

Special notes for your reviewer:

TODOs:

squashed commits
includes documentation
adds unit tests

Release note:

💚 add debug output for e2e service and job failures

k8s-ci-robot · 2020-08-04T17:57:56Z

Hi @mboersma. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mboersma · 2020-08-04T18:00:21Z

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test

Ok, ok, I'll open a kubernetes/org issue for sig membership now.

nader-ziada · 2020-08-04T19:53:27Z

/ok-to-test

nader-ziada · 2020-08-05T18:59:56Z

what are you expecting the output to look like, I tried to make the test fail and got a big blob of html

mboersma · 2020-08-06T16:39:16Z

@nader-ziada my thought was to dump the Service or Job to stdout so we could at least examine what its status and selectors are. The next step would be to dump events related to the object that failed, but I planned to do that as a second change after something like this is in CAPI too for deployments.

I meant to prefix the JSON dump with at least Service:, so I'll add that for context. Maybe that's clear enough?

STEP: waiting for cluster capz-e2e-q4nyjd to be deleted


• Failure [2288.132 seconds]
Workload cluster creation
/Users/matt/Projects/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/azure_test.go:36
  Creating highly available control-plane cluster
  /Users/matt/Projects/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/azure_test.go:95
    With 3 control-plane nodes and 2 worker nodes [It]
    /Users/matt/Projects/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/azure_test.go:96

    Timed out after 10.004s.
    Service default/ingress-nginx-ilb failed to get an IP for LoadBalancer.Ingress
    Service:
    {
      "metadata": {
        "name": "ingress-nginx-ilb",
        "namespace": "default",
        "selfLink": "/api/v1/namespaces/default/services/ingress-nginx-ilb",
        "uid": "d7cdfa43-525a-43b7-a53a-165ac558a6bc",
        "resourceVersion": "1106",
        "creationTimestamp": "2020-08-05T21:32:20Z",
        "annotations": {
          "service.beta.kubernetes.io/azure-load-balancer-internal": "true"
        },
        "finalizers": [
          "service.kubernetes.io/load-balancer-cleanup"
        ],
        "managedFields": [
          {
            "manager": "cluster-api-e2e",
            "operation": "Update",
            "apiVersion": "v1",
            "time": "2020-08-05T21:32:20Z",
            "fieldsType": "FieldsV1",
            "fieldsV1": {
              "f:metadata": {
                "f:annotations": {
                  ".": {},
                  "f:service.beta.kubernetes.io/azure-load-balancer-internal": {}
                }
              },
              "f:spec": {
                "f:externalTrafficPolicy": {},
                "f:ports": {
                  ".": {},
                  "k:{\"port\":80,\"protocol\":\"TCP\"}": {
                    ".": {},
                    "f:name": {},
                    "f:port": {},
                    "f:protocol": {},
                    "f:targetPort": {}
                  },
                  "k:{\"port\":443,\"protocol\":\"TCP\"}": {
                    ".": {},
                    "f:name": {},
                    "f:port": {},
                    "f:protocol": {},
                    "f:targetPort": {}
                  }
                },
                "f:selector": {
                  ".": {},
                  "f:app": {}
                },
                "f:sessionAffinity": {},
                "f:type": {}
              }
            }
          },
          {
            "manager": "kube-controller-manager",
            "operation": "Update",
            "apiVersion": "v1",
            "time": "2020-08-05T21:32:20Z",
            "fieldsType": "FieldsV1",
            "fieldsV1": {
              "f:metadata": {
                "f:finalizers": {
                  ".": {},
                  "v:\"service.kubernetes.io/load-balancer-cleanup\"": {}
                }
              }
            }
          }
        ]
      },
      "spec": {
        "ports": [
          {
            "name": "http",
            "protocol": "TCP",
            "port": 80,
            "targetPort": 80,
            "nodePort": 31535
          },
          {
            "name": "https",
            "protocol": "TCP",
            "port": 443,
            "targetPort": 443,
            "nodePort": 30891
          }
        ],
        "selector": {
          "app": "ingress-nginx"
        },
        "clusterIP": "10.106.132.19",
        "type": "LoadBalancer",
        "sessionAffinity": "None",
        "externalTrafficPolicy": "Cluster"
      },
      "status": {
        "loadBalancer": {}
      }
    }
    
    Expected
        <bool>: false
    to be true

    /Users/matt/Projects/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/helpers.go:98

    Full Stack Trace
    sigs.k8s.io/cluster-api-provider-azure/test/e2e.WaitForServiceAvailable(0x2ae21e0, 0xc0000460f0, 0x2a9bd00, 0xc000984750, 0xc000b0d680, 0xc00098aa00, 0x2, 0x2)
    	/Users/matt/Projects/src/sigs.k8s.io
...

mboersma · 2020-08-07T16:42:53Z

I opened kubernetes-sigs/cluster-api#3468 to give Deployments the same treatment.

mboersma · 2020-08-12T19:54:15Z

kubernetes-sigs/cluster-api#3468 did get merged, so we should benefit from that the next time we vendor in CAPI.

CecileRobertMichon · 2020-08-13T00:05:37Z

do we still want this one @mboersma ?

mboersma · 2020-08-13T16:32:46Z

do we still want this one ?

Yes, I think so. While Deployments are what we have seen failing in e2e runs, there's no reason a Service or Job couldn't fail similarly, in which case we would glad to have this debug logging. Also our CAPZ WaitFor[Service|Job] funcs are based on CAPI's e2e framework helper funcs, so I think we should keep the implementations similar.

CecileRobertMichon

/lgtm
/assign @nader-ziada

nader-ziada · 2020-08-13T18:07:21Z

/approve

I agree this is helpful in debugging

k8s-ci-robot · 2020-08-13T18:07:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nader-ziada

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [nader-ziada]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mboersma · 2020-08-13T21:30:21Z

/retest

mboersma · 2020-08-14T16:35:48Z

/retest

mboersma · 2020-08-17T16:10:52Z

/retest

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 4, 2020

k8s-ci-robot requested review from awesomenix and devigned August 4, 2020 17:57

k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 4, 2020

k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 4, 2020

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 4, 2020

CecileRobertMichon added this to the v0.4.8 milestone Aug 4, 2020

mboersma force-pushed the fix-e2e-adaptors branch from 819b178 to 47265c5 Compare August 6, 2020 16:42

💚 add debug output for e2e service and job failures

330d68c

mboersma force-pushed the fix-e2e-adaptors branch from 47265c5 to 330d68c Compare August 7, 2020 18:35

CecileRobertMichon approved these changes Aug 13, 2020

View reviewed changes

k8s-ci-robot assigned nader-ziada and CecileRobertMichon Aug 13, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 13, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 13, 2020

mboersma mentioned this pull request Aug 13, 2020

💚 log failed e2e resources and related events #878

Merged

3 tasks

k8s-ci-robot merged commit 01aa442 into kubernetes-sigs:master Aug 17, 2020

mboersma deleted the fix-e2e-adaptors branch August 17, 2020 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💚 add debug output for e2e service and job failures #860

💚 add debug output for e2e service and job failures #860

mboersma commented Aug 4, 2020

k8s-ci-robot commented Aug 4, 2020

mboersma commented Aug 4, 2020 •

edited

nader-ziada commented Aug 4, 2020

nader-ziada commented Aug 5, 2020

mboersma commented Aug 6, 2020

mboersma commented Aug 7, 2020

mboersma commented Aug 12, 2020

CecileRobertMichon commented Aug 13, 2020

mboersma commented Aug 13, 2020

CecileRobertMichon left a comment

nader-ziada commented Aug 13, 2020

k8s-ci-robot commented Aug 13, 2020

mboersma commented Aug 13, 2020

mboersma commented Aug 14, 2020

mboersma commented Aug 17, 2020

💚 add debug output for e2e service and job failures #860

💚 add debug output for e2e service and job failures #860

Conversation

mboersma commented Aug 4, 2020

k8s-ci-robot commented Aug 4, 2020

mboersma commented Aug 4, 2020 • edited

nader-ziada commented Aug 4, 2020

nader-ziada commented Aug 5, 2020

mboersma commented Aug 6, 2020

mboersma commented Aug 7, 2020

mboersma commented Aug 12, 2020

CecileRobertMichon commented Aug 13, 2020

mboersma commented Aug 13, 2020

CecileRobertMichon left a comment

Choose a reason for hiding this comment

nader-ziada commented Aug 13, 2020

k8s-ci-robot commented Aug 13, 2020

mboersma commented Aug 13, 2020

mboersma commented Aug 14, 2020

mboersma commented Aug 17, 2020

mboersma commented Aug 4, 2020 •

edited