OCPBUGS-17359: test/e2e: Don't use openshift/origin-node #970

Miciah · 2023-08-04T00:23:57Z

`test/e2e`: Don't use "openshift/origin-node" image

Use the "openshift/tools" image from the cluster image registry instead of using the "openshift/origin-node" image pullspec in E2E tests.

Before this change, the E2E tests were inadvertently pulling the "openshift/origin-node" image from Docker Hub and getting rate-limited.

The choice to use "openshift/tools" is based on a similar change here: openshift/origin@4cbb844

Follow-up to #410 and #451.

test/e2e/util_test.go (buildEchoPod, buildSlowHTTPDPod): Replace the "openshift/origin-node" image pullspec with "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest".

`TestHstsPolicyWorks`: Dump events if test fails

test/e2e/hsts_policy_test.go (TestHstsPolicyWorks): Dump events in case of test failure, using the new dumpEventsInNamespace helper.
test/e2e/util_test.go (dumpEventsInNamespace): New helper function to log all events in a namespace.

`TestHstsPolicyWorks`: Wait for namespace to be provisioned

When creating a new namespace for the TestHstsPolicyWorks test, wait for the "default" ServiceAccount and the "system:image-pullers" RoleBinding to be provisioned in the newly created namespace before proceeding with the test. Make a similar change for the TestMTLSWithCRLsCerts test.

Before this change, TestHstsPolicyWorks sometimes failed because it tried to create a pod before the ServiceAccount had been provisioned and granted access to pull images. As a result, the test would randomly fail with the following error:

Failed to pull image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest": rpc error: code = Unknown desc = reading manifest

This change should prevent such failures.

Because TestMTLSWithCRLsCerts also creates a namespace and then creates pods in this namespace, this PR makes the same change to this test as well. Some other tests create namespaces but do not create pods in those
namespaces; those tests do not necessarily need to wait for the ServiceAccount and RoleBinding.

Inspired by openshift/origin@877c652.

test/e2e/client_tls_test.go (TestMTLSWithCRLs):
test/e2e/hsts_policy_test.go (TestHstsPolicyWorks): Use the new createNamespace helper.
test/e2e/util_test.go (createNamespace): New helper function. Create a namespace with the specified name, register a cleanup handler to delete the namespace when the test finishes, wait for the "default" ServiceAccount and "system:image-pullers" RoleBinding to be created, and return the namespace.

openshift-ci-robot · 2023-08-04T00:24:04Z

@Miciah: This pull request references Jira Issue OCPBUGS-17359, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.14.0) matches configured target version for branch (4.14.0)
bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lihongan

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Use the "openshift/tools" image from the cluster image registry instead of using the "openshift/origin-node" image pullspec in E2E tests.

Before this change, the E2E tests were inadvertently pulling the "openshift/origin-node" image from Docker Hub and getting rate-limited.

The choice to use "openshift/tools" is based on a similar change here: openshift/origin@4cbb844

Follow-up to #410 and #451.

test/e2e/util_test.go (buildEchoPod, buildSlowHTTPDPod): Replace the "openshift/origin-node" image pullspec with "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest".

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

candita · 2023-08-04T16:28:29Z

/retest-required

frobware · 2023-08-04T16:42:35Z

/lgtm
/approve

openshift-ci-robot · 2023-08-04T18:26:41Z

/retest-required

Remaining retests: 0 against base HEAD 4e7b2da and 2 for PR HEAD 01c2d8b in total

candita · 2023-08-04T19:44:07Z

hsts_policy_test.go:147: failed to find header [max-age=0;preload;includesubdomains]: timed out waiting for the condition

/test e2e-azure-operator

openshift-ci-robot · 2023-08-04T22:26:31Z

/retest-required

Remaining retests: 0 against base HEAD 833bc28 and 1 for PR HEAD 01c2d8b in total

Miciah · 2023-08-04T23:55:41Z

Using a Cluster Bot cluster to run TestHstsPolicyWorks, I see that the kubelet is failing to pull "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest" for the "hsts-policy-echo" pod:

Events:
  Type     Reason          Age                From               Message
  ----     ------          ----               ----               -------
  Normal   Scheduled       48s                default-scheduler  Successfully assigned hsts-policy-namespace/hsts-policy-echo to ip-10-0-26-86.ec2.internal
  Normal   AddedInterface  48s                multus             Add eth0 [10.129.2.3/23] from ovn-kubernetes
  Normal   BackOff         19s (x3 over 47s)  kubelet            Back-off pulling image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest"
  Warning  Failed          19s (x3 over 47s)  kubelet            Error: ImagePullBackOff
  Normal   Pulling         5s (x3 over 48s)   kubelet            Pulling image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest"
  Warning  Failed          5s (x3 over 48s)   kubelet            Failed to pull image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest": rpc error: code = Unknown desc = reading manifest
latest in image-registry.openshift-image-registry.svc:5000/openshift/tools: authentication required
  Warning  Failed          5s (x3 over 48s)   kubelet            Error: ErrImagePull

Miciah · 2023-08-05T00:12:17Z

Hm, sometimes the test is able to pull the image, and sometimes it gets that auth error, during repeated tests on the same cluster.

Miciah · 2023-08-05T00:34:12Z

Watching oc -n openshift-image-registry logs -ldocker-registry=default --tail=0, the image registry is sometimes denying the image pull during one test run:

time="2023-08-05T00:26:31.931724609Z" level=error msg="OpenShift access denied: no opinion" go.version="go1.20.5 X:strictfipsruntime" http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=403cb81a-1e08-42bf-b3c3-c65f57d2b619 http.request.method=GET http.request.remoteaddr="100.64.0.7:39038" http.request.uri=/v2/openshift/tools/manifests/latest http.request.useragent="cri-o/1.27.1-4.rhaos4.14.gitab7845e.el9 go/go1.20.5 os/linux arch/amd64" openshift.auth.user=anonymous vars.name=openshift/tools vars.reference=latest
time="2023-08-05T00:26:31.93178121Z" level=warning msg="error authorizing context: access denied" go.version="go1.20.5 X:strictfipsruntime" http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=403cb81a-1e08-42bf-b3c3-c65f57d2b619 http.request.method=GET http.request.remoteaddr="100.64.0.7:39038" http.request.uri=/v2/openshift/tools/manifests/latest http.request.useragent="cri-o/1.27.1-4.rhaos4.14.gitab7845e.el9 go/go1.20.5 os/linux arch/amd64" vars.name=openshift/tools vars.reference=latest
time="2023-08-05T00:26:31.931848511Z" level=info msg=response go.version="go1.20.5 X:strictfipsruntime" http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=20c2a054-f06d-4f62-8a0c-b4e5185a6284 http.request.method=GET http.request.remoteaddr="100.64.0.7:39038" http.request.uri=/v2/openshift/tools/manifests/latest http.request.useragent="cri-o/1.27.1-4.rhaos4.14.gitab7845e.el9 go/go1.20.5 os/linux arch/amd64" http.response.contenttype=application/json http.response.duration=2.437939ms http.response.status=401 http.response.written=158
time="2023-08-05T00:26:48.31509818Z" level=error msg="OpenShift access denied: no opinion" go.version="go1.20.5 X:strictfipsruntime" http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=c775a56a-3125-4211-9a5a-a8493e2ac309 http.request.method=GET http.request.remoteaddr="100.64.0.7:48090" http.request.uri=/v2/openshift/tools/manifests/latest http.request.useragent="cri-o/1.27.1-4.rhaos4.14.gitab7845e.el9 go/go1.20.5 os/linux arch/amd64" openshift.auth.user=anonymous vars.name=openshift/tools vars.reference=latest
time="2023-08-05T00:26:48.315145541Z" level=warning msg="error authorizing context: access denied" go.version="go1.20.5 X:strictfipsruntime" http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=c775a56a-3125-4211-9a5a-a8493e2ac309 http.request.method=GET http.request.remoteaddr="100.64.0.7:48090" http.request.uri=/v2/openshift/tools/manifests/latest http.request.useragent="cri-o/1.27.1-4.rhaos4.14.gitab7845e.el9 go/go1.20.5 os/linux arch/amd64" vars.name=openshift/tools vars.reference=latest
time="2023-08-05T00:26:48.315177572Z" level=info msg=response go.version="go1.20.5 X:strictfipsruntime" http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=1d4f1da9-9f01-40ea-9b21-571b7d5d691d http.request.method=GET http.request.remoteaddr="100.64.0.7:48090" http.request.uri=/v2/openshift/tools/manifests/latest http.request.useragent="cri-o/1.27.1-4.rhaos4.14.gitab7845e.el9 go/go1.20.5 os/linux arch/amd64" http.response.contenttype=application/json http.response.duration=2.396133ms http.response.status=401 http.response.written=158

And then the image registry is allowing the pull in the next run:

time="2023-08-05T00:27:05.807299045Z" level=info msg="authorized request" go.version="go1.20.5 X:strictfipsruntime" http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=a2a449f4-ce12-4f30-80a9-b9f050782826 http.request.method=GET http.request.remoteaddr="100.64.0.7:58848" http.request.uri=/v2/openshift/tools/manifests/latest http.request.useragent="cri-o/1.27.1-4.rhaos4.14.gitab7845e.el9 go/go1.20.5 os/linux arch/amd64" openshift.auth.user="system:serviceaccount:hsts-policy-namespace:default" vars.name=openshift/tools vars.reference=latest
time="2023-08-05T00:27:05.926022483Z" level=info msg="response completed" go.version="go1.20.5 X:strictfipsruntime" http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=a2a449f4-ce12-4f30-80a9-b9f050782826 http.request.method=GET http.request.remoteaddr="100.64.0.7:58848" http.request.uri=/v2/openshift/tools/manifests/latest http.request.useragent="cri-o/1.27.1-4.rhaos4.14.gitab7845e.el9 go/go1.20.5 os/linux arch/amd64" http.response.contenttype=application/vnd.docker.distribution.manifest.v2+json http.response.duration=123.8248ms http.response.status=200 http.response.written=1252 openshift.auth.user="system:serviceaccount:hsts-policy-namespace:default" vars.name=openshift/tools vars.reference=latest
time="2023-08-05T00:27:05.926060873Z" level=info msg=response go.version="go1.20.5 X:strictfipsruntime" http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=8bfe698c-5476-48bc-b740-0c763cce1b30 http.request.method=GET http.request.remoteaddr="100.64.0.7:58848" http.request.uri=/v2/openshift/tools/manifests/latest http.request.useragent="cri-o/1.27.1-4.rhaos4.14.gitab7845e.el9 go/go1.20.5 os/linux arch/amd64" http.response.contenttype=application/vnd.docker.distribution.manifest.v2+json http.response.duration=123.883201ms http.response.status=200 http.response.written=1252

Miciah · 2023-08-05T00:37:06Z

The successful request has openshift.auth.user="system:serviceaccount:hsts-policy-namespace:default", and the failing request has openshift.auth.user=anonymous. 🤔...

openshift-ci-robot · 2023-08-05T01:52:28Z

@Miciah: This pull request references Jira Issue OCPBUGS-17359, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.14.0) matches configured target version for branch (4.14.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @melvinjoseph86

In response to this:

test/e2e: Don't use "openshift/origin-node" image

Use the "openshift/tools" image from the cluster image registry instead of using the "openshift/origin-node" image pullspec in E2E tests.

Before this change, the E2E tests were inadvertently pulling the "openshift/origin-node" image from Docker Hub and getting rate-limited.

The choice to use "openshift/tools" is based on a similar change here: openshift/origin@4cbb844

Follow-up to #410 and #451.

test/e2e/util_test.go (buildEchoPod, buildSlowHTTPDPod): Replace the "openshift/origin-node" image pullspec with "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest".

TestHstsPolicyWorks: Dump events if test fails

test/e2e/hsts_policy_test.go (TestHstsPolicyWorks): Dump events in case of test failure, using the new dumpEventsInNamespace helper.

test/e2e/util_test.go (dumpEventsInNamespace): New helper function to log all events in a namespace.

TestHstsPolicyWorks: Wait for namespace to be provisioned

When creating a new namespace for the TestHstsPolicyWorks test, wait for the "default" ServiceAccount and the "system:image-pullers" RoleBinding to be provisioned in the newly created namespace before proceeding with the test. Make a similar change for the TestMTLSWithCRLsCerts test.

Before this change, TestHstsPolicyWorks sometimes failed because it tried to create a pod before the ServiceAccount had been provisioned and granted access to pull images. As a result, the test would randomly fail with the following error:

Failed to pull image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest": rpc error: code = Unknown desc = reading manifest

This change should prevent such failures.

Because TestMTLSWithCRLsCerts also creates a namespace and then creates pods in this namespace, this PR makes the same change to this test as well. Some other tests create namespaces but do not create pods in those
namespaces; those tests do not necessarily need to wait for the ServiceAccount and RoleBinding.

Inspired by openshift/origin@877c652.

test/e2e/client_tls_test.go (TestMTLSWithCRLs):

test/e2e/hsts_policy_test.go (TestHstsPolicyWorks): Use the new createNamespace helper.

test/e2e/util_test.go (createNamespace): New helper function. Create a namespace with the specified name, register a cleanup handler to delete the namespace when the test finishes, wait for the "default" ServiceAccount and "system:image-pullers" RoleBinding to be created, and return the namespace.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Miciah · 2023-08-05T01:54:17Z

The two new commits seem to prevent the "authentication required" errors. In manual testing with these changes, I have seen the test pass over 20 times and fail 0 times.

Miciah · 2023-08-05T19:55:11Z

e2e-aws-ovn-serial failed because Undiagnosed panic detected in pod failed:

{  pods/openshift-cloud-network-config-controller_cloud-network-config-controller-c5f776f49-j76xk_controller_previous.log.gz:E0805 03:33:08.727528       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)}

I believe this is a known issue: OCPBUGS-17151.

Also, [sig-storage] PersistentVolumes-local Stress with local volumes [Serial] should be able to process many pods and reuse local volumes failed:

{  fail [test/e2e/storage/persistent_volumes-local.go:522]: persistentvolumes "local-pvkbdlx" not found
Error: exit with code 1
Ginkgo exit error 1: exit with code 1}

This is also a known issue: OCPBUGS-14930.
/test e2e-aws-ovn-serial

e2e-aws-operator failed because must-gather failed.
/test e2e-aws-operator

Use the "openshift/tools" image from the cluster image registry instead of using the "openshift/origin-node" image pullspec in E2E tests. Before this commit, the E2E tests were inadvertently pulling the "openshift/origin-node" image from Docker Hub and getting rate-limited. The choice to use "openshift/tools" is based on a similar change here: openshift/origin@4cbb844 Follow-up to commit 167bcc2 and commit a635566. This commit fixes OCPBUGS-17359. https://issues.redhat.com/browse/OCPBUGS-17359 * test/e2e/util_test.go (buildEchoPod, buildSlowHTTPDPod): Replace the "openshift/origin-node" image pullspec with "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest".

Miciah · 2023-08-09T15:15:28Z

e2e-aws-ovn-serial failed with same failure for test [sig-storage] PersistentVolumes-local Stress with local volumes [Serial] should be able to process many pods and reuse local volumes:

{  fail [test/e2e/storage/persistent_volumes-local.go:522]: persistentvolumes "local-pvd2fz8" not found
Error: exit with code 1
Ginkgo exit error 1: exit with code 1}

Miciah · 2023-08-09T15:15:42Z

https://github.com/openshift/cluster-ingress-operator/compare/839d1b2ea9450bc7d15d3aa3008fd6fe9e0ccaed..4216df06b6a9731693f8d227123992e1e59b6a59 rebases for #965.

Miciah · 2023-08-10T03:19:36Z

e2e-aws-operator failed because must-gather failed.
/test e2e-aws-operator

e2e-gcp-operator failed because TestAllowedSourceRanges, TestAllowedSourceRangesStatus, TestInternalLoadBalancer, and TestUserDefinedIngressController failed. It appears that these tests failed because it took too long for the LB that each tests creates to get created or updated. For example, the TestInternalLoadBalancer test creates an IngressController named "testinternalloadbalancer" that requests an LB, and the test then runs a polling loop with a 5-minute timeout waiting for the LB to be provisioned, but the LB is actually taking more than 5 minutes to become ready, counting from the time the pod is scheduled to the time the LB is reported ready:

% jq < events.json -c '.items|sort_by(.metadata.creationTimestamp)|.[]|select(.involvedObject.namespace=="openshift-ingress" and (.reason=="Scheduled" or .reason=="EnsuredLoadBalancer"))|[.metadata.creationTimestamp,.involvedObject.kind,.involvedObject.name,.reason]' | grep -w -e testinternalloadbalancer    
["2023-08-09T16:14:03Z","Pod","router-testinternalloadbalancer-84f74cb78b-qx9xq","Scheduled"]
["2023-08-09T16:19:29Z","Service","router-testinternalloadbalancer","EnsuredLoadBalancer"]

Similarly, TestUserDefinedIngressController creates an IngressController named "testuserdefinedingresscontroller" and waits 5 minutes for it to be ready, and it is taking more than 5 minutes to become ready:

% jq < events.json -c '.items|sort_by(.metadata.creationTimestamp)|.[]|select(.involvedObject.namespace=="openshift-ingress" and (.reason=="Scheduled" or .reason=="EnsuredLoadBalancer"))|[.metadata.creationTimestamp,.involvedObject.kind,.involvedObject.name,.reason]' | grep -w -e testuserdefinedingresscontroller
["2023-08-09T16:16:50Z","Pod","router-testuserdefinedingresscontroller-78bb97f5b6-675cl","Scheduled"]
["2023-08-09T16:22:13Z","Service","router-testuserdefinedingresscontroller","EnsuredLoadBalancer"]

The TestAllowedSourceRangesStatus test likewise creates an IngressController named "sourcerangesstatus", waits 5 minutes, then modifies configuration related to the LB and waits for the update:

% jq < events.json -c '.items|sort_by(.metadata.creationTimestamp)|.[]|select(.involvedObject.namespace=="openshift-ingress" and (.reason=="Scheduled" or .reason=="EnsuredLoadBalancer"))|[.metadata.creationTimestamp,.involvedObject.kind,.involvedObject.name,.reason]' | grep -w -e sourcerangesstatus
["2023-08-09T16:11:00Z","Pod","router-sourcerangesstatus-9cd78d588-dqbsv","Scheduled"]
["2023-08-09T16:16:11Z","Service","router-sourcerangesstatus","EnsuredLoadBalancer"]
["2023-08-09T16:32:03Z","Pod","router-sourcerangesstatus-9cd78d588-z88pw","Scheduled"]
["2023-08-09T16:32:53Z","Service","router-sourcerangesstatus","EnsuredLoadBalancer"]

TestAllowedSourceRanges in contrast creates a deliberately misconfigured IngressController named "sourcerange", so the IngressController doesn't get provisioned at first; then the test modifies the IngressController with valid configuration so that the LB can be provisioned. Then, the test waits 1 minute for the LB to be provisioned, but the LB takes over a minute to become ready:

% jq < events.json -c '.items|sort_by(.metadata.creationTimestamp)|.[]|select(.involvedObject.namespace=="openshift-ingress" and (.reason=="Scheduled" or .reason=="EnsuredLoadBalancer"))|[.metadata.creationTimestamp,.involvedObject.kind,.involvedObject.name,.reason]' | grep -w -e sourcerange
["2023-08-09T16:11:01Z","Pod","router-sourcerange-7b886595c8-prkt6","Scheduled"]
["2023-08-09T16:32:04Z","Pod","router-sourcerange-7b886595c8-cqhhz","Scheduled"]
["2023-08-09T16:33:44Z","Service","router-sourcerange","EnsuredLoadBalancer"]

We might need to adjust these timeouts, but if ELBs are consistently taking longer to provision than they used to, we also should look into that. Looking into the kube-controller logs, there is a rather conspicuous 3m39s gap from 16:14 to 16:17, between Error syncing endpoint slices for service "openshift-ingress/router-testinternalloadbalancer", retrying. Error: EndpointSlice informer cache is out of date and Ensuring load balancer for service openshift-ingress/router-testinternalloadbalancer, and then a somewhat conspicuous 1m43s gap from 16:17 to 16:29 before Ensured load balancer:

% grep -h -e router-testinternalloadbalancer -- namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-*/kube-controller-manager/kube-controller-manager/logs/*.log 
2023-08-09T16:14:03.482683918Z I0809 16:14:03.482609       1 replica_set.go:571] "Too few replicas" replicaSet="openshift-ingress/router-testinternalloadbalancer-84f74cb78b" need=1 creating=1
2023-08-09T16:14:03.483178036Z I0809 16:14:03.483086       1 event.go:307] "Event occurred" object="openshift-ingress/router-testinternalloadbalancer" fieldPath="" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled up replica set router-testinternalloadbalancer-84f74cb78b to 1"
2023-08-09T16:14:03.502208701Z I0809 16:14:03.502132       1 event.go:307] "Event occurred" object="openshift-ingress/router-testinternalloadbalancer-84f74cb78b" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: router-testinternalloadbalancer-84f74cb78b-qx9xq"
2023-08-09T16:14:03.512974851Z I0809 16:14:03.512911       1 deployment_controller.go:503] "Error syncing deployment" deployment="openshift-ingress/router-testinternalloadbalancer" err="Operation cannot be fulfilled on deployments.apps \"router-testinternalloadbalancer\": the object has been modified; please apply your changes to the latest version and try again"
2023-08-09T16:14:07.057662221Z I0809 16:14:07.057585       1 replica_set.go:461] ReplicaSet "router-testinternalloadbalancer-84f74cb78b" will be enqueued after 30s for availability check
2023-08-09T16:14:07.082018038Z W0809 16:14:07.081951       1 endpointslice_controller.go:297] Error syncing endpoint slices for service "openshift-ingress/router-testinternalloadbalancer", retrying. Error: EndpointSlice informer cache is out of date
2023-08-09T16:17:46.872430243Z I0809 16:17:46.871676       1 controller.go:388] Ensuring load balancer for service openshift-ingress/router-testinternalloadbalancer
2023-08-09T16:17:46.872430243Z I0809 16:17:46.871722       1 controller.go:887] Adding finalizer to service openshift-ingress/router-testinternalloadbalancer
2023-08-09T16:17:46.872511683Z I0809 16:17:46.872423       1 event.go:307] "Event occurred" object="openshift-ingress/router-testinternalloadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
2023-08-09T16:19:03.442314693Z I0809 16:19:03.442087       1 deployment_controller.go:597] "Deployment has been deleted" deployment="openshift-ingress/router-testinternalloadbalancer"
2023-08-09T16:19:03.442314693Z I0809 16:19:03.442161       1 garbagecollector.go:533] "Processing item" item="[monitoring.coreos.com/v1/ServiceMonitor, namespace: openshift-ingress, name: router-testinternalloadbalancer, uid: 56f210f1-9a74-46ba-8780-666834feae3b]" virtual=false
2023-08-09T16:19:03.442314693Z I0809 16:19:03.442249       1 garbagecollector.go:533] "Processing item" item="[apps/v1/ReplicaSet, namespace: openshift-ingress, name: router-testinternalloadbalancer-84f74cb78b, uid: 9ce1e076-3b17-4be1-a1d7-e1e40ba638dd]" virtual=false
2023-08-09T16:19:03.442314693Z I0809 16:19:03.442270       1 garbagecollector.go:533] "Processing item" item="[v1/Service, namespace: openshift-ingress, name: router-testinternalloadbalancer, uid: b4b97aca-b006-4ca4-a6aa-464896304086]" virtual=false
2023-08-09T16:19:03.465034758Z I0809 16:19:03.464471       1 garbagecollector.go:672] "Deleting item" item="[monitoring.coreos.com/v1/ServiceMonitor, namespace: openshift-ingress, name: router-testinternalloadbalancer, uid: 56f210f1-9a74-46ba-8780-666834feae3b]" propagationPolicy=Background
2023-08-09T16:19:03.465368730Z I0809 16:19:03.464575       1 garbagecollector.go:672] "Deleting item" item="[v1/Service, namespace: openshift-ingress, name: router-testinternalloadbalancer, uid: b4b97aca-b006-4ca4-a6aa-464896304086]" propagationPolicy=Background
2023-08-09T16:19:03.467089357Z I0809 16:19:03.466716       1 garbagecollector.go:672] "Deleting item" item="[apps/v1/ReplicaSet, namespace: openshift-ingress, name: router-testinternalloadbalancer-84f74cb78b, uid: 9ce1e076-3b17-4be1-a1d7-e1e40ba638dd]" propagationPolicy=Background
2023-08-09T16:19:03.486778335Z I0809 16:19:03.486567       1 garbagecollector.go:533] "Processing item" item="[v1/Pod, namespace: openshift-ingress, name: router-testinternalloadbalancer-84f74cb78b-qx9xq, uid: 2cdf06a1-aff2-45f4-ace4-9a07a829afa1]" virtual=false
2023-08-09T16:19:03.519219754Z I0809 16:19:03.508957       1 garbagecollector.go:672] "Deleting item" item="[v1/Pod, namespace: openshift-ingress, name: router-testinternalloadbalancer-84f74cb78b-qx9xq, uid: 2cdf06a1-aff2-45f4-ace4-9a07a829afa1]" propagationPolicy=Background
2023-08-09T16:19:04.660860221Z I0809 16:19:04.660794       1 garbagecollector.go:533] "Processing item" item="[apps/v1/Deployment, namespace: openshift-ingress, name: router-testinternalloadbalancer, uid: ef10ff41-d3b2-4e94-a64e-e3691694f65e]" virtual=true
2023-08-09T16:19:29.346434716Z I0809 16:19:29.346309       1 controller.go:928] Patching status for service openshift-ingress/router-testinternalloadbalancer
2023-08-09T16:19:29.346838911Z I0809 16:19:29.346792       1 event.go:307] "Event occurred" object="openshift-ingress/router-testinternalloadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuredLoadBalancer" message="Ensured load balancer"
2023-08-09T16:23:14.022928645Z I0809 16:23:14.022042       1 controller.go:369] Deleting existing load balancer for service openshift-ingress/router-testinternalloadbalancer
2023-08-09T16:23:14.022928645Z I0809 16:23:14.022684       1 event.go:307] "Event occurred" object="openshift-ingress/router-testinternalloadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="DeletingLoadBalancer" message="Deleting load balancer"
2023-08-09T16:24:04.000418556Z I0809 16:24:04.000364       1 deployment_controller.go:597] "Deployment has been deleted" deployment="openshift-ingress/router-testinternalloadbalancer"
2023-08-09T16:24:20.865370064Z I0809 16:24:20.865241       1 controller.go:902] Removing finalizer from service openshift-ingress/router-testinternalloadbalancer
2023-08-09T16:24:20.885177033Z I0809 16:24:20.885124       1 controller.go:928] Patching status for service openshift-ingress/router-testinternalloadbalancer
2023-08-09T16:24:20.886273090Z I0809 16:24:20.886221       1 event.go:307] "Event occurred" object="openshift-ingress/router-testinternalloadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="DeletedLoadBalancer" message="Deleted load balancer"
2023-08-09T16:24:20.886709000Z I0809 16:24:20.886664       1 garbagecollector.go:533] "Processing item" item="[discovery.k8s.io/v1/EndpointSlice, namespace: openshift-ingress, name: router-testinternalloadbalancer-klw88, uid: 6caae1a3-1b8a-4ac1-b596-d0f45c546cc7]" virtual=false
2023-08-09T16:24:20.936934069Z I0809 16:24:20.936882       1 garbagecollector.go:672] "Deleting item" item="[discovery.k8s.io/v1/EndpointSlice, namespace: openshift-ingress, name: router-testinternalloadbalancer-klw88, uid: 6caae1a3-1b8a-4ac1-b596-d0f45c546cc7]" propagationPolicy=Background

/test e2e-gcp-operator

frobware · 2023-08-10T06:55:52Z

test/e2e/util_test.go

@@ -712,3 +712,19 @@ func getRouteHost(t *testing.T, route *routev1.Route, router string) string {
 	t.Fatalf("failed to find host name for default router in route: %#v", route)
 	return ""
 }
+
+// dumpEventsInNamespace gets the namespaces in the specified namespace and logs


Is this supposed to be "dumpEventsInNamespace gets the events"?

Yes, thanks! Fixed in https://github.com/openshift/cluster-ingress-operator/compare/4216df06b6a9731693f8d227123992e1e59b6a59..42c4f827df32ca5166e26dc2a2d75aea4dd8ebc4.

* test/e2e/hsts_policy_test.go (TestHstsPolicyWorks): Dump events in case of test failure, using the new dumpEventsInNamespace helper. * test/e2e/util_test.go (dumpEventsInNamespace): New helper function to log all events in a namespace.

When creating a new namespace for the TestHstsPolicyWorks test, wait for the "default" ServiceAccount and the "system:image-pullers" RoleBinding to be provisioned in the newly created namespace before proceeding with the test. Make a similar change for the TestMTLSWithCRLsCerts test. Before this commit, TestHstsPolicyWorks sometimes failed because it tried to create a pod before the ServiceAccount had been provisioned and granted access to pull images. As a result, the test would randomly fail with the following error: Failed to pull image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest": rpc error: code = Unknown desc = reading manifest This change should prevent such failures. Because TestMTLSWithCRLsCerts also creates a namespace and then creates pods in this namespace, this commit makes the same change to this test as well. Some other tests create namespaces but do not create pods in those namespaces; those tests do not necessarily need to wait for the ServiceAccount and RoleBinding. Inspired by openshift/origin@877c652. * test/e2e/client_tls_test.go (TestMTLSWithCRLs): * test/e2e/hsts_policy_test.go (TestHstsPolicyWorks): Use the new createNamespace helper. * test/e2e/util_test.go (createNamespace): New helper function. Create a namespace with the specified name, register a cleanup handler to delete the namespace when the test finishes, wait for the "default" ServiceAccount and "system:image-pullers" RoleBinding to be created, and return the namespace.

Miciah · 2023-08-10T15:00:45Z

https://github.com/openshift/cluster-ingress-operator/compare/4216df06b6a9731693f8d227123992e1e59b6a59..42c4f827df32ca5166e26dc2a2d75aea4dd8ebc4 fixes the godoc comment for dumpEventsInNamespace.

frobware · 2023-08-10T15:30:51Z

/lgtm
/approve

openshift-ci · 2023-08-10T15:33:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: frobware

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [frobware]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2023-08-10T16:58:17Z

/retest-required

Remaining retests: 0 against base HEAD 8f2f035 and 2 for PR HEAD 42c4f82 in total

Miciah · 2023-08-10T23:12:20Z

e2e-gcp-operator failed because TestUserDefinedIngressController and TestUnmanagedDNSToManagedDNSInternalIngressController failed. From the test output, this looks like the issue described in #970 (comment). (To do: File a bug report for this issue.)
/test e2e-gcp-operator

e2e-hypershift failed because TestNodePool/NodePool_Tests_Group/TestNodepoolMachineconfigGetsRolledout/EnsureNoCrashingPods and TestNodePool/NodePool_Tests_Group/TestNTOMachineConfigGetsRolledOut/EnsureNoCrashingPods failed.
/test e2e-hypershift

e2e-aws-ovn-single-node failed because 338 of 3821 tests failed. The first failure in the list was an Undiagnosed panic detected in pod failure:

{  pods/openshift-controller-manager_controller-manager-7f55874d69-twwz8_controller-manager.log.gz:E0810 16:47:11.628946       1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*runtime._type)(0x38e6da0), concrete:(*runtime._type)(0x3aa81e0), asserted:(*runtime._type)(0x3d79a20), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.BuildConfig)
pods/openshift-controller-manager_controller-manager-7f55874d69-twwz8_controller-manager.log.gz:E0810 16:47:12.630035       1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*runtime._type)(0x38e6da0), concrete:(*runtime._type)(0x3aa81e0), asserted:(*runtime._type)(0x3d79a20), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.BuildConfig)}

I filed OCPBUGS-17632 for this panic. Let's see what happens if we retry the job.
/test e2e-aws-ovn-single-node

e2e-gcp-ovn failed because [sig-network] pods should successfully create sandboxes by adding pod to network failed. Let's see if that happens again.
/test e2e-gcp-ovn

Miciah · 2023-08-11T03:05:58Z

e2e-gcp-operator failed because TestUnmanagedDNSToManagedDNSInternalIngressController failed again. I see this test failing for #872 too.
/test e2e-gcp-operator

openshift-ci-robot · 2023-08-11T19:06:47Z

/retest-required

Remaining retests: 0 against base HEAD 56a00a7 and 1 for PR HEAD 42c4f82 in total

Miciah · 2023-08-11T19:07:12Z

e2e-gcp-operator failed because TestInternalLoadBalancer failed because it timed out while waiting for the LB to be provisioned.

Miciah · 2023-08-11T22:43:12Z

I filed OCPBUGS-17670 for the issue that LBs can take over 5 minutes to provision on GCP.

openshift-ci · 2023-08-11T23:34:48Z

@Miciah: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-single-node	`42c4f82`	link	false	`/test e2e-aws-ovn-single-node`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Miciah · 2023-08-11T23:56:36Z

e2e-azure-operator failed because TestManagedDNSToUnmanagedDNSIngressController failed. I filed OCPBUGS-17671 for this issue.
/test e2e-azure-operator

openshift-ci-robot · 2023-08-12T01:49:48Z

@Miciah: Jira Issue OCPBUGS-17359: All pull requests linked via external trackers have merged:

openshift/cluster-ingress-operator#970

Jira Issue OCPBUGS-17359 has been moved to the MODIFIED state.

In response to this:

test/e2e: Don't use "openshift/origin-node" image

Use the "openshift/tools" image from the cluster image registry instead of using the "openshift/origin-node" image pullspec in E2E tests.

Before this change, the E2E tests were inadvertently pulling the "openshift/origin-node" image from Docker Hub and getting rate-limited.

The choice to use "openshift/tools" is based on a similar change here: openshift/origin@4cbb844

Follow-up to #410 and #451.

test/e2e/util_test.go (buildEchoPod, buildSlowHTTPDPod): Replace the "openshift/origin-node" image pullspec with "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest".

TestHstsPolicyWorks: Dump events if test fails

test/e2e/hsts_policy_test.go (TestHstsPolicyWorks): Dump events in case of test failure, using the new dumpEventsInNamespace helper.

test/e2e/util_test.go (dumpEventsInNamespace): New helper function to log all events in a namespace.

TestHstsPolicyWorks: Wait for namespace to be provisioned

When creating a new namespace for the TestHstsPolicyWorks test, wait for the "default" ServiceAccount and the "system:image-pullers" RoleBinding to be provisioned in the newly created namespace before proceeding with the test. Make a similar change for the TestMTLSWithCRLsCerts test.

Before this change, TestHstsPolicyWorks sometimes failed because it tried to create a pod before the ServiceAccount had been provisioned and granted access to pull images. As a result, the test would randomly fail with the following error:

Failed to pull image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest": rpc error: code = Unknown desc = reading manifest

This change should prevent such failures.

Because TestMTLSWithCRLsCerts also creates a namespace and then creates pods in this namespace, this PR makes the same change to this test as well. Some other tests create namespaces but do not create pods in those
namespaces; those tests do not necessarily need to wait for the ServiceAccount and RoleBinding.

Inspired by openshift/origin@877c652.

test/e2e/client_tls_test.go (TestMTLSWithCRLs):

test/e2e/hsts_policy_test.go (TestHstsPolicyWorks): Use the new createNamespace helper.

test/e2e/util_test.go (createNamespace): New helper function. Create a namespace with the specified name, register a cleanup handler to delete the namespace when the test finishes, wait for the "default" ServiceAccount and "system:image-pullers" RoleBinding to be created, and return the namespace.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Miciah · 2023-10-25T16:27:00Z

/cherry-pick release-4.13

openshift-cherrypick-robot · 2023-10-25T16:27:45Z

@Miciah: #970 failed to apply on top of branch "release-4.13":

Applying: test/e2e: Don't use openshift/origin-node
Applying: TestHstsPolicyWorks: Dump events if test fails
Using index info to reconstruct a base tree...
M	test/e2e/util_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/util_test.go
CONFLICT (content): Merge conflict in test/e2e/util_test.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 TestHstsPolicyWorks: Dump events if test fails
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci bot requested review from lihongan, candita and knobunc August 4, 2023 00:24

Miciah mentioned this pull request Aug 4, 2023

NE-1244: Use permissions instead of the "Contributor" role in Azure CredentialsRequest #929

Merged

openshift-ci bot assigned frobware Aug 4, 2023

openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 4, 2023

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 5, 2023

openshift-ci bot requested a review from melvinjoseph86 August 5, 2023 01:52

Miciah force-pushed the OCPBUGS-17359-test-slash-e2e-don't-use-openshift-slash-origin-node branch from 839d1b2 to 4216df0 Compare August 9, 2023 15:15

frobware requested changes Aug 10, 2023

View reviewed changes

Miciah added 2 commits August 10, 2023 10:59

TestHstsPolicyWorks: Dump events if test fails

e32c79f

* test/e2e/hsts_policy_test.go (TestHstsPolicyWorks): Dump events in case of test failure, using the new dumpEventsInNamespace helper. * test/e2e/util_test.go (dumpEventsInNamespace): New helper function to log all events in a namespace.

Miciah force-pushed the OCPBUGS-17359-test-slash-e2e-don't-use-openshift-slash-origin-node branch from 4216df0 to 42c4f82 Compare August 10, 2023 15:00

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 10, 2023

openshift-merge-robot merged commit be01a22 into openshift:master Aug 12, 2023
13 of 14 checks passed

alebedev87 mentioned this pull request Oct 23, 2023

NE-1381: API - add STSIAMRoleARN field openshift/aws-load-balancer-operator#113

Merged

This was referenced Oct 25, 2023

[release-4.13] OCPBUGS-22402: test/e2e: Don't use openshift/origin-node #991

Merged

[release-4.12] OCPBUGS-22432: test/e2e: Don't use openshift/origin-node #992

Merged

[release-4.11] OCPBUGS-22433: test/e2e: Don't use openshift/origin-node #993

Merged

OCPBUGS-17359: test/e2e: Don't use openshift/origin-node #970

OCPBUGS-17359: test/e2e: Don't use openshift/origin-node #970

Conversation

Miciah commented Aug 4, 2023 • edited

test/e2e: Don't use "openshift/origin-node" image

TestHstsPolicyWorks: Dump events if test fails

TestHstsPolicyWorks: Wait for namespace to be provisioned

openshift-ci-robot commented Aug 4, 2023

candita commented Aug 4, 2023

frobware commented Aug 4, 2023

openshift-ci-robot commented Aug 4, 2023

candita commented Aug 4, 2023

openshift-ci-robot commented Aug 4, 2023

Miciah commented Aug 4, 2023

Miciah commented Aug 5, 2023

Miciah commented Aug 5, 2023

Miciah commented Aug 5, 2023

openshift-ci-robot commented Aug 5, 2023

test/e2e: Don't use "openshift/origin-node" image

TestHstsPolicyWorks: Dump events if test fails

TestHstsPolicyWorks: Wait for namespace to be provisioned

Miciah commented Aug 5, 2023

Miciah commented Aug 5, 2023 • edited

Miciah commented Aug 9, 2023

Miciah commented Aug 9, 2023 • edited

Miciah commented Aug 10, 2023

frobware Aug 10, 2023

Choose a reason for hiding this comment

Miciah Aug 10, 2023

Choose a reason for hiding this comment

Miciah commented Aug 10, 2023

frobware commented Aug 10, 2023

openshift-ci bot commented Aug 10, 2023

openshift-ci-robot commented Aug 10, 2023

Miciah commented Aug 10, 2023

Miciah commented Aug 11, 2023

openshift-ci-robot commented Aug 11, 2023

Miciah commented Aug 11, 2023

Miciah commented Aug 11, 2023

openshift-ci bot commented Aug 11, 2023 • edited

Miciah commented Aug 11, 2023

openshift-ci-robot commented Aug 12, 2023

test/e2e: Don't use "openshift/origin-node" image

TestHstsPolicyWorks: Dump events if test fails

TestHstsPolicyWorks: Wait for namespace to be provisioned

Miciah commented Oct 25, 2023

openshift-cherrypick-robot commented Oct 25, 2023

Miciah commented Aug 4, 2023 •

edited

`test/e2e`: Don't use "openshift/origin-node" image

`TestHstsPolicyWorks`: Dump events if test fails

`TestHstsPolicyWorks`: Wait for namespace to be provisioned

`test/e2e`: Don't use "openshift/origin-node" image

`TestHstsPolicyWorks`: Dump events if test fails

`TestHstsPolicyWorks`: Wait for namespace to be provisioned

Miciah commented Aug 5, 2023 •

edited

Miciah commented Aug 9, 2023 •

edited

openshift-ci bot commented Aug 11, 2023 •

edited

`test/e2e`: Don't use "openshift/origin-node" image

`TestHstsPolicyWorks`: Dump events if test fails

`TestHstsPolicyWorks`: Wait for namespace to be provisioned