Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ppc64le libvirt ci job #8110

Closed
wants to merge 2 commits into from

Conversation

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 2, 2020
@jaypoulz
Copy link
Contributor

jaypoulz commented Apr 6, 2020

To keep all the madness straight - this now depends on:

This drops the dependency on #7161

@clnperez
Copy link
Contributor Author

clnperez commented Apr 6, 2020

Thanks @jaypoulz. We do have a local workaround for the bootstrap memory, so if you guys are okay with that I'd call that a non-issue for this PR (but still needs to be resolved ASAP).

@clnperez clnperez force-pushed the ppc64le-libvirt-ci-job branch 3 times, most recently from c10109d to 204cc4c Compare April 6, 2020 21:56
@mkumatag
Copy link
Member

mkumatag commented Apr 7, 2020

/retest

@clnperez clnperez force-pushed the ppc64le-libvirt-ci-job branch 8 times, most recently from fce29b0 to 2d9387d Compare April 7, 2020 19:39
@clnperez
Copy link
Contributor Author

clnperez commented Apr 7, 2020

/retest

@clnperez
Copy link
Contributor Author

clnperez commented Apr 7, 2020

@jaypoulz that list is from @mkumatag

@clnperez clnperez force-pushed the ppc64le-libvirt-ci-job branch 2 times, most recently from 630777a to f85c995 Compare April 7, 2020 23:19
@clnperez clnperez force-pushed the ppc64le-libvirt-ci-job branch 5 times, most recently from b0531e2 to cc5493b Compare April 8, 2020 19:30
@clnperez clnperez changed the title WIP: ppc64le libvirt ci job ppc64le libvirt ci job Apr 8, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 8, 2020
@clnperez
Copy link
Contributor Author

clnperez commented Apr 8, 2020

Still need to change the image name to the :4.3 tag but otherwise ready for review

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Apr 13, 2020
@openshift-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@mkumatag
Copy link
Member

/test pj-rehearse

@mkumatag
Copy link
Member

Run no. 40
link: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_release/8110/rehearse-8110-release-openshift-origin-installer-e2e-remote-libvirt-ppc64le-4.3/40

Failing tests:

  1. [k8s.io] [sig-node] Pods Extended [k8s.io] Pod Container Status should never report success for a pending container [Suite:openshift/conformance/parallel] [Suite:k8s]
  2. [sig-storage] PersistentVolumes-local [Volume type: block] Two pods mounting a local volume at the same time should be able to write from pod1 and read from pod2 [Suite:openshift/conformance/parallel] [Suite:k8s]

Writing JUnit report to /tmp/artifacts/junit/junit_e2e_20200414-061017.xml

error: 2 fail, 732 pass, 1265 skip (35m7s)

1 - seems like a flaky test seen on many platforms - https://search.svc.ci.openshift.org/?search=Pod+Container+Status+should+never+report+success+for+a+pending+container&maxAge=336h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520
2 - Seems like a flaky test, failed while cleaning up the resource during the test teardown process.

@mkumatag
Copy link
Member

/test pj-rehearse

@mkumatag
Copy link
Member

Run no. 41
link: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_release/8110/rehearse-8110-release-openshift-origin-installer-e2e-remote-libvirt-ppc64le-4.3/41

Failing tests:

  1. [k8s.io] [sig-node] Pods Extended [k8s.io] Pod Container Status should never report success for a pending container [Suite:openshift/conformance/parallel] [Suite:k8s]
  2. [sig-network] Networking Granular Checks: Services should function for client IP based session affinity: http [Suite:openshift/conformance/parallel] [Suite:k8s] [Skipped:Network/OVNKubernetes]
  3. [sig-network] Networking Granular Checks: Services should function for client IP based session affinity: udp [Suite:openshift/conformance/parallel] [Suite:k8s] [Skipped:Network/OVNKubernetes]
  4. [sig-network] Networking Granular Checks: Services should function for endpoint-Service: http [Suite:openshift/conformance/parallel] [Suite:k8s] [Skipped:Network/OVNKubernetes]
  5. [sig-network] Networking Granular Checks: Services should function for endpoint-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s] [Skipped:Network/OVNKubernetes]
  6. [sig-network] Networking Granular Checks: Services should function for node-Service: http [Suite:openshift/conformance/parallel] [Suite:k8s]
  7. [sig-network] Networking Granular Checks: Services should function for node-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s]
  8. [sig-network] Networking Granular Checks: Services should function for pod-Service: http [Suite:openshift/conformance/parallel] [Suite:k8s]
  9. [sig-network] Networking Granular Checks: Services should function for pod-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s]
  10. [sig-network] Networking Granular Checks: Services should update endpoints: http [Suite:openshift/conformance/parallel] [Suite:k8s]
  11. [sig-network] Networking Granular Checks: Services should update endpoints: udp [Suite:openshift/conformance/parallel] [Suite:k8s]
  12. [sig-network] Networking should provide Internet connection for containers [Feature:Networking-IPv4] [Suite:openshift/conformance/parallel] [Suite:k8s] [Skipped:azure]

Writing JUnit report to /tmp/artifacts/junit/junit_e2e_20200414-091117.xml

error: 12 fail, 732 pass, 1255 skip (38m18s)


1: failure persistent from run 40 and find the reason from #8110 (comment)

2-12: because of FAIL: Unable to connect/talk to the internet: Get http://google.com: dial tcp: i/o timeout

@mkumatag
Copy link
Member

/test pj-rehearse

@mkumatag
Copy link
Member

Run no. 42
link: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_release/8110/rehearse-8110-release-openshift-origin-installer-e2e-remote-libvirt-ppc64le-4.3/42

Failing tests:

  1. [k8s.io] [sig-node] Pods Extended [k8s.io] Pod Container Status should never report success for a pending container [Suite:openshift/conformance/parallel] [Suite:k8s]

Writing JUnit report to /tmp/artifacts/junit/junit_e2e_20200414-121416.xml

error: 1 fail, 733 pass, 1265 skip (36m25s)

1: failure persistent from run 40 and find the reason from #8110 (comment)

@clnperez
Copy link
Contributor Author

/retest

@mkumatag
Copy link
Member

Failing tests:

  1. [k8s.io] [sig-node] Pods Extended [k8s.io] Pod Container Status should never report success for a pending container [Suite:openshift/conformance/parallel] [Suite:k8s]

This is failing because we are running the test got merged into openshift/origin#24841 PR with an older version of the openshift origin build(registry.svc.ci.openshift.org/ocp-ppc64le/release-ppc64le:4.3.0-0.nightly-ppc64le-2020-04-07-135751). @jaypoulz can someone help us invoking the new build for 4.3 release for ppc64le?

@clnperez
Copy link
Contributor Author

New failures this run:

failed: (37.8s) 2020-04-14T13:43:44 "[Feature:Prometheus][Conformance] Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present [Suite:openshift/conformance/parallel/minimal]"
failed: (2m9s) 2020-04-14T13:45:22 "[Feature:ImageLayers][registry] Image layer subresource should return layers from tagged images [Suite:openshift/conformance/parallel]"
failed: (25.6s) 2020-04-14T14:10:49 "[Feature:Prometheus][Conformance] Prometheus when installed on the cluster when using openshift-sdn should be able to get the sdn ovs flows [Suite:openshift/conformance/parallel/minimal]"
failed: (2m2s) 2020-04-14T14:16:48 "[Feature:Platform] Managed cluster should have no crashlooping pods in core namespaces over two minutes [Suite:openshift/conformance/parallel]"
failed: (31.9s) 2020-04-14T14:18:58 "[Conformance][Area:Networking][Feature:Router] The HAProxy router should enable openshift-monitoring to pull metrics [Suite:openshift/conformance/parallel/minimal]"

Seen again:

failed: (1m51s) 2020-04-14T14:06:30 "[k8s.io] [sig-node] Pods Extended [k8s.io] Pod Container Status should never report success for a pending container [Suite:openshift/conformance/parallel] [Suite:k8s]"

So a new build would be great to get rid of that pending container one. The crashlooping one is most likely not a flake but a resource issue. Not sure about the others.

Signed-off-by: Christy Norman <christy@linux.vnet.ibm.com>
@manojnkumar
Copy link
Contributor

/test pj-rehearse

@clnperez clnperez force-pushed the ppc64le-libvirt-ci-job branch 4 times, most recently from 62ee30b to e14a0a4 Compare April 14, 2020 22:39
@clnperez
Copy link
Contributor Author

Not sure about this one but putting it here for easier comparison:

level=error msg="Cluster operator authentication Degraded is True with MultipleConditionsMatching: RouterCertsDegraded: secret/v4-0-config-system-router-certs -n openshift-authentication: could not be retrieved: secret \"v4-0-config-system-router-certs\" not found\nRouteStatusDegraded: route is not available at canonical host oauth-openshift.apps.ci-op-1sh1xzrt-4f122.libvirt-ppc64le-00: []"
level=info msg="Cluster operator authentication Progressing is Unknown with NoData: "
level=info msg="Cluster operator authentication Available is Unknown with NoData: "
level=error msg="Cluster operator kube-apiserver Degraded is True with InstallerPodContainerWaiting_ContainerCreating: InstallerPodContainerWaitingDegraded: Pod \"installer-3-ci-op-1sh1xzrt-4f122-lkj86-master-2\" on node \"ci-op-1sh1xzrt-4f122-lkj86-master-2\" container \"installer\" is waiting for 8m29.045200879s because \"\""

/retest

@mkumatag
Copy link
Member

/retest

@openshift-ci-robot
Copy link
Contributor

@clnperez: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/rehearse/release-openshift-origin-installer-e2e-remote-libvirt-s390x-4.3 bfe21a70b27dab65a4c93bb1baa2f95c7d2935d9 link /test pj-rehearse
ci/rehearse/release-openshift-origin-installer-e2e-remote-libvirt-s390x-4.2 bfe21a70b27dab65a4c93bb1baa2f95c7d2935d9 link /test pj-rehearse
ci/rehearse/release-openshift-origin-installer-e2e-remote-libvirt-jenkins-e2e-s390x-4.2 bfe21a70b27dab65a4c93bb1baa2f95c7d2935d9 link /test pj-rehearse
ci/rehearse/release-openshift-origin-installer-e2e-remote-libvirt-image-ecosystem-s390x-4.2 bfe21a70b27dab65a4c93bb1baa2f95c7d2935d9 link /test pj-rehearse
ci/rehearse/release-openshift-ocp-installer-e2e-openstack-serial-4.3 8f363d946df298c3ae550d83911f8190bc9bc25f link /test pj-rehearse
ci/rehearse/release-openshift-origin-installer-e2e-remote-libvirt-ppc64le-4.3 88e48e1 link /test pj-rehearse
ci/prow/pj-rehearse 88e48e1 link /test pj-rehearse

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mkumatag mkumatag mentioned this pull request Apr 15, 2020
Signed-off-by: Christy Norman <christy@linux.vnet.ibm.com>
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: clnperez, jaypoulz, manojnkumar
To complete the pull request process, please assign stevekuznetsov
You can assign the PR to them by writing /assign @stevekuznetsov in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mkumatag
Copy link
Member

/close

@openshift-ci-robot
Copy link
Contributor

@mkumatag: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants