Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

status.total counter is not correct for openshift/conformance suite #27350

Closed
mtulio opened this issue Aug 9, 2022 · 5 comments
Closed

status.total counter is not correct for openshift/conformance suite #27350

mtulio opened this issue Aug 9, 2022 · 5 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@mtulio
Copy link
Contributor

mtulio commented Aug 9, 2022

The field total from status is not correct on the openshift/conformance suite (default, parallel).

The problem was found when running the OPCT on the latest release. The OPCT is built on top of openshift-tests binary, and consumes that counter to report the execution to the user when running the tool. More details is available here: https://issues.redhat.com/browse/SPLAT-696

Version
$ oc version
Client Version: 4.10.10
Server Version: 4.11.0
Kubernetes Version: v1.24.0+9546431
Steps To Reproduce
  1. openshift-tests run openshift/conformance
  2. Wait for 1127th test
  3. Check if the total keep increasing with the index, second field [(failed/index/total)] of status
Current Result

after the 1127th test, the total counter keeps increasing with the index:

openshift-tests version: 4.11.0-202208020706.p0.gb860532.assembly.stream-b860532
Starting SimultaneousPodIPControllerI0809 16:31:15.790490    3733 shared_informer.go:255] Waiting for caches to sync for SimultaneousPodIPController
started: (0/1/1127) "[sig-scheduling][Early] The openshift-monitoring pods should be scheduled on different nodes [Suite:openshift/conformance/parallel]"

(...)

started: (0/1126/1127) "[sig-storage] PersistentVolumes-expansion  loopback local block volume should support online expansion on node [Suite:openshift/conformance/parallel] [Suite:k8s]"

passed: (38s) 2022-08-09T17:12:21 "[sig-storage] In-tree Volumes [Driver: nfs] [Testpattern: Dynamic PV (default fs)] provisioning should provision storage with mount options [Suite:openshift/conformance/parallel] [Suite:k8s]"

started: (0/1127/1127) "[sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: tmpfs] [Testpattern: Generic Ephemeral-volume (block volmode) (late-binding)] ephemeral should support two pods which have the same volume definition [Suite:openshift/conformance/parallel] [Suite:k8s]"

passed: (6.6s) 2022-08-09T17:12:21 "[sig-storage] Downward API volume should provide container's memory request [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"

started: (0/1128/1128) "[sig-storage] In-tree Volumes [Driver: cinder] [Testpattern: Dynamic PV (immediate binding)] topology should fail to schedule a pod which has topologies that conflict with AllowedTopologies [Suite:openshift/conformance/parallel] [Suite:k8s]"

skip [k8s.io/kubernetes@v1.24.0/test/e2e/storage/framework/testsuite.go:116]: Driver local doesn't support GenericEphemeralVolume -- skipping
Ginkgo exit error 3: exit with code 3

skipped: (400ms) 2022-08-09T17:12:21 "[sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: tmpfs] [Testpattern: Generic Ephemeral-volume (block volmode) (late-binding)] ephemeral should support two pods which have the same volume definition [Suite:openshift/conformance/parallel] [Suite:k8s]"

started: (0/1129/1129) "[sig-storage] In-tree Volumes [Driver: emptydir] [Testpattern: Dynamic PV (default fs)] capacity provides storage capacity information [Suite:openshift/conformance/parallel] [Suite:k8s]" 

After that, it keeps increasing until the last test (3475th):

started: (30/3474/3474) "[sig-arch][bz-etcd][Late] Alerts alert/etcdGRPCRequestsSlow should not be at or above pending [Suite:openshift/conformance/parallel]"

passed: (4.5s) 2022-08-09T18:26:40 "[sig-arch][bz-Unknown][Late] Alerts alert/KubePodNotReady should not be at or above info in all the other namespaces [Suite:openshift/conformance/parallel]
"

started: (30/3475/3475) "[sig-arch][bz-Unknown][Late] Alerts alert/KubePodNotReady should not be at or above pending in ns/default [Suite:openshift/conformance/parallel]"
Expected Result
started: (0/1/3475)   (....)
Additional Information

Extracting the openshift-tests from the same release the cluster is running, I got a different counter:

$ ./.local/bin/openshift-install-linux-4.11.0 version
./.local/bin/openshift-install-linux-4.11.0 4.11.0
built from commit 37684309bcb598757c99d3ea9fbc0758343d64a5
release image quay.io/openshift-release-dev/ocp-release@sha256:300bce8246cf880e792e106607925de0a404484637627edf5f517375517d54a4
release architecture amd64

$ RELEASE_IMAGE=$(./.local/bin/openshift-install-linux-4.11.0 version | awk '/release image/ {print $3}')
$ TESTS_IMAGE=$(oc adm release info --image-for='tests' $RELEASE_IMAGE)

$ oc image extract $TESTS_IMAGE --file="/usr/bin/openshift-tests" -a ~/.openshift/pull-secret-latest.json

$ chmod u+x openshift-tests
$ ./openshift-tests run --dry-run openshift/conformance |wc -l
3487

@elmiko
Copy link
Contributor

elmiko commented Aug 16, 2022

we talked about this issue during the install flex sync meeting today, we don't think this is overly concerning but it will be an issue for people who want to monitor the count as it's happening. it will be difficult to determine when the tests will end.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 15, 2022
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 15, 2022
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this as completed Jan 15, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 15, 2023

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

3 participants