Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-37837: vertical scaling test should not rely on CPMS replicas #28969

Merged
merged 1 commit into from
Aug 6, 2024

Conversation

hasbro17
Copy link
Contributor

@hasbro17 hasbro17 commented Aug 2, 2024

The vertical scaling test will currently timeout on waiting for the CPMS status.readyReplicas to scale up to 4 where as in practice that may not happen. Ensuring that the cluster membership is 3 and checking that the old member has been removed is enough of a signal that vertical scaling has successfully completed.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 2, 2024
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Aug 2, 2024
@openshift-ci-robot
Copy link

@hasbro17: This pull request references Jira Issue OCPBUGS-37837, which is invalid:

  • expected the bug to target the "4.17.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

The vertical scaling test will currently timeout on waiting for the CPMS status.readyReplicas to scale up to 4 where as in practice that may not happen. Ensuring that the cluster membership is 3 and checking that the old member has been removed is enough of a signal that vertical scaling has successfully completed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hasbro17
Copy link
Contributor Author

hasbro17 commented Aug 2, 2024

/test e2e-aws-ovn-etcd-scaling
/test e2e-gcp-ovn-etcd-scaling
/test e2e-azure-ovn-etcd-scaling
/test e2e-vsphere-ovn-etcd-scaling

@hasbro17
Copy link
Contributor Author

hasbro17 commented Aug 2, 2024

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Aug 2, 2024
@openshift-ci-robot
Copy link

@hasbro17: This pull request references Jira Issue OCPBUGS-37837, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.17.0) matches configured target version for branch (4.17.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @geliu2016

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from dusk125 and Elbehery August 2, 2024 00:35
@hasbro17
Copy link
Contributor Author

hasbro17 commented Aug 2, 2024

/payload-aggregate ?

Checking if I can aggregate the scaling job somehow (probably not).

Copy link
Contributor

openshift-ci bot commented Aug 2, 2024

@hasbro17: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@openshift-ci openshift-ci bot requested a review from geliu2016 August 2, 2024 00:36
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 2, 2024
@hasbro17
Copy link
Contributor Author

hasbro17 commented Aug 2, 2024

/payload-aggregate e2e-aws-ovn-etcd-scaling

Copy link
Contributor

openshift-ci bot commented Aug 2, 2024

@hasbro17: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@hasbro17
Copy link
Contributor Author

hasbro17 commented Aug 2, 2024

/payload-aggregate e2e-aws-ovn-etcd-scaling 10

Copy link
Contributor

openshift-ci bot commented Aug 2, 2024

@hasbro17: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

Copy link

@geliu2016 geliu2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/label cherry-pick-approved

Copy link
Contributor

openshift-ci bot commented Aug 2, 2024

@geliu2016: Can not set label cherry-pick-approved: Must be member in one of these teams: [openshift-staff-engineers]

In response to this:

/label cherry-pick-approved

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hasbro17
Copy link
Contributor Author

hasbro17 commented Aug 2, 2024

/test e2e-aws-ovn-etcd-scaling
/test e2e-gcp-ovn-etcd-scaling
/test e2e-azure-ovn-etcd-scaling
/test e2e-vsphere-ovn-etcd-scaling

@hasbro17
Copy link
Contributor Author

hasbro17 commented Aug 2, 2024

Okay all the scaling test runs have passed.
This is failing across the board but I've seen this elsewhere so not sure this is related.

 [bz-kube-storage-version-migrator] clusteroperator/kube-storage-version-migrator should not change condition/Available 
 
2 unexpected clusteroperator state transitions during e2e test run.  These did not match any known exceptions, so they cause this test-case to fail:

Aug 02 18:33:00.646 E clusteroperator/kube-storage-version-migrator condition/Available reason/KubeStorageVersionMigrator_Deploying status/False KubeStorageVersionMigratorAvailable: Waiting for Deployment
Aug 02 18:33:00.646 - 4s    E clusteroperator/kube-storage-version-migrator condition/Available reason/KubeStorageVersionMigrator_Deploying status/False KubeStorageVersionMigratorAvailable: Waiting for Deployment

1 unwelcome but acceptable clusteroperator state transitions during e2e test run.  These should not happen, but because they are tied to exceptions, the fact that they did happen is not sufficient to cause this test-case to fail:

Aug 02 18:33:05.083 W clusteroperator/kube-storage-version-migrator condition/Available reason/AsExpected status/True All is well (exception: Available=True is the happy case)

Running again to see if the scaling test flakes:

/test e2e-aws-ovn-etcd-scaling
/test e2e-gcp-ovn-etcd-scaling
/test e2e-azure-ovn-etcd-scaling
/test e2e-vsphere-ovn-etcd-scaling

@hasbro17
Copy link
Contributor Author

hasbro17 commented Aug 2, 2024

/test e2e-gcp-ovn-etcd-scaling
/test e2e-azure-ovn-etcd-scaling
/test e2e-vsphere-ovn-etcd-scaling

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 52738a5

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-azure-ovn-etcd-scaling Medium
[sig-network] pods should successfully create sandboxes by adding pod to network
This test has passed 94.77% of 5827 runs on release 4.17 [Overall] in the last week.

Open Bugs
High rate of pod sandbox errors detected on metal
s390x: [sig-network] pods should successfully create sandboxes by adding pod to network fails with error adding pod to CNI network

@hasbro17
Copy link
Contributor Author

hasbro17 commented Aug 3, 2024

Good enough for me. The scaling test itself is passing consistently now. The jobs trip up on the storage version migrator test but I don't think that's related to the scaling test.

@tjungblu
Copy link
Contributor

tjungblu commented Aug 5, 2024

Good enough for me.

for me as well. Thanks @hasbro17

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 5, 2024
@tjungblu
Copy link
Contributor

tjungblu commented Aug 5, 2024

/hold

in case you want to still adjust anything

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 5, 2024
The vertical scaling test will currently timeout on waiting for the CPMS
status.readyReplicas to scale up to 4 where as in practice that may not happen.
Ensuring that the cluster membership is 3 and checking that the old member has
been removed is enough of a signal that vertical scaling has successfully completed
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 5, 2024
@hasbro17
Copy link
Contributor Author

hasbro17 commented Aug 5, 2024

Update to fix a typo and an extraneous comment. Should be the same otherwise.
/cc @dusk125

@hasbro17 hasbro17 changed the title WIP: OCPBUGS-37837: vertical scaling test should not rely on CPMS replicas OCPBUGS-37837: vertical scaling test should not rely on CPMS replicas Aug 5, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 5, 2024
@hasbro17
Copy link
Contributor Author

hasbro17 commented Aug 5, 2024

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 5, 2024
@dusk125
Copy link
Contributor

dusk125 commented Aug 5, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 5, 2024
Copy link
Contributor

openshift-ci bot commented Aug 5, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dusk125, geliu2016, hasbro17, tjungblu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hasbro17
Copy link
Contributor Author

hasbro17 commented Aug 5, 2024

/cherrypick release-4.16 release-4.15 release-4.14

@openshift-cherrypick-robot

@hasbro17: once the present PR merges, I will cherry-pick it on top of release-4.16 in a new PR and assign it to you.

In response to this:

/cherrypick release-4.16 release-4.15 release-4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD a1fbfad and 2 for PR HEAD f2e7297 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD ed8c948 and 1 for PR HEAD f2e7297 in total

@tjungblu
Copy link
Contributor

tjungblu commented Aug 6, 2024

/override ci/prow/e2e-metal-ipi-ovn-ipv6

unrelated installation issue

Copy link
Contributor

openshift-ci bot commented Aug 6, 2024

@tjungblu: tjungblu unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight openshift-staff-engineers.

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-ipv6

unrelated installation issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tjungblu
Copy link
Contributor

tjungblu commented Aug 6, 2024

then let's
/retest-required

Copy link
Contributor

openshift-ci bot commented Aug 6, 2024

@hasbro17: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere-ovn-etcd-scaling 52738a5 link false /test e2e-vsphere-ovn-etcd-scaling
ci/prow/e2e-azure-ovn-etcd-scaling 52738a5 link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-agnostic-ovn-cmd f2e7297 link false /test e2e-agnostic-ovn-cmd
ci/prow/e2e-aws-ovn-single-node-upgrade f2e7297 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-openstack-ovn f2e7297 link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-single-node f2e7297 link false /test e2e-aws-ovn-single-node
ci/prow/e2e-aws-ovn-ipsec-serial f2e7297 link false /test e2e-aws-ovn-ipsec-serial

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit ba9ac0a into openshift:master Aug 6, 2024
19 of 24 checks passed
@openshift-ci-robot
Copy link

@hasbro17: Jira Issue OCPBUGS-37837: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-37837 has been moved to the MODIFIED state.

In response to this:

The vertical scaling test will currently timeout on waiting for the CPMS status.readyReplicas to scale up to 4 where as in practice that may not happen. Ensuring that the cluster membership is 3 and checking that the old member has been removed is enough of a signal that vertical scaling has successfully completed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-cherrypick-robot

@hasbro17: new pull request created: #28981

In response to this:

/cherrypick release-4.16 release-4.15 release-4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: openshift-enterprise-tests
This PR has been included in build openshift-enterprise-tests-container-v4.17.0-202408061013.p0.gba9ac0a.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants