Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARMOCP-567: AWS and Azure test configs for clusters heterogeneous in arch and kernel pagesize #45768

Merged

Conversation

aleskandro
Copy link
Member

This commit adds an AWS connected and Azure disconnected test configs to run openshift-tests-private against clusters heterogeneous in CPU Architecture and kernel pagesize.

The env consists of:

  • 3 arm64 masters
  • 2 arm64 workers with 4k pagesize kernel
  • 1 arm64 worker with 64k pagesize kernel
  • 1 amd64 worker (only 4k pagesize kernel is supported)

Frequency is set to f28 for both the test configs. Upgrade jobs will be added once the config file from 4.15 stable to 4.15 nightly is available.

/cc @lwan-wanglin

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 16, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 16, 2023

@aleskandro: This pull request references ARMOCP-567 which is a valid jira issue.

In response to this:

This commit adds an AWS connected and Azure disconnected test configs to run openshift-tests-private against clusters heterogeneous in CPU Architecture and kernel pagesize.

The env consists of:

  • 3 arm64 masters
  • 2 arm64 workers with 4k pagesize kernel
  • 1 arm64 worker with 64k pagesize kernel
  • 1 amd64 worker (only 4k pagesize kernel is supported)

Frequency is set to f28 for both the test configs. Upgrade jobs will be added once the config file from 4.15 stable to 4.15 nightly is available.

/cc @lwan-wanglin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@aleskandro
Copy link
Member Author

/pj-rehearse

@aleskandro
Copy link
Member Author

aleskandro commented Nov 16, 2023

Also refers MCO-800

@aleskandro
Copy link
Member Author

/hold

cc @sergiordlr for visibility.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 16, 2023
@aleskandro
Copy link
Member Author

/pj-rehearse

@aleskandro
Copy link
Member Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.15-multi-nightly-azure-ipi-arm-mixarch-kerneltype-f28-day2-64k-pages

@openshift-ci-robot
Copy link
Contributor

@aleskandro: job(s): periodic-ci-openshift-openshift-tests-private-release-4.15-multi-nightly-azure-ipi-arm-mixarch-kerneltype-f28-day2-64k-pages either don't exist or were not found to be affected, and cannot be rehearsed

@aleskandro
Copy link
Member Author

/pj-rehearse

@aleskandro
Copy link
Member Author

aleskandro commented Nov 17, 2023

hi @lwan-wanglin thanks for the comments, going to address them and continue here. In the meantime, the azure fullypriv-disc seems permafailing at the mirror-by-oc-adm step. Do you have any info about it? Perhaps, the bastion get full in storage?

@lwan-wanglin
Copy link
Contributor

hi @lwan-wanglin thanks for the comments, going to address them and continue here. In the meantime, the azure fullypriv-disc seems permafailing at the mirror-by-oc-adm step. Do you have any info about it? Perhaps, the bastion get full in storage?

I checked azure disconnected jobs in reportportal, no such failures, it might be a flaky, we can rebuild the job.

@aleskandro
Copy link
Member Author

/pj-rehearse

@aleskandro
Copy link
Member Author

/pj-rehearse

@aleskandro
Copy link
Member Author

/pj-rehearse

@aleskandro
Copy link
Member Author

/pj-rehearse

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Nov 23, 2023
@openshift-ci-robot
Copy link
Contributor

@aleskandro: no rehearsable tests are affected by this change

@aleskandro
Copy link
Member Author

/pj-rehearse

@openshift-ci-robot openshift-ci-robot removed the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Nov 23, 2023
@aleskandro
Copy link
Member Author

/pj-rehearse

- Do not double quote the label selector as oc would interpret it as unique
- Run set_proxy() before validate_params()
- Minor harmless improvements in the code style
…arch and kernel pagesize

This commit adds an AWS connected and Azure disconnected test configs to
run openshift-tests-private against clusters heterogeneous in CPU
Architecture and kernel pagesize.

The env consists of:
- 3 arm64 masters
- 2 arm64 workers with 4k pagesize kernel
- 1 arm64 worker with 64k pagesize kernel
- 1 amd64 worker (only 4k pagesize kernel is supported)

Frequency is set to f28 for both the test configs. Upgrade jobs will be
added once the config file from 4.15 stable to 4.15 nightly is available.
@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@aleskandro: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-openshift-openshift-tests-private-release-4.15-multi-nightly-azure-ipi-arm-mixarch-disc-fullypriv-f28-day2-64k-pagesize N/A periodic Periodic changed
periodic-ci-openshift-openshift-tests-private-release-4.15-multi-nightly-aws-ipi-arm-mixarch-f28-day2-64k-pagesize N/A periodic Periodic changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 10 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 20 rehearsals
Comment: /pj-rehearse max to run up to 35 rehearsals
Comment: /pj-rehearse auto-ack to run up to 10 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse abort to abort all active rehearsals

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@aleskandro
Copy link
Member Author

aleskandro commented Nov 23, 2023

/unhold

Hi @sergiordlr, any objections to the added tests configs? We want to be as minimal as possible for now, so the idea is to add an AWS-connected scenario and an Azure full disconnected one. All are multi-arch and include both arm with 64k and 4k and amd64 nodes. BM tests will come later as they are partially blocked by the new infra configuration we are performing.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 23, 2023
@aleskandro
Copy link
Member Author

/test core-valid

1 similar comment
@aleskandro
Copy link
Member Author

/test core-valid

Copy link
Contributor

openshift-ci bot commented Nov 23, 2023

@aleskandro: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-openshift-tests-private-release-4.15-multi-nightly-azure-ipi-arm-mixarch-kerneltype-f28-day2-64k-pages f736faedc79980dd86943057c4b5341982a5f36a link unknown /pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.15-multi-nightly-azure-ipi-arm-mixarch-kerneltype-f28-day2-64k-pages
ci/rehearse/periodic-ci-openshift-openshift-tests-private-release-4.15-multi-nightly-aws-ipi-amd-mixarch-f28-day2-64k-pagesize 1d52e88775a84fd0c6da4e33e3484986a1b23e97 link unknown /pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.15-multi-nightly-aws-ipi-amd-mixarch-f28-day2-64k-pagesize

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@aleskandro
Copy link
Member Author

/test core-valid

@aleskandro
Copy link
Member Author

/pj-rehearse ack

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Nov 23, 2023
@sergiordlr
Copy link
Contributor

sergiordlr commented Nov 24, 2023

Hello! It looks good to me.

The only thing I could say is that day2 kerneltype config is not exciting the kerneltype bootstrap code, so adding a workflow to boot the cluster directly with day1 64k kernel would increase the coverage. But since we are trying to keep things minimal, imho it is acceptable to skip this scenario, especially when it is already tested via realtime kernel.

/lgtm

@aleskandro
Copy link
Member Author

Yes, I did only day2 because the support for multiarch for day1 conf has not been implemented and we would avoid additional single arch configs especially for this scenario that is inherently a "heterogeneous" cluster (in kernel pagesize first). Moreover, covering the bootstrap node seems superfluous for this feature as users shouldn't need 64k pagesize kernels on bootstrap and control plane.

Thanks @sergiordlr

/assign @liangxia

@liangxia
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 27, 2023
Copy link
Contributor

openshift-ci bot commented Nov 27, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aleskandro, liangxia, sergiordlr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 27, 2023
@openshift-merge-bot openshift-merge-bot bot merged commit e6a9416 into openshift:master Nov 27, 2023
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged
Projects
None yet
5 participants