Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenStack: add CI coverage for SR-IOV #25388

Closed
wants to merge 15 commits into from
Closed

OpenStack: add CI coverage for SR-IOV #25388

wants to merge 15 commits into from

Conversation

EmilienM
Copy link
Member

@EmilienM EmilienM commented Jan 18, 2022

This PR is combining the following commits:

openstack/performanceprofile: deploy PAO from source on 4.10

Until 4.10 (and future versions under development), we need to install
the PAO from source because it's not present in the Marketplace.

openstack/generateconfig: allow to change # of workers

The default will remain 3 but we can now change the number of workers
that we want to deploy.

openstack: add new step for config-drive

This step registry is meant to be used for the workers so they have
/var/config mounted with metadata infos from OpenStack.

openstack: add new step for sriov-worker

This new step registry will do the following:

  • Scale down the workers to 0
  • Download the current worker MachineSet
  • Apply a patch to the MachineSet which includes the necessary bits for
    SR-IOV: configDrive enabled, direct type port connected to the SR-IOV
    network
  • Deploy the new MachineSet and wait for the worker to be up and running.

openstack: add new step for sriov-networknodepolicy

  • Disable the webhook on 4.9, until a PR has landed
  • Create a SriovNetworkNodePolicy on the SR-IOV workers using
    the SR-IOV provider network.

openstack: make DPDK tests its own

  • Rename OPENSTACK_PERFORMANCE_NETWORK to OPENSTACK_DPDK_NETWORK
  • Rename NFV tests to DPDK, so later we can extend it to add SR-IOV

optional-operators/subscribe: add support for proxy

When a deployment runs behind a proxy, we need this snippet, like it's
already used by many step registries.

openstack: add new step for testing SR-IOV

Like we did for DPDK, this step will create a Pod that will use a VF and
consume the SR-IOV network operator.
If we run testpmd successfuly and see the NIC, the test will be considered as passing.

openstack: update openstack-nfv workflow to add SR-IOV

  • Add necessary steps to the openstack-nfv workflow for testing SR-IOV,
    and its parameters.
  • Remove the openstack-nfv job from shiftstack-ci, as it will move under
    openshift-sriov-network-operator.

sriov-network-operator: add CI jobs for OpenStack

Add the OpenStack NFV CI job that test the SR-IOV network operator in
the context of OpenShift deployed on top of OpenStack (virtualized
worker nodes), for both 4.10 and 4.9. The jobs are non-voting, so won't
block anything to merge in the operator.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 18, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: EmilienM
To complete the pull request process, please assign bn222, samvarankashyap after the PR has been reviewed.
You can assign the PR to them by writing /assign @bn222 @samvarankashyap in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested review from bn222 and dougbtv January 18, 2022 17:34
@EmilienM
Copy link
Member Author

/retest

3 similar comments
@EmilienM
Copy link
Member Author

/retest

@EmilienM
Copy link
Member Author

/retest

@EmilienM
Copy link
Member Author

/retest

Until 4.10 (and future versions under development), we need to install
the PAO from source because it's not present in the Marketplace.
The default will remain 3 but we can now change the number of workers
that we want to deploy.
This step registry is meant to be used for the workers so they have
/var/config mounted with metadata infos from OpenStack.

This step will be useful when deploying OpenShift on OpenStack with
SR-IOV enabled, the SR-IOV network operator needs config-drive on the
workers.
This new step registry will do the following:

* Scale down the workers to 0
* Download the current worker MachineSet
* Apply a patch to the MachineSet which includes the necessary bits for
  SR-IOV: configDrive enabled, direct type port connected to the SR-IOV
network.
* Deploy the new MachineSet and wait for the worker to be up and
  running.

This step will be useful later when testing SR-IOV.
* Disable the webhook on 4.9, until a PR has landed
* Create a SriovNetworkNodePolicy on the SR-IOV workers using
  the SR-IOV provider network.

This step will be required later when we'll test SR-IOV.
* Rename OPENSTACK_PERFORMANCE_NETWORK to OPENSTACK_DPDK_NETWORK
* Rename NFV tests to DPDK, so later we can extend it to add SR-IOV
When a deployment runs behind a proxy, we need this snippet, like it's
already used by many step registries.

This will help the OpenStack team to deploy Operators in our CI that
deploys disconnected OpenShift clusters on OpenStack platform.
Like we did for DPDK, this step will create a Pod that will use a VF and
consume the SR-IOV network operator.
If we run testpmd successfuly and see the NIC, the test will be considered as passing.
* Add necessary steps to the openstack-nfv workflow for testing SR-IOV,
  and its parameters.
* Remove the openstack-nfv job from shiftstack-ci, as it will move under
  openshift-sriov-network-operator.
Add the OpenStack NFV CI job that test the SR-IOV network operator in
the context of OpenShift deployed on top of OpenStack (virtualized
worker nodes), for both 4.10 and 4.9. The jobs are non-voting, so won't
block anything to merge in the operator.
The operator has a webhook that can take time to start and be ready, the
sleep will avoid errors like we saw here:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.9-e2e-openstack-nfv/1483816862163668992

```
Error from server (InternalError): error when creating "STDIN": Internal error occurred: conversion webhook for performance.openshift.io/v1, Kind=PerformanceProfile failed: Post "https://performance-operator-service.openshift-performance-addon-operator.svc:443/convert?timeout=30s": dial tcp 10.130.0.42:4343: connect: connection refused
```
Copy link
Contributor

@MaysaMacedo MaysaMacedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, Emilien. Just took an initial look:

# Remove this workaround when https://github.com/openshift/sriov-network-operator/pull/600
# is merged and released in the operator
if [[ "${oc_version}" == *"4.9"* ]]; then
oc patch sriovoperatorconfig default --type=merge -n openshift-sriov-network-operator --patch '{ "spec": { "enableOperatorWebhook": false } }'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this patch can be removed since the PR got merged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet, because we need a new zstream release that will contain the fix

oc create -f - -o jsonpath='{.metadata.name}' <<EOF
oc_version=$(oc version -o json | jq -r '.openshiftVersion')
# Once 4.10 is GA, we need to bump it to 4.11 and so on.
if [[ "${oc_version}" == *"4.10"* ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't the same apply for previous versions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, in previous versions we want to install the operator from the Marketplace to test what customers have (I think, for now). We can probably change it later if we want to catch problems earlier than that, by installing it from source even on 4.9 etc. WDYT?

echo "${SHARED_DIR}/sriov-worker-node file not found, no worker node for SR-IOV was deployed"
exit 1
fi
#WORKER_NODE=$(cat "${SHARED_DIR}/sriov-worker-node")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

@@ -7,10 +7,6 @@ base_images:
name: builder
namespace: ocp
tag: rhel-8-golang-1.16-openshift-4.9
openstack-installer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it being disabled just to reduce the load?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i was doing it for testing only. Ignore this change.

@EmilienM
Copy link
Member Author

FYI I've (again) started a new PR where we focus on OpenStack related changes: #25608
After this one merges, I'll add the job to the SRIOV network operator project.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 25, 2022

@EmilienM: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 25, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 2, 2022

@EmilienM: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/openshift/aws-efs-csi-driver-operator/release-4.9/operator-e2e c0da101efc5c3864e1422716f7d2a58117917d2c link unknown /test pj-rehearse
ci/rehearse/kubevirt/hyperconverged-cluster-operator/release-4.11/hco-e2e-image-index-sno-aws e262721879f75ea7dbb77ef326333e6103e28de0 link unknown /test pj-rehearse
ci/rehearse/kubevirt/hyperconverged-cluster-operator/release-4.11/hco-e2e-image-index-sno-azure e262721879f75ea7dbb77ef326333e6103e28de0 link unknown /test pj-rehearse
ci/rehearse/red-hat-data-services/opendatahub-operator/master/48-modh-operator-e2e-48 13bd12afa79ff4b102c25d1c2d4d30668277674f link unknown /test pj-rehearse
ci/rehearse/openshift/cluster-cloud-controller-manager-operator/release-4.9/e2e-openstack-ccm f071074 link unknown /test pj-rehearse
ci/rehearse/periodic-ci-redhat-openshift-ecosystem-cvp-ocp-4.7-cvp-common-ibmz-eco f071074 link unknown /test pj-rehearse
ci/prow/pj-rehearse f071074 link false /test pj-rehearse
ci/rehearse/periodic-ci-redhat-openshift-ecosystem-cvp-ocp-4.7-cvp-common-powervs f071074 link unknown /test pj-rehearse
ci/rehearse/red-hat-storage/odf-operator/release-4.9/odf-operator-bundle-e2e-aws 6ce3a97 link unknown /test pj-rehearse
ci/build-farm/build03-dry f071074 link true /test build03-dry
ci/build-farm/build01-dry f071074 link true /test build01-dry
ci/build-farm/build04-dry f071074 link true /test build04-dry
ci/build-farm/hive-dry f071074 link true /test hive-dry
ci/prow/ci-operator-config f071074 link true /test ci-operator-config
ci/prow/ci-secret-bootstrap-config-validation f071074 link true /test ci-secret-bootstrap-config-validation
ci/prow/boskos-config-generation f071074 link true /test boskos-config-generation
ci/prow/boskos-config f071074 link true /test boskos-config
ci/build-farm/build02-dry f071074 link true /test build02-dry
ci/prow/build-clusters f071074 link true /test build-clusters

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@EmilienM
Copy link
Member Author

EmilienM commented Feb 4, 2022

/close

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 4, 2022

@EmilienM: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot closed this Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
2 participants