Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubeVirt CSI Driver Integration #1733

Merged
merged 1 commit into from Oct 18, 2022

Conversation

davidvossel
Copy link
Contributor

@davidvossel davidvossel commented Sep 7, 2022

There are two components for CSI drivers, the "controller" deployment, which works at the cluster level, and the "node" daemonset which works on the individual node level. In general this driver works by mirroring usage of storage classes in the infra cluster to the guest cluster.

The "controller" deployment is laid down by the CPO within the hosted control plane namespace. We put this in the hosted control plane namespace because the "controller" needs access to both the guest and infra clusters in order to mirror PVCs.

The "node" daemonset portion is laid down within the guest cluster by the HCCO. This component requires no access to the infra cluster and works purely within the guest cluster.

It's possible we will transition the KubeVirt CSI Driver's installation to the cluster-storage-operator (CSO) in the future, but given our unique scenario where infra access is required for our driver to work, we have included the integration into the CPO and HCCO for now.

This work supersedes the WIP pr #1733

@davidvossel
Copy link
Contributor Author

This should be good to go now that the kubevirt-csi-driver is in the 4.12 release payload. kubevirt test lanes pass.

metadata:
name: kubevirt-csi-controller-cr
rules:
# Allow listing and creating CRDs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which new CRDs are needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they aren't anymore. I removed this and fixed it upstream as well.

crclient "sigs.k8s.io/controller-runtime/pkg/client"
)

//go:embed files/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we either put this into support/ as in https://github.com/openshift/hypershift/pull/1698/files#diff-091a3abbf9e278c6f5cd74ca5fb7094a370f2a57c8c13c3f6568a6cc049ed8ea

Also why particular reason why using yaml here vs golang?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These yamls are a slightly modified version of the yamls sourced from the deploy yaml in the kubevirt-csi repo. I'll be modifying the kubevirt-csi repo yamls to match what we had to do for hypershift, so eventually they should be a 1:1 reflection of one another. The idea would be that when we test kubevirt-csi upstream, we'd be using the same yamls that get sourced into hypershift... but we're not 100% there yet.

In the medium term, we're looking to converge with the CSO which will involve sourcing yaml similar to what we're doing here.

Copy link
Member

@enxebre enxebre Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use yamls the must* functions already exists in olm/assets, please move them to support/assets and reuse them.

And you need to make sure to call SetDefaults() over it, see https://github.com/openshift/hypershift/pull/1698/files#r987870941

Also we'd preferably want to follow the same approach for all components, i.e. atm that's golang though I can see the benefit for this case as we are mirroring.

func (r *HostedControlPlaneReconciler) reconcileCSIDriver(ctx context.Context, hcp *hyperv1.HostedControlPlane, releaseImage *releaseinfo.ReleaseImage, createOrUpdate upsert.CreateOrUpdateFN) error {
switch hcp.Spec.Platform.Type {
case hyperv1.KubevirtPlatform:
err := kubevirtcsi.ReconcileInfra(r.Client, hcp, ctx, createOrUpdate, releaseImage.ComponentImages())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are other platforms expected to reconcile here too or should the storage operator do that?
Does this need to be clarified here https://github.com/openshift/enhancements/blob/master/enhancements/storage/storage-hypershift.md?

How does this PR interacts with #1698?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are other platforms expected to reconcile here too or should the storage operator do that? Does this need to be clarified here

most likely other drivers should be laid down by the cso and not the hcp or hcco. KubeVirt is a little different at the moment though.

How does this PR interacts with #1698?

We started planning kubevirt csi driver integration before the CSO started their reference implementation with aws. Essentially both our kubevirt csi integration effort and the CSO's effort occurred in parallel. We understood this was happening and plan to (attempt to) converge with the CSO in 4.13 once they have an established pattern for deploying into the HCP that we can follow.

Converging with the CSO might be slightly challenging though, which is something we'll be investigating soon. Due to the nested nature of the driver (requiring api access to both mgmt and guest clusters) kubevirt csi has some unique requirements that other csi drivers (like aws) do not have.

For now, we'd like to be able to begin exercising the KubeVirt csi driver using the integration method we have here. A decision about how to handle this long term will be addressed before KubeVirt goes GA (4.13+)

@enxebre
Copy link
Member

enxebre commented Sep 26, 2022

can we please conflate commits in to a single one?

@enxebre
Copy link
Member

enxebre commented Sep 26, 2022

thanks @davidvossel looks great, dropped some questions.

Signed-off-by: David Vossel <davidvossel@gmail.com>
Co-authored-by: isaacdorfman <isaac.i.dorfman@gmail.com>
@davidvossel
Copy link
Contributor Author

can we please conflate commits in to a single one?

yes

thanks @davidvossel looks great, dropped some questions.

@enxebre, thanks for the review. All your comments should be addressed now

containers:
- name: csi-driver
imagePullPolicy: Always
image: quay.io/dvossel/kubevirt-csi-driver:latest
Copy link
Member

@enxebre enxebre Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please update this image to come from the payload?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi for anyone reviewing this. the images are all overwritten in these yaml files using the release payload

privileged: true
allowPrivilegeEscalation: true
imagePullPolicy: Always
image: quay.io/dvossel/kubevirt-csi-driver:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please update this image to come from the payload?

- name: csi-node-driver-registrar
securityContext:
privileged: true
image: quay.io/openshift/origin-csi-node-driver-registrar:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can not use latest here, since the hcp is versioned we must use tagged versions

Comment on lines +148 to +158
for i, container := range controller.Spec.Template.Spec.Containers {
switch container.Name {
case "csi-driver":
controller.Spec.Template.Spec.Containers[i].Image = csiDriverImage
case "csi-provisioner":
controller.Spec.Template.Spec.Containers[i].Image = csiProvisionerImage
case "csi-attacher":
controller.Spec.Template.Spec.Containers[i].Image = csiAttacherImage
case "csi-liveness-probe":
controller.Spec.Template.Spec.Containers[i].Image = csiLivenessProbeImage
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enxebre the images in the yaml are overwritten using images from the release payload here.

Comment on lines +247 to +255
for i, container := range ds.Spec.Template.Spec.Containers {
switch container.Name {
case "csi-driver":
ds.Spec.Template.Spec.Containers[i].Image = csiDriverImage
case "csi-node-driver-registrar":
ds.Spec.Template.Spec.Containers[i].Image = csiNodeDriverRegistrarImage
case "csi-liveness-probe":
ds.Spec.Template.Spec.Containers[i].Image = csiLivenessProbeImage
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and daemonset images are overwritten here.

@enxebre
Copy link
Member

enxebre commented Oct 10, 2022

@davidvossel

I think this still needs to be addressed?

And you need to make sure to call SetDefaults() over it, see https://github.com/openshift/hypershift/pull/1698/files#r987870941

@davidvossel
Copy link
Contributor Author

@davidvossel
I think this still needs to be addressed?
And you need to make sure to call SetDefaults() over it, see https://github.com/openshift/hypershift/pull/1698/files#r987870941

@enxebre That was already done. Look at the control-plane-operator/controllers/hostedcontrolplane/csi/kubevirt/kubevirt.go file and grep for deploymentConfig. example below.

        deploymentConfig := &config.DeploymentConfig{}
        deploymentConfig.Scheduling.PriorityClass = config.DefaultPriorityClass
        deploymentConfig.SetRestartAnnotation(hcp.ObjectMeta)
        deploymentConfig.SetDefaults(hcp, nil, utilpointer.IntPtr(1))

Then ApplyTo is called when the controller deployment is created.

@enxebre
Copy link
Member

enxebre commented Oct 14, 2022

/test e2e-aws

1 similar comment
@davidvossel
Copy link
Contributor Author

/test e2e-aws

@sjenning
Copy link
Contributor

sjenning commented Oct 17, 2022

/lgtm
/approve
/retest-required

Code paths are isolated to kubevirt platform and this is a blocker to get e2e signal on that platform. If there is an undiscovered issue with the reconciliation in PR, it will also block kubevirt e2e from pass and will have to be fixed at that time.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 17, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 17, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidvossel, sjenning

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 17, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 17, 2022

@davidvossel: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-kubevirt-gcp-ovn 6e3475d link false /test e2e-kubevirt-gcp-ovn
ci/prow/capi-provider-agent-sanity 420b758 link false /test capi-provider-agent-sanity

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 1f70f40 and 2 for PR HEAD 420b758 in total

@openshift-merge-robot openshift-merge-robot merged commit bb64fb1 into openshift:main Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants