Skip to content

Conversation

@einsteinXue
Copy link

Hi @bn222 @wizhaoredhat @thom311
I have submitted a pull request for the review of the design scheme. Would you be kind enough to help assess its rationality?

Certain implementation details — including the Dockerfile, Makefile, and SynaXG Plugin logic — have not yet been fully finalized. Please disregard these items for the time being and focus solely on the design scheme. Thank you.

First, I would like to provide some background:
The SynaXG card will not have Kubernetes or OpenShift Container Platform (OCP) installed. As such, the SynaXG VSP will run exclusively on the host side. The VSP’s core functions are twofold:
1) To perform DPU reboot by unbinding the target PCI address.
2) To implement firmware upgrade via gRPC — specifically, a gRPC server runs on the SynaXG card itself, while the corresponding gRPC client is deployed within the VSP pod. (This gRPC is only for firmware upgrade, has nothing to do with dpu-operator)

Let me briefly introduce my idea:
A "DataProcessingUnitConfig" CRD will be created for each DPU.

Daemon pods will be deployed on all DPU nodes in the cluster, meaning "HostSideManagers" will run on every DPU node.

The "DataProcessingUnitConfigReconciler" is configured within the "HostSideManager", so each DPU node can monitor changes to the "DataProcessingUnitConfig" CRD.

When a user adds "nodeName" and specifies "reboot DPU" in the CRD, each "DataProcessingUnitConfigReconciler" will verify whether its node is the target node. If the labels match, the "DataProcessingUnitConfigReconciler" will call the gRPC method to execute the reboot operation.

From my understanding, the gRPC connection is already established by the "HostSideManager"—thus, when the "DataProcessingUnitConfigReconciler" is initialized, "vsp" is passed in as an input parameter.

  1. [api/v1/dataprocessingunitconfig_types.go]: Added CRD API definitions
  2. [cmd/main.go]: Commented out the setup of dataProcessingUnitConfigReconciler (to be initialized in HostSideManager instead)
  3. [dpu-api/api.proto]: Added DataProcessingUnitManagementService to the gRPC API to support DPU reboot and firmware upgrade operations
  4. [internal/controller/dataprocessingunitconfig_controller.go]: Implemented reconciliation logic for the DataProcessingUnitConfig CRD, and executes "reboot" and "firmware upgrade" operations by invoking gRPC methods from the VSP
  5. [internal/daemon/hostsidemanager.go]: Added setup logic for dataProcessingUnitConfigReconciler to monitor changes to the DataProcessingUnitConfig CRD
  6. [internal/daemon/plugin/vendorplugin.go]: Added interfaces for DpuReboot and FirmwareUpgrade operations
  7. [internal/platform/synaxg-dpu.go]: Added a detector for the SynaXG DPU platform
  8. [internal/platform/vendordetector.go]: Integrated the SynaXG platform detector

@openshift-ci
Copy link

openshift-ci bot commented Jan 9, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: einsteinXue
Once this PR has been reviewed and has the lgtm label, please assign bn222 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 9, 2026
@openshift-ci
Copy link

openshift-ci bot commented Jan 9, 2026

Hi @einsteinXue. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@einsteinXue einsteinXue marked this pull request as draft January 9, 2026 07:05
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 9, 2026
@bn222
Copy link
Contributor

bn222 commented Jan 14, 2026

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 14, 2026
@openshift-ci
Copy link

openshift-ci bot commented Jan 14, 2026

@einsteinXue: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/make-prow-ci-manifests-check 3ef8145 link true /test make-prow-ci-manifests-check
ci/prow/make-generate-check 3ef8145 link true /test make-generate-check
ci/prow/verify-deps 3ef8145 link true /test verify-deps
ci/prow/make-vendor-check 3ef8145 link true /test make-vendor-check
ci/prow/make-test 3ef8145 link true /test make-test
ci/prow/make-e2e-test 3ef8145 link true /test make-e2e-test
ci/prow/images 3ef8145 link true /test images
ci/prow/make-e2e-test-ptl 3ef8145 link true /test make-e2e-test-ptl
ci/prow/make-fmt-check 3ef8145 link true /test make-fmt-check
ci/prow/make-e2e-test-marvell 3ef8145 link true /test make-e2e-test-marvell

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants