SynaXG plugin dev scheme review #626
Draft
+1,437
−59
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi @bn222 @wizhaoredhat @thom311
I have submitted a pull request for the review of the design scheme. Would you be kind enough to help assess its rationality?
Certain implementation details — including the Dockerfile, Makefile, and SynaXG Plugin logic — have not yet been fully finalized. Please disregard these items for the time being and focus solely on the design scheme. Thank you.
First, I would like to provide some background:
The SynaXG card will not have Kubernetes or OpenShift Container Platform (OCP) installed. As such, the SynaXG VSP will run exclusively on the host side. The VSP’s core functions are twofold:
1) To perform DPU reboot by unbinding the target PCI address.
2) To implement firmware upgrade via gRPC — specifically, a gRPC server runs on the SynaXG card itself, while the corresponding gRPC client is deployed within the VSP pod. (This gRPC is only for firmware upgrade, has nothing to do with dpu-operator)
Let me briefly introduce my idea:
A "DataProcessingUnitConfig" CRD will be created for each DPU.
Daemon pods will be deployed on all DPU nodes in the cluster, meaning "HostSideManagers" will run on every DPU node.
The "DataProcessingUnitConfigReconciler" is configured within the "HostSideManager", so each DPU node can monitor changes to the "DataProcessingUnitConfig" CRD.
When a user adds "nodeName" and specifies "reboot DPU" in the CRD, each "DataProcessingUnitConfigReconciler" will verify whether its node is the target node. If the labels match, the "DataProcessingUnitConfigReconciler" will call the gRPC method to execute the reboot operation.
From my understanding, the gRPC connection is already established by the "HostSideManager"—thus, when the "DataProcessingUnitConfigReconciler" is initialized, "vsp" is passed in as an input parameter.