-
Notifications
You must be signed in to change notification settings - Fork 1.2k
HLD for SmsrtSwitch DPU graceful shutdown #1991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
01c62c2
Initial version for dpu-graceful-shutdown HLD
rameshraghupathy 2c8b83b
Did some minor improvement
rameshraghupathy 5020c9f
Addressed review comments
rameshraghupathy 5fa9881
Adding two approaches
rameshraghupathy cf02ac4
Did some cleanup
rameshraghupathy 8003e40
Did some cleanup
rameshraghupathy 1d96db3
Did some cleanup
rameshraghupathy f97db46
Did some cleanup
rameshraghupathy db03b8b
Did some cleanup
rameshraghupathy 6b9fe9f
Did some cleanup
rameshraghupathy 29585aa
Did some cleanup
rameshraghupathy 07afb46
Did some cleanup
rameshraghupathy a27f575
Did some cleanup
rameshraghupathy 0e4b549
Did some cleanup
rameshraghupathy e9582fd
Did some cleanup
rameshraghupathy d280d2d
Fixed the sequence diagram
rameshraghupathy 563b5c0
Fixed the sequence diagram
rameshraghupathy efc9530
Fixed the sequence flow
rameshraghupathy 60a0ad3
Addressed review comments
rameshraghupathy 6e7729a
Did some cleanup
rameshraghupathy 4545f1c
Called out that the response read happens in a 5 sec loop
rameshraghupathy ae70f24
Added a section for interaoperability
rameshraghupathy 3882f88
Did some cleanup
rameshraghupathy 9d6c8ba
Did some cleanup
rameshraghupathy 2eb24c0
Did some cleanup
rameshraghupathy cb3cd8d
Enhanced the reboot-interoperability.svg diagram
rameshraghupathy d677530
Enhanced the reboot-interoperability description
rameshraghupathy 1af23de
Addressed review comments
rameshraghupathy f8d3fd7
Addressed review comments
rameshraghupathy a67d16a
Addressed review comments
rameshraghupathy a695a67
Addressed some review comments
rameshraghupathy eb73698
Addressed some review comments
rameshraghupathy 9aaef84
modified reboot-interoperability diagram
rameshraghupathy 35a790f
addressed review comments
rameshraghupathy 4dbf08f
Addressed some review comments
rameshraghupathy 490357a
Addressed some review comments
rameshraghupathy ef08c8e
Updated the image
rameshraghupathy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
199 changes: 199 additions & 0 deletions
199
doc/smart-switch/graceful-shutdown/graceful-shutdown.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,199 @@ | ||
# SmartSwitch DPU Graceful Shutdown | ||
|
||
| Rev | Date | Author | Change Description | | ||
| --- | ---- | ------ | ------------------ | | ||
| 0.1 | 12/05/2025 | Ramesh Raghupathy | Initial version| | ||
|
||
|
||
## Definitions / Abbreviations | ||
|
||
| Term | Meaning | | ||
| --- | ---- | | ||
| PMON | Platform Monitor | | ||
| DPU | Data Processing Unit | | ||
| gRPC | Generic Remote Procedure Calls | | ||
| gNOI | gRPC Network Operations Interface | | ||
| gNMI | gRPC Network Management Interface | | ||
|
||
## Introduction | ||
SmartSwitch supports graceful reboot of the DPUs. Given this, it is quiet natural that we provide support for graceful shutdown of the DPUs. Though it may sound like that the graceful shutdown is the first half of graceful reboot, it is not so because the way it is invoked, the code path for the shutdown are different making the implementation little complex. Besides this, the limitation of the absence of docker, the container separation, and the platform agnostic implementation adds to the challenge of invoking the gNOI call from this code path. Graceful shutdown on each DPU happens in parallel. | ||
|
||
## DPU Graceful Shutdown Sequence | ||
|
||
The following sequence diagram illustrates the detailed steps involved in the graceful shutdown of a DPU: | ||
|
||
<p align="center"><img src="./images/dpu-graceful-shutdown.svg"></p> | ||
|
||
## Sequence of Operations | ||
|
||
1. **Daemon Initialization:** | ||
|
||
* Upon startup, `gnoi_reboot_daemon.py` subscribes to the `CHASSIS_MODULE_INFO_TABLE` to monitor incoming shutdown/reboot requests. The state transition will be no-op for startup requests. | ||
|
||
2. **CLI Command Execution:** | ||
|
||
* The user executes the command `config chassis module shutdown DPUx` via the CLI or a config load. | ||
|
||
3. **Chassis Daemon Processing:** | ||
|
||
* `chassisd` receives the shutdown command and invokes set_admin_state(down) on `module_base.py`. | ||
|
||
* Within `module_base.py`, the system checks if the device `subtype` is `"SmartSwitch"` and `switch_type` is not `dpu`. | ||
|
||
* If both conditions are met, it proceeds with the graceful shutdown process, else calls `module.py` `set_admin_state(down)` | ||
|
||
4. **Graceful Shutdown Handler Invocation:** | ||
|
||
* `module_base.py` calls the `graceful_shutdown_handler()` method to initiate the graceful shutdown sequence. | ||
|
||
5. **Reboot Request Creation:** | ||
|
||
* Within the `graceful_shutdown_handler()`, `state_transition_in_progress` `True`is written to the `CHASSIS_MODULE_INFO_TABLE` in Redis STATE_DB for DPUx along with `transition_type`. | ||
|
||
6. **Daemon Notification and Processing:** | ||
|
||
* `gnoi_reboot_daemon.py` detects the `state_transition_in_progress` turning `True` in `CHASSIS_MODULE_INFO_TABLE` and sends a gNOI Reboot RPC with the method `HALT` to the sysmgr in DPUx, which in turn issues a DBUS request to execute `reboot -p` on DPUx. | ||
|
||
7. **Reboot Request**: | ||
|
||
* The daemon forwards the reboot request. | ||
|
||
8. **Reboot Status Monitoring:** | ||
|
||
* The daemon sends `gnoi_client -rpc RebootStatus` to monitor the reboot status of DPUx. | ||
|
||
9. **DPUx Returns Status:** | ||
|
||
* DPUx returns the reboot status response to the daemon. | ||
|
||
10. **Reboot Result Update in DB:** | ||
|
||
* The daemon writes the reboot result to the `CHASSIS_MODULE_INFO_TABLE` in Redis STATE_DB by turning `state_transition_in_progress` to `False` when after the platform API completes the power down operation of the modules as shown in step 13. | ||
|
||
* In case of a reboot result failure the result gets updated after the timeout. | ||
|
||
11. **Read the Result:** | ||
|
||
* `module_base.py` in a loop reads the `state_transition_in_progress` turning `False` in `CHASSIS_MODULE_INFO_TABLE` every 5 secs. | ||
|
||
12. **Log the Result:** | ||
|
||
* `module_base.py` logs the reboot result accordingly. | ||
|
||
13. **Final State Transition:** | ||
|
||
* `module_base.py` invokes `set_admin_state(down)` on `module.py`. | ||
|
||
* `module.py` calls the platform API to power down the module when the DPUx completes kernel shutdown. | ||
|
||
## Objective | ||
|
||
This design enables the `chassisd` process running in the PMON container to invoke a **gNOI-based reboot** when it triggers the "set_admin_state(down)" API of a DPU module, without relying on `docker`, `bash`, or `hostexec` within the container. | ||
|
||
## Constraints | ||
|
||
- The PMON container is highly restricted: no `docker`, `hostexec`, or `bash`. | ||
- gNOI reboot requires executing a command using `docker exec` on the host. | ||
- Communication must be initiated from PMON and executed by the host. | ||
|
||
--- | ||
|
||
## Design Overview | ||
|
||
In the Redis STATE_DB IPC approach, SONiC leverages Redis's publish-subscribe mechanism to facilitate inter-process communication between components. This event-driven design ensures decoupled and reliable communication between components. | ||
|
||
### CHASSIS_MODULE_INFO_TABLE Schema (STATE_DB) | ||
|
||
KEY: `CHASSIS_MODULE_INFO_TABLE|<MODULE_NAME>`. | ||
|
||
| Field | Description | | ||
| ------------------------------ | -------------------------------------------------------------------------------------------------------- | | ||
| `state_transition_in_progress` | `"True"` indicates that a transition is ongoing; `"False"` or absence implies no transition. | | ||
| `transition_start_time` | Timestamp in human-readable UTC format representing the start of the transition. | | ||
| `transition_type` | Specifies the nature of the transition: `"shutdown"`, `"none"`. `none` is default for reboot and startup | | ||
|
||
**Example:** | ||
``` | ||
CHASSIS_MODULE_INFO_TABLE|DPU0 | ||
{ | ||
"state_transition_in_progress": "True", | ||
"transition_start_time": "Mon Jun 17 08:32:10 UTC 2025", | ||
"transition_type": "shutdown" | ||
} | ||
``` | ||
|
||
| Transition Type | Who Sets the Field | How It's Cleared | | ||
| --------------------- | --------------------------------------------------------------- | --------------------------------------------------- | | ||
| **Startup** | CLI or config load | Once module reaches online state | | ||
| **Shutdown** | CLI or config load | `gnoi-reboot-daemon` upon completing the platform API (module shutdown) | | ||
| **Reboot** | `smartswitch_reboot_helper` | Cleared by `smartswitch_reboot_helper` upon completing the platform API | | ||
|
||
## Parallel Execution | ||
|
||
The following sequence diagram illustrates the parallel execution of graceful shutdown of multiple DPUs: | ||
|
||
<p align="center"><img src="./images/parallel-execution.svg"></p> | ||
|
||
## Interoperability between DPU Graceful Shutdown & gNOI Reboot | ||
|
||
<p align="center"><img src="./images/reboot-interoperability.svg"></p> | ||
|
||
The diagram above illustrates scenarios where both module_base.py and smartswitch_reboot_helper might attempt to initiate a shutdown, startup and reboot simultaneously. When there is a race condition the one that writes the `CHASSIS_MODULE_INFO_TABLE` `state_transition_in_progress` field wins. In case if the `state_transition_in_progress` is `True` as a result of DPU startup in progress both reboot and shutdown will fail. It is up to the requesting module to re-issue the transaction if needed. When the module level reboot and switch level reboot happen simultaneously, if the module level reboot has already updated the | ||
`state_transition_in_progress` to `True` the switch level reboot needs to be reissued. If the switch level reboot happens first it will grab all the module | ||
`state_transition_in_progress` and set them to `True` as a first step and runs to completion. | ||
|
||
**Scenario 1:** module_base issues a startup or shutdown when smartswitch_reboot_helper module reboot is in progress for the same module. | ||
|
||
The same scenario applies if "config reload" happens when reboot is in progress. | ||
|
||
* smartswitch_reboot_helper writes to `CHASSIS_MODULE_INFO_TABLE` with `state_transition_in_progress` to `True`. | ||
|
||
* If module_base.py attempts to write to `CHASSIS_MODULE_INFO_TABLE` with `state_transition_in_progress` `True` during this process, the operation will fail. The user has to retry the shutdown operation later. | ||
|
||
* When the reboot is complete the `CHASSIS_MODULE_INFO_TABLE` `state_transition_in_progress` will be set to `False`. The module_base.py has to retry the shutdown/startup operation as needed when the reboot is complete. | ||
|
||
**Scenario 2:** smartswitch_reboot_helper module issues a reboot when module_base graceful shutdown is in progress. | ||
|
||
* module_base.py writes to `CHASSIS_MODULE_INFO_TABLE` with `state_transition_in_progress` `True` and sets the `"transition_type": "shutdown"`. | ||
|
||
* gnoi_reboot_daemon.py is notified of the new entry and proceeds to send a gNOI Reboot RPC with the method HALT to the sysmgr in DPUx. | ||
|
||
* The daemon writes the reboot result to the `CHASSIS_MODULE_INFO_TABLE` by toggling `state_transition_in_progress` to `False`. | ||
|
||
* If smartswitch_reboot_helper also attempts to write to `CHASSIS_MODULE_INFO_TABLE` with `state_transition_in_progress` `True` during this process, the operation will fail. | ||
|
||
* The graceful shutdown completes as planned. So, there is no need for the reboot in this situation. | ||
|
||
**Scenario 3:** smartswitch_reboot_helper module issues a reboot when module_base startup is in progress. | ||
|
||
* module_base.py writes to `CHASSIS_MODULE_INFO_TABLE` with `state_transition_in_progress` `True`. | ||
|
||
* If smartswitch_reboot_helper also attempts to write to `CHASSIS_MODULE_INFO_TABLE` with `state_transition_in_progress` `True` during this process, the operation will fail. | ||
|
||
* The module startup completes as planned. So, the reboot may not be needed in this situation. | ||
|
||
**Scenario 4:** module_base issues a graceful shutdown when the module startup is in progress or vice versa. | ||
|
||
* If module_base.py writes to `CHASSIS_MODULE_INFO_TABLE` with `state_transition_in_progress` `True` indicating startup or shutdown is in progress. | ||
|
||
* If module_base.py issues another startup or shutdown to the same module that will fail and the user has to issue it again later when the previous operation is complete. | ||
|
||
**Scenario 5:** Switch level reboot is issued when module level reboot or startup or shutdown in progress. | ||
|
||
* In this situation the switch level reboot logic will check the `state_transition_in_progress` for all the modules first and grab anything that is `False` set them the `True`. If one or more modules are already undergoing reboot or shutdown or startup it will ignore those modules and complete the remaining. This will leave the system in the expected state. Until the switch level reboot is complete the `state_transition_in_progress` for all modules will be maintained `True` irrespective of the type of operation. | ||
|
||
**Scenario 6:** Module level reboot or startup or shutdown is issued when switch level reboot is in progress. | ||
|
||
* The module level requests will fail as the switch level reboot has already set all the module level `state_transition_in_progress` to `True`. | ||
* The user needs to redo the module level operation after the switch level reboot if needed. | ||
|
||
This design ensures that only one reboot process is initiated, regardless of which component triggers it first, thereby preventing race conditions and ensuring system stability. | ||
|
||
--- | ||
|
||
## References | ||
|
||
- [PMON HLD](https://github.com/sonic-net/SONiC/blob/master/doc/smart-switch/pmon/smartswitch-pmon.md) | ||
- [Smart Switch Reboot HLD](https://github.com/sonic-net/SONiC/blob/master/doc/smart-switch/reboot/reboot-hld.md) | ||
|
||
--- |
1 change: 1 addition & 0 deletions
1
doc/smart-switch/graceful-shutdown/images/dpu-graceful-shutdown.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.