Skip to content

Commit

Permalink
Add error handling and exception scenarios
Browse files Browse the repository at this point in the history
  • Loading branch information
vvolam committed May 17, 2024
1 parent 0c9c284 commit 47fa43b
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions doc/smart-switch/reboot/reboot-hld.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
- [NPU platform.json](#npu-platformjson)
- [GNOI API implementation](#gnoi-api-implementation)
- [reboot.py script modifications](#rebootpy-script-modifications)
- [Error handling and exception scenarios](#error-handling-and-exception-scenarios)
- [Test plan](#test-plan)

## Revision ##
Expand Down Expand Up @@ -233,6 +234,14 @@ a complete switch reboot or targeting a specific DPU.
* Add a new reboot_smartswitch() function to reboot either the entire switch or a particular DPU, which takes DPU ID as an argument that
needs a reboot.

### Error handling and exception scenarios ###

* If the GNMI service is not operational on the DPU for any reason, detach the PCI, and proceed with the reboot after a timeout upon receiving an acknowledgment.

* After the DPU reboots, if the DPU PCI fails to reconnect for any reason, an error-handling mechanism should be in place to restore the DPU.

* In the event of power failure, a power-cycle due to a kernel panic, or any other unknown reason, both the DPU and NPU will undergo an ungraceful reboot.

## Test plan ##

Presented below is the test plan within the ```sonic-mgmt``` framework for the smart switch reboot.
Expand Down

0 comments on commit 47fa43b

Please sign in to comment.