From 6419c78a6789b69d0a2a656d67d33842d9a509ff Mon Sep 17 00:00:00 2001 From: Pau Capdevila Date: Fri, 5 Dec 2025 11:12:55 +0100 Subject: [PATCH 1/3] Document DS5000 breakout transceiver insert limitations Signed-off-by: Pau Capdevila --- docs/known-limitations/known-limitations.md | 25 +++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/docs/known-limitations/known-limitations.md b/docs/known-limitations/known-limitations.md index 0dde9592..511e2eaf 100644 --- a/docs/known-limitations/known-limitations.md +++ b/docs/known-limitations/known-limitations.md @@ -90,3 +90,28 @@ root cause and possible workarounds. None. We recommend avoiding mesh topologies on TH5-based devices for the time being, with the exception of 2-node topologies without gateway, where the above issues would not apply. + +### Breakout and CMIS transceiver initialization issues on DS5000 + +On Celestica DS5000 devices, certain transceivers using the Common Management Interface Specification (CMIS) fail to initialize properly under specific conditions. + +CMIS is an open standard for managing high-speed pluggable transceivers, providing a uniform way for the network operating system to interact with and monitor them. + +#### Diagnosing the issue + +If you breakout a port (for example, changing from 1x800G to 2x400G or 8x100G) while no transceiver is present, and then insert a transceiver afterward, initialization may fail and the transceiver may be missing or appear as failed in SONiC. + +This occurs because SONiC did not always correctly reinitialize hardware abstraction for the port after breakout and re-insertion in this scenario, especially affecting CMIS modules. + +#### Resolution + +- The Hedgehog Fabric agent now automatically patches `/usr/share/sonic/platform/pddf/pddf-device.json` as needed after NOS installation (the patch is indicated by `-hh1` in the description). No user action is required to apply this workaround. +- A full switch reboot is still required after agent deployment for the patch to take effect. +- The `REBOOTREQ` column for the agent object in `kubectl` or `k9s` will indicate if a reboot is needed. +- If you encounter existing transceiver failures (such as after an upgrade), a full power cycle of the switch may still be required in addition to the reboot. + +#### Additional guidance + +- Prefer inserting transceivers before breaking out ports to avoid the issue altogether, if possible. +- Always follow any REBOOTREQ status after upgrades or configuration changes. +- If problems persist, perform a full power cycle as a last resort. From 6a02d57433b7d24a981e34236076489303416df4 Mon Sep 17 00:00:00 2001 From: Pau Capdevila Date: Fri, 5 Dec 2025 11:17:07 +0100 Subject: [PATCH 2/3] Add DS5000 breakout limitation to page TOC Signed-off-by: Pau Capdevila --- docs/known-limitations/known-limitations.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/known-limitations/known-limitations.md b/docs/known-limitations/known-limitations.md index 511e2eaf..23770c5e 100644 --- a/docs/known-limitations/known-limitations.md +++ b/docs/known-limitations/known-limitations.md @@ -7,6 +7,7 @@ working hard to address: * [Configuration not allowed when port is member of PortChannel](#configuration-not-allowed-when-port-is-member-of-portchannel) * [External peering over a connection originating from an MCLAG switch can fail](#external-peering-over-a-connection-originating-from-an-mclag-switch-can-fail) * [Mesh limitations on TH5-based devices](#mesh-limitations-on-th5-based-devices) +* [Breakout and CMIS transceiver initialization issues on DS5000](#breakout-and-cmis-transceiver-initialization-issues-on-ds5000) ### Deleting a VPC and creating a new one right away can cause the agent to fail From d136acfc826301bf02d48fb34be32d34a93563bd Mon Sep 17 00:00:00 2001 From: Pau Capdevila Date: Tue, 9 Dec 2025 15:11:22 +0100 Subject: [PATCH 3/3] Clairfy power-cycle aka cold boot workaround Signed-off-by: Pau Capdevila --- docs/known-limitations/known-limitations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/known-limitations/known-limitations.md b/docs/known-limitations/known-limitations.md index 23770c5e..c339814a 100644 --- a/docs/known-limitations/known-limitations.md +++ b/docs/known-limitations/known-limitations.md @@ -109,7 +109,7 @@ This occurs because SONiC did not always correctly reinitialize hardware abstrac - The Hedgehog Fabric agent now automatically patches `/usr/share/sonic/platform/pddf/pddf-device.json` as needed after NOS installation (the patch is indicated by `-hh1` in the description). No user action is required to apply this workaround. - A full switch reboot is still required after agent deployment for the patch to take effect. - The `REBOOTREQ` column for the agent object in `kubectl` or `k9s` will indicate if a reboot is needed. -- If you encounter existing transceiver failures (such as after an upgrade), a full power cycle of the switch may still be required in addition to the reboot. +- If you encounter existing transceiver failures (such as after an upgrade), a full power cycle of the switch, sometimes referred as cold boot, may still be required in addition to the reboot. #### Additional guidance