Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add intelligence to xcvrd to understand process restart and config reload #377

Closed
wants to merge 5 commits into from

Conversation

mihirpat1
Copy link
Contributor

@mihirpat1 mihirpat1 commented Jun 13, 2023

Description

Fixes #356
sonic-net/sonic-buildimage/pull/15453 is also required along with the current changes so that XCVRD is successfully able to receive DEL notification when PROC_INFO table is deleted

Upon xcvrd/pmon docker restart set the media settings in app DB which sends spurious SI programming from OA upon receiving. xcvrd should not set the settings upon process restart/pmon restart.

So add intelligence to xcvrd to understand the process restart and config reload/OA crash.

Motivation and Context

Summary of the changes:

  1. XCVRD main thread creates “MEDIA_NOTIFY_REQUIRED” and "CMIS_REINIT_REQUIRED" with value as True in APPL_DB PROC_INFO:XCVRD table for the relevant namespace while coming up if both the keys are not present and sets the port_reinit_request_tbl to True for the relevant ports in the namespace.
    a. port_reinit_request_tbl is a dictionary which helps the child thread in deciding if media settings renotify and CMIS reinit are required
    1. If both the keys “MEDIA_NOTIFY_REQUIRED” and "CMIS_REINIT_REQUIRED" are set to False, port_reinit_request_tbl will be set as False. In all other cases, port_reinit_request_tbl will be set as True and both the keys will be set to True as well.
    b. “MEDIA_NOTIFY_REQUIRED” is a boolean which states if media notify is required per namespace upon xcvrd boot-up. The SfpStateUpdateTask thread will set this to False after all ports for the corresponding namespace have been initialized
    c. "CMIS_REINIT_REQUIRED" is a boolean which states if CMIS reinit is required per namespace upon xcvrd boot-up. The CmisManagerTask thread will set this to False after all ports for the corresponding namespace have been processed (CMIS SM reaches to a steady state for a port)
    1. Why not bitmap of ports for the keys and boolean instead?
    For creating bitmap, I will need to translate the logical port name (eg Ethernet8) to a number which represents a bit. This might become more difficult to manage on systems with multiple ASICs and/or systems which have multiple speeds supported (such as 400G + 10G on some ports)
  2. If only xcvrd crashes, then “MEDIA_NOTIFY_REQUIRED” and "CMIS_REINIT_REQUIRED" are already set as False, so xcvrd will not renotify media_settings and will not CMIS reinit the modules
  3. XCVRD will subscribe to PROC_INFO:XCVRD table and trigger self-restart if the PROC_INFO:XCVRD table is deleted for the namespace. All threads will be gracefully terminated and xcvrd deinit will be performed followed by issuing a SIGKILL (or some other signal) to ensure XCVRD is restarted automatically by supervisord. After respawn, CMIS re-init and media_settings notified is triggered for the ports belonging to the affected namespace
    1. The drawback here is xcvrd will restart even if OA crashes for just 1 ASIC.
  4. Based on the port_reinit_request_tbl, the CMIS FSM will perform DP_DEACTIVATE and DP reinitialization.
  5. syncd/swss restart clears the entire APP-DB, including the “MEDIA_NOTIFY_REQUIRED” and "CMIS_REINIT_REQUIRED" in PROC_INFO:XCVRD table in the namespace context.

How Has This Been Tested?

Following is the summary of Unit-test performed

<style> </style>
Event APPL_DB cleared Xcvrd restarted Media renotify CMIS re-init triggered Link flap
Xcvrd restart N Y N N N
Pmon restart N Y N N N
Swss restart Y Y Y Y Y
Syncd restart Y Y Y Y Y
config reload Y Y – Restarted through config reload and not due to DEL event handling Y Y Y
Cold reboot Y Y Y Y Y
Config shut N N N N Y
Config no shut N N N N Y

Additional Information (Optional)

…load

Signed-off-by: Mihir Patel <patelmi@microsoft.com>
@mihirpat1 mihirpat1 marked this pull request as ready for review June 14, 2023 03:20
@mihirpat1
Copy link
Contributor Author

@amnsinghal @shyam77git - It will be great if you can help in reviewing this

@Junchao-Mellanox
Copy link
Collaborator

What is the use case to restart xcvrd only?

@mihirpat1
Copy link
Contributor Author

What is the use case to restart xcvrd only?

Restarting xcvrd after OA crash will now allow us with the below 2 things:

  1. CmisManagerTask thread can initiate CMIS re-init only for the ports belonging to the affected namespace with init routine itself
  2. Media setting can be renotified for the ports belonging to the affected namespace

Following alternate options were evaluated before concluding on the xcvrd restart part.

  1. Perform TX Disable upon APPL_DB PORT_DEL event in CMIS thread

    1. Taking this approach would mean we will need to decide on the thread responsible for populating the APPL_DB PROC_INFO:XCVRD table entry? We would need to narrow down between Xcvrd main thread or CMIS manager since initially, xcvrd populated the PROC_INFO:XCVRD table entry
    2. Also, performing TX Disable upon receiving APPL_DB DEL event in CMIS thread will now require SfpStateUpdateTask thread to re-notify the media settings.
  2. If XCVRD main thread handles APPL_DB DEL event handling, we would need a way to notify other threads of this action. If xcvrd main thread puts the port in Disabled and DeInit state, we would need to modify the CMIS SM from the main thread as well.

Overall, both of the above approaches would increase dependency between the threads and hence, we chose to proceed with the xcvrd restart option

@snider-nokia
Copy link

XCVRD will subscribe to PROC_INFO:XCVRD table and trigger self-restart if the XCVRD_COLD_RESTART is deleted for the namespace. XCVRD will be killed using SIGKILL. After respawn, CMIS re-init and media_settings notified is triggered for the ports belonging to the affected namespace

Please don't use SIGKILL. SIGTERM is more appropriate, just in case there is platform specific cleanup that needs to done.

@mihirpat1
Copy link
Contributor Author

XCVRD will subscribe to PROC_INFO:XCVRD table and trigger self-restart if the XCVRD_COLD_RESTART is deleted for the namespace. XCVRD will be killed using SIGKILL. After respawn, CMIS re-init and media_settings notified is triggered for the ports belonging to the affected namespace

Please don't use SIGKILL. SIGTERM is more appropriate, just in case there is platform specific cleanup that needs to done.

Since the default "stopsignal" setting in supervisord config is TERM for xcvrd, we don't see xcvrd restarting after receiving a SIGTERM signal. Hence, I opted to do SIGKILL.

@snider-nokia
Copy link

XCVRD will subscribe to PROC_INFO:XCVRD table and trigger self-restart if the XCVRD_COLD_RESTART is deleted for the namespace. XCVRD will be killed using SIGKILL. After respawn, CMIS re-init and media_settings notified is triggered for the ports belonging to the affected namespace

Please don't use SIGKILL. SIGTERM is more appropriate, just in case there is platform specific cleanup that needs to done.

Since the default "stopsignal" setting in supervisord config is TERM for xcvrd, we don't see xcvrd restarting after receiving a SIGTERM signal. Hence, I opted to do SIGKILL.

If SIGKILL is used then the shutdown will (obviously) be graceless and our platform will be deprived of doing the graceful shutdown related things that are warranted. I implore you to reconsider the use of SIGKILL, as standard practice dictates that it really should only be used as a last resort. SIGABRT would work too.

@mihirpat1
Copy link
Contributor Author

XCVRD will subscribe to PROC_INFO:XCVRD table and trigger self-restart if the XCVRD_COLD_RESTART is deleted for the namespace. XCVRD will be killed using SIGKILL. After respawn, CMIS re-init and media_settings notified is triggered for the ports belonging to the affected namespace

Please don't use SIGKILL. SIGTERM is more appropriate, just in case there is platform specific cleanup that needs to done.

Since the default "stopsignal" setting in supervisord config is TERM for xcvrd, we don't see xcvrd restarting after receiving a SIGTERM signal. Hence, I opted to do SIGKILL.

If SIGKILL is used then the shutdown will (obviously) be graceless and our platform will be deprived of doing the graceful shutdown related things that are warranted. I implore you to reconsider the use of SIGKILL, as standard practice dictates that it really should only be used as a last resort. SIGABRT would work too.

Sure, will explore into this and try to handle graceful shutdown.

@snider-nokia
Copy link

If SIGKILL is used then the shutdown will (obviously) be graceless and our platform will be deprived of doing the graceful shutdown related things that are warranted. I implore you to reconsider the use of SIGKILL, as standard practice dictates that it really should only be used as a last resort. SIGABRT would work too.

Sure, will explore into this and try to handle graceful shutdown.

Thanks. Along the same lines, hope you will also make the following change (or something similar) to force the main Xcvrd thread to cause underlying platform initialization of SFP (transceiver) subsystem prior to the other Xcvrd threads being spun up. The reason this is requested is because Python does not allow signal handlers to be installed/set from anywhere except main thread.

For platforms with meaningful signal handling implemented (like ours), something of this sort is a requirement, otherwise signal handler(s) will not be allowed by Python to be installed/setup.

image

image

@Junchao-Mellanox
Copy link
Collaborator

I just have another thought: why not fix the issue from portsorch.cpp side? Here is my proposal:

In portsorch.cpp, we shall cache all the serdes attributes. We shall compare the cached serdes attributes values with incoming values while doPortTask is called. If the value equals, no need apply the value to SAI.

Any comment? @prgeor @mihirpat1

@prgeor
Copy link
Collaborator

prgeor commented Jun 19, 2023

I just have another thought: why not fix the issue from portsorch.cpp side? Here is my proposal:

In portsorch.cpp, we shall cache all the serdes attributes. We shall compare the cached serdes attributes values with incoming values while doPortTask is called. If the value equals, no need apply the value to SAI.

Any comment? @prgeor @mihirpat1

I just have another thought: why not fix the issue from portsorch.cpp side? Here is my proposal:

In portsorch.cpp, we shall cache all the serdes attributes. We shall compare the cached serdes attributes values with incoming values while doPortTask is called. If the value equals, no need apply the value to SAI.

Any comment? @prgeor @mihirpat1

@Junchao-Mellanox its not a good idea for OA to decide when to apply the media setting because consider a case where same cable is removed and plugged in, it should re-apply same media setting (as previous values) to keep various scenarios where Xcvrd need to init the CMIS state machine during various cases like config reload, swss/syncd crash

@Junchao-Mellanox
Copy link
Collaborator

I just have another thought: why not fix the issue from portsorch.cpp side? Here is my proposal:
In portsorch.cpp, we shall cache all the serdes attributes. We shall compare the cached serdes attributes values with incoming values while doPortTask is called. If the value equals, no need apply the value to SAI.
Any comment? @prgeor @mihirpat1

I just have another thought: why not fix the issue from portsorch.cpp side? Here is my proposal:
In portsorch.cpp, we shall cache all the serdes attributes. We shall compare the cached serdes attributes values with incoming values while doPortTask is called. If the value equals, no need apply the value to SAI.
Any comment? @prgeor @mihirpat1

@Junchao-Mellanox its not a good idea for OA to decide when to apply the media setting because consider a case where same cable is removed and plugged in, it should re-apply same media setting (as previous values) to keep various scenarios where Xcvrd need to init the CMIS state machine during various cases like config reload, swss/syncd crash

Thanks for the reply. For cable re-plug case, I suppose xcvrd should remove the media settings from APPL DB and re-apply it explicitly. The current solution looks a litter bit complicated to me.

…NIT_REQUIRED keys in the PROC_INFO:XCVRD table in APPL_DB

- Printing Module state while handling Insertion event in CMIS SM
- Allowing xcvrd graceful shutdown followed by generating SIGABRT signal while handling DEL event for PROC_INFO:XCVRD table
@mihirpat1
Copy link
Contributor Author

If SIGKILL is used then the shutdown will (obviously) be graceless and our platform will be deprived of doing the graceful shutdown related things that are warranted. I implore you to reconsider the use of SIGKILL, as standard practice dictates that it really should only be used as a last resort. SIGABRT would work too.

Sure, will explore into this and try to handle graceful shutdown.

Thanks. Along the same lines, hope you will also make the following change (or something similar) to force the main Xcvrd thread to cause underlying platform initialization of SFP (transceiver) subsystem prior to the other Xcvrd threads being spun up. The reason this is requested is because Python does not allow signal handlers to be installed/set from anywhere except main thread.

For platforms with meaningful signal handling implemented (like ours), something of this sort is a requirement, otherwise signal handler(s) will not be allowed by Python to be installed/setup.

image

image

I have now addressed this

@mihirpat1
Copy link
Contributor Author

mihirpat1 commented Jun 20, 2023

I just have another thought: why not fix the issue from portsorch.cpp side? Here is my proposal:
In portsorch.cpp, we shall cache all the serdes attributes. We shall compare the cached serdes attributes values with incoming values while doPortTask is called. If the value equals, no need apply the value to SAI.
Any comment? @prgeor @mihirpat1

I just have another thought: why not fix the issue from portsorch.cpp side? Here is my proposal:
In portsorch.cpp, we shall cache all the serdes attributes. We shall compare the cached serdes attributes values with incoming values while doPortTask is called. If the value equals, no need apply the value to SAI.
Any comment? @prgeor @mihirpat1

@Junchao-Mellanox its not a good idea for OA to decide when to apply the media setting because consider a case where same cable is removed and plugged in, it should re-apply same media setting (as previous values) to keep various scenarios where Xcvrd need to init the CMIS state machine during various cases like config reload, swss/syncd crash

Thanks for the reply. For cable re-plug case, I suppose xcvrd should remove the media settings from APPL DB and re-apply it explicitly. The current solution looks a litter bit complicated to me.

@Junchao-Mellanox - The current solution will update media settings only in the case of switch boot-up or OA crash. For the cable re-plug case, the below existing code will handle the corresponding scenario

notify_media_setting(logical_port, transceiver_dict, self.xcvr_table_helper.get_app_port_tbl(asic_index), self.port_mapping)

sonic-xcvrd/xcvrd/xcvrd.py Outdated Show resolved Hide resolved
@@ -134,6 +136,17 @@ def get_physical_port_name_dict(logical_port_name, port_mapping):

return port_name_dict

# Get dictionary of asic to set of corresponding logical ports
def get_asic_to_logical_ports_dict(port_mapping):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move this function to src/sonic-platform-daemons/sonic-xcvrd/xcvrd/xcvrd_utilities/port_mapping.py and make it a member of PortMapping class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not put it in PortMapping class since this class is more of tied to a single port and the corresponding mapping for it such as its physical/logical port number or asic ID.
However, the function get_asic_to_logical_ports_dict is tied to a ASIC and it maps to a set of ports. Hence, I added this in xcvrd and not as a member of PortMapping class.

proc_info_xcvrd_fvs_dict = dict(proc_info_xcvrd_fvs)
media_notify_rqd = proc_info_xcvrd_fvs_dict.get('MEDIA_NOTIFY_REQUIRED', 'True')
cmis_reinit_rqd = proc_info_xcvrd_fvs_dict.get('CMIS_REINIT_REQUIRED', 'True')
if media_notify_rqd == 'False' and cmis_reinit_rqd == 'False':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that media_notify_rqd is "True" but cmis_reinit_rqd is "False" or vice versa?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - that is a possibility if either CmisManagerTask or SfpStateUpdateTask thread crashes while performing the initialization.
In such scenario, xcvrd would perform the CMIS reinit and media setting notify for all ports in the affected namespace (irrespective of the completion status of media settings and CMIS reinit of ports before the thread got killed)

@@ -1385,6 +1453,14 @@ def task_worker(self):
self.CMIS_STATE_FAILED,
self.CMIS_STATE_READY,
self.CMIS_STATE_REMOVED]:
if self.port_reinit_request_tbl[lport] is True:
self.log_info("{}: Setting port_reinit_request_tbl to True".format(lport))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo? True -> False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will correct it in the new diff.

@@ -2524,12 +2656,15 @@ def run(self):

if self.sfp_error_event.is_set():
sys.exit(SFP_SYSTEM_ERROR)
elif generate_sigabrt is True:
self.log_notice("Sending SIGABRT to xcvrd!")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we send a SIGABRT here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done for adhering to the below comment
#377 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a normal reboot/config reload flow, is it possible to run into this case:

  1. xcvrd has started and called self.init (xcvrd has initialized PROC_INFO table)
  2. swss starts after and clears PROC_INFO table
  3. xcvrd runs into while loop to select PROC_INFO table later, but it does not get the delete event

In such case, PROC_INFO table is empty, but xcvrd has already called notify_media_settings. The state is incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, starting of swss will not clear PROC_INFO table. Only if swss crashes, PROC_INFO table will be cleared.

Also, xcvrd will initialize PROC_INFO table after port config is done. The CmisManagerTask and SfpStateUpdateTask threads also check for port config done before performing their respective initialization.
Following is the function call from xcvrd.init() to ensure port config done is set prior to performing further actions.

self.wait_for_port_config_done(namespace)

@Junchao-Mellanox
Copy link
Collaborator

One question, maybe not relevant: how could CmisManagerTask re-initialize relevant logical port if a CMIS cable is replaced by another CMIS cable (plug out old one and plug in new one)?

@mihirpat1
Copy link
Contributor Author

One question, maybe not relevant: how could CmisManagerTask re-initialize relevant logical port if a CMIS cable is replaced by another CMIS cable (plug out old one and plug in new one)?

In such case, CmisManagerTask will be receiving Sfp insertion/removal event due to a change in TRANSCEIVER_INFO table. Following is the code flow for it.

or a XCVR insertion/removal in STATE_DB

@Junchao-Mellanox
Copy link
Collaborator

One question, maybe not relevant: how could CmisManagerTask re-initialize relevant logical port if a CMIS cable is replaced by another CMIS cable (plug out old one and plug in new one)?

In such case, CmisManagerTask will be receiving Sfp insertion/removal event due to a change in TRANSCEIVER_INFO table. Following is the code flow for it.

or a XCVR insertion/removal in STATE_DB

Thanks

@prgeor
Copy link
Collaborator

prgeor commented Jun 23, 2023

I just have another thought: why not fix the issue from portsorch.cpp side? Here is my proposal:
In portsorch.cpp, we shall cache all the serdes attributes. We shall compare the cached serdes attributes values with incoming values while doPortTask is called. If the value equals, no need apply the value to SAI.
Any comment? @prgeor @mihirpat1

I just have another thought: why not fix the issue from portsorch.cpp side? Here is my proposal:
In portsorch.cpp, we shall cache all the serdes attributes. We shall compare the cached serdes attributes values with incoming values while doPortTask is called. If the value equals, no need apply the value to SAI.
Any comment? @prgeor @mihirpat1

@Junchao-Mellanox its not a good idea for OA to decide when to apply the media setting because consider a case where same cable is removed and plugged in, it should re-apply same media setting (as previous values) to keep various scenarios where Xcvrd need to init the CMIS state machine during various cases like config reload, swss/syncd crash

Thanks for the reply. For cable re-plug case, I suppose xcvrd should remove the media settings from APPL DB and re-apply it explicitly. The current solution looks a litter bit complicated to me.

@Junchao-Mellanox agreed. @mihirpat1 ?

@mihirpat1
Copy link
Contributor Author

I just have another thought: why not fix the issue from portsorch.cpp side? Here is my proposal:
In portsorch.cpp, we shall cache all the serdes attributes. We shall compare the cached serdes attributes values with incoming values while doPortTask is called. If the value equals, no need apply the value to SAI.
Any comment? @prgeor @mihirpat1

I just have another thought: why not fix the issue from portsorch.cpp side? Here is my proposal:
In portsorch.cpp, we shall cache all the serdes attributes. We shall compare the cached serdes attributes values with incoming values while doPortTask is called. If the value equals, no need apply the value to SAI.
Any comment? @prgeor @mihirpat1

@Junchao-Mellanox its not a good idea for OA to decide when to apply the media setting because consider a case where same cable is removed and plugged in, it should re-apply same media setting (as previous values) to keep various scenarios where Xcvrd need to init the CMIS state machine during various cases like config reload, swss/syncd crash

Thanks for the reply. For cable re-plug case, I suppose xcvrd should remove the media settings from APPL DB and re-apply it explicitly. The current solution looks a litter bit complicated to me.

@Junchao-Mellanox agreed. @mihirpat1 ?

Agreed

@mihirpat1
Copy link
Contributor Author

Closing the current PR since planning to open a new PR once the changes inline to the HLD sonic-net/SONiC#1432 are completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add intelligence to xcvrd to understand process restart and config reload
5 participants