Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mellanox] make sure shared storage with syncd is cleared on restarts #14547

Merged
merged 1 commit into from
Jun 28, 2023

Conversation

stepanblyschak
Copy link
Collaborator

@stepanblyschak stepanblyschak commented Apr 6, 2023

Why I did it

Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways.
If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past.

NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker

How I did it

Implemented new service to clean the shared storage.

How to verify it

Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211

Description for the changelog

Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
@liat-grozovik
Copy link
Collaborator

@stepanblyschak does it have any impact of upgrades between different version on the same branch and between branches?

@liat-grozovik
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liat-grozovik liat-grozovik changed the title [nvidia] make sure shared storage with syncd is cleared on restarts [Mellanox] make sure shared storage with syncd is cleared on restarts Apr 13, 2023
@liat-grozovik
Copy link
Collaborator

@lguohan would you like to review or assign someone to review? LGTM

@lguohan
Copy link
Collaborator

lguohan commented May 10, 2023

frankly, i do not like the idea of having additional channel for communication between dockers. there are lots of challenges of dependency between dockers and very hard to debug. I would really like to understand why this is needed, what is the data shared between syncd docker and other dockers.

@lguohan
Copy link
Collaborator

lguohan commented May 10, 2023

@vaibhavhd and @yxieca for more comments.

@stepanblyschak
Copy link
Collaborator Author

@lguohan This is not new communication channel. It exists for quite some time and used by pmon, what-just-happened to communicate with SDK directly as not all features are now supported by SAI. This PR is not about adding new communication channel but making the shared location is cleared on restarts.

@yxieca
Copy link
Contributor

yxieca commented May 11, 2023

This change has an risk: since the shared folder is used by syncd and other dockers:

  1. all services uses the share folder should start after the service creates / removes the share folder.
  2. all service uses the shared folder have to stop access the shared folder in order for the service to be able to remove the shared folder.

@stepanblyschak
Copy link
Collaborator Author

@yxieca This is guernteed by systemd. What is the risk?

@stepanblyschak
Copy link
Collaborator Author

@yxieca @liat-grozovik @lguohan Clarified the purpose in PR description.

@yxieca
Copy link
Contributor

yxieca commented May 20, 2023

@yxieca This is guernteed by systemd. What is the risk?

I am not sure that is fully guaranteed. e.g. if syncd is stopping while pmon is accessing the shared memory. Then the syncd won't be able to remove the shared memory folder. As result, it could see stale information after next start. Or the service stop/start could fail?

@stepanblyschak
Copy link
Collaborator Author

stepanblyschak commented May 22, 2023

@yxieca Pmon is instructed to start After=syncd -

.

That means, on shutdown the reverse order is used. -

When two units with an ordering dependency between them are shut down, the inverse of the start-up order is applied. I.e. if a unit is configured with After= on another unit, the former is stopped before the latter if both are shut down

If syncd service is about to stop (no matter if gracefully by request or abnormally) systemd first stops pmon service.

@liat-grozovik
Copy link
Collaborator

@lguohan could you please help to review/approve?

@lguohan
Copy link
Collaborator

lguohan commented Jun 21, 2023

@yxieca , i see this note.

NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker

looks like it is only used between mellanox syncd docker and pmon docker, and they are working on removing that dependency, so that will be removed in the future. is this ok?

@yxieca
Copy link
Contributor

yxieca commented Jun 22, 2023

@yxieca , i see this note.

NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker

looks like it is only used between mellanox syncd docker and pmon docker, and they are working on removing that dependency, so that will be removed in the future. is this ok?

I believe @stepanblyschak is checking warm reboot scenario, we can merge on his signal.

@stepanblyschak
Copy link
Collaborator Author

@yxieca We are good to merge it.

@yxieca yxieca merged commit 1ebdcda into sonic-net:master Jun 28, 2023
1 check passed
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Aug 6, 2023
…onic-net#14547)

Why I did it
Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways.
If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past.

NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker

How I did it
Implemented new service to clean the shared storage.

How to verify it
Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202305: #16046

StormLiangMS pushed a commit that referenced this pull request Aug 7, 2023
…14547) (#16046)

Why I did it
Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways.
If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past.

NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker

How I did it
Implemented new service to clean the shared storage.

How to verify it
Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Co-authored-by: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
sonic-otn pushed a commit to sonic-otn/sonic-buildimage that referenced this pull request Sep 20, 2023
…onic-net#14547)

Why I did it
Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways.
If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past.

NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker

How I did it
Implemented new service to clean the shared storage.

How to verify it
Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants