Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[config] Adding sanity checks for config reload #1664

Merged
merged 4 commits into from
Jun 21, 2021

Conversation

dgsudharsan
Copy link
Collaborator

DO NOT MERGE UNTIL sonic-net/sonic-buildimage#7846 is merged.

What I did

Fixed config reload to add some system sanity checks
1) To check if the system is in running state
2) Check if the services which are grouped under delayed target up
3) Check if swss is running for at least 120 seconds
To force config reload and to avoid these checks an extra option -f/--force is added

How I did it

Added the above mentioned checks in config reload flow

How to verify it

Perform config reload after reboot, between system up and delayed services up, immediately after another config reload and running two config reload in parallel. In all the cases the config reload should not execute and throw appropriate error messages.

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

When system is not up
sonic:~#config reload
System is not up

When delayed services are not up
sonic:~#config reload
Services are not up

When swss is not up for at least 120 seconds
sonic:~# config reload -y
SWSS is not ready

@dgsudharsan
Copy link
Collaborator Author

This PR should fix sonic-net/sonic-buildimage#7132

config/main.py Outdated Show resolved Hide resolved
jleveque
jleveque previously approved these changes Jun 14, 2021
@liat-grozovik
Copy link
Collaborator

/azp run

@jleveque jleveque changed the title Adding sanity checks for config reload [config] Adding sanity checks for config reload Jun 21, 2021
@jleveque jleveque merged commit 9041ba0 into sonic-net:master Jun 21, 2021

def _system_running():
out = clicommon.run_command("sudo systemctl is-system-running", return_cmd=True)
return out.strip() == "running"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While verifying PR sonic-net/sonic-buildimage#8003. I hit this condition where I was I was not able to do "config reload". Even after system is up for a while the systemctl was showing running state as degraded

admin@str-s6000-acs-9:~$ sudo systemctl is-system-running
degraded
admin@str-s6000-acs-9:~$ sudo config reload -y
System is not up. Retry later or use -f to avoid system checks
admin@str-s6000-acs-9:~$ docker ps
CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS              PORTS               NAMES
8baf489477e6        docker-snmp:latest                   "/usr/local/bin/supe…"   18 minutes ago      Up 2 minutes                            snmp
ad835c8b42ba        docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe…"   18 minutes ago      Up 2 minutes                            mgmt-framework
930f188f20a7        docker-router-advertiser:latest      "/usr/bin/docker-ini…"   20 minutes ago      Up 5 minutes                            radv
15fd0337e6ea        docker-dhcp-relay:latest             "/usr/bin/docker_ini…"   20 minutes ago      Up 5 minutes                            dhcp_relay
c37a6752f57f        docker-lldp:latest                   "/usr/bin/docker-lld…"   20 minutes ago      Up 5 minutes                            lldp
79cb8660e7d2        docker-syncd-brcm:latest             "/usr/local/bin/supe…"   20 minutes ago      Up 5 minutes                            syncd
77e8ca15b220        docker-teamd:latest                  "/usr/local/bin/supe…"   20 minutes ago      Up 5 minutes                            teamd
ce63284a0f0a        docker-orchagent:latest              "/usr/bin/docker-ini…"   20 minutes ago      Up 5 minutes                            swss
277e43234314        docker-fpm-frr:latest                "/usr/bin/docker_ini…"   20 minutes ago      Up 5 minutes                            bgp
d9c63106202a        docker-platform-monitor:latest       "/usr/bin/docker_ini…"   21 minutes ago      Up 5 minutes                            pmon
34129d689ebd        docker-database:latest               "/usr/local/bin/dock…"   21 minutes ago      Up 6 minutes                            database

I have seen cases where system remains in degraded state as may be one of the systemd service is not started. @dgsudharsan do share your observations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judyjoseph I believe we should try to fix the service that is in the degraded state. The whole point of introducing these checks were to avoid issues around config reload which happen when the system was degraded. Until the faulty services are fixed we can expect config reload without force not to work. However to give a handle we introduced '-f' option which lets user to take own risk and perform config reload when the system may not be in a stable state.

raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-utilities that referenced this pull request Aug 10, 2021
Fixed config reload to add some system sanity checks

1) To check if the system is in running state
2) Check if the services which are grouped under delayed target up
3) Check if swss is running for at least 120 seconds
To force config reload and to avoid these checks an extra option -f/--force is added
qiluo-msft pushed a commit that referenced this pull request Nov 23, 2021
Updating command reference reflecting the latest config reload behavior

#### What I did
Update command reference for the config reload behavior. These changes were introduced by #1664
abdosi pushed a commit that referenced this pull request Dec 8, 2021
Updating command reference reflecting the latest config reload behavior

#### What I did
Update command reference for the config reload behavior. These changes were introduced by #1664
@dgsudharsan dgsudharsan deleted the config_reload_fix branch March 9, 2023 02:05
malletvapid23 added a commit to malletvapid23/Sonic-Utility that referenced this pull request Aug 3, 2023
Updating command reference reflecting the latest config reload behavior

#### What I did
Update command reference for the config reload behavior. These changes were introduced by sonic-net/sonic-utilities#1664
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants