Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading startup-config shows OK despite error detected #429

Closed
troglobit opened this issue May 8, 2024 · 1 comment · Fixed by #431
Closed

Loading startup-config shows OK despite error detected #429

troglobit opened this issue May 8, 2024 · 1 comment · Fixed by #431
Assignees
Labels
bug Something isn't working
Milestone

Comments

@troglobit
Copy link
Contributor

troglobit commented May 8, 2024

While debugging issue #428 the root cause turned out to be a system error which was not reported back to the console while loading startup-config.

The expected output:

[ OK ] Bootstrapping YANG datastore
[ OK ] Starting Configuration daemon
[ OK ] Starting DHCP/DNS proxy
[ OK ] Starting LED daemon
[ OK ] Verifying SSH host keys
[ OK ] Starting System watchdog daemon
[FAIL] Loading startup-config
[ OK ] Loading failure-config
[ OK ] Update DNS configuration
@troglobit
Copy link
Contributor Author

troglobit commented May 8, 2024

Initial analysis shows that sysrepocfg, from the sysrepo project does not return a non-zero exit code when callbacks return a status != SR_ERR_OK in the SR_EV_DONE event.

The documentation states:

SR_EV_DONE: Occurs just after the changes have been successfully committed to the datastore, the subscriber can apply the changes now, but it cannot deny the changes in this phase anymore (any returned errors are just logged and ignored).

We understand this, and agree that it's too late to deny any changes, yet we firmly believe that calling sysrepocfg should return a non-zero exit code if any callback returns non-zero. Because when the rubber hits the road, e.g., calling iproute2 commands, or in this case actually calling adduser, we can run into all sorts of errors/bugs/issues that we cannot prepare for in SR_EV_CHANGE.

A patch to sysrepo has been added to the v2.2.105-kkit branch of sysrepo in kernelkit/sysrepo@49b5580. This will be included in the next patch release of Infix, v24.04.2.

@troglobit troglobit self-assigned this May 8, 2024
@troglobit troglobit added the bug Something isn't working label May 8, 2024
@jovatn jovatn changed the title Loading startup-config shows succeess/OK despite error detected Loading startup-config shows success/OK despite error detected May 8, 2024
@troglobit troglobit added this to the Infix v24.05 milestone May 8, 2024
@troglobit troglobit changed the title Loading startup-config shows success/OK despite error detected Loading startup-config shows OK despite error detected May 8, 2024
troglobit added a commit that referenced this issue May 9, 2024
This local patch of sysrepo adds support for detecting non-zero return
code from sysrepo callbacks during SR_EV_DONE, which upstream sysrepo
currently, as of v2.2.105, discards.

With this change failure to load startup-config results in the system
detecting the failure applying failure-config instead.

Fixes #429

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
troglobit added a commit that referenced this issue May 9, 2024
With the sysrepo patch from the previous commit, we can now properly
detect if a callback failed to apply its changes in SR_EV_DONE.  When
this occurs the system may be in an undefined state, so we must try
to recover it before loading failure-config.

This patch tries to perform a factory-default RPC, which is an Infix
specifc RPC that does "copy factory-config running-config".  We give
sysrepocfg some time to clean up any stale SHM connections before we
do a hard scratch of the db state and restart sysrepo-plugind.

Fixes #429

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
@troglobit troglobit linked a pull request May 9, 2024 that will close this issue
13 tasks
troglobit added a commit that referenced this issue May 13, 2024
This local patch of sysrepo adds support for detecting non-zero return
code from sysrepo callbacks during SR_EV_DONE, which upstream sysrepo
currently, as of v2.2.105, discards.

With this change failure to load startup-config results in the system
detecting the failure applying failure-config instead.

Fixes #429

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
troglobit added a commit that referenced this issue May 13, 2024
With the sysrepo patch from the previous commit, we can now properly
detect if a callback failed to apply its changes in SR_EV_DONE.  When
this occurs the system may be in an undefined state, so we must try
to recover it before loading failure-config.

This patch tries to perform a factory-default RPC, which is an Infix
specifc RPC that does "copy factory-config running-config".  We give
sysrepocfg some time to clean up any stale SHM connections before we
do a hard scratch of the db state and restart sysrepo-plugind.

Fixes #429

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
@wkz wkz closed this as completed in 9df97d9 May 13, 2024
@wkz wkz closed this as completed in #431 May 13, 2024
wkz pushed a commit that referenced this issue May 13, 2024
With the sysrepo patch from the previous commit, we can now properly
detect if a callback failed to apply its changes in SR_EV_DONE.  When
this occurs the system may be in an undefined state, so we must try
to recover it before loading failure-config.

This patch tries to perform a factory-default RPC, which is an Infix
specifc RPC that does "copy factory-config running-config".  We give
sysrepocfg some time to clean up any stale SHM connections before we
do a hard scratch of the db state and restart sysrepo-plugind.

Fixes #429

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant