Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint is not reverted when role aborts with log_error() #154

Closed
tyll opened this issue Feb 21, 2020 · 7 comments · Fixed by #193
Closed

Checkpoint is not reverted when role aborts with log_error() #154

tyll opened this issue Feb 21, 2020 · 7 comments · Fixed by #193
Assignees
Labels
blocker Issue to fix for next release bug

Comments

@tyll
Copy link
Member

tyll commented Feb 21, 2020

log_error() does not only log but also exists the module. In this case, the role does not yet revert the checkpoint. This needs to be changed.

@tyll tyll added bug blocker Issue to fix for next release labels Feb 21, 2020
@pcahyna
Copy link
Member

pcahyna commented Feb 22, 2020

does it mean that #119 introduced a regression? Or did it just expose a bug present before?

@pcahyna
Copy link
Member

pcahyna commented Feb 22, 2020

Instead of fixing the module so that it always reverts the checkpoint, wouldn't it be better to let NM itself revert the checkpoint (after a timeout)? There can always be unforeseen circumstances where the module would abort, so letting NM revert the checkpoint would be more robust.

@tyll
Copy link
Member Author

tyll commented Feb 24, 2020

does it mean that #119 introduced a regression? Or did it just expose a bug present before?

Previously the role did not use checkpoints and the new implementation is faulty.

Instead of fixing the module so that it always reverts the checkpoint, wouldn't it be better to let NM itself revert the checkpoint (after a timeout)? There can always be unforeseen circumstances where the module would abort, so letting NM revert the checkpoint would be more robust.

There is also a timeout that will revert the checkpoint eventually. But in the meantime a possibly partial configuration might be active and the role will fail until the checkpoint is taken care of. Reference:
https://github.com/linux-system-roles/network/blob/master/library/network_connections.py#L1895

@pcahyna
Copy link
Member

pcahyna commented Apr 15, 2020

Symptom: when the role fails (for whatever reason), subsequent invocation fails with a message like

msg: 'fatal error: failure to checkpoint_create checkpoint: nm-manager-error-quark: device ''lo'' is already included in checkpoint /org/freedesktop/NetworkManager/Checkpoint/268 (12): {''error'': "nm-manager-error-quark: device ''lo'' is already included in checkpoint /org/freedesktop/NetworkManager/Checkpoint/268 (12)", ''success'': None}'

Workaround: wait 90 seconds for each profile one specified when the role failed originally.
(from #190)

@tyll
Copy link
Member Author

tyll commented Apr 16, 2020

@9minds thank you for closing #190. Could you maybe share how you triggered the failure? It might help to add a test case.

@tyll
Copy link
Member Author

tyll commented Apr 16, 2020

Related bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1816140

@tyll
Copy link
Member Author

tyll commented Apr 16, 2020

Workaround: wait 90 seconds for each profile one specified when the role failed originally.
(from #190)

Another workaround:

  1. Get the name/ID of the checkpoint:

gdbus introspect --system --dest org.freedesktop.NetworkManager --object-path /org/freedesktop/NetworkManager/Checkpoint --recurse | awk '/^ *node /{print $2}'

  1. Revert it:

gdbus call --system --dest org.freedesktop.NetworkManager --object-path /org/freedesktop/NetworkManager --method org.freedesktop.NetworkManager.CheckpointRollback /org/freedesktop/NetworkManager/Checkpoint/66

@tyll tyll self-assigned this Apr 16, 2020
@tyll tyll closed this as completed in #193 Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker Issue to fix for next release bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants