Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition between rollback checks and Node state #474

Closed
qinqon opened this issue Mar 31, 2020 · 0 comments · Fixed by #480
Closed

Race condition between rollback checks and Node state #474

qinqon opened this issue Mar 31, 2020 · 0 comments · Fixed by #480

Comments

@qinqon
Copy link
Member

qinqon commented Mar 31, 2020

What happened:
When applying a policy that deals with primary nic and to test rollback policy was a success state but enacment was progressing and ended with failure (sice a rollback was expected) the issue is related to node counting for policy conditions, since only ready nodes are being counted and playing with primary nic can render one node temporally at NotReady, the comparation between number of nodes and not matching enactments was passing.

To fix this we have to add another probe after apply and after rollback to check that Node is at Ready state so we block there until node is ok again.

What you expected to happen:
Policy to be a Failure state after a rollback from a bad primary nic change.

How to reproduce it (as minimally and precisely as possible):
Apply a bad primary nic policy at multinic env, it has to be exercise multiple time until race appear.

Anything else we need to know?:

Environment:

  • NodeNetworkState on affected nodes (use kubectl get nodenetworkstate <node_name> -o yaml):
  • Problematic NodeNetworkConfigurationPolicy:
  • kubernetes-nmstate image (use kubectl get pods --all-namespaces -l app=kubernetes-nmstate -o jsonpath='{.items[0].spec.containers[0].image}'):
  • NetworkManager version (use nmcli --version)
  • Kubernetes version (use kubectl version):
  • OS (e.g. from /etc/os-release):
  • Others:
qinqon added a commit to qinqon/kubernetes-nmstate that referenced this issue Mar 31, 2020
Closes nmstate#474

Signed-off-by: Quique Llorente <ellorent@redhat.com>
kubevirt-bot pushed a commit that referenced this issue Mar 31, 2020
* Move probes to their own module

Signed-off-by: Quique Llorente <ellorent@redhat.com>

* Add Node Readiness probe

Closes #474

Signed-off-by: Quique Llorente <ellorent@redhat.com>
kubevirt-bot pushed a commit to kubevirt-bot/kubernetes-nmstate that referenced this issue Mar 31, 2020
Closes nmstate#474

Signed-off-by: Quique Llorente <ellorent@redhat.com>
kubevirt-bot pushed a commit that referenced this issue Apr 1, 2020
* Move probes to their own module

Signed-off-by: Quique Llorente <ellorent@redhat.com>

* Add Node Readiness probe

Closes #474

Signed-off-by: Quique Llorente <ellorent@redhat.com>

Co-authored-by: Quique Llorente <ellorent@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant