New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync: Delete the chassis records during deleteNode from sbdb #1113
Conversation
@abhat @dcbw this will not work since when the deleteNode event handler gets called in the master there is no guarantee that the onvkube-node Pod has been deleted on that node. I initially had this code and was planning to submit the PR. However, during the testing I found that if the deleteNode() gets called first before ovn-controller.log-20200304.gz:2020-03-03T17:19:10.624Z|00012|chassis|WARN|Could not find Chassis : stored (11c0eead-da11-402c-9973-1c37f62bb4f6) ovs (11c0eead-da11-402c-9973-1c37f62bb4f6) |
Understood, but then when the next time syncNodes() is called, it should still have the chassis in the |
@abhat syncNodes() runs only once, when the ovnkube-master daemon starts. Thereafter it never runs. So, we cannot depend on ovnkube-master to restart to delete the chassis, right? BTW, your code in syncNodes() is still valid, as a separate PR, to handle the case of deleting the |
@girishmg adding some more context here...we need to handle the case when the cluster is scaled down. Therefore we need a goroutine that continues to check to remove the stale entry. If syncNodes only runs once we either need to make it run more often or use a different syncChassis thread or something. |
I see what you are saying. I didn't realize syncNodes was only called to process existing nodes during start up. 👍 to @trozet's comments. Alternatively, if we can ensure that the ovn-controller can check for some flag we can set before creating the chassis record, that will work too. |
Ok so yeah, we need a sync function since we don't know when ovn-controller will actually exit. So 3 things:
|
@dcbw yes to all 3 of them. |
e547d5f
to
26eb3c4
Compare
4ac7e75
to
ffc3b1a
Compare
In some cases, the chassis may get created in the sbdb, but the node logical switch may not exist and we can get the node deleted from the kube API. This commit handles cleaning the chassis for such a corner case.
Currently this sync mechanism is to run a sync nodes job every 5 minutes which remove stale chassis from the sbdb.
This pull request is to delete the chassis record from the sbdb during sync when the nodes get deleted from the ovnkube-master's perspective.
Signed-off by: Aniket Bhat anbhat@redhat.com