Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Rework how gratuitous advertisements (ARP/NDP) work
With our current approach, we only do gratuitous advertisements on the first SetBalancer() call, and we don't resend any gratuitous advertisements on the next calls to reduce the amount of "spam" in the network. This was working pretty well when we were using K8S API to do all the decisions. Now that we are also using MemberList status, our decisions are based on eventually consistent information, and there are at least 2 cases where we need to resend gratuitous advertisements even if the information that we have makes us think there were no changes in ownership: 1) Split brain with no ownership changes: 3 nodes A B C, A owns the LoadBalancer IP I, cluster is clean. Now for some reason C can't talk to A and B anymore, and our algorithm in ShouldAnnounce() continues to pick A as the owner of I. As there were no changes for A, A doesn't send any gratuitous advertisement. As C thinks it is alone, it thinks it owns I and sends gratuitous advertisements. Some seconds later C rejoins A & B, C stops sending gratuitous advertisements, but A continues to be the owner and doesn't send any gratuitous advertisement. Depending on the switches' inner working, traffic might continue to go to C for a long time. 2) Race condition on ForceSync: 3 nodes A B C, A owns the LoadBalancer IP I, cluster is clean. A becomes really slow (cpu limits or ...) and memberlist on B and C decides that A is not part of the memberlist cluster anymore. B and C each start a ForceSync(), one of B or C becomes the owner of I and starts gratuitous advertisements for I. A starts to respond again to memberlist and rejoins the cluster, while doing its first ForceSync(). A thinks it was always the owner of I and doesn't send any gratuitous advertisement. The idea of this patch is to send gratuitous advertissements for 5 seconds from the last SetBalancer() call, instead of the last time we think we became the owner. To ensure there is only 1 sender for each IP, we use only one goroutine for all gratuitous advertisement calls. As gratuitous() was using Lock() (ie exclusive lock), we were sending at most 1 gratuitous advertisement at a time, so we know that this is fine performance wise, but it might be burstier than before. This fixes #584 Signed-off-by: Etienne Champetier <echampetier@anevia.com>
- Loading branch information