-
Notifications
You must be signed in to change notification settings - Fork 1k
FRR configuration retries #1463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If the reload fails, most likely due to the reloader container not ready yet, we wait and try to reload the configuration again. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
If we detect a failure in the configuration processing, we ask the debouncer to retry using the latest configuration. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
| if newCfg.useOld && config == nil { | ||
| continue // just ignore the event | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I understand this :/ why sending something to the chan in the first place only to do nothing with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should never happen, because it means that we received a "reload old config" event but the old config is nil (meaning, we already tried a reload that failed).
Nonethless, I'd prefer not having the speaker crash but ignore the event in such corner case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I misread it as newCfg.config == nil 😅
| } | ||
| updated = <-result | ||
| if updated.Hostname != "1" { | ||
| t.Fatal("Config was not updated") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be "Config was updated"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, because what writes to result is the action to update the config. So in this case, if we don't get the value, it means that the retry mechanism did not trigger and that the update function was not called
| err := body(config) | ||
| if err != nil { | ||
| timeOut = time.After(failureRetryInterval) | ||
| timerSet = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean that after a fail the next config update won't happen until the failureretryinterval finishes? shouldn't we let new updates override this timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct, and I thought that it makes sense to go slow after a failure happened (think about the case of the slow container startup, after a failure we want to give it time to recover). 5 seconds is still a bearable timeout for applying configuration changes.
|
Tested also manually by putting a sleep in the reloader script first, and in the frr entry point after, and the speaker recovered. |
oribon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Adding a note about metallb#1463 Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
Adding a note about #1463 Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
Adding a note about metallb#1463 Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
Adding a note about metallb/metallb#1463 Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
Making the interaction with FRR more robust introducing retries after failures in both the call of the reloader (if the pid is not there yet) and if we find that the interaction between the reloader and the container fails.
Fixes #1462