New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a liveness probe to the frr container #1732
Conversation
Because of the proliferation of the frr related tools that are starting to pullute the root folder, we move all the frr addons under frr-tools. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
config/manifests/metallb-frr.yaml
Outdated
exec: | ||
command: | ||
- /etc/frr_liveness/liveness.sh | ||
initialDelaySeconds: 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems a bit short ( as a guess ), is there a reason you went with 5 seconds?
also a startupprobe might be more fitting, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it makes sense. Will change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
startupProbe: | ||
exec: | ||
command: ["/etc/frr_liveness/liveness.sh"] | ||
failureThreshold: 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it 30 on purpose here? or should it be 3? i.e. like in the livenessProbe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's on purpouse
initialDelaySeconds: 5 | ||
periodSeconds: 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be:
periodSeconds: 5
failureThreshold: 3
like in different places in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's the old version, good catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There are corner case, after node reboots, where the speaker is still running but the frr processes are not. Here we add a liveness probe that verifies all the daemons are running so the container gets restarted if one of them died. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
LGTM, also tested on my Kind cluster and works great. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
In order to make sure that frr is up and running, we add a liveness probe to check that the daemons are available.