Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a liveness probe to the frr container #1732

Merged
merged 4 commits into from Dec 8, 2022

Conversation

fedepaol
Copy link
Member

@fedepaol fedepaol commented Dec 2, 2022

In order to make sure that frr is up and running, we add a liveness probe to check that the daemons are available.

Because of the proliferation of the frr related tools that are starting
to pullute the root folder, we move all the frr addons under frr-tools.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
exec:
command:
- /etc/frr_liveness/liveness.sh
initialDelaySeconds: 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a bit short ( as a guess ), is there a reason you went with 5 seconds?
also a startupprobe might be more fitting, wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it makes sense. Will change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

startupProbe:
exec:
command: ["/etc/frr_liveness/liveness.sh"]
failureThreshold: 30
Copy link
Contributor

@liornoy liornoy Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it 30 on purpose here? or should it be 3? i.e. like in the livenessProbe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's on purpouse

initialDelaySeconds: 5
periodSeconds: 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be:
periodSeconds: 5
failureThreshold: 3
like in different places in the code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's the old version, good catch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

There are corner case, after node reboots, where the speaker is still
running but the frr processes are not. Here we add a liveness probe that
verifies all the daemons are running so the container gets restarted if
one of them died.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
@liornoy
Copy link
Contributor

liornoy commented Dec 6, 2022

LGTM, also tested on my Kind cluster and works great.

Copy link
Member

@oribon oribon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants