-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(agw): all AGW containeres have health checks #13918
Conversation
Thanks for opening a PR! 💯
Howto
More infoPlease take a moment to read through the Magma project's
If this is your first Magma PR, also consider reading
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have this for all the components, should we abstract this in x-generic-service:
? At least the timeout and retries part, such what we only configure the target for the components?
@@ -178,6 +182,10 @@ services: | |||
container_name: sctpd | |||
ulimits: | |||
core: -1 | |||
healthcheck: | |||
test: ["CMD", "bash", "-c", "[ -S /tmp/sctpd_downstream.sock ]"] | |||
timeout: "5s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a different timeout as for everything else configured here. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It failed with the 4s timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the test setup that made it fail? On my machine it worked with a timeout of one second. To be fair I only started sctpd and not the other containers, but then again a simple bash test command should usually take way less than a second I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It worked for me as well with 4s (I tested starting all containers).
afd16f3
to
e3f7c83
Compare
Changed. |
command: /usr/bin/env python3 -m magma.ctraced.main | ||
|
||
sctpd: | ||
<<: *ltecservice | ||
container_name: sctpd | ||
ulimits: | ||
core: -1 | ||
healthcheck: | ||
test: ["CMD", "bash", "-c", "[ -S /tmp/sctpd_downstream.sock ]"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: We could also write test: "test -S /tmp/sctpd_downstream.sock"
. Personally I find that more readable, especially the combination of square brackets for Yaml lists and for the bash test
command are a bit hard to parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And as a compromise if we want to stay consistent with the ["CMD", ...]
syntax used in this file:
test: ["CMD", "test", "-S", "/tmp/sctpd_downstream.sock"]
disclaimer: This one I didn't test, but I think it should work.
@@ -25,6 +25,9 @@ x-generic-service: &service | |||
logging: *logging_anchor | |||
restart: always | |||
network_mode: host | |||
healthcheck: | |||
timeout: "4s" | |||
retries: 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we should deduplicate this. Currently deduplicating is possible because the parameters are not fine-tuned to the individual checks. Once we start to fine-tune the timeout (like further down in this PR with sctpd) then we actually have two places where we define the timeout: One default and one overwrite. Defaults + overwrites are more complex than just having one definite place to look. Also, by deduplicating, we now split the healthcheck
blocks into two: This part that defines the timeouts and retries, and the part in the respective services that defines the tests. Without de-duplicating you can easily see the whole healthcheck definition for a service at a glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The classic discussion on
code abstraction vs duplication
. I am sure there are heaps of internet resources arguing for both sides. Once we start
: But we do not at the moment. Would it not be sensible to focus on the current state, on the here and now, instead of anticipating what we might or might not want to do in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we do not at the moment
Yes we do, in this very PR, with sctpd. That's what I wrote in paranthesis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it works if we set all timeouts to 4s, so maybe there is no need for fine-tuning at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it's possible to use the same values everywhere, I wouldn't say this is a good candidate for deduplication. I think de-duplicating those values would be important if those values need to be the same for all checks, and if we want to prevent ourselves from accidentally modifying the setting for only one service. I don't see the need for that here. So I think this would basically save us some lines of code, and I don't think that justifies having to look up the healthcheck definitions in two different places.
…gma#13918 Signed-off-by: Fritz Lehnert <13189449+Neudrino@users.noreply.github.com>
After merge of #13852 a rebase with appropriate changes to align would probably be good. |
…gma#13918 Signed-off-by: Fritz Lehnert <13189449+Neudrino@users.noreply.github.com>
…gma#13918 Signed-off-by: Fritz Lehnert <13189449+Neudrino@users.noreply.github.com>
e3f7c83
to
9059b9a
Compare
…gma#13918 Signed-off-by: Fritz Lehnert <13189449+Neudrino@users.noreply.github.com>
Signed-off-by: Marco Pfirrmann <marco.pfirrmann@tngtech.com>
9059b9a
to
4565ec9
Compare
…gma#13918 Signed-off-by: Fritz Lehnert <13189449+Neudrino@users.noreply.github.com>
… from magma#13918" This reverts commit 063a426.
Signed-off-by: Marco Pfirrmann <marco.pfirrmann@tngtech.com>
Signed-off-by: Marco Pfirrmann marco.pfirrmann@tngtech.com
Summary
This PR is embedded into the larger issue #13684.
magmad
andsctpd
containers are addedinterval
, is reduced from its default 30s to 4s.Test Plan
Follow the README to build the containers and check their status with
docker ps
.Additional Information