New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
local exec healthchecks #703
Conversation
Going to put this in the queue -- focusing on releasing v0.3 today, and this feature was planned for v0.4. Nice work getting ahead of the curve, @coryschwartz! |
In order to run the tests with the local:exec runner, there are a few things that must be taken care of first. | ||
1. install required test software | ||
* redis (redis.io) | ||
* prometheus (prometheus.io) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are prometheus
and prometheus-pushgateway
required components of testground
now? I thought that monitoring components (specially when ran with local:exec, local:docker, etc.) are mostly optional.
If prometheus
crashes for some reason, what is the expected outcome of a testplan (if we assume that the testplan is correct and passing) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is a completely valid concern. I'm not particularly a fan of having these as required infrastructure, but having them opportunistically available would probably be a good thing.
What do you think about this? -- if the pushgateway is unavailable and can't start it, don't fail the healthcheck but don't give a healthy OK message either.
This way it starts automatically if it can, and doesn't prevent you from running without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I regard them as required. We offer a metric API that wires directly onto Prometheus. Plans using that API will just end up sending stuff to /dev/null if those components are optional, and the environment doesn’t have them. Why would there be any benefit in making them optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general you don't always care about metrics. For example in tests. Would we have to set up prometheus as part of our unit and integration tests for modules that are instrumented?
Most previous projects I've work on have always had a way to disable metrics, so that they don't incur performance cost, when you don't need them - generally this is done for tracing, but there is no reason to not do it for metrics as well.
pkg/runner/local_exec.go
Outdated
} | ||
// Checker failed, try to fix. | ||
err := hcp.Fixer() | ||
if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, but probably a good idea to switch err == nil
to err != nil
- make it easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, but I think we should consider metrics functionality as optional
, and/or have a way to disable it (for example for the purpose of tests).
Also if we start having problems with Prometheus (not that I think this will happen), I don't think this should have any effect on the actual runs of testplans - we should just be losing the measurements.
* local exec healthchecks * swap == nil for != nil * fix lint
Final fix for #660
healthchecks, with fixes for the prometheus and pushgateway added to the local:exec runner.
I tried to lay this out so it would be less repetitive to add more checks if needed in the future.
First time running (so they all need to be fixed):
Second time running, now that they have all started:
Additionally, the process context is canceled and the processes all are killed as expected when the daemon closes.