Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local exec healthchecks #703

Merged
merged 5 commits into from Mar 23, 2020
Merged

local exec healthchecks #703

merged 5 commits into from Mar 23, 2020

Conversation

coryschwartz
Copy link

Final fix for #660

healthchecks, with fixes for the prometheus and pushgateway added to the local:exec runner.

I tried to lay this out so it would be less repetitive to add more checks if needed in the future.

First time running (so they all need to be fixed):

↬ ./testground healthcheck --runner local:exec
Mar 18 05:28:34.409635	INFO	testground client initialized	{"addr": "localhost:8080"}
checking runner local:exec
finished checking runner local:exec
Checks:
- local-redis: ok; local-redis instance check: OK
- local-prometheus: ok; local-prometheus instance check: OK
- local-pushgateway: ok; local-pushgateway instance check: OK
Fixes:
- local-redis: ok; local-redis instance check: OK
- local-prometheus: ok; local-prometheus instance check: OK
- local-pushgateway: ok; local-pushgateway instance check: OK

Second time running, now that they have all started:

↬ ./testground healthcheck --runner local:exec
Mar 18 05:28:36.364931	INFO	testground client initialized	{"addr": "localhost:8080"}
checking runner local:exec
finished checking runner local:exec
Checks:
- local-redis: ok; local-redis instance check: OK
- local-prometheus: ok; local-prometheus instance check: OK
- local-pushgateway: ok; local-pushgateway instance check: OK
No fixes applied.

Additionally, the process context is canceled and the processes all are killed as expected when the daemon closes.

@raulk
Copy link
Contributor

raulk commented Mar 18, 2020

Going to put this in the queue -- focusing on releasing v0.3 today, and this feature was planned for v0.4. Nice work getting ahead of the curve, @coryschwartz!

In order to run the tests with the local:exec runner, there are a few things that must be taken care of first.
1. install required test software
* redis (redis.io)
* prometheus (prometheus.io)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are prometheus and prometheus-pushgateway required components of testground now? I thought that monitoring components (specially when ran with local:exec, local:docker, etc.) are mostly optional.

If prometheus crashes for some reason, what is the expected outcome of a testplan (if we assume that the testplan is correct and passing) ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a completely valid concern. I'm not particularly a fan of having these as required infrastructure, but having them opportunistically available would probably be a good thing.

What do you think about this? -- if the pushgateway is unavailable and can't start it, don't fail the healthcheck but don't give a healthy OK message either.

This way it starts automatically if it can, and doesn't prevent you from running without it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I regard them as required. We offer a metric API that wires directly onto Prometheus. Plans using that API will just end up sending stuff to /dev/null if those components are optional, and the environment doesn’t have them. Why would there be any benefit in making them optional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general you don't always care about metrics. For example in tests. Would we have to set up prometheus as part of our unit and integration tests for modules that are instrumented?

Most previous projects I've work on have always had a way to disable metrics, so that they don't incur performance cost, when you don't need them - generally this is done for tracing, but there is no reason to not do it for metrics as well.

}
// Checker failed, try to fix.
err := hcp.Fixer()
if err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, but probably a good idea to switch err == nil to err != nil - make it easier to read.

Copy link
Member

@nonsense nonsense left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but I think we should consider metrics functionality as optional, and/or have a way to disable it (for example for the purpose of tests).

Also if we start having problems with Prometheus (not that I think this will happen), I don't think this should have any effect on the actual runs of testplans - we should just be losing the measurements.

@coryschwartz coryschwartz merged commit d295689 into master Mar 23, 2020
@coryschwartz coryschwartz deleted the feat/local-exec-healthchecks branch March 23, 2020 04:15
aschmahmann pushed a commit that referenced this pull request Mar 24, 2020
* local exec healthchecks

* swap == nil for != nil

* fix lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants