Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local exec healthchecks #703

Merged
merged 5 commits into from Mar 23, 2020
Merged

local exec healthchecks #703

merged 5 commits into from Mar 23, 2020

Conversation

@coryschwartz
Copy link
Collaborator

coryschwartz commented Mar 18, 2020

Final fix for #660

healthchecks, with fixes for the prometheus and pushgateway added to the local:exec runner.

I tried to lay this out so it would be less repetitive to add more checks if needed in the future.

First time running (so they all need to be fixed):

↬ ./testground healthcheck --runner local:exec
Mar 18 05:28:34.409635	INFO	testground client initialized	{"addr": "localhost:8080"}
checking runner local:exec
finished checking runner local:exec
Checks:
- local-redis: ok; local-redis instance check: OK
- local-prometheus: ok; local-prometheus instance check: OK
- local-pushgateway: ok; local-pushgateway instance check: OK
Fixes:
- local-redis: ok; local-redis instance check: OK
- local-prometheus: ok; local-prometheus instance check: OK
- local-pushgateway: ok; local-pushgateway instance check: OK

Second time running, now that they have all started:

↬ ./testground healthcheck --runner local:exec
Mar 18 05:28:36.364931	INFO	testground client initialized	{"addr": "localhost:8080"}
checking runner local:exec
finished checking runner local:exec
Checks:
- local-redis: ok; local-redis instance check: OK
- local-prometheus: ok; local-prometheus instance check: OK
- local-pushgateway: ok; local-pushgateway instance check: OK
No fixes applied.

Additionally, the process context is canceled and the processes all are killed as expected when the daemon closes.

@raulk

This comment has been minimized.

Copy link
Member

raulk commented Mar 18, 2020

Going to put this in the queue -- focusing on releasing v0.3 today, and this feature was planned for v0.4. Nice work getting ahead of the curve, @coryschwartz!

@raulk raulk self-requested a review Mar 18, 2020
@Robmat05 Robmat05 added this to the Testground v0.4 milestone Mar 18, 2020
@coryschwartz coryschwartz requested a review from nonsense Mar 18, 2020
In order to run the tests with the local:exec runner, there are a few things that must be taken care of first.
1. install required test software
* redis (redis.io)
* prometheus (prometheus.io)

This comment has been minimized.

Copy link
@nonsense

nonsense Mar 19, 2020

Member

Are prometheus and prometheus-pushgateway required components of testground now? I thought that monitoring components (specially when ran with local:exec, local:docker, etc.) are mostly optional.

If prometheus crashes for some reason, what is the expected outcome of a testplan (if we assume that the testplan is correct and passing) ?

This comment has been minimized.

Copy link
@coryschwartz

coryschwartz Mar 19, 2020

Author Collaborator

I think that is a completely valid concern. I'm not particularly a fan of having these as required infrastructure, but having them opportunistically available would probably be a good thing.

What do you think about this? -- if the pushgateway is unavailable and can't start it, don't fail the healthcheck but don't give a healthy OK message either.

This way it starts automatically if it can, and doesn't prevent you from running without it.

This comment has been minimized.

Copy link
@raulk

raulk Mar 19, 2020

Member

I regard them as required. We offer a metric API that wires directly onto Prometheus. Plans using that API will just end up sending stuff to /dev/null if those components are optional, and the environment doesn’t have them. Why would there be any benefit in making them optional?

This comment has been minimized.

Copy link
@nonsense

nonsense Mar 20, 2020

Member

In general you don't always care about metrics. For example in tests. Would we have to set up prometheus as part of our unit and integration tests for modules that are instrumented?

Most previous projects I've work on have always had a way to disable metrics, so that they don't incur performance cost, when you don't need them - generally this is done for tracing, but there is no reason to not do it for metrics as well.

@coryschwartz coryschwartz mentioned this pull request Mar 19, 2020
0 of 2 tasks complete
}
// Checker failed, try to fix.
err := hcp.Fixer()
if err == nil {

This comment has been minimized.

Copy link
@nonsense

nonsense Mar 20, 2020

Member

Nitpick, but probably a good idea to switch err == nil to err != nil - make it easier to read.

Copy link
Member

nonsense left a comment

Overall LGTM, but I think we should consider metrics functionality as optional, and/or have a way to disable it (for example for the purpose of tests).

Also if we start having problems with Prometheus (not that I think this will happen), I don't think this should have any effect on the actual runs of testplans - we should just be losing the measurements.

@coryschwartz coryschwartz merged commit d295689 into master Mar 23, 2020
2 checks passed
2 checks passed
Travis CI - Branch Build Passed
Details
Travis CI - Pull Request Build Passed
Details
@coryschwartz coryschwartz deleted the feat/local-exec-healthchecks branch Mar 23, 2020
@Robmat05 Robmat05 added status/done and removed status/waiting labels Mar 23, 2020
aschmahmann added a commit that referenced this pull request Mar 24, 2020
* local exec healthchecks

* swap == nil for != nil

* fix lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.