-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus monitoring, forever loop + sleep and some refactoring #14
Conversation
This is in preparation to adding Prometheus instrumentation. Make sure that when Rebot is terminated by SIGINT or SIGTERM the history file is written to disk.
TODO: make sure rebootMany() gets the site label.
…d tests. With this change it's possible to expose the "site" field in Prometheus metrics as well.
TODO: make it testable again.
Quick feedback: please add a .travis.yml file to test the code in Travis-CI. You can copy from github.com/m-lab/pusher if your tests are complicated and need authentication or you can copy from github.com/m-lab/index2ip if your tests are simple. I have added this repo to Travis and Coveralls.io, so once you add a .travis.yml file, everything should "just work". |
Please put prometheus.go, history.go, and reboot.go (and their tests) each in their own subpackages. I suggest naming the packages healthcheck, history, and reboot. The moving of that code will cause build breakages, but those breakages will help expose the parts of the code where the separation of concerns has broken down a little, and suggest further cleanups. Once the movement and subsequent fixes are done, please let me know and I will start my next round of review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comments.
.travis.yml
Outdated
- go get github.com/mattn/goveralls | ||
|
||
script: | ||
- $GOPATH/bin/goveralls -service=travis-ci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
goveralls is only reporting the coverage for the main
package. Try $GOPATH/bin/goveralls -service=travis-ci -package=./...
instead.
and if that doesn't work, try the script pusher uses: https://github.com/m-lab/pusher/blob/master/.travis.yml#L19
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the go team fixed things!
Now you should make your script be:
- go test -covermode=count -coverprofile=profile.cov ./...
- goveralls -coverprofile=profile.cov -service=travis-ci
This works for Go 1.10 and later!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified to work in m-lab/pusher#36
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That worked. Thanks!
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small changes suggested, but this is definitely the last round of review - things look good!
Please get coverage up to 100% in each subpackage - it looks like there are just a few easy error cases and then deleting some code which became unused after refactoring.
Please move some functionality out of cheackAndReboot
and into the healthcheck
module - doing so will allow the healthcheck
module to export only one function but still have all the functionality it needs.
Please normalize your flag usage - no need to start variable names with f
, no periods in flag names, and please consider using flagx.ArgsFromEnv
, as it may simplify your life.
This looks good!
Reviewed 1 of 10 files at r3.
Reviewable status: 1 of 12 files reviewed, 11 unresolved discussions (waiting on @pboothe, @evfirerob, and @stephen-soltesz)
main.go, line 49 at r3 (raw file):
oneshot bool fDryRun bool
no need to name variables manipulated by flags with a preceding f
. You can if you want, but it's definitely optional and in some cases can impair readability.
main.go, line 119 at r3 (raw file):
} if fOneshot {
fOneshot
and oneshot
should be the same variable. Don't use one for the flag and the other for the action.
main.go, line 124 at r3 (raw file):
} func readEnv() {
Unnecessary if you use flagx.ArgsFromEnv
main.go, line 207 at r3 (raw file):
flag.BoolVar(&fOneshot, "oneshot", false, "Execute just once, do not loop.") flag.StringVar(&fListenAddr, "web.listen-address", ":9999",
please don't put .
in flag names.
main.go, line 214 at r3 (raw file):
func main() { readEnv() parseFlags()
rather than parseFlags
, maybe use flagx.ArgsFromEnv()
- https://github.com/m-lab/go/blob/master/flagx/argsfromenv.go
main.go, line 222 at r3 (raw file):
// and make sure we always write it back on exit. candidateHistory := history.Read(historyPath)
This logic is pretty complicated, and it seems like what you really want is just defer cleanup(candidateHistory)
main_test.go, line 185 at r3 (raw file):
func Test_main(t *testing.T) { restore := osx.MustSetenv("REBOT_ONESHOT", "1")
recommend defer restore()
healthcheck/prometheus.go, line 51 at r3 (raw file):
// GetOfflineNodes checks for offline nodes in the last N minutes. // It returns a Vector of samples. func GetOfflineNodes(prom promtest.PromClient, minutes int) ([]node.Node, error) {
Please make this also perform the filtering of offline sites. Then, only export GetOfflineNodes
and make all other functions internal to the healthcheck
package.
node/node.go, line 19 at r3 (raw file):
} func NewNode(name string, site string) Node {
just New
for this one. When people use it, they will type node.New
and that is pretty self-documenting and avoids stutter, as requested in https://blog.golang.org/package-names
history/history.go, line 49 at r3 (raw file):
// the nodes slice. If a candidate did not previously exist, it creates a // new one. func Upsert(candidates []node.Node, history map[string]node.History) {
Update
is a perfectly fine name for this - upsert
is a term usually restricted to more-obscure database ops.
Also, in the spirit of nitpicking, |
- Do not intercept SIGINT/SIGKILL anymore - Write history file on each run
Random:
I really like this suggestion :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 13 files reviewed, 1 unresolved discussion (waiting on @evfirerob and @stephen-soltesz)
Woohoo! |
👍 !!! |
Summary of changes:
rebot_last_rebooted
metric, exposing machine+site that were rebooted on the last run (WIP)This change is