Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various improvements following real-world usage. #4

Merged
merged 5 commits into from
Feb 4, 2018
Merged

Conversation

josh-padnick
Copy link
Contributor

  • Health-checker now outputs to stdout, not stderr.
  • Health checks are now handled in parallel, not serial.
  • Additional errors that should have been caught are now explicitly logged where applicable.
  • Introduced an env var, HEALTH_CHECKER_DEBUG. When set to true, health-checker will show the full stack trace on most error messages.

@josh-padnick
Copy link
Contributor Author

Merging now so I can use in package-kafka. Feedback welcome as always.

@josh-padnick josh-padnick merged commit f909446 into master Feb 4, 2018
@@ -48,7 +48,8 @@ func runHealthChecker(cliContext *cli.Context) error {
}

opts, err := parseOptions(cliContext)
if isSimpleError(err) {
if isDebugMode() {
opts.Logger.Infof("Note: To enable debug mode, set %s to \"true\"", ENV_VAR_NAME_DEBUG_MODE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this. Why do you return err right away in debug mode? And tell the user to enable debug mode?

err := writeHttpResponse(w, resp)
if err != nil {
opts.Logger.Error("Failed to send HTTP response. Exiting.")
panic(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exit the health check entirely because of this?

err := attemptTcpConnection(port, opts)
if err != nil {
logger.Warnf("TCP connection to port %d FAILED: %s", port, err)
allPortsValid = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not safe to do in a concurrent environment! You're modifying shared state from multiple different goroutines at the same time.

Use an integer (initialized to 0) instead with an atomic increment operation on errors. If it ends up greater than 0, at least one health check failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there should be tests for all of this.

@@ -45,21 +72,23 @@ func attemptTcpConnection(port int, opts *options.Options) error {
logger := opts.Logger
logger.Infof("Attempting to connect to port %d via TCP...", port)

_, err := net.Dial("tcp", fmt.Sprintf("0.0.0.0:%d", port))
defaultTimeout := time.Second * 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be less than the total configured timeout for all the health checks?

w.Write([]byte("OK"))
}
func writeHttpResponse(w http.ResponseWriter, resp *httpResponse) error {
w.WriteHeader(resp.StatusCode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this return an error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants