Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: alert on "bad" logs #32311

Open
bradfitz opened this issue May 29, 2019 · 3 comments

Comments

@bradfitz
Copy link
Member

commented May 29, 2019

We should get alerts if we see new/many "bad" log messages from our various services.

For some definition of new, many, and bad.

Maybe bad could mean it has "error" in it. Or a dozen other phrases.

(forking from https://go-review.googlesource.com/c/build/+/179419/1/cmd/coordinator/gce.go#b193 )

/cc @bcmills @dmitshur

@gopherbot gopherbot added this to the Unreleased milestone May 29, 2019
@gopherbot gopherbot added the Builders label May 29, 2019
@bcmills

This comment has been minimized.

Copy link
Member

commented May 30, 2019

As a temporary workaround in the meantime, we could write the non-critical services (build.golang.org and dev.golang.org, not golang.org in general) in “crash-only” style, and just make sure that we'll notice if any given service is down.

@bcmills bcmills added the NeedsFix label May 30, 2019
@bradfitz

This comment has been minimized.

Copy link
Member Author

commented May 30, 2019

For non-critical things that'll likely recover on their own, I can just add items (perhaps at WARN level where appropriate) at https://farmer.golang.org/#health .... each of those can easily be hooked up to monitoring too.

I'd prefer not to crash if a non-critical service we depend on is having temporary issues. We have a lot of them.

@dmitshur

This comment has been minimized.

Copy link
Member

commented May 30, 2019

build.golang.org and dev.golang.org are not non-critical services. If they're down, trybots and builders don't run, gopherbot won't assign reviewers to CLs, etc. People rely on those things working, and so I don't think it's good idea to try to solve this issue at the cost of reducing Go contributor productivity. We should find a non-disruptive way to find "bad" entries in logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.