[Bug] Cloudprober stops working #144
Comments
@Daxten Thanks for the report. Can you tell me a little bit more about your setup?
|
Wow, thanks for coming back to me so fast!
|
I think I'll wait for config before commenting further. It does sound like a bug in cloudprober that is getting surfaced by something in your environment. Also, are you running the latest cloudprober, that is, the latest cloudprober image, built from the source, or the last release (0.9.3)? |
Hi, I sent you the config (via mail). With 0.9.3 the problem kept persisting until restarting I THINK I switched to latest for a week now, and it seems like it regenerates on itself after a few hours with this version |
@Daxten I got the config, thanks! I'll certainly recommend using the latest cloudprober image instead of 0.9.3 -- there have been some bug fixes since that version. I am not sure why cloudprober will stop working. Trying to think of a few options aloud:
(Also, you should be able to exclude this using logs.)
Also, you said you didn't see anything in logs. Can you try mapping /tmp as a volume - "-v /tmp:/tmp" and see if it generates any logs? I think cloudprober will try to log in /tmp if not running on GCE (on GCE logs go to stackdriver logging). |
Regarding my last comment about logging, I verified that our docker image's default command line is set to log to stderr: Line 24 in a397582
So unless you're overriding the docker image entrypoint, cloudprober should be logging to stderr rather than a file under /tmp. |
Hey, |
Hi @Daxten, I got the logs. Also, responded over email but to close the loop here: === Just to collect some more info -
I improved the logging in last couple of changes. Can you retry with the "latest" container? |
@Daxten, I was wondering if you're still experiencing this issue. Can we close this issue if you're not. Thanks, |
Closing this due to inactivity. Please feel free to reopen if it's still a problem. I'll be more than happy to debug this with you. Cheers. |
We are using Cloudprober to ping ~20 hosts currently. From time to time it stops working, without crashing the container. The HTTP Endpoint still works, but there are no new results generated.
The text was updated successfully, but these errors were encountered: