New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Micrometer client not reconnecting to statsd after server is restarted #3563
Comments
I tried to reproduce this locally and I was unable to. I started the Datadog agent on my macbook, started an application using Micrometer's statsd registry with datadog flavor, waited for some metrics to be sent, restarted the Datadog Agent, waited for more metrics to be sent, and I was able to see the metrics from after the restart in Datadog. Could you help us figure out how to reproduce the issue you're seeing? |
Thanks for the response @shakuzen For context the Datadog agent version is 7.38.0-jmx |
We really just do a rolling restart of the agent and the issue occurs. Perhaps turning DEBUG logging on for the client will help to gather more info? |
If you turn on debug logging for classes under the |
We ran into similar issues lately. Our Datadog agent ( |
@raymondchen625 thanks for the additional datapoint. Unfortunately, since I haven't been able to reproduce it locally or get logs from anyone with more info about what is happening, we haven't made any progress on this. If you would be able to provide steps to reproduce this (ideally locally or with as minimal other things involved as possible) or some debug logs, it may help get to the bottom of this. |
I could not reproduce this locally. Unfortunately, I didn't try to capture IP packets before restarting the pods to see if Micrometer was sending traffic somewhere. I did check the application logs and didn't see any error. For custom metrics being sent to UDP 8125, I believe there is no error logs if the agent is down. But we also had APM metrics sent to TCP 8126, which should print error logs if the service is down. That makes me think Micrometer might have stopped sending out metrics. I might have captured nothing even if I had done the tcpdump on it. |
I suspect (but could be wrong) that there will be some logs at debug level when the issue occurs, but most people don't have debug logs enabled on their production services. |
We found the same issue in our staging environment. I killed one of our two pods now it's clearly from the Datadog dashboard that only the new pod is able to send metrics. I used tcpdump to capture IP packets on both containers. They are both sending TCP packets to port 8126 or UDP datagrams to port 8125. |
Hi @shakuzen, I finally somehow found a scenario to reproduce the issue.
To reproduce the issue:
I'm not sure if this is exactly the same scenario @dunnk2022 had. But this indicates an issue with the current implementation: it doesn't resolve the hostname again after |
@raymondchen625 We added an initContainer to our Datadog host agent pods to delete conntrack entries before starting the Datadog agent. This change was based on comments from this issue - containernetworking/plugins#123. We have not had a report of missing metrics sent over StatsD since. |
After looking at the latest comments, can we assume that the problem is fixed ? |
@raymondchen625 What you've described sounds like #1252. If that is still an issue in the latest versions, please comment over there so we know it still needs to be fixed. I'm not sure if that was the original problem here or not. If the server was restarted with the same IP, I guess it's a different problem. |
Yes, that's probably related to a series of kube-proxy issues:
IMO, this issue can be closed. The DNS change detection issue can be addressed by #1252 . |
Closing in favour of #1252 |
I would say that my reported issue is fixed. Thanks All! |
@dunnk2022 how is it fixed? I'm asking cause #1252 seems to be opened |
@marcingrzejszczak My issue was not on the micrometer end.
|
Describe the bug
The micrometer client does not reconnect to the statsd server (Datadog agent) when the agent is restarted.
Environment
To Reproduce
How to reproduce the bug:
Restart the statsd server (datadog-agent)
Expected behavior
A clear and concise description of what you expected to happen.
The micrometer client re-establishes a connection to the statsd server on port 8125 when the daemon is back up.
Additional context
Add any other context about the problem here, e.g. related issues.
The text was updated successfully, but these errors were encountered: