-
Notifications
You must be signed in to change notification settings - Fork 38.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: DNS "diagnosis" program #45934
Comments
@thockin Where should the diagnose problem belong to? contrib repo? |
I think we should consider putting it in dns or creating a new repo. I find the contrib repo hard to navigate and I believe there was a motion to split out most components into seperate repos. |
+1 DNS
contrib is dead.
…On Fri, May 19, 2017 at 11:05 AM, cmluciano ***@***.***> wrote:
I think we should consider putting it in dns or creating a new repo. I
find the contrib repo hard to navigate and I believe there was a motion to
split out most components into seperate repos.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#45934 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVDMvGZFRIKayFvLGeTZtAkDrkDh-ks5r7dnjgaJpZM4NdXkx>
.
|
I'd like to work on this, where should I begin? I've been reading through https://github.com/kubernetes/dns, AIUI the Dockerfile of dns-diagnoser should live there and the build process should be changed to build it, is that ok? Also, about the initial suggestion, what should dns.yaml define? Thanks! |
Seems appropriate as a home. I would start with a rough spec of the tests
you want to run. Report number of dns endpoints. Report number of
restarts. Measure latency and dropped requests to each endpoint in the
service. Report. Ache ratios for each replica. Etc
…On May 28, 2017 8:55 AM, "Federico Gimenez" ***@***.***> wrote:
I'd like to work on this, where should I begin? I've been reading through
https://github.com/kubernetes/dns, AIUI the Dockerfile of dns-diagnoser
should live there and the build process should be changed to build it, is
that ok? Also, about the initial suggestion, what should dns.yaml define?
Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#45934 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVCVsd3Rr5T6G-uVQUVROTT2-my3Oks5r-ZjsgaJpZM4NdXkx>
.
|
@thockin great thx, on it |
Based off of my personal experiences with cluster DNS issues I find them to be mostly short lived in the 1 - 10 minute range which stop as abruptly as they started with nobody fixing anything. Would it be too heavy weight to have have the proposed diagnostic tool running continuously to catch intermittent issues? I can see in my situation by the time I run the diagnostic tool the mysterious problem may have subsided and I've missed the event. An alternative to having the diagnostic tool running continuously would possibly be a document which describes what data users should be gathering all the time to support post incident review. Things like udp packet loss, conntrack drops, dnsmasq/kube-dns metrics, cpu/mem consumption of kube-dns pod, etc. |
@someword Some of this could be integrated into the node problem detector? (https://github.com/kubernetes/node-problem-detector) |
I think it would be interesting to have a long running DNS prober that did
lookups every couple seconds. It could be a Job that runs for 3 hours and
then exits with failure, so the Job controller restarts it possibly
elsewhere. Need to collect useful results, rotate them, serve them, and
defeat the DNS caches.
…On Thu, Jun 1, 2017 at 4:41 PM, Bowei Du ***@***.***> wrote:
@someword <https://github.com/someword> Some of this could be integrated
into the node problem detector? (https://github.com/
kubernetes/node-problem-detector)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#45934 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVIsHD-YHqQS8xmMyIoyybCzyB9biks5r_0xFgaJpZM4NdXkx>
.
|
@thockin cool thx a lot, I can add that description to an initial spec proposal for the process to be performed by the diagnose tool to discuss the implementation further, WDYT? Also, in terms of the checks themselves (not for managing the results), do you think DNSPerf could be useful? |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
We still have a lot of reports of DNS issues. It would be super cool to have a diagnoser program that could run as a pod in your cluster, and would gather information about DNS - how many replicas are running, how many restarts they have, do a bunch of DNS lookups of various kinds - in-cluster, out of cluster, A, PTR, SRV - and collect the latencies and dropped requests.
Something like
kubectl apply -f http://kubernetes.io/diagnose/dns.yaml && kubectl attach -ti dns-diagnoser | tee dns.out
or similar.I'm filing this as help-wanted - it seems like something that a newcomer could tackle to learn how to use kubernetes and produce a valuable result!
The text was updated successfully, but these errors were encountered: