Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: DNS "diagnosis" program #45934

Closed
thockin opened this issue May 17, 2017 · 13 comments
Closed

Idea: DNS "diagnosis" program #45934

thockin opened this issue May 17, 2017 · 13 comments
Labels
area/dns help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@thockin
Copy link
Member

thockin commented May 17, 2017

We still have a lot of reports of DNS issues. It would be super cool to have a diagnoser program that could run as a pod in your cluster, and would gather information about DNS - how many replicas are running, how many restarts they have, do a bunch of DNS lookups of various kinds - in-cluster, out of cluster, A, PTR, SRV - and collect the latencies and dropped requests.

Something like kubectl apply -f http://kubernetes.io/diagnose/dns.yaml && kubectl attach -ti dns-diagnoser | tee dns.out or similar.

I'm filing this as help-wanted - it seems like something that a newcomer could tackle to learn how to use kubernetes and produce a valuable result!

@thockin thockin added area/dns help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. sig/network Categorizes an issue or PR as relevant to SIG Network. labels May 17, 2017
@resouer
Copy link
Contributor

resouer commented May 19, 2017

@thockin Where should the diagnose problem belong to? contrib repo?

@cmluciano
Copy link

I think we should consider putting it in dns or creating a new repo. I find the contrib repo hard to navigate and I believe there was a motion to split out most components into seperate repos.

@thockin
Copy link
Member Author

thockin commented May 24, 2017 via email

@fgimenez
Copy link
Contributor

I'd like to work on this, where should I begin? I've been reading through https://github.com/kubernetes/dns, AIUI the Dockerfile of dns-diagnoser should live there and the build process should be changed to build it, is that ok? Also, about the initial suggestion, what should dns.yaml define?

Thanks!

@thockin
Copy link
Member Author

thockin commented May 28, 2017 via email

@fgimenez
Copy link
Contributor

@thockin great thx, on it

@someword
Copy link

Based off of my personal experiences with cluster DNS issues I find them to be mostly short lived in the 1 - 10 minute range which stop as abruptly as they started with nobody fixing anything. Would it be too heavy weight to have have the proposed diagnostic tool running continuously to catch intermittent issues? I can see in my situation by the time I run the diagnostic tool the mysterious problem may have subsided and I've missed the event. An alternative to having the diagnostic tool running continuously would possibly be a document which describes what data users should be gathering all the time to support post incident review. Things like udp packet loss, conntrack drops, dnsmasq/kube-dns metrics, cpu/mem consumption of kube-dns pod, etc.

@bowei
Copy link
Member

bowei commented Jun 1, 2017

@someword Some of this could be integrated into the node problem detector? (https://github.com/kubernetes/node-problem-detector)

@thockin
Copy link
Member Author

thockin commented Jun 2, 2017 via email

@fgimenez
Copy link
Contributor

fgimenez commented Jun 7, 2017

@thockin cool thx a lot, I can add that description to an initial spec proposal for the process to be performed by the diagnose tool to discuss the implementation further, WDYT? Also, in terms of the checks themselves (not for managing the results), do you think DNSPerf could be useful?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 26, 2017
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 25, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dns help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

8 participants