New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in swarm, sometimes DNS resolve to container id and not name #34882
Comments
I am also seeing this issue and was about to raise a defect before I found this one. I would like to add my voice to this issue since it creates a problem with one of the few mechanisms that are available within the service containers that allow them to introspect their environment (other than, of course, crawling out to the docker API). Consider the following. I need to monitor my services and provide some kind of reasonable and consistent statistics on usage and behavior on a per instance basis. As such I need to contact each individual service instance (task). I can identify the IP addresses of the tasks for a service using the "tasks." DNS:
So I can see the IPs of the two containers within the lab1_das service. However, what are the instance numbers. I was hoping to retrieve this using reverse DNS, but I hit the same problem described above.
In about 9/10 queries I get the PTR record that includes the task number (lab1_das.1.ftnllsan0awehevz3hltdcilb.lab1) which is useful for me. Environment is 17.06.2-ce running on Centos 7. Details:
|
We are having the same behavior: at random time (or seems like it), the DNS reverse resolving in Swarm return container id and not name. We are tracking network traffic and using reverse DNS to avoid monitoring some specific overloaded nodes. When the name changes, the tracking system falls because of overload.
|
I got the same issue on docker for mac 18.02.0-ce-rc1. |
I can reliably reproduce this as well, and it causes hbase to fail. Please let me know what additional information / logs would help. |
I can reproduce this too on version 18.05.0-ce, build f150324, is there a fix of a workaround for this? |
FYI, I did some investigations around of this one and find out that:
about 100-150 / 1000 tests fails. b) I can see from daemon debug log that DNS records are first added for container name and then for it's ID:
c) If I disable code lines on https://github.com/docker/libnetwork/blob/master/service_common.go#L65-L68 then problem disappears. So to be able fix this we need figure out why/where container ID is added as alias and/or make sure that it is not used as PTR record. EDIT: Created PR which disables adding PTR records for aliases moby/libnetwork#2299 |
@cjbearman |
@fcrisciani Certainly using the .Service.ID template within the target container allows the container itself to know its instance number and enable it to report that in a communications stream - for purposes such as monitoring / logging per my original comment. Specifically my monitoring code will now resolve the hosts using forwards DNS resolution as described in my comment, but forgo the backwards resolution and allow the target container instance to report it's instance number as part of the data stream I receive from it. |
As mentionned in linked issue, I'm capturing network communication between the containers in the Swarm to analyse and troubleshoot system behavior.
I would dearly appreciate resolution of this issue :) |
Also for identifying the tasks on the same node. A fix would be much appreciated. |
@TincaTibo just to clarify. moby/libnetwork#2299 is fix to this issue. Not linked issue ;) But now Docker maintainers need to make sure that it does not create some other issues. |
@olljanat Thanks, got it! |
@fcrisciani / @thaJeztah FYI this can be closed as moby/libnetwork#2299 is merged. |
Is this change supposed to be in Docker 18.09.1-ce? I'm still getting multiple PTR records, docker logs around container creation:
Dockerd logs for resolution:
docker info:
Minimal reproduction on a swarm-enable docker instance: docker-compose.yaml version: "3.5"
services:
test:
image: test
hostname: "{{.Task.Name}}.test_default"
environment:
SERVICE_NAME: test
deploy:
replicas: 10 Dockerfile for 'test' image: FROM alpine:3.9
ADD log.sh /
CMD ["/bin/sh", "/log.sh"] log.sh: #!/bin/sh
while true; do
echo "Resolving..."
nslookup tasks.$SERVICE_NAME
sleep 1
done
Dig also agrees with nslookup: / # dig -x 10.0.11.199
; <<>> DiG 9.12.3-P4 <<>> -x 10.0.11.199
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34865
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;199.11.0.10.in-addr.arpa. IN PTR
;; ANSWER SECTION:
199.11.0.10.in-addr.arpa. 600 IN PTR test_test.9.95ubli8xvaxijix5s8m0qjfem.test_default.
;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Fri Apr 26 14:58:19 UTC 2019
;; MSG SIZE rcvd: 130
/ # dig -x 10.0.11.199
; <<>> DiG 9.12.3-P4 <<>> -x 10.0.11.199
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 148
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;199.11.0.10.in-addr.arpa. IN PTR
;; ANSWER SECTION:
199.11.0.10.in-addr.arpa. 600 IN PTR e0ccacea03de.test_default.
;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Fri Apr 26 14:58:20 UTC 2019
;; MSG SIZE rcvd: 105 |
@AHelper 18.09 means code freeze on September, 2018. moby/libnetwork#2299 was merged on November so it will be on next version 19.03. You can see its target dates on https://github.com/docker/docker-ce/milestone/32 |
Description
reverse ip lookup randomly returns container id and not the name.
Steps to reproduce the issue:
from inside container dig to ip few times
Describe the results you received:
Describe the results you expected:
i expect resolve to container name always
Output of
docker version
:The text was updated successfully, but these errors were encountered: