Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying DNS SRV record returns wrong host #1125

Closed
seal-ss opened this issue Jul 21, 2015 · 9 comments
Closed

Querying DNS SRV record returns wrong host #1125

seal-ss opened this issue Jul 21, 2015 · 9 comments
Assignees
Labels
thinking More time is needed to research by the Consul Contributors

Comments

@seal-ss
Copy link

seal-ss commented Jul 21, 2015

We try to register a service on a remote consul node. So the service itself and the consul process are running on different hosts (docker containers).

  • The service is running on host f3632a6214bc with IP 172.17.0.96.
  • Consul is running on host bda62c8a3961 (with alias consul) with IP 172.17.0.95.

We provide the address of the service in the ServiceAddress field.

Here you can see the data stored for the service:

$ curl http://consul:8500/v1/catalog/service/checkin | jq .
[
  {
    "Node": "bda62c8a3961",
    "Address": "172.17.0.95",
    "ServiceID": "checkin@f3632a6214bc:515",
    "ServiceName": "checkin",
    "ServiceTags": [
      "lpr"
    ],
    "ServiceAddress": "172.17.0.96",
    "ServicePort": 515
  }
]

Querying the A record for the service returns the correct IP address:

$ host checkin.service.consul consul                
Using domain server:
Name: consul
Address: 172.17.0.95#53
Aliases: 

checkin.service.consul has address 172.17.0.96

But querying the SRV record returns the wrong host name bda62c8a3961.

$ host -t srv checkin.service.consul consul         
Using domain server:
Name: consul
Address: 172.17.0.95#53
Aliases: 

checkin.service.consul has SRV record 1 1 515 bda62c8a3961.node.dc1.consul.

Is there any other way to get the correct host and port of the service by using DNS?

@ryanuber
Copy link
Member

This looks like a subtle bug in the DNS system. Since we don't have any node for that service address, we can't formulate a proper SRV record response (from what I understand, the target must be a name, and not an address directly). There currently is no way around this while using the DNS interface, so unfortunately for now you would need to either use well-known port numbers or the HTTP API to grab the correct info.

I'll mark this as a thinking ticket, since I'm not sure off hand what the best solution to this would be.

@ryanuber ryanuber added the thinking More time is needed to research by the Consul Contributors label Jul 21, 2015
@gdiazlo
Copy link

gdiazlo commented Sep 22, 2015

we need this :)
tl;dr
In our world of "everything is a container", there are cases in which the use of the consul REST API is not an option, for example with third party binaries.

Also, in this world of ephemeral containers, making a container a consul node, will make the serf network suffer, difficulting the work of the consistency process. We've already seen false positives and negatives of services running this way. Given that the deregistration of a service takes 72hrs. automatically, and that in our typical dev environment, there are dozens of re-deployments per day, we end with a consul full of ghost nodes.

We believe that using consul in this scenario requires using service abstraction to represent those services in containers, with their IPs, etc. and using the container's host as a consul node, which participate in the serf network.

We think consul development is going this way, that's why we choose it instead of implementing our own tool. Said that, we'd love to see the DNS api to support having services as nodes. One option could be using services as nodes, separating serf nodes from service nodes in the consul internals.

We don't have any PR yet, but we would like to hear your opinion about it.

thanks

@slackpad slackpad self-assigned this Sep 23, 2015
@slackpad
Copy link
Contributor

I'll take a look into this post HashiConf.

@gdiazlo
Copy link

gdiazlo commented Sep 29, 2015

fyi, we're working on a solution that on container termination issues a consul leave, we'll update this when the tests is done.

@mfischer-zd
Copy link
Contributor

mfischer-zd commented Feb 6, 2017

@ryanuber @slackpad Any thoughts on this? We're now in need of a solution here as well.

Would it make sense to create a DNS A record in, say, the .private.consul domain to represent the service IP, then make the SRV answer point to that A record?

@slackpad slackpad added this to the Triaged milestone Feb 6, 2017
@mfischer-zd
Copy link
Contributor

Hmm, commit 2a26597 may have fixed this already. Is there a released Consul version containing this fix?

@slackpad
Copy link
Contributor

slackpad commented Feb 6, 2017

@mfischer-zd that went out in 0.7.1, but had an issue we just fixed in 0.7.4 that released today (#2695). I think you might be right that this is a dup of #1228 (at least solution-wise).

@jonmoter
Copy link

jonmoter commented Feb 7, 2017

I'm a coworker of @mfischer-zd. I downloaded Consul 0.7.4 and tried to reproduce the problem. I registered a service like:

{
    "Datacenter": "dc1", 
    "Node": "some.random.node",
    "Address": "127.0.0.1",
    "Service": {
            "ID": "myservice-123",
            "Service": "myservice", 
            "Address": "1.2.3.4",
            "Port": 80
    }
}

and then did a SRV lookup. As desired, the SRV lookup returned a record that resolved to the service IP, rather than the Node IP.

$ dig @127.0.0.1 -p 8600 myservice.service.consul SRV

; <<>> DiG 9.8.3-P1 <<>> @127.0.0.1 -p 8600 myservice.service.consul SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1807
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;myservice.service.consul.	IN	SRV

;; ANSWER SECTION:
myservice.service.consul. 0	IN	SRV	1 1 80 01020304.addr.dc1.consul.

;; ADDITIONAL SECTION:
01020304.addr.dc1.consul. 0	IN	A	1.2.3.4

;; Query time: 1 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Mon Feb  6 14:14:19 2017
;; MSG SIZE  rcvd: 96

So as far as I can tell, this issue is fixed in 0.7.4.

@slackpad
Copy link
Contributor

slackpad commented Feb 7, 2017

@jonmoter thanks for the confirmation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
thinking More time is needed to research by the Consul Contributors
Projects
None yet
Development

No branches or pull requests

6 participants