Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify correctness of service discovery for metadata', metadata.google.internal`, etc. on GKE and AWS #392

Closed
briansmith opened this issue Feb 19, 2018 · 7 comments
Assignees
Labels
area/security priority/P1 Planned for Release
Milestone

Comments

@briansmith
Copy link
Contributor

See kubernetes/kubernetes#8512 (comment):

The metadata service is the de facto standard for distributing short-lived creds to apps running on EC2 (IAM roles) or GCE (scoped compute service accounts), and SDKs from both support this very well.

If that is still true, then we can't safely do DNS resolution for these hostnames from the controller's Destination service's pods, because the metadata returned would be the metadata intended for the Destination service's node, not the node that the proxied pod is running on.

See also kubernetes/kubernetes#8867.

/cc @olix0r @adleong
.

@briansmith
Copy link
Contributor Author

This, #62, #366, and #384 all make me think that we should do DNS resolution within the proxy, and not within the Destination service.

Specifically, I think we need to do something like this: When we see a partially-qualified name like "metadata" or "metadata.$namespace", we need to use DNS to resolve both that name and "metadata.$namespace.svc.$zone". If and only if they resolve to the same IP address then subscribe to the name using the control plane's Destination service. Otherwise, use the IP addresses in the DNS response received (and probably skip our internal load balancing).

@briansmith
Copy link
Contributor Author

The GCE documentation for the metadata service is https://cloud.google.com/compute/docs/storing-retrieving-metadata#querying.

The documentation for AWS is at https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html#instancedata-data-retrieval. The stack overflow answer at https://stackoverflow.com/a/42315582 is very helpful to understand that our service discovery logic probably needs to consider link-local addresses specially.

@briansmith
Copy link
Contributor Author

In Conduit 0.3, "metadata" won't resolve to the right thing (it will resolve as if it were "metadata.$namespace.svc"), but "metadata.google.internal" should work correctly.

I'll tentatively propose that we make "metadata" (IIUC, the deprecated legacy name for "metadata.google.internal") work in 0.4.

@olix0r olix0r modified the milestones: Conduit 0.4, Conduit 0.3.1 Feb 21, 2018
@olix0r olix0r added the priority/P1 Planned for Release label Feb 23, 2018
@seanmonstar seanmonstar removed their assignment Feb 26, 2018
@briansmith
Copy link
Contributor Author

It would be good to get the following information from a shell in a conduit-proxy container in a GKE cluster:

$ cat /etc/resolv.conf
$ dig +showsearch metadata

@olix0r
Copy link
Member

olix0r commented Mar 7, 2018

running in the myorg Google Cloud organization:

:; kubectl  -n myns exec -it -c conduit-proxy mypod-1084968931-fprts bash 
I have no name!@mypod-1084968931-fprts:/$ cat /etc/resolv.conf
nameserver 10.51.240.10
search myns.svc.cluster.local svc.cluster.local cluster.local c.myorg.internal google.internal
options ndots:5
I have no name!@mypod-1084968931-fprts:/$ dig +showsearch metadata

; <<>> DiG 9.9.5-9+deb8u15-Debian <<>> +showsearch metadata
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 2244
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;metadata.myns.svc.cluster.local.	IN A

;; AUTHORITY SECTION:
cluster.local.		60	IN	SOA	ns.dns.cluster.local. hostmaster.cluster.local. 1520460000 28800 7200 604800 60

;; Query time: 2 msec
;; SERVER: 10.51.240.10#53(10.51.240.10)
;; WHEN: Wed Mar 07 22:48:20 UTC 2018
;; MSG SIZE  rcvd: 157


; <<>> DiG 9.9.5-9+deb8u15-Debian <<>> +showsearch metadata
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 64665
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;metadata.svc.cluster.local.	IN	A

;; AUTHORITY SECTION:
cluster.local.		60	IN	SOA	ns.dns.cluster.local. hostmaster.cluster.local. 1520460000 28800 7200 604800 60

;; Query time: 0 msec
;; SERVER: 10.51.240.10#53(10.51.240.10)
;; WHEN: Wed Mar 07 22:48:20 UTC 2018
;; MSG SIZE  rcvd: 137


; <<>> DiG 9.9.5-9+deb8u15-Debian <<>> +showsearch metadata
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 10359
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;metadata.cluster.local.		IN	A

;; AUTHORITY SECTION:
cluster.local.		60	IN	SOA	ns.dns.cluster.local. hostmaster.cluster.local. 1520460000 28800 7200 604800 60

;; Query time: 0 msec
;; SERVER: 10.51.240.10#53(10.51.240.10)
;; WHEN: Wed Mar 07 22:48:20 UTC 2018
;; MSG SIZE  rcvd: 133


; <<>> DiG 9.9.5-9+deb8u15-Debian <<>> +showsearch metadata
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 47597
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;metadata.c.myorg.internal.	IN	A

;; AUTHORITY SECTION:
internal.		30	IN	SOA	ns.global.gcedns-prod.internal. cloud-dns-hostmaster.google.com. 2015030600 7200 3600 24796800 5

;; Query time: 6 msec
;; SERVER: 10.51.240.10#53(10.51.240.10)
;; WHEN: Wed Mar 07 22:48:20 UTC 2018
;; MSG SIZE  rcvd: 148


; <<>> DiG 9.9.5-9+deb8u15-Debian <<>> +showsearch metadata
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39487
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;metadata.google.internal.	IN	A

;; ANSWER SECTION:
metadata.google.internal. 3600	IN	A	169.254.169.254

;; Query time: 0 msec
;; SERVER: 10.51.240.10#53(10.51.240.10)
;; WHEN: Wed Mar 07 22:48:20 UTC 2018
;; MSG SIZE  rcvd: 69

@seanmonstar seanmonstar added this to Triage in Proxy Transparency Mar 12, 2018
@seanmonstar seanmonstar moved this from Triage to Ready in Proxy Transparency Mar 12, 2018
@wmorgan wmorgan modified the milestones: 0.3.1, 0.4.0 Mar 19, 2018
@olix0r
Copy link
Member

olix0r commented Apr 13, 2018

For what it's worth, this is not fixed on master (37434d0) yet.

the following request times out:

root@web-569cc78964-dph9c:/# curl -v "http://metadata/computeMetadata/v1/instance/tags" -H "Metadata-Flavor: Google"
* Hostname was NOT found in DNS cache
*   Trying 169.254.169.254...
* Connected to metadata (169.254.169.254) port 80 (#0)
> GET /computeMetadata/v1/instance/tags HTTP/1.1
> User-Agent: curl/7.38.0
> Host: metadata
> Accept: */*
> Metadata-Flavor: Google
> 
< HTTP/1.1 500 Internal Server Error
< content-length: 0
< Date: Fri, 13 Apr 2018 15:11:47 GMT
< 
* Connection #0 to host metadata left intact

whereas the fqdn works properly:

root@web-569cc78964-dph9c:/# curl -v "http://metadata.google.internal/computeMetadata/v1/instance/attributes/" -H "Metadata-Flavor: Google"
* Hostname was NOT found in DNS cache
*   Trying 169.254.169.254...
* Connected to metadata.google.internal (169.254.169.254) port 80 (#0)
> GET /computeMetadata/v1/instance/attributes/ HTTP/1.1
> User-Agent: curl/7.38.0
> Host: metadata.google.internal
> Accept: */*
> Metadata-Flavor: Google
> 
< HTTP/1.1 200 OK
< metadata-flavor: Google
< content-type: application/text
< etag: d241baa20f34b132
< date: Fri, 13 Apr 2018 15:14:09 GMT
* Server Metadata Server for VM is not blacklisted
< server: Metadata Server for VM
< content-length: 160
< x-xss-protection: 1; mode=block
< x-frame-options: SAMEORIGIN
< 
cluster-location
cluster-name
configure-sh
created-by
gci-ensure-gke-docker
gci-update-strategy
google-compute-enable-pcid
instance-template
kube-env
user-data

I think we basically understand how we want to do name resolution/discovery to resolve this sort of issue. I'll open a new issue to start documenting exactly what needs to change..

@olix0r olix0r modified the milestones: 0.4.0, 0.5.0 Apr 13, 2018
@olix0r
Copy link
Member

olix0r commented Apr 26, 2018

As of 0.4.1:

root@voter-74cdc6648c-t87ql:/# curl -v http://metadata
* Rebuilt URL to: http://metadata/
* Hostname was NOT found in DNS cache
*   Trying 169.254.169.254...
* Connected to metadata (169.254.169.254) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: metadata
> Accept: */*
> 
< HTTP/1.1 200 OK
< metadata-flavor: Google
< content-type: application/text
< date: Thu, 26 Apr 2018 17:21:22 GMT
* Server Metadata Server for VM is not blacklisted
< server: Metadata Server for VM
< content-length: 22
< x-xss-protection: 1; mode=block
< x-frame-options: SAMEORIGIN
< 
0.1/
computeMetadata/
* Connection #0 to host metadata left intact

I believe this is resolved now?

@olix0r olix0r modified the milestones: 0.5.0, 0.4.1 Apr 26, 2018
@olix0r olix0r closed this as completed Apr 26, 2018
Proxy Transparency automation moved this from Ready to Done Apr 26, 2018
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/security priority/P1 Planned for Release
Projects
No open projects
Development

No branches or pull requests

4 participants