Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCE metadata name doesn't resolve in container #8512

Closed
evandbrown opened this issue May 19, 2015 · 33 comments
Closed

GCE metadata name doesn't resolve in container #8512

evandbrown opened this issue May 19, 2015 · 33 comments
Assignees
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Milestone

Comments

@evandbrown
Copy link

In v0.15.0 on GKE, containers had access to the GCE metadata service at http://metadata. On a v0.17.0 cluster, metadata does not resolve:

$ curl "http://metadata/computeMetadata/v1/instance/service-accounts/default/token" -H "Metadata-Flavor: Google"
curl: (6) Could not resolve host: metadata

The link-local address works fine in the container:

$ curl "http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token" -H "Metadata-Flavor: Google"
"access_token":...

A number of tools rely on the metadata name resolving to access service accounts, etc.

The name resolves fine on the GCE host.

@yujuhong yujuhong added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label May 19, 2015
@evandbrown
Copy link
Author

Qualifying the name also works:

curl "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" -H "Metadata-Flavor: Google"
"access_token";...

@roberthbailey
Copy link
Contributor

/cc @thockin

@roberthbailey roberthbailey added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label May 19, 2015
@evandbrown
Copy link
Author

FWIW http://metadata/ is hardcoded into the Google API client for Java (haven't checked other languages), so any app using that and relying on service accounts is impacted.

@thockin
Copy link
Member

thockin commented May 20, 2015

I can get to metadata from my current cluster (which is about 5 days old)

# docker exec -ti bcb88c390bfa sh
/ # cat /etc/resolv.conf
nameserver 10.0.0.10
nameserver 169.254.169.254
nameserver 10.240.0.1
search default.cluster.local default.svc.cluster.local svc.cluster.local
cluster.local c.thockin-dev.internal. 844790599918.google.internal.
google.internal.
/ # ping metadata
PING metadata (127.0.53.53): 56 data bytes
64 bytes from 127.0.53.53: seq=0 ttl=64 time=0.038 ms
64 bytes from 127.0.53.53: seq=1 ttl=64 time=0.042 ms
^C
--- metadata ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.038/0.040/0.042 ms
/ #

On Tue, May 19, 2015 at 5:12 PM, Evan Brown notifications@github.com
wrote:

FWIW http://metadata/ is hardcoded into the Google API client for Java
(haven't checked other languages), so any app using that and relying on
service accounts is impacted.


Reply to this email directly or view it on GitHub
#8512 (comment)
.

@evandbrown
Copy link
Author

@thockin this is a GKE cluster I created today. Just created another ~2 minutes ago, same result:

Creating the GKE Cluster:

CLUSTER_NAME=mdtester
NUM_NODES=1
MACHINE_TYPE=g1-small
API_VERSION=0.17.0

gcloud alpha container clusters create ${CLUSTER_NAME} \
  --num-nodes ${NUM_NODES} \
  --machine-type ${MACHINE_TYPE} \
  --cluster-api-version ${API_VERSION} \
  --scopes "https://www.googleapis.com/auth/devstorage.full_control" \
           "https://www.googleapis.com/auth/projecthosting"

Tools on my workstation:

$ gcloud components update preview
All components are up to date.

$ kubectl version
Client Version: version.Info{Major:"0", Minor:"16", GitVersion:"v0.16.1", GitCommit:"b933dda5369043161fa7c7330dfcfbc4624d40e6", GitTreeState:"clean"}
Server Version: version.Info{Major:"0", Minor:"17", GitVersion:"v0.17.0", GitCommit:"82f8bdac06ddfacf493a9ed0fedc85f5ea62ebd5", GitTreeState:"clean"}

On node:

evanbrown@k8s-mdtester-node-1:~$ sudo bash
root@k8s-mdtester-node-1:/home/evanbrown# docker ps
CONTAINER ID        IMAGE                                                  COMMAND                CREATED              STATUS              PORTS               NAMES
d1de71de8b9a        gcr.io/cloud-solutions-images/nginx-ssl-proxy:latest   "/bin/sh -c ./start.   50 seconds ago       Up 47 seconds                           k8s_nginx-ssl-proxy.22441003_nginx-ssl-proxy-7u7uh_default_9c82b564-fe8f-11e4-845f-42010af01d9a_b174af7a
f6a34bf6e2ee        gcr.io/google_containers/pause:0.8.0                   "/pause"               About a minute ago   Up About a minute                       k8s_POD.e4cc795_jenkins-agent-czbes_default_9daa9258-fe8f-11e4-845f-42010af01d9a_a96641ed
8324df574b26        gcr.io/google_containers/pause:0.8.0                   "/pause"               About a minute ago   Up About a minute                       k8s_POD.b72e170c_jenkins-leader-gnebq_default_9d16d691-fe8f-11e4-845f-42010af01d9a_123c0387
10d8e2e6e6a8        gcr.io/google_containers/pause:0.8.0                   "/pause"               About a minute ago   Up About a minute                       k8s_POD.605b1ae5_nginx-ssl-proxy-7u7uh_default_9c82b564-fe8f-11e4-845f-42010af01d9a_3d84c97c
b5e81c0e3a3a        gcr.io/google_containers/skydns:2015-03-11-001         "/skydns -machines=h   About a minute ago   Up About a minute                       k8s_skydns.cd47cc92_kube-dns-lmn6f_default_776bdb99-fe8f-11e4-82e2-42010af01d9a_52f85030
c7124b4dac50        gcr.io/google_containers/kube2sky:1.4                  "/kube2sky -domain=k   About a minute ago   Up About a minute                       k8s_kube2sky.87b54c53_kube-dns-lmn6f_default_776bdb99-fe8f-11e4-82e2-42010af01d9a_71c23a7f
bbf319e174d0        gcr.io/google_containers/etcd:2.0.9                    "/usr/local/bin/etcd   About a minute ago   Up About a minute                       k8s_etcd.13594ee7_kube-dns-lmn6f_default_776bdb99-fe8f-11e4-82e2-42010af01d9a_ba9cfac8
22ce1c0b0c8c        gcr.io/google_containers/pause:0.8.0                   "/pause"               About a minute ago   Up About a minute                       k8s_POD.8fdb0e41_kube-dns-lmn6f_default_776bdb99-fe8f-11e4-82e2-42010af01d9a_11a56c63
26949dfb38f7        gcr.io/google_containers/pause:0.8.0                   "/pause"               2 minutes ago        Up 2 minutes                            k8s_POD.e4cc795_fluentd-cloud-logging-k8s-mdtester-node-1_default_e7a73ce4931dc175ddc463501188a765_00ec948d

root@k8s-mdtester-node-1:/home/evanbrown# docker exec -ti d1de71de8b9a sh
# ping metadata
ping: unknown host

# ping metadata.google.internal
PING metadata.google.internal (169.254.169.254): 48 data bytes
56 bytes from 169.254.169.254: icmp_seq=0 ttl=254 time=0.335 ms
56 bytes from 169.254.169.254: icmp_seq=1 ttl=254 time=1.369 ms

@roberthbailey
Copy link
Contributor

I tried this on a GKE cluster that I created today.

robertbailey@k8s-robertbailey-testing-node-3:~$ sudo docker run -it ubuntu
Unable to find image 'ubuntu:latest' locally
Pulling repository ubuntu
07f8e8c5e660: Download complete 
e9e06b06e14c: Download complete 
a82efea989f9: Download complete 
37bea4ee0c81: Download complete 
Status: Downloaded newer image for ubuntu:latest
root@b41a2b672f3b:/# ping metadata
PING metadata.google.internal (169.254.169.254) 56(84) bytes of data.
64 bytes from metadata.google.internal (169.254.169.254): icmp_seq=1 ttl=254 time=0.200 ms
64 bytes from metadata.google.internal (169.254.169.254): icmp_seq=2 ttl=254 time=0.359 ms
64 bytes from metadata.google.internal (169.254.169.254): icmp_seq=3 ttl=254 time=0.221 ms
64 bytes from metadata.google.internal (169.254.169.254): icmp_seq=4 ttl=254 time=0.232 ms
64 bytes from metadata.google.internal (169.254.169.254): icmp_seq=5 ttl=254 time=0.136 ms
64 bytes from metadata.google.internal (169.254.169.254): icmp_seq=6 ttl=254 time=0.200 ms
64 bytes from metadata.google.internal (169.254.169.254): icmp_seq=7 ttl=254 time=0.263 ms
^C
--- metadata.google.internal ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 5997ms
rtt min/avg/max/mdev = 0.136/0.230/0.359/0.064 ms

Have you tried this from within an image other than gcr.io/cloud-solutions-images/nginx-ssl-proxy:latest?

@thockin
Copy link
Member

thockin commented May 20, 2015

cat /etc/resolv.conf from within the container?

On Tue, May 19, 2015 at 6:36 PM, Evan Brown notifications@github.com
wrote:

@thockin https://github.com/thockin this is a GKE cluster I created
today. Just created another ~2 minutes ago, same result:
Creating the GKE Cluster:

CLUSTER_NAME=mdtester
NUM_NODES=1
MACHINE_TYPE=g1-small
API_VERSION=0.17.0

gcloud alpha container clusters create ${CLUSTER_NAME}
--num-nodes ${NUM_NODES}
--machine-type ${MACHINE_TYPE}
--cluster-api-version ${API_VERSION}
--scopes "https://www.googleapis.com/auth/devstorage.full_control"
"https://www.googleapis.com/auth/projecthosting"

Tools on my workstation:

$ gcloud components update preview
All components are up to date.

$ kubectl version
Client Version: version.Info{Major:"0", Minor:"16", GitVersion:"v0.16.1", GitCommit:"b933dda5369043161fa7c7330dfcfbc4624d40e6", GitTreeState:"clean"}
Server Version: version.Info{Major:"0", Minor:"17", GitVersion:"v0.17.0", GitCommit:"82f8bdac06ddfacf493a9ed0fedc85f5ea62ebd5", GitTreeState:"clean"}

On node:

evanbrown@k8s-mdtester-node-1:~$ sudo bash
root@k8s-mdtester-node-1:/home/evanbrown# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d1de71de8b9a gcr.io/cloud-solutions-images/nginx-ssl-proxy:latest "/bin/sh -c ./start. 50 seconds ago Up 47 seconds k8s_nginx-ssl-proxy.22441003_nginx-ssl-proxy-7u7uh_default_9c82b564-fe8f-11e4-845f-42010af01d9a_b174af7af6a34bf6e2ee gcr.io/google_containers/pause:0.8.0 "/pause" About a minute ago Up About a minute k8s_POD.e4cc795_jenkins-agent-czbes_default_9daa9258-fe8f-11e4-845f-42010af01d9a_a96641ed8324df574b26 gcr.io/google_containers/pause:0.8.0 "/pause" About a minute ago Up About a minute k8s_POD.b72e170c_jenkins-leader-gnebq_default_9d16d691-fe8f-11e4-845f-42010af01d9a_123c038710d8e2e6e6a8 gcr.io/google_containers/pause:0.8.0 "/pause" About a minute ago Up About a minute k8s_POD.605b1ae5_nginx-ssl-proxy-7u7uh_default_9c82b564-fe8f-11e4-845f-42010af01d9a_3d84c97cb5e81c0e3a3a gcr.io/google_containers/skydns:2015-03-11-001 "/skydns -machines=h About a minute ago Up About a minute k8s_skydns.cd47cc92_kube-dns-lmn6f_default_776bdb99-fe8f-11e4-82e2-42010af01d9a_52f85030
c7124b4dac50 gcr.io/google_containers/kube2sky:1.4 "/kube2sky -domain=k About a minute ago Up About a minute k8s_kube2sky.87b54c53_kube-dns-lmn6f_default_776bdb99-fe8f-11e4-82e2-42010af01d9a_71c23a7fbbf319e174d0 gcr.io/google_containers/etcd:2.0.9 "/usr/local/bin/etcd About a minute ago Up About a minute k8s_etcd.13594ee7_kube-dns-lmn6f_default_776bdb99-fe8f-11e4-82e2-42010af01d9a_ba9cfac8
22ce1c0b0c8c gcr.io/google_containers/pause:0.8.0 "/pause" About a minute ago Up About a minute k8s_POD.8fdb0e41_kube-dns-lmn6f_default_776bdb99-fe8f-11e4-82e2-42010af01d9a_11a56c63
26949dfb38f7 gcr.io/google_containers/pause:0.8.0 "/pause" 2 minutes ago Up 2 minutes k8s_POD.e4cc795_fluentd-cloud-logging-k8s-mdtester-node-1_default_e7a73ce4931dc175ddc463501188a765_00ec948d

root@k8s-mdtester-node-1:/home/evanbrown# docker exec -ti d1de71de8b9a sh# ping metadata
ping: unknown host

ping metadata.google.internal

PING metadata.google.internal (169.254.169.254): 48 data bytes
56 bytes from 169.254.169.254: icmp_seq=0 ttl=254 time=0.335 ms
56 bytes from 169.254.169.254: icmp_seq=1 ttl=254 time=1.369 ms


Reply to this email directly or view it on GitHub
#8512 (comment)
.

@evandbrown
Copy link
Author

Ahh, ubuntu and busybox work. Two images I've been using on GKE 0.15.0 are breaking. Frak. I'll spin up a 0.15.0 cluster locally and confirm they work there (they were working on 0.15.0 on GKE early last week.)

/etc/resolv.conf is:

nameserver 10.195.240.10
nameserver 169.254.169.254
nameserver 10.240.0.1
search default.kubernetes.local default.svc.kubernetes.local svc.kubernetes.local kubernetes.local c.your-new-project.internal. 297725298471.google.internal. google.internal.

@thockin
Copy link
Member

thockin commented May 20, 2015

Is one of those your actual DNS service? The search path look splausible.
I don't know why it would be failing.

On Tue, May 19, 2015 at 9:08 PM, Evan Brown notifications@github.com
wrote:

Ahh, ubuntu and busybox work. Two images I've been using on GKE 0.15.0 are
breaking. Frak. I'll spin up a 0.15.0 cluster locally and confirm they work
there (they were working on 0.15.0 on GKE early last week.)

/etc/resolv.conf is:

nameserver 10.195.240.10
nameserver 169.254.169.254
nameserver 10.240.0.1
search default.kubernetes.local default.svc.kubernetes.local svc.kubernetes.local kubernetes.local c.your-new-project.internal. 297725298471.google.internal. google.internal.


Reply to this email directly or view it on GitHub
#8512 (comment)
.

@evandbrown
Copy link
Author

I haven't touched /etc/resolv.conf in either image. Here's output of docker inspect:

"HostConfig": {
        ...
        "Dns": [
            "10.195.240.10",
            "169.254.169.254",
            "10.240.0.1"
        ],
        "DnsSearch": [
            "default.kubernetes.local",
            "default.svc.kubernetes.local",
            "svc.kubernetes.local",
            "kubernetes.local",
            "c.your-new-project.internal.",
            "297725298471.google.internal.",
            "google.internal."
        ],

This is what I used to create the controller (the image is public):

{
  "kind": "ReplicationController",
  "apiVersion": "v1beta3",
  "metadata": {
    "name": "nginx-ssl-proxy",
    "labels": {
      "name": "nginx", "role": "ssl-proxy" 
    }
  },
  "spec": {
    "replicas": 1,
    "selector": {
      "name": "nginx", "role": "ssl-proxy"
    },
    "template": {
      "metadata": {
        "name": "nginx-ssl-proxy",
        "labels": {
          "name": "nginx", "role": "ssl-proxy"
        }
      },
      "spec": {
        "containers": [
          {
            "name": "nginx-ssl-proxy",
            "image": "gcr.io/cloud-solutions-images/nginx-ssl-proxy:latest",
            "env": [
                { 
                    "name": "SERVICE_HOST_ENV_NAME",
                    "value": "JENKINS_SERVICE_HOST"
                },
                { 
                    "name": "SERVICE_PORT_ENV_NAME",
                    "value": "JENKINS_SERVICE_PORT_UI"
                },
                { 
                    "name": "ENABLE_SSL",
                    "value": "false"
                },
                { 
                    "name": "ENABLE_BASIC_AUTH",
                    "value": "true"
                }
            ],
            "ports": [
                {
                  "name": "nginx-ssl-proxy-http",
                  "containerPort": 80
                },
                {
                  "name": "nginx-ssl-proxy-https",
                  "containerPort": 443 
                }
            ],
            "volumeMounts": [{
              "name": "secrets",
              "mountPath": "/etc/secrets",
              "readOnly": true
            }]
          }
        ],
        "volumes": [{
          "name": "secrets",
          "secret": {
            "secretName": "ssl-proxy-secret"
          }
        }]
      }
    }
  }
}

@thockin
Copy link
Member

thockin commented May 20, 2015

I'm not sure what to say. Something about that image is not using
/etc/resolv.conf.

On Tue, May 19, 2015 at 9:26 PM, Evan Brown notifications@github.com
wrote:

I haven't touched /etc/resolv.conf in either image. Here's output of docker
inspect:

"HostConfig": {
...
"Dns": [
"10.195.240.10",
"169.254.169.254",
"10.240.0.1"
],
"DnsSearch": [
"default.kubernetes.local",
"default.svc.kubernetes.local",
"svc.kubernetes.local",
"kubernetes.local",
"c.your-new-project.internal.",
"297725298471.google.internal.",
"google.internal."
],

This is what I used to create the controller (the image is public):

{
"kind": "ReplicationController",
"apiVersion": "v1beta3",
"metadata": {
"name": "nginx-ssl-proxy",
"labels": {
"name": "nginx", "role": "ssl-proxy"
}
},
"spec": {
"replicas": 1,
"selector": {
"name": "nginx", "role": "ssl-proxy"
},
"template": {
"metadata": {
"name": "nginx-ssl-proxy",
"labels": {
"name": "nginx", "role": "ssl-proxy"
}
},
"spec": {
"containers": [
{
"name": "nginx-ssl-proxy",
"image": "gcr.io/cloud-solutions-images/nginx-ssl-proxy:latest",
"env": [
{
"name": "SERVICE_HOST_ENV_NAME",
"value": "JENKINS_SERVICE_HOST"
},
{
"name": "SERVICE_PORT_ENV_NAME",
"value": "JENKINS_SERVICE_PORT_UI"
},
{
"name": "ENABLE_SSL",
"value": "false"
},
{
"name": "ENABLE_BASIC_AUTH",
"value": "true"
}
],
"ports": [
{
"name": "nginx-ssl-proxy-http",
"containerPort": 80
},
{
"name": "nginx-ssl-proxy-https",
"containerPort": 443
}
],
"volumeMounts": [{
"name": "secrets",
"mountPath": "/etc/secrets",
"readOnly": true
}]
}
],
"volumes": [{
"name": "secrets",
"secret": {
"secretName": "ssl-proxy-secret"
}
}]
}
}
}
}


Reply to this email directly or view it on GitHub
#8512 (comment)
.

@evandbrown
Copy link
Author

Cool, I'll keep diggin'. Thanks for the help thus far.

@evandbrown
Copy link
Author

On GKE cluster API version 0.16.0 everything works fine. But not 0.17.0. I've repro'd on multiple clusters in the last hour, and I haven't even taken a crazy pill. Something's up with 0.17.0 on GKE.

Easy to repro:

  1. Clone https://github.com/evandbrown/kube-jenkins-imager
  2. cluster_up.sh
  3. gcloud compute ssh k8s-imager-node-1
  4. exec into the nginx-ssl-proxy or jenkins-gcp-leader containers and ping metadata
  5. cluster_down.sh
  6. Change API version in cluster_up.sh to 0.16.0 and repeat above steps. ping metadata works now.

@ArtfulCoder
Copy link
Contributor

@evandbrown
Thank you for raising this issue!

@thockin @roberthbailey

This was indeed introduced in v.0.17.0

resolv.conf only allows a fixed number of search paths.
You can find this upper limit by looking up MAXDNSRCH in /usr/include/resolv.h

It is generally set to 6.

In v0.17.0, we introduced two new search domains, "default.svc.kubernetes.local" and "svc.kubernetes.local".

This caused .google.internal. search path to be the 7th search domain, essentially causing it to be silently ignored.

Observations

  • This means that any relative path that uses google.internal. will not work.
  • This limit obviously doesnt apply for FQDN since search path suffixes are not required for that..
    As a result, ping worked with FQDN.
  • ubuntu and busybox containers worked because they were probably not wrapped in pods or somehow, Kubernetes didnt inject the dns search paths, and thus did not exceed 6 search path max-limit, or google.internal. was amongst the first 6 search paths. The real reason would have to be investigated.

In terms of fixes, we would probably have to find a way to reduce the number of search domains..

@ArtfulCoder ArtfulCoder added this to the v1.0 milestone May 20, 2015
@brendandburns
Copy link
Contributor

This seems bad. Can we get this fixed asap?

@ArtfulCoder
Copy link
Contributor

This can be fixed immediately if we rollback #8089

We will have to find a better fix after the rollback..
I am OOO in the morning.
Could someone else can help reverting and running e2e before checking in (assuming code freeze is off).
Maybe @thockin has other ideas to work around this.

@thockin
Copy link
Member

thockin commented May 20, 2015

I'll discuss options with @vishh today. This will make the DNS conversion
bumpier. Six? That should be enough for anyone. I'll clear a rollback
with Quinton re: PR freeze.

@smarterclayton

On Wed, May 20, 2015 at 8:24 AM, Abhi Shah notifications@github.com wrote:

This can be fixed immediately if we rollback #8089
#8089

We will have to find a better fix after the rollback..


Reply to this email directly or view it on GitHub
#8512 (comment)
.

@brendandburns
Copy link
Contributor

and can we add a simple e2e so we don't break this again in the future?

(and @roberthbailey this is also prob. worth cutting an 0.17.1 for)

@thockin
Copy link
Member

thockin commented May 20, 2015

Yeah, we can add 'metadata' to the list or URLs we resolve in e2e -
iterally 1 line

On Wed, May 20, 2015 at 8:50 AM, Brendan Burns notifications@github.com
wrote:

and can we add a simple e2e so we don't break this again in the future?

(and @roberthbailey https://github.com/roberthbailey this is also prob.
worth cutting an 0.17.1 for)


Reply to this email directly or view it on GitHub
#8512 (comment)
.

@roberthbailey
Copy link
Contributor

Agreed that this is worth 0.17.1. Bummer.

@mbforbes

@thockin
Copy link
Member

thockin commented May 20, 2015

Q OK'ed a rollback. I'm on it.

On Wed, May 20, 2015 at 9:00 AM, Robert Bailey notifications@github.com
wrote:

Agreed that this is worth 0.17.1. Bummer.

@mbforbes https://github.com/mbforbes


Reply to this email directly or view it on GitHub
#8512 (comment)
.

@thockin
Copy link
Member

thockin commented May 20, 2015

#8568 - can I get a quick review?

On Wed, May 20, 2015 at 9:03 AM, Tim Hockin thockin@google.com wrote:

Q OK'ed a rollback. I'm on it.

On Wed, May 20, 2015 at 9:00 AM, Robert Bailey notifications@github.com
wrote:

Agreed that this is worth 0.17.1. Bummer.

@mbforbes https://github.com/mbforbes


Reply to this email directly or view it on GitHub
#8512 (comment)
.

@ghost
Copy link

ghost commented May 20, 2015

And I beg for a couple hours grace to see if we can fix it before cutting
0.17.1

On Wed, May 20, 2015 at 9:07 AM, Tim Hockin notifications@github.com
wrote:

#8568 - can I get a quick review?

On Wed, May 20, 2015 at 9:03 AM, Tim Hockin thockin@google.com wrote:

Q OK'ed a rollback. I'm on it.

On Wed, May 20, 2015 at 9:00 AM, Robert Bailey <notifications@github.com

wrote:

Agreed that this is worth 0.17.1. Bummer.

@mbforbes https://github.com/mbforbes


Reply to this email directly or view it on GitHub
<
#8512 (comment)

.


Reply to this email directly or view it on GitHub
#8512 (comment)
.

@dchen1107
Copy link
Member

cc/ @dchen1107

@vishh
Copy link
Contributor

vishh commented May 20, 2015

I am working on extending e2e to cover this failure.

@thockin
Copy link
Member

thockin commented May 20, 2015

we can get the e2e fix in without the rest, good idea

On Wed, May 20, 2015 at 9:52 AM, Vish Kannan notifications@github.com
wrote:

I am working on extending e2e to cover this failure.


Reply to this email directly or view it on GitHub
#8512 (comment)
.

@ArtfulCoder
Copy link
Contributor

closing this issue.
It is resolved now with #8568 and we have e2e test as well to cover this.

@bgrant0607
Copy link
Member

@evandbrown Which Google API client?

We don't want containers talking to the host, in general.

@evandbrown
Copy link
Author

@bgrant0607 The Java API client in my case, but it generally affected any client that tried to resolve the metadata name. All fixed and working great in 0.17.1.

What about container->host communication, though?

@bgrant0607
Copy link
Member

Filed #8990. Container->host communication should require special permission.

Is this the client?
https://developers.google.com/api-client-library/java/

@bgrant0607
Copy link
Member

It has to be possible to call Google APIs from outside of GCE. Presumably, you just need to provide the credentials.

https://github.com/google/google-api-java-client/blob/86fd2a574487aaff9b4dbdd0396a98bb2741828f/google-api-client/src/main/java/com/google/api/client/googleapis/auth/oauth2/OAuth2Utils.java#L56

  static boolean runningOnComputeEngine(HttpTransport transport) {
    try {
      GenericUrl tokenUrl = new GenericUrl(METADATA_SERVER_URL);
      HttpRequest request = transport.createRequestFactory().buildGetRequest(tokenUrl);
      HttpResponse response = request.execute();
      HttpHeaders headers = response.getHeaders();
      if (headersContainValue(headers, "Metadata-Flavor", "Google")) {
        return true;
      }
    } catch (IOException expected) {
    }
    return false;
  }

https://github.com/google/google-api-java-client/blob/86fd2a574487aaff9b4dbdd0396a98bb2741828f/google-api-client/src/main/java/com/google/api/client/googleapis/auth/oauth2/DefaultCredentialProvider.java#L287

@evandbrown
Copy link
Author

It's definitely possible, but creating a service account (or AWS IAM access/secret key) and distributing/revoking/rotating that credential is a huge pain. The metadata service is the de facto standard for distributing short-lived creds to apps running on EC2 (IAM roles) or GCE (scoped compute service accounts), and SDKs from both support this very well.

The GCE and EC2 metadata services would need to add done thing similar to namespaces to help here.

@thockin
Copy link
Member

thockin commented May 29, 2015

It's a real balancing act. On the one hand we don't want people to couple
to the cloud environment. On the other hand it's actually useful (as
here). On the third hand, the VM almost certainly has privs we don't want
containers to have by default.

And yet, we can't really shut down access to things outside your k8s
cluster - project-scoped resources are a big part of the cloud experience
and we're not going to be perpetually inventing new ways to import them
into kubernetes.

We probably need a way to offer GCE's metadata at reduced scope to
containers.

On Thu, May 28, 2015 at 11:18 PM, Evan Brown notifications@github.com
wrote:

It's definitely possible, but creating a service account (or AWS IAM
access/secret key) and distributing/revoking/rotating that credential is a
huge pain. The metadata service is the de facto standard for distributing
short-lived creds to apps running on EC2 (IAM roles) or GCE (scoped compute
service accounts), and SDKs from both support this very well.

The GCE and EC2 metadata services would need to add done thing similar to
namespaces to help here.


Reply to this email directly or view it on GitHub
#8512 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

No branches or pull requests

9 participants