kubectl logs POD broken when using amazon-vpc-cni-k8s (kubelet registers the wrong IP) #4218

dezmodue · 2018-01-08T00:00:11Z

What kops version are you running? The command kops version, will display
this information.
kops and nodeup built from master 2fdf834 - golang 1.8.5
What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T05:17:43Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
What cloud provider are you using?
AWS
What commands did you run? What is the simplest way to reproduce this issue?
kubectl logs POD -n NAMESPACE
What happened after the commands executed?

Error from server: Get https://10.103.20.110:10250/containerLogs/monitoring/grafana-0/grafana: dial tcp 10.103.20.110:10250: getsockopt: connection timed out

What did you expect to happen?
The container logs to be displayed
Please provide your cluster manifest. Execute
kops get --name my.example.com -oyaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2018-01-03T13:20:31Z
  name: megamind.mycompany.io
spec:
  additionalPolicies:
    master: |
      [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateNetworkInterface",
                "ec2:AttachNetworkInterface",
                "ec2:DeleteNetworkInterface",
                "ec2:DetachNetworkInterface",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeInstances",
                "ec2:ModifyNetworkInterfaceAttribute",
                "ec2:AssignPrivateIpAddresses"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": "tag:TagResources",
            "Resource": "*"
        }
      ]
    node: |
      [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateNetworkInterface",
                "ec2:AttachNetworkInterface",
                "ec2:DeleteNetworkInterface",
                "ec2:DetachNetworkInterface",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeInstances",
                "ec2:ModifyNetworkInterfaceAttribute",
                "ec2:AssignPrivateIpAddresses"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": "tag:TagResources",
            "Resource": "*"
        }
      ]
  api:
    dns: {}
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://megamind-dev-kops-state/megamind.mycompany.io
  etcdClusters:
  - enableEtcdTLS: true
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-eu-west-1a
      name: a
    - encryptedVolume: true
      instanceGroup: master-eu-west-1b
      name: b
    - encryptedVolume: true
      instanceGroup: master-eu-west-1c
      name: c
    name: main
    version: 3.1.11
  - enableEtcdTLS: true
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-eu-west-1a
      name: a
    - encryptedVolume: true
      instanceGroup: master-eu-west-1b
      name: b
    - encryptedVolume: true
      instanceGroup: master-eu-west-1c
      name: c
    name: events
    version: 3.1.11
  iam:
    allowContainerRegistry: true
    legacy: false
  kubernetesApiAccess:
  - 1.2.3.4/32
  kubernetesVersion: 1.8.4
  masterInternalName: api.internal.megamind.mycompany.io
  masterPublicName: api.megamind.mycompany.io
  networkCIDR: 10.103.0.0/16
  networkID: vpc-01234567
  networking:
    amazonvpc: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 1.2.3.4/32
  subnets:
  - cidr: 10.103.16.0/21
    name: eu-west-1a
    type: Private
    zone: eu-west-1a
  - cidr: 10.103.32.0/21
    name: utility-eu-west-1a
    type: Utility
    zone: eu-west-1a
  - cidr: 10.103.48.0/21
    name: eu-west-1b
    type: Private
    zone: eu-west-1b
  - cidr: 10.103.64.0/21
    name: utility-eu-west-1b
    type: Utility
    zone: eu-west-1b
  - cidr: 10.103.80.0/21
    name: eu-west-1c
    type: Private
    zone: eu-west-1c
  - cidr: 10.103.112.0/21
    name: utility-eu-west-1c
    type: Utility
    zone: eu-west-1c
  topology:
    bastion:
      bastionPublicName: bastion.megamind.mycompany.io
    dns:
      type: Public
    masters: private
    nodes: private

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-01-03T13:20:34Z
  labels:
    kops.k8s.io/cluster: megamind.mycompany.io
  name: bastions
spec:
  associatePublicIp: true
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02
  machineType: t2.micro
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: bastion-eu-west-1a
  role: Bastion
  subnets:
  - utility-eu-west-1a

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-01-03T13:20:32Z
  labels:
    kops.k8s.io/cluster: megamind.mycompany.io
  name: master-eu-west-1a
spec:
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02
  machineType: m3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-eu-west-1a
  role: Master
  subnets:
  - eu-west-1a

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-01-03T13:20:32Z
  labels:
    kops.k8s.io/cluster: megamind.mycompany.io
  name: master-eu-west-1b
spec:
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02
  machineType: m3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-eu-west-1b
  role: Master
  subnets:
  - eu-west-1b

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-01-03T13:20:32Z
  labels:
    kops.k8s.io/cluster: megamind.mycompany.io
  name: master-eu-west-1c
spec:
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02
  machineType: m3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-eu-west-1c
  role: Master
  subnets:
  - eu-west-1c

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-01-03T13:20:34Z
  labels:
    kops.k8s.io/cluster: megamind.mycompany.io
  name: nodes-eu-west-1
spec:
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02
  machineType: r4.2xlarge
  maxSize: 3
  minSize: 3
  nodeLabels:
    kops.k8s.io/instancegroup: nodes-eu-west-1
  role: Node
  subnets:
  - eu-west-1a
  - eu-west-1b
  - eu-west-1c

Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
Anything else do we need to know?
This error seems to be related to the fact that the kubelet registers the wrong IP, specifically it seems that the kubelet reports one of the secondary private IPs
In the example error reported:

Error from server: Get https://10.103.20.110:10250/containerLogs/monitoring/grafana-0/grafana: dial tcp 10.103.20.110:10250: getsockopt: connection timed out

10.103.20.110 is a secondary private IP and it is indeed the IP shown by kubectl describe node:

Addresses:
  InternalIP:   10.103.20.110
  InternalDNS:  ip-10-103-21-40.megamind.internal
  Hostname:     ip-10-103-21-40.megamind.internal

Locally curl works on both the primary IPs on eth0 and eth1

The problem has also been occurring on the master nodes and it manifests with new nodes unable to join the cluster because the kubelet is unable to contact the API (same reason, wrong IPs)

As a poc I built a modified nodeup that passes the flag --node-ip=LOCAL-IPV4 to the kubelet (where LOCAL-IPV4 is the result of curl http://169.254.169.254/latest/meta-data/local-ipv4)

With that in place the master and nodes build correctly and kubectl logs works as expected

The text was updated successfully, but these errors were encountered:

chrislovecnm · 2018-01-08T00:38:35Z

Can you please file this issue with https://github.com/aws/amazon-vpc-cni-k8s?

dezmodue · 2018-01-09T21:27:15Z

Hi @chrislovecnm, I will do as I cannot seem to reproduce the same issue when running a private kops cluster with flannel-vxlan by simply adding eni and IPs to the nodes

liwenwu-amazon · 2018-01-12T18:30:11Z

Hi @dezmodue @chrislovecnm , the cni plugin back-end (L-IPAM) allocates IP addresses on the primary ENI interface right after it is initialized. So at the time when kubelet is reporting node address, the primary ENI interface already have multiple secondary IP addresses assigned.

I like the idea of your poc that passing --node-ip=LOCAL-IPV4 to the kubelet. Can we use that to fix this issue?

dezmodue · 2018-01-16T10:02:47Z

@chrislovecnm if that is a satisfactory solution I could try to work on it (with some guidance on how it should be implemented)

chrislovecnm · 2018-02-08T00:27:03Z

@liwenwu-amazon what is the recommended solution?

liwenwu-amazon · 2018-02-08T00:36:30Z

The recommended solution is

kubeletes must also explicit specify using primary IPv4 address on the Primary ENI as its node-ip, for example:

--node-ip=$(curl http://169.254.169.254/latest/meta-data/local-ipv4)

chrislovecnm · 2018-02-08T00:44:04Z

If someone wants to contribute this, I can provide more details on how to do this in nodeup.

dezmodue · 2018-02-09T17:22:56Z

@chrislovecnm I would like to contribute

dezmodue · 2018-02-09T23:42:42Z

@chrislovecnm I gave it a try - #4417 - let me know

dezmodue · 2018-04-17T08:25:22Z

@chrislovecnm is it ok to close this issue since 4417 is released with 1.9.0 ?

fejta-bot · 2018-07-16T09:16:11Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-08-15T10:02:37Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2018-09-14T10:49:40Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2018-09-14T10:49:47Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dezmodue mentioned this issue Jan 10, 2018

kubectl logs breaks when using the cni plugin aws/amazon-vpc-cni-k8s#21

Closed

dezmodue mentioned this issue Mar 1, 2018

Bind the kubelet to the local ipv4 address #4417

Merged

micahhausler mentioned this issue Mar 30, 2018

Kubelet reports secondary InternalIP in AWS with multiple ENIs kubernetes/kubernetes#61921

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 16, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 15, 2018

k8s-ci-robot closed this as completed Sep 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubectl logs POD broken when using amazon-vpc-cni-k8s (kubelet registers the wrong IP) #4218

kubectl logs POD broken when using amazon-vpc-cni-k8s (kubelet registers the wrong IP) #4218

dezmodue commented Jan 8, 2018 •

edited

Loading

chrislovecnm commented Jan 8, 2018

dezmodue commented Jan 9, 2018

liwenwu-amazon commented Jan 12, 2018

dezmodue commented Jan 16, 2018 •

edited

Loading

chrislovecnm commented Feb 8, 2018

liwenwu-amazon commented Feb 8, 2018

chrislovecnm commented Feb 8, 2018

dezmodue commented Feb 9, 2018

dezmodue commented Feb 9, 2018

dezmodue commented Apr 17, 2018

fejta-bot commented Jul 16, 2018

fejta-bot commented Aug 15, 2018

fejta-bot commented Sep 14, 2018

k8s-ci-robot commented Sep 14, 2018

kubectl logs POD broken when using amazon-vpc-cni-k8s (kubelet registers the wrong IP) #4218

kubectl logs POD broken when using amazon-vpc-cni-k8s (kubelet registers the wrong IP) #4218

Comments

dezmodue commented Jan 8, 2018 • edited Loading

chrislovecnm commented Jan 8, 2018

dezmodue commented Jan 9, 2018

liwenwu-amazon commented Jan 12, 2018

dezmodue commented Jan 16, 2018 • edited Loading

chrislovecnm commented Feb 8, 2018

liwenwu-amazon commented Feb 8, 2018

chrislovecnm commented Feb 8, 2018

dezmodue commented Feb 9, 2018

dezmodue commented Feb 9, 2018

dezmodue commented Apr 17, 2018

fejta-bot commented Jul 16, 2018

fejta-bot commented Aug 15, 2018

fejta-bot commented Sep 14, 2018

k8s-ci-robot commented Sep 14, 2018

dezmodue commented Jan 8, 2018 •

edited

Loading

dezmodue commented Jan 16, 2018 •

edited

Loading