Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent kube-dns errors after upgrading kops to v1.9 #5103

Closed
erstaples opened this issue May 3, 2018 · 3 comments
Closed

Intermittent kube-dns errors after upgrading kops to v1.9 #5103

erstaples opened this issue May 3, 2018 · 3 comments

Comments

@erstaples
Copy link

  1. What kops version are you running? The command kops version, will display
    this information.

Version 1.9.0 (git-cccd71e67)

  1. What Kubernetes version are you running? kubectl version will print the
    version if a cluster is running or provide the Kubernetes version specified as
    a kops flag.
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:55:54Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/386"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
  1. What cloud provider are you using?

AWS

  1. What commands did you run? What is the simplest way to reproduce this issue?

I followed the instructions here to update kube-dns from 1.14.9 to 1.14.10. Then I deployed several apps that use ExternalName services.

  1. What happened after the commands executed?

I noticed intermittent errors in our PHP apps: PDO::__construct(): php_network_getaddresses: getaddrinfo failed: Name or service not known. Suspecting a kube-dns issue, I looked at the logs in kube-dns and saw the following repeated over-and-over:

I0503 19:12:44.971393       1 logs.go:41] skydns: incomplete CNAME chain from "my-rds-instance.us-west-1.rds.amazonaws.com.": failure to lookup name
  1. What did you expect to happen?

I expected ExternalName DNS queries to resolve every time.

  1. Please provide your cluster manifest. Execute
    kops get --name my.example.com -oyaml to display your cluster manifest.
    You may want to remove your cluster name and other sensitive information.
apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2018-04-24T16:19:39Z
  name: prod-cluster.example.io
spec:
  api:
    dns: {}
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: <redacted>
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-us-west-1a
      name: a
    name: main
  - etcdMembers:
    - instanceGroup: master-us-west-1a
      name: a
    name: events
  fileAssets:
  - content: |
      yes
    name: docker-1.12
    path: /etc/coreos/docker-1.12
    roles:
    - Node
    - Master
  - content: |
      #cloud-config
      coreos:
        update:
          reboot-strategy: "etcd-lock"
        locksmith:
          window-start: Tue 09:00
          window-length: 2h
    name: cloud-config.yaml
    path: /home/core/cloud-config.yaml
    roles:
    - Node
    - Master
  hooks:
  - manifest: |
      <redacted>
	 roles:
    - Node
    - Master
  - manifest: |
      Type=oneshot
      ExecStart=/usr/bin/coreos-cloudinit --from-file=/home/core/cloud-config.yaml
    name: reboot-window.service
    roles:
    - Node
    - Master
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    authorizationRbacSuperUser: admin
    featureGates:
      TaintBasedEvictions: "true"
      TaintNodesByCondition: "true"
  kubeControllerManager:
    featureGates:
      TaintBasedEvictions: "true"
      TaintNodesByCondition: "true"
  kubeScheduler:
    featureGates:
      TaintBasedEvictions: "true"
      TaintNodesByCondition: "true"
  kubelet:
    featureGates:
      TaintBasedEvictions: "true"
      TaintNodesByCondition: "true"
  kubernetesApiAccess:
  - <redacted>
  kubernetesVersion: 1.9.3
  masterPublicName: api.<redacted>
  networkCIDR: <redacted> 
  networkID: <redacted>
  networking:
    canal: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - <redacted>
  subnets:
  - cidr: <redacted>
    name: us-west-1a
    type: Public
    zone: us-west-1a
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-04-24T16:19:44Z
  labels:
    kops.k8s.io/cluster: <redacted> 
  name: <redacted>
spec:
  additionalSecurityGroups:
  - <redacted>
  image: ami-a9c2d0c9
  machineType: c5.large
  maxSize: 10
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: <redacted> 
  role: Node
  subnets:
  - us-west-1a

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-04-24T16:19:45Z
  labels:
    kops.k8s.io/cluster: <redacted> 
  name: dedicated-memory
spec:
  additionalSecurityGroups:
  - <redacted>
  image: ami-a9c2d0c9
  machineType: m5.xlarge
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: dedicated-memory
  role: Node
  subnets:
  - us-west-1a
  taints:
  - dedicated=memory:NoSchedule

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-04-24T16:19:40Z
  labels:
    kops.k8s.io/cluster: <redacted>
  name: master-us-west-1a
spec:
  additionalSecurityGroups:
  - <redacted>
  image: ami-a9c2d0c9
  machineType: m5.2xlarge
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-west-1a
  role: Master
  subnets:
  - us-west-1a

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-04-24T16:19:40Z
  labels:
    kops.k8s.io/cluster: <redacted> 
  name: nodes
spec:
  additionalSecurityGroups:
  - <redacted>
  image: ami-a9c2d0c9
  machineType: c5.2xlarge
  maxSize: 4
  minSize: 4
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  subnets:
  - us-west-1a
  1. Please run the commands with most verbose logging by adding the -v 10 flag.
    Paste the logs into this report, or in a gist and provide the gist link here.

N/A. Not a kops CLI related issue.

  1. Anything else do we need to know?

Please let me know if this is not appropriate for this repo.

@erstaples
Copy link
Author

erstaples commented May 15, 2018

Reverting kube-dns to 1.14.5 fixed it for me. No more intermittent errors.

@mootpt
Copy link

mootpt commented Jun 13, 2018

was also seeing high latency when attempting to resolve rds in some cases. Worth noting that 2 of 3 of my clusters were not exhibiting this behavior. All clusters running kube-dns:1.14.9. @erstaples did you attempt upgrading to 1.14.9 once more to see if the behavior remanifested? This issue is perplexing. I'd say reopen this issue until we get official word on the culprit.

@ruudk
Copy link

ruudk commented Aug 30, 2019

Seeing the same flood of errors with k8s.gcr.io/k8s-dns-kube-dns-amd64:1.15.4. Anybody an idea how to solve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants