Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB - Update stuck on peer-finder cannot unmarshal DNS message #449

Closed
endrec opened this issue Mar 7, 2019 · 17 comments
Closed

MongoDB - Update stuck on peer-finder cannot unmarshal DNS message #449

endrec opened this issue Mar 7, 2019 · 17 comments
Assignees

Comments

@endrec
Copy link

endrec commented Mar 7, 2019

I'm trying to upgrade to kubeDB v0.10.0, but I'm unable to update my mongoDB version.

Versions:
k8s: v1.11.5
kubeDB: 0.10.0
mongo: 3.6

Mongo runs in a custom namespace, kubeDB operator is in kube-system.

I successfully updated kubeDB operator and catalog (followed instructions here), and as mongo 3.6-v1 is now depreciated, I edited my mongo CRD, and replaced it with 3.6-v2.

The operator started the RollingUpdate, shut down one of the replicas, and started a new one with the new version as it should, but the new pod stuck in the Init status.
After checking the logs, I found that the bootstrap container is unable to run to completion, because peer-finder is stuck in a cannot unmarshal DNS message loop:

2019/03/07 14:49:53 lookup mongo-gvr on 172.20.0.10:53: cannot unmarshal DNS message
2019/03/07 14:49:54 lookup mongo-gvr on 172.20.0.10:53: cannot unmarshal DNS message
2019/03/07 14:49:55 lookup mongo-gvr on 172.20.0.10:53: cannot unmarshal DNS message

Any idea how can I upgrade mongo to the non-deprecated version?

@the-redback
Copy link
Contributor

Hi @endrec. Can you share your yaml?

@the-redback the-redback self-assigned this Mar 8, 2019
@the-redback
Copy link
Contributor

Can you also provide the docker version of this kubernetes node?

@endrec
Copy link
Author

endrec commented Mar 8, 2019

Hi @the-redback,

The node is an AWS EC2 instance (we are using EKS), with the amazon-eks-node-1.11-v20190211 image:

  • Amazon Linux 2
  • Kernel: 4.14.94-89.73.amzn2.x86_64
  • Docker: 17.6.2

I suppose you meant the MongoDB yaml, here it is:

apiVersion: kubedb.com/v1alpha1
kind: MongoDB
metadata:
  creationTimestamp: 2019-03-07T14:24:51Z
  finalizers:
  - kubedb.com
  generation: 4
  labels:
    chart: mongo-0.1.0
    heritage: Tiller
    mongo: mongo
    release: mongo
  name: mongo
  namespace: rungway
  resourceVersion: "1630937"
  selfLink: REDACTED
  uid: REDACTED
spec:
  backupSchedule:
    cronExpression: '@every 6h'
    podTemplate:
      controller: {}
      metadata: {}
      spec:
        resources: {}
    s3:
      bucket: REDACTED
      endpoint: s3.amazonaws.com
      prefix: REDACTED
    storageSecretName: REDACTED
  databaseSecret:
    secretName: mongo-auth
  monitor:
    agent: prometheus.io/coreos-operator
    prometheus:
      interval: 10s
      labels:
        app: kubedb-mongo
        prometheus: monitoring
      namespace: monitoring
      port: 56790
    resources: {}
  podTemplate:
    controller: {}
    metadata: {}
    spec:
      livenessProbe: {}
      readinessProbe: {}
      resources:
        requests:
          cpu: 16m
          memory: 512Mi
  replicaSet:
    keyFile:
      secretName: mongo-keyfile
    name: rs0
  replicas: 3
  serviceTemplate:
    metadata: {}
    spec: {}
  storage:
    accessModes:
    - ReadWriteOnce
    dataSource: null
    resources:
      requests:
        storage: 50Mi
    storageClassName: standard
  storageType: Durable
  terminationPolicy: Pause
  updateStrategy:
    type: RollingUpdate
  version: 3.6-v2
status:
  observedGeneration: 4$12823691698954692130
  phase: Failed
  reason: timed out waiting for the condition

@the-redback
Copy link
Contributor

the-redback commented Mar 8, 2019

It's working for me. Can you look into a running pod and see /etc/resolv.conf?
Also, nslookup mongo-gvr?

kubectl run -it -n demo --rm --restart=Never dnsutils --image=tutum/dnsutils  --command -- bash -c 'nslookup mgo-replicaset-gvr; cat /etc/resolv.conf'

@endrec
Copy link
Author

endrec commented Mar 8, 2019

$ kubectl run -it -n rungway --rm --restart=Never dnsutils --image=tutum/dnsutils  --command -- bash -c 'nslookup mongo-gvr; cat /etc/resolv.conf'
Server:		172.20.0.10
Address:	172.20.0.10#53

Non-authoritative answer:
Name:	mongo-gvr.rungway.svc.cluster.local
Address: 10.0.77.31

nameserver 172.20.0.10
search rungway.svc.cluster.local svc.cluster.local cluster.local eu-west-1.compute.internal us-west-2.compute.internal
options ndots:5
pod "dnsutils" deleted

@the-redback
Copy link
Contributor

I am still unable to recreate the issue. Are you using extra plugins that may affect dns query?

@endrec
Copy link
Author

endrec commented Mar 8, 2019

not that I'm aware of.

@endrec
Copy link
Author

endrec commented Mar 8, 2019

I found a lof of this in my kube-dns log:

I0308 11:46:53.282402       1 dns.go:601] Could not find endpoints for service "kubedb" in namespace "rungway". DNS records will be created once endpoints show up.

I do have a headless service called kubedb, is this something created by the operator?

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2019-02-26T15:35:21Z
  name: kubedb
  namespace: rungway
  resourceVersion: "35254"
spec:
  clusterIP: None
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

@endrec
Copy link
Author

endrec commented Mar 8, 2019

I think this is the underlying issue: golang/go#27546
Unfortunately EKS runs kube-dns 1.14.10.

@the-redback
Copy link
Contributor

😮

I used eksctl to create cluster in EKS. that used coredns instead of kube-dns (AFAIK), that's why I didn't faced that problem, I guess.

@endrec
Copy link
Author

endrec commented Mar 11, 2019

You created a new 1.11 cluster, where the default is coredns, and our cluster was upgraded from 1.10, where the default was kube-dns... A bit confusing...
Source: https://docs.aws.amazon.com/eks/latest/userguide/coredns.html

@the-redback
Copy link
Contributor

the-redback commented Mar 11, 2019

I see.

I have pushed new mongodb images to my dockerhub registry where peer-finder is built in go1.10.

Can you create this new mongodb version and deploy a new replicaset using this mongoversion 3.6-dev ?

apiVersion: catalog.kubedb.com/v1alpha1
kind: MongoDBVersion
metadata:
  name: "3.6-dev"
  labels:
    app: kubedb
spec:
  version: "3.6"
  db:
    image: "maruftuhin/mongo:3.6-v2"
  exporter:
    image: "maruftuhin/mongodb_exporter:v1.0.0"
  tools:
    image: "maruftuhin/mongo-tools:3.6-v2"

@endrec
Copy link
Author

endrec commented Mar 11, 2019

Yep, that version is working fine, thanks.

@the-redback
Copy link
Contributor

I am updating kubedb mongodb images. 👍

@the-redback
Copy link
Contributor

I have updated kubedb/mongo images [non-dprecated versions]. You need to pull kubedb/mongo images inside your all nodes. Let me know, if everything is working.

@endrec
Copy link
Author

endrec commented Mar 11, 2019

Looks fine, thanks for the quick solution!

@the-redback
Copy link
Contributor

No problem. 🙂 👍

the-redback added a commit to kmodules/peer-finder that referenced this issue Apr 2, 2019
the-redback added a commit to kmodules/peer-finder that referenced this issue Apr 2, 2019
the-redback added a commit to kmodules/peer-finder that referenced this issue Apr 2, 2019
the-redback added a commit to kmodules/peer-finder that referenced this issue Apr 2, 2019
the-redback added a commit to kmodules/peer-finder that referenced this issue Apr 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants