Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statefulset DNS not function well after upgrade to 1.7.3 #50227

Closed
weiwei04 opened this issue Aug 7, 2017 · 11 comments
Closed

Statefulset DNS not function well after upgrade to 1.7.3 #50227

weiwei04 opened this issue Aug 7, 2017 · 11 comments
Labels
area/dns kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@weiwei04
Copy link
Contributor

weiwei04 commented Aug 7, 2017

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

after I upgrade from 1.6.6 yo 1.7.3
statefulset drone-server with headless service drone-server
nslookup drone-server-0.drone-server return nslookup: can't resolve 'drone-server-0.drone-server': Name does not resolve

What you expected to happen:

return a valid dns record

How to reproduce it (as minimally and precisely as possible):

kubectl apply the yaml from https://github.com/yaoshipu/aslan-platform/tree/spock/reaper/kubernetes/http on a k8s 1.6.6 cluster

I believe my statefullset yaml file for drone-server is valid:

---
apiVersion: v1
kind: Service
metadata:
  name: drone-server
  namespace: spock
  labels:
    ke-app: drone
    ke-svc: server
spec:
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: 80
  selector:
    ke-app: drone
    ke-svc: server
  clusterIP: None
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: drone-server
  namespace: spock
  labels:
    ke-app: drone
    ke-svc: server
spec:
  serviceName: drone-server
  replicas: 1
  template:
    metadata:
      labels:
        ke-app: drone
        ke-svc: server
    spec:
      imagePullSecrets:
        - name: drone-secrets
      containers:
        - image: drone/drone:0.7.3
          name: drone-server
          env:
            - name: DOCKER_API_VERSION
              valueFrom:
                configMapKeyRef:
                  name: drone-config
                  key: docker.api.version
            - name: DRONE_SERVER_HOST
              valueFrom:
                configMapKeyRef:
                  name: drone-config
                  key: server.host
            - name: DRONE_SERVER_ADDR
              value: ":80"
            - name: DRONE_DEBUG
              valueFrom:
                configMapKeyRef:
                  name: drone-config
                  key: server.debug.is.enabled
            - name: DRONE_SECRET
              valueFrom:
                secretKeyRef:
                  name: drone-secrets
                  key: server.secret
            - name: DRONE_DATABASE_DRIVER
              valueFrom:
                configMapKeyRef:
                  name: drone-config
                  key: server.database.driver
            - name: DRONE_DATABASE_DATASOURCE
              valueFrom:
                configMapKeyRef:
                  name: drone-config
                  key: server.database.config
            - name: DRONE_OPEN
              valueFrom:
                configMapKeyRef:
                  name: drone-config
                  key: server.is.open
            - name: DRONE_ORGS
              valueFrom:
                configMapKeyRef:
                  name: drone-config
                  key: server.orgs.list
            - name: DRONE_ADMIN
              valueFrom:
                configMapKeyRef:
                  name: drone-config
                  key: server.admin.list
            - name: DRONE_ADMIN_ALL
              valueFrom:
                configMapKeyRef:
                  name: drone-config
                  key: server.admin.everyone.is.admin
            - name: DRONE_GITHUB
              valueFrom:
                configMapKeyRef:
                  name: drone-config
                  key: server.remote.github.is.enabled
            - name: DRONE_GITHUB_CLIENT
              valueFrom:
                configMapKeyRef:
                  name: drone-config
                  key: server.remote.github.client.id
            - name: DRONE_GITHUB_SECRET
              valueFrom:
                secretKeyRef:
                  name: drone-secrets
                  key: server.remote.github.secret
            - name: DRONE_GITHUB_GIT_USERNAME
              valueFrom:
                secretKeyRef:
                  name: drone-secrets
                  key: server.remote.github.username
          ports:
            - containerPort: 80
              protocol: TCP
          resources:
            limits:
              cpu: 2000m
              memory: 4Gi
            requests:
              cpu: 1000m
              memory: 2Gi

Anything else we need to know?:

This happens after I upgrade k8s from 1.6.6 to 1.7.3

  1. I created this statefulset on a k8s cluster with version 1.6.6(apiserver, scheduler, controller-manager), and It works as I expected.

  2. I upgrade the k8s cluster to 1.7.3 by upgrade apiserver, scheduler, controller-manager and kubelet step by step.

  3. kube-dns version: 1.14.1(gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1)

  4. kube-dns logs

before upgrade:

I0729 06:42:17.697058       1 dns.go:264] New service: drone-server
I0729 06:42:17.697068       1 dns.go:548] Could not find endpoints for service "drone-server" in namespace "spock". DNS records will be created once endpoints show up.
I0729 06:42:17.697406       1 dns.go:264] New service: drone-server
I0729 06:42:17.697418       1 dns.go:548] Could not find endpoints for service "drone-server" in namespace "default". DNS records will be created once endpoints show up.
I0729 06:42:17.704426       1 dns.go:494] Added SRV record &{Host:drone-server-0.drone-server.default.svc.cluster.local. Port:80 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0729 06:42:17.705562       1 dns.go:494] Added SRV record &{Host:drone-server-0.drone-server.spock.svc.cluster.local. Port:80 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0729 06:47:17.634425       1 dns.go:494] Added SRV record &{Host:drone-server-0.drone-server.spock.svc.cluster.local. Port:80 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0729 06:47:17.634713       1 dns.go:494] Added SRV record &{Host:drone-server-0.drone-server.default.svc.cluster.local. Port:80 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}

after upgrade

I0807 03:42:19.069513       1 dns.go:494] Added SRV record &{Host:drone-server-0.drone-server.spock.svc.cluster.local. Port:80 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0807 03:50:05.578527       1 dns.go:494] Added SRV record &{Host:drone-server-0.drone-server.default.svc.cluster.local. Port:80 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0807 03:50:05.579140       1 dns.go:494] Added SRV record &{Host:drone-server-0.drone-server.spock.svc.cluster.local. Port:80 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0807 03:50:05.580987       1 dns.go:264] New service: drone-server
I0807 03:50:05.581099       1 dns.go:494] Added SRV record &{Host:drone-server-0.drone-server.spock.svc.cluster.local. Port:80 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0807 03:50:05.583750       1 dns.go:264] New service: drone-server
I0807 03:50:05.583793       1 dns.go:494] Added SRV record &{Host:drone-server-0.drone-server.default.svc.cluster.local. Port:80 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0807 03:52:36.021686       1 dns.go:494] Added SRV record &{Host:3465346530386466.drone-server.default.svc.cluster.local. Port:80 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0807 03:52:36.575082       1 dns.go:494] Added SRV record &{Host:3238653332616432.drone-server.spock.svc.cluster.local. Port:80 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0807 03:55:05.527063       1 dns.go:264] New service: drone-server
I0807 03:55:05.527005       1 dns.go:494] Added SRV record &{Host:3238653332616432.drone-server.spock.svc.cluster.local. Port:80 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0807 03:55:05.527128       1 dns.go:494] Added SRV record &{Host:3238653332616432.drone-server.spock.svc.cluster.local. Port:80 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0807 03:55:05.527611       1 dns.go:494] Added SRV record &{Host:3465346530386466.drone-server.default.svc.cluster.local. Port:80 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0807 03:55:05.529048       1 dns.go:264] New service: drone-server
I0807 03:55:05.529116       1 dns.go:494] Added SRV record &{Host:3465346530386466.drone-server.default.svc.cluster.local. Port:80 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0807 04:00:05.526963       1 dns.go:494] Added SRV record &{Host:3238653332616432.drone-server.spock.svc.cluster.local. Port:80 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0807 04:00:05.527542       1 dns.go:264] New service: drone-server

seems drone-server-0 changed to 3465346530386466 a random value?

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"6+", GitVersion:"v1.6.1-2+ed9e3d33a07093", GitCommit:"ed9e3d33a07093451cdd6fc50027235cbf249df6", GitTreeState:"clean", BuildDate:"2017-04-13T04:49:56Z", GoVersion:"go1.7.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3", GitCommit:"2c2fe6e8278a5db2d15a013987b53968c743f2a1", GitTreeState:"clean", BuildDate:"2017-08-03T06:43:48Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration**: Bare Metal
  • OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="16.04.2 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.2 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
  • Kernel (e.g. uname -a):
Linux cs23 4.4.0-83-generic #106-Ubuntu SMP Mon Jun 26 17:54:43 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: kubeadm
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 7, 2017
@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 7, 2017
@weiwei04
Copy link
Contributor Author

weiwei04 commented Aug 7, 2017

/sig area/dns

@xiangpengzhao
Copy link
Contributor

/sig network
/area dns

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. area/dns labels Aug 7, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 7, 2017
@agol
Copy link

agol commented Aug 12, 2017

Encountered the same issue with StatefulSets after nodes upgraded from 1.7.2 to 1.7.3 on GKE. Reverting back to 1.7.2 solved the issue.

@zaghaghi
Copy link

Having the same issue with StatefulSets with kubernetes 1.7.3 and mongo image

@MrHohn
Copy link
Member

MrHohn commented Aug 16, 2017

Humm...Checked a bit, didn't spot relevant changes on kube-dns side.

Seems like that random number came from HashServiceRecord(): https://github.com/kubernetes/dns/blob/1.14.4/pkg/dns/util/util.go#L61-L89 (example: https://play.golang.org/p/3CDC2Wz3tp).

// HashServiceRecord hashes the string representation of a DNS
// message.
func HashServiceRecord(msg *msg.Service) string {
	s := fmt.Sprintf("%v", msg)
	h := fnv.New32a()
	h.Write([]byte(s))
	return fmt.Sprintf("%x", h.Sum32())
}

That hash number is returned by GetSkyMsg() as endpointName, and would usually be overwritten by the endpoint hostname:
https://github.com/kubernetes/dns/blob/1.14.4/pkg/dns/dns.go#L486-L489

			recordValue, endpointName := util.GetSkyMsg(endpointIP, 0)
			if hostLabel, exists := getHostname(address); exists {
				endpointName = hostLabel
			}

And eventually it is passed in generateSRVRecordValue(): https://github.com/kubernetes/dns/blob/1.14.4/pkg/dns/dns.go#L526-L533

So it looks like the direct cause for this issue is that the received EndpointAddress doesn't not have Hostname set (which is supposed to be drone-server-0), but not sure why yet...

@CallMeFoxie
Copy link
Contributor

Could be related to #48327 - specifically the bottom comments related to DNS?

@MrHohn
Copy link
Member

MrHohn commented Aug 17, 2017

@CallMeFoxie Thanks, that seems to be the cause.

@imacube
Copy link

imacube commented Sep 20, 2017

Encountered this problem in kube 1.7.6. Found that DNS resolution fails for the stateful set if the service name has a hyphen (-) in it even though the service will list the pods as endpoints.

To reproduce use the web.yaml example from: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 11, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 11, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dns kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

10 participants