Pets refusing to be exposed via DNS in 1.4.7 #39360

etotheipi · 2017-01-03T03:14:05Z

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):

petset
statefulset
dns

I found 38870

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
Bug report

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.7", GitCommit:"92b4f971662de9d8770f8dcd2ee01ec226a6f6c0", GitTreeState:"clean", BuildDate:"2016-12-10T04:49:33Z", GoVersion:"go1.7.1", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.7", GitCommit:"92b4f971662de9d8770f8dcd2ee01ec226a6f6c0", GitTreeState:"clean", BuildDate:"2016-12-10T04:43:42Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:
AWS cluster setup with kops 1.4.4

What happened:
I have been battling PetSets in 1.4.7 for quite some time now. I have deleted the petset, and the PVs & PVCs it created, multiple times, and re-created all of it using different names, in an attempt to completely decouple my attempts. However, it seems that things may be out of whack, with no way to resolve it. At the moment, I have a headless service for the pets, and the service is exposed as if it was not headless: it doesn't have a clusterIP but DNS resolves to any one of the pets, and the individual pets do not have resolvable names. Further, the pets seem to have hostnames that are tied to a previous service even though such a service doens't exist any more.

Here's what it looks like: 4 nodes, starting 4 citus workers as a petsets:

$ k get pv,pvc,petset,pods,svc
NAME                                          CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM                             REASON    AGE
pv/awsebs-pv001                               500Gi      RWO           Retain          Bound     kube-system/kube-registry-pvc               11d
pv/pvc-71a4c936-cf13-11e6-8c73-0a9b3314e7e6   100Gi      RWO           Delete          Bound     default/citus-claim-citus-pet-0             2d
pv/pvc-71a6667b-cf13-11e6-8c73-0a9b3314e7e6   100Gi      RWO           Delete          Bound     default/citus-claim-citus-pet-1             2d
pv/pvc-71ad3a47-cf13-11e6-8c73-0a9b3314e7e6   100Gi      RWO           Delete          Bound     default/citus-claim-citus-pet-2             2d
pv/pvc-71ae1655-cf13-11e6-8c73-0a9b3314e7e6   100Gi      RWO           Delete          Bound     default/citus-claim-citus-pet-3             2d
NAME                          STATUS    VOLUME                                     CAPACITY   ACCESSMODES   AGE
pvc/citus-claim-citus-pet-0   Bound     pvc-71a4c936-cf13-11e6-8c73-0a9b3314e7e6   100Gi      RWO           2d
pvc/citus-claim-citus-pet-1   Bound     pvc-71a6667b-cf13-11e6-8c73-0a9b3314e7e6   100Gi      RWO           2d
pvc/citus-claim-citus-pet-2   Bound     pvc-71ad3a47-cf13-11e6-8c73-0a9b3314e7e6   100Gi      RWO           2d
pvc/citus-claim-citus-pet-3   Bound     pvc-71ae1655-cf13-11e6-8c73-0a9b3314e7e6   100Gi      RWO           2d
NAME                DESIRED   CURRENT   AGE
petsets/citus-pet   4         4         2d
NAME                                      READY     STATUS    RESTARTS   AGE
po/citus-pet-0                            1/1       Running   0          2d
po/citus-pet-1                            1/1       Running   0          2d
po/citus-pet-2                            1/1       Running   0          2d
po/citus-pet-3                            1/1       Running   0          2d
po/dns-tool                               1/1       Running   0          2d
NAME                    CLUSTER-IP      EXTERNAL-IP        PORT(S)          AGE
svc/citus-worker-pets   None            <none>             6543/TCP         2d
svc/kubernetes          100.64.0.1      <none>             443/TCP          11d

The service name is citus-worker-pets. However, the pets disagree:

[root@citus-pet-1 postgres]# hostname -f
citus-pet-1.citus-worker-service.default.svc.cluster.local

[root@citus-pet-1 postgres]# nslookup citus-pet-1.citus-worker-service.default.svc.cluster.local
...
** server can't find citus-pet-1.citus-worker-service.default.svc.cluster.local: NXDOMAIN

Searching by the real service name (citus-worker-pets) does not work either:

[root@citus-pet-1 postgres]# nslookup citus-pet-1.citus-worker-pets.default.svc.cluster.local
...
** server can't find citus-pet-1.citus-worker-pets.default.svc.cluster.local: NXDOMAIN

However, the service with the expected name is resolvable:

[root@citus-pet-1 postgres]# nslookup citus-worker-pets.default.svc.cluster.local
...
Name:	citus-worker-pets.default.svc.cluster.local
Address: 100.96.7.5
Name:	citus-worker-pets.default.svc.cluster.local
Address: 100.96.7.6
Name:	citus-worker-pets.default.svc.cluster.local
Address: 100.96.6.6
Name:	citus-worker-pets.default.svc.cluster.local
Address: 100.96.6.7

I have tested all the commands above from a separate pod that is unrelated to the petset, and see the same thing. The service is headless:

kind: Service
apiVersion: v1
metadata:
  name: citus-worker-pets
  labels:
    app: in-citus
    type: svc
spec:
  ports:
    - port: 6543
      name: worker-port
  clusterIP: None
  selector:
    app: in-citus
    role: worker
    type: pod

And the PetSet definition (recall this is 1.4.7):

kind: PetSet
apiVersion: "apps/v1alpha1"
metadata:
  name: citus-pet
spec:
  serviceName: "citus-worker-service"
  replicas: 4
  template:
    metadata:
      name: citus-worker
      labels:
        app: in-citus
        role: worker
        type: pod
      annotations:
        pod.alpha.kubernetes.io/initialized: "true"
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: citus-worker
          image: localhost:5000/in-citus
          env:
            - name: PGPORT
              value: "5433"
            - name: CITUS_ROLE
              value: worker
          ports:
            - containerPort: 6543
              name: pgbouncer-port
          volumeMounts:
            - name: citus-claim
              mountPath: /var/lib/pgsql/9.6
  volumeClaimTemplates:
    - metadata:
        name: citus-claim
        annotations: 
          volume.alpha.kubernetes.io/storage-class: aws-ebs-gp2
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
           storage: 100Gi

What you expected to happen:
I expect the pets to be individually resolvable via ...svc.cluster.local

Anything else do we need to know:
Take note that the petset has been restarted and it still has the same hostname as before, based on a svc that doesn't exist anymore. I'm not sure how the pet even gets its hostname (returned by hostname -f residing in /etc/hosts). What if the petset is started after the service that exposes it? What do I have to change in a petset to make k8s treat it as a different petset to avoid any "memory" like this? I changed the names in the petset spec, but I guess that wasn't enough.

The text was updated successfully, but these errors were encountered:

etotheipi · 2017-01-03T03:14:49Z

I forgot to tag @bprashanth, recommended by @justinsb

ravilr · 2017-01-03T08:04:19Z

@etotheipi the petspec spec.serviceName should refer to an existing headless service resource name. see http://kubernetes.io/docs/user-guide/petset/#network-identity
you may want to try changing to 'serviceName: citus-worker-pets' instead of citus-worker-service in petset spec.

etotheipi · 2017-01-03T15:11:38Z

Argh! I missed that, somehow. I switched it to the correct service name and it works now. That also answers my question about how the pets know how to setup /etc/hosts. It's requires the service name you expect to use to be in the petset spec.

Not a bug, just user error. Thanks for finding that. Closing.

etotheipi closed this as completed Jan 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pets refusing to be exposed via DNS in 1.4.7 #39360

Pets refusing to be exposed via DNS in 1.4.7 #39360

etotheipi commented Jan 3, 2017

etotheipi commented Jan 3, 2017

ravilr commented Jan 3, 2017

etotheipi commented Jan 3, 2017 •

edited

Pets refusing to be exposed via DNS in 1.4.7 #39360

Pets refusing to be exposed via DNS in 1.4.7 #39360

Comments

etotheipi commented Jan 3, 2017

etotheipi commented Jan 3, 2017

ravilr commented Jan 3, 2017

etotheipi commented Jan 3, 2017 • edited

etotheipi commented Jan 3, 2017 •

edited