Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consul-inject: Issue with service registration for multiple pods of the same deployment #36

Closed
pavel-mikhalchuk opened this issue Nov 25, 2018 · 2 comments

Comments

@pavel-mikhalchuk
Copy link

Applicable for both GKE and local Kubernetes (docker-for-desktop).

Steps (assuming we have K8S cluster and helm running):

  1. Deploy consul-k8s using helm chart consul-helm.zip
    . Modifications to the original chart from latest tag are: server replica count set to 1, server pod disruption policy maxUnavailable is set to 1, consul image is set to "consul:1.4.0", imageK8S is set to "hashicorp/consul-k8s:0.2.1".

  2. Deploy simple app using this chart modern-app.zip. Number of replicas is 2.

Workload:

screen shot 2018-11-25 at 16 11 47

I was expected two services registered in consul but:

Consul:

screen shot 2018-11-25 at 16 13 37

Consul agent:

screen shot 2018-11-25 at 16 22 56

modern-app pods described:

kubectl describe pod modern-app-8d7cbf79-l5xxc
Name:               modern-app-8d7cbf79-l5xxc
Namespace:          default
Priority:           0
PriorityClassName:  
Node:               gke-your-first-cluster-1-pool-1-c7dcb3b3-r8wn/10.128.0.2
Start Time:         Sun, 25 Nov 2018 15:25:51 +0800
Labels:             app.kubernetes.io/instance=modern-app
                    app.kubernetes.io/name=modern-app
                    pod-template-hash=48376935
Annotations:        consul.hashicorp.com/connect-inject: true
                    consul.hashicorp.com/connect-inject-status: injected
                    kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container modern-app
Status:             Running
IP:                 10.4.0.23
Controlled By:      ReplicaSet/modern-app-8d7cbf79
Init Containers:
  consul-connect-inject-init:
    Container ID:  docker://c30e66e20007cb100644ec708c88d951e350b69efcd3bb26cb7087e047fe5a73
    Image:         consul:1.4.0
    Image ID:      docker-pullable://consul@sha256:eb39173d00ce0efe6f586042e1f9f3b24741468285210074a84b9b1a9dbe13ea
    Port:          
    Host Port:     
    Command:
      /bin/sh
      -ec
      export CONSUL_HTTP_ADDR="${HOST_IP}:8500"
      export CONSUL_GRPC_ADDR="${HOST_IP}:8502"
  # Register the service. The HCL is stored in the volume so that
  # the preStop hook can access it to deregister the service.
  cat <<EOF >/consul/connect-inject/service.hcl
  services {
    id   = "-modern-app-proxy"
    name = "modern-app-proxy"
    kind = "connect-proxy"
    address = "${POD_IP}"
    port = 20000

    proxy {
      destination_service_name = "modern-app"
      destination_service_id = "modern-app"
      local_service_address = "127.0.0.1"
      local_service_port = 8080

    }

    checks {
      name = "Proxy Public Listener"
      tcp = "${POD_IP}:20000"
      interval = "10s"
      deregister_critical_service_after = "10m"
    }

    checks {
      name = "Destination Alias"
      alias_service = "modern-app"
    }
  }
  EOF

  /bin/consul services register /consul/connect-inject/service.hcl

  # Generate the envoy bootstrap code
  /bin/consul connect envoy \
    -proxy-id="-modern-app-proxy" \
    -bootstrap > /consul/connect-inject/envoy-bootstrap.yaml

  # Copy the Consul binary
  cp /bin/consul /consul/connect-inject/consul
State:          Terminated
  Reason:       Completed
  Exit Code:    0
  Started:      Sun, 25 Nov 2018 15:25:52 +0800
  Finished:     Sun, 25 Nov 2018 15:25:53 +0800
Ready:          True
Restart Count:  0
Environment:
  HOST_IP:   (v1:status.hostIP)
  POD_IP:    (v1:status.podIP)
Mounts:
  /consul/connect-inject from consul-connect-inject-data (rw)
Containers:
  modern-app:
    Container ID:   docker://b5a1a07637b9f36a30bd0fd04340d6418b3c7d01c3c4cf096f96878400444905
    Image:          pavelmikhalchuk/modern-app:latest
    Image ID:       docker-pullable://pavelmikhalchuk/modern-app@sha256:9c91e2bb40091447f8cedc80e6626a103561f531eee6cf1e97ffc0774ba0aae2
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 25 Nov 2018 15:26:04 +0800
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        100m
    Environment:  
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qdcww (ro)
  consul-connect-envoy-sidecar:
    Container ID:  docker://47bce24c31565665e2255cbcb00ef473f7df183ea8a31c58e659e88dc88bfe6b
    Image:         envoyproxy/envoy-alpine:v1.8.0
    Image ID:      docker-pullable://envoyproxy/envoy-alpine@sha256:c29cb89f1f7553fbad9c68f572ae19097e0531bb82ea55c6a5083452997b1aea
    Port:          
    Host Port:     
    Command:
      envoy
      --config-path
      /consul/connect-inject/envoy-bootstrap.yaml
    State:          Running
      Started:      Sun, 25 Nov 2018 15:26:05 +0800
    Ready:          True
    Restart Count:  0
    Environment:
      HOST_IP:   (v1:status.hostIP)
    Mounts:
      /consul/connect-inject from consul-connect-inject-data (rw)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-qdcww:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-qdcww
    Optional:    false
  consul-connect-inject-data:
    Type:        EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
QoS Class:       Burstable
Node-Selectors:  
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          
kubectl describe pod modern-app-8d7cbf79-zzdr7
Name:               modern-app-8d7cbf79-zzdr7
Namespace:          default
Priority:           0
PriorityClassName:  
Node:               gke-your-first-cluster-1-pool-2-e7b20083-fdtg/10.128.0.3
Start Time:         Sun, 25 Nov 2018 15:26:05 +0800
Labels:             app.kubernetes.io/instance=modern-app
                    app.kubernetes.io/name=modern-app
                    pod-template-hash=48376935
Annotations:        consul.hashicorp.com/connect-inject: true
                    consul.hashicorp.com/connect-inject-status: injected
                    kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container modern-app
Status:             Running
IP:                 10.4.1.7
Controlled By:      ReplicaSet/modern-app-8d7cbf79
Init Containers:
  consul-connect-inject-init:
    Container ID:  docker://ea3926157b70db1222d82d61f8bd8877d98fe671be4282448f16ebf714e51f5e
    Image:         consul:1.4.0
    Image ID:      docker-pullable://consul@sha256:eb39173d00ce0efe6f586042e1f9f3b24741468285210074a84b9b1a9dbe13ea
    Port:          
    Host Port:     
    Command:
      /bin/sh
      -ec
      export CONSUL_HTTP_ADDR="${HOST_IP}:8500"
      export CONSUL_GRPC_ADDR="${HOST_IP}:8502"
  # Register the service. The HCL is stored in the volume so that
  # the preStop hook can access it to deregister the service.
  cat <<EOF >/consul/connect-inject/service.hcl
  services {
    id   = "-modern-app-proxy"
    name = "modern-app-proxy"
    kind = "connect-proxy"
    address = "${POD_IP}"
    port = 20000

    proxy {
      destination_service_name = "modern-app"
      destination_service_id = "modern-app"
      local_service_address = "127.0.0.1"
      local_service_port = 8080

    }

    checks {
      name = "Proxy Public Listener"
      tcp = "${POD_IP}:20000"
      interval = "10s"
      deregister_critical_service_after = "10m"
    }

    checks {
      name = "Destination Alias"
      alias_service = "modern-app"
    }
  }
  EOF

  /bin/consul services register /consul/connect-inject/service.hcl

  # Generate the envoy bootstrap code
  /bin/consul connect envoy \
    -proxy-id="-modern-app-proxy" \
    -bootstrap > /consul/connect-inject/envoy-bootstrap.yaml

  # Copy the Consul binary
  cp /bin/consul /consul/connect-inject/consul
State:          Terminated
  Reason:       Completed
  Exit Code:    0
  Started:      Sun, 25 Nov 2018 15:26:07 +0800
  Finished:     Sun, 25 Nov 2018 15:26:08 +0800
Ready:          True
Restart Count:  0
Environment:
  HOST_IP:   (v1:status.hostIP)
  POD_IP:    (v1:status.podIP)
Mounts:
  /consul/connect-inject from consul-connect-inject-data (rw)
Containers:
  modern-app:
    Container ID:   docker://de8fbbcd2bef1c071d40b02f9c25d84f65d0e329d12ec92d3b14555b8e5f4e28
    Image:          pavelmikhalchuk/modern-app:latest
    Image ID:       docker-pullable://pavelmikhalchuk/modern-app@sha256:9c91e2bb40091447f8cedc80e6626a103561f531eee6cf1e97ffc0774ba0aae2
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 25 Nov 2018 15:26:18 +0800
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        100m
    Environment:  
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qdcww (ro)
  consul-connect-envoy-sidecar:
    Container ID:  docker://edf75283cb3b3724012182bb40e4b16141e57216ceac5439a0aff6f347670609
    Image:         envoyproxy/envoy-alpine:v1.8.0
    Image ID:      docker-pullable://envoyproxy/envoy-alpine@sha256:c29cb89f1f7553fbad9c68f572ae19097e0531bb82ea55c6a5083452997b1aea
    Port:          
    Host Port:     
    Command:
      envoy
      --config-path
      /consul/connect-inject/envoy-bootstrap.yaml
    State:          Running
      Started:      Sun, 25 Nov 2018 15:26:18 +0800
    Ready:          True
    Restart Count:  0
    Environment:
      HOST_IP:   (v1:status.hostIP)
    Mounts:
      /consul/connect-inject from consul-connect-inject-data (rw)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-qdcww:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-qdcww
    Optional:    false
  consul-connect-inject-data:
    Type:        EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
QoS Class:       Burstable
Node-Selectors:  
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          

Both services have the same ID = "-modern-app-proxy" which is suspicious. And I believe this is a root cause of this issue.

I looked at the consul-k8s code here -

# Register the service. The HCL is stored in the volume so that
# the preStop hook can access it to deregister the service.
cat <<EOF >/consul/connect-inject/service.hcl
services {
id = "{{ .PodName }}-{{ .ServiceName }}-proxy"
name = "{{ .ServiceName }}-proxy"

and then here

func (h *Handler) containerInit(pod *corev1.Pod) (corev1.Container, error) {
data := initContainerCommandData{
PodName: pod.Name,
ServiceName: pod.Annotations[annotationService],

PodName is pod.Name field of payload which comes in POST request to /mutate endpoint.

I was pretty much sure that this field should contain a generated name for a pod. But it turned out to be empty:

&Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:,GenerateName:modern-app-8d7cbf79-,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:,DeletionGracePeriodSeconds:nil,Labels:map[string]string{app.kubernetes.io/instance: modern-app,app.kubernetes.io/name: modern-app,pod-template-hash: 48376935,},Annotations:map[string]string{consul.hashicorp.com/connect-inject: true,consul.hashicorp.com/connect-service: modern-app,consul.hashicorp.com/connect-service-port: http,kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container modern-app,},OwnerReferences:[{apps/v1 ReplicaSet modern-app-8d7cbf79 568dd570-f083-11e8-8652-42010a8000cf 0xc0003bfafa 0xc0003bfafb}],Finalizers:[],ClusterName:,Initializers:nil,},Spec:PodSpec{Volumes:[{default-token-qdcww {nil nil nil nil nil SecretVolumeSource{SecretName:default-token-qdcww,Items:[],DefaultMode:nil,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}],Containers:[{modern-app pavelmikhalchuk/modern-app:latest [] []  [{http 0 8080 TCP }] [] [] {map[] map[cpu:{{100 -3} {} 100m DecimalSI}]} [{default-token-qdcww true /var/run/secrets/kubernetes.io/serviceaccount  }] [] nil nil nil /dev/termination-log File Always nil false false false}],RestartPolicy:Always,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[{node.kubernetes.io/not-ready Exists  NoExecute 0xc0001bc680} {node.kubernetes.io/unreachable Exists  NoExecute 0xc0001bc6a0}],HostAliases:[],PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[],},Status:PodStatus{Phase:,Conditions:[],Message:,Reason:,HostIP:,PodIP:,StartTime:,ContainerStatuses:[],QOSClass:,InitContainerStatuses:[],NominatedNodeName:,},}
10.128.0.2 - - [25/Nov/2018:10:07:28 +0000] "POST /mutate?timeout=30s HTTP/2.0" 200 3309

I just added a little bit of logging to container_init.go file and deployed webhook to my cluster.

I'm not sure why is it happening so I'm looking into kubernetes code right now. Any help where to look would be highly appreciated. I will definitely come up with a pull request if I find how to fix it. Please let me know if you have any thoughts on it. I would love to contribute to this project if I can!

@pavel-mikhalchuk
Copy link
Author

I've been helped with that on sig-api-machinery slack channel. Here's what I got:

screen shot 2018-11-25 at 19 49 21

Thanks to @alvaroaleman

@adilyse
Copy link
Contributor

adilyse commented Jan 24, 2019

This should be fixed with the merge of #55 . Thanks for reporting the issue and submitting a fix!

@adilyse adilyse closed this as completed Jan 24, 2019
ndhanushkodi pushed a commit to ndhanushkodi/consul-k8s that referenced this issue Jul 9, 2021
Doing this PR separately to fix hashicorp#36 since it is independent of my [PR](hashicorp/consul-helm#37) for RBAC for the injector.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants