Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow node registration (~20 minutes) with the sympton that protokube image cannot be downloaded #9692

Closed
yuha0 opened this issue Aug 6, 2020 · 9 comments · Fixed by #9708

Comments

@yuha0
Copy link

yuha0 commented Aug 6, 2020

1. What kops version are you running? The command kops version, will display
this information.

1.18

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

1.18.6

3. What cloud provider are you using?
aws

4. What commands did you run? What is the simplest way to reproduce this issue?
No need to run commands. simply bring up a node and observe the slow start time.

5. What happened after the commands executed?
N/A

6. What did you expect to happen?
My previous experience with kops on debian AMI was pretty fast, usually within 5 minutes.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: null
  name: use2kube.k8s.local
spec:
  api:
    loadBalancer:
      type: Internal
      additionalSecurityGroups:
      - sg-12345
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://kops-state-bucket/use2kube.k8s.local
  containerRuntime: docker
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: master-us-east-2a-1
      name: "1"
    - instanceGroup: master-us-east-2a-2
      name: "2"
    - instanceGroup: master-us-east-2a-3
      name: "3"
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: master-us-east-2a-1
      name: "1"
    - instanceGroup: master-us-east-2a-2
      name: "2"
    - instanceGroup: master-us-east-2a-3
      name: "3"
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    enableAdmissionPlugins:
    - PodNodeSelector
    - PodPreset
    - PodTolerationRestriction
    disableBasicAuth: true
  kubeControllerManager:
    nodeCIDRMaskSize: 26
  kubeDNS:
    provider: CoreDNS
    nodeLocalDNS:
      enabled: true
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
    kubeReserved:
        cpu: "100m"
        memory: "100Mi"
        ephemeral-storage: "1Gi"
    kubeReservedCgroup: "/kube-reserved"
    kubeletCgroups: "/kube-reserved"
    runtimeCgroups: "/kube-reserved"
    enforceNodeAllocatable: "pods,kube-reserved"
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: v1.18.6
  masterPublicName: api.use2kube.k8s.local
  networkCIDR: 10.162.0.0/16
  networkID: vpc-12345
  additionalNetworkCIDRs:
  - 10.161.7.0/24
  networking:
    calico: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 10.162.0.0/16
    name: KUBERNETES-MAIN
    type: Private
    zone: us-east-2a
    id: subnet-12345
  - cidr: 10.161.7.0/24
    name: KUBERNETES-UTILITY
    type: Utility
    zone: us-east-2a
    id: subnet-12345
  topology:
    dns:
      type: Private
    masters: private
    nodes: private

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: use2kube.k8s.local
  name: master-us-east-2a-1
spec:
  additionalSecurityGroups:
  - sg-12345
  associatePublicIp: false
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20200729
  machineType: m5.xlarge
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-2a-1
  role: Master
  rootVolumeSize: 64
  subnets:
  - KUBERNETES-MAIN

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: use2kube.k8s.local
  name: nodes
spec:
  additionalSecurityGroups:
  - sg-12345
  associatePublicIp: false
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20200729
  machineType: m5.xlarge
  maxSize: 20
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  rootVolumeSize: 128
  subnets:
  - KUBERNETES-MAIN

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

N/A

9. Anything else do we need to know?
AFAIK, protokube is not on a public container registry like dockerhub, instead, it is loaded from s3 (github kops release artifact)?
I am not sure which service is responsible for doing this, but the symptom is that protokube.service kept failing for about 20 minutes until then protokube image was somehow magically loaded. I attached the syslog from the node which recorded all the protokube service start attempts.

I also pasted one of three master IG manifest and the nodes IG manifest. The weird thing is that, this can only be re-produced on nodes, not on masters.

syslog.txt

@hakman
Copy link
Member

hakman commented Aug 6, 2020

@yuha0 Could you check the kops-configuration service logs? Protokube is the last thing loaded when the initial setup ends, even though the service is started earlier.

@yuha0
Copy link
Author

yuha0 commented Aug 6, 2020

@hakman in kops-configuration log, I see:

Aug 06 03:12:09 ip-10-162-154-92 nodeup[2041]: I0806 03:12:09.748792    2041 executor.go:103] Tasks: 71 done / 72 total; 1 can run
Aug 06 03:12:09 ip-10-162-154-92 nodeup[2041]: I0806 03:12:09.748834    2041 executor.go:174] Executing task "LoadImageTask/protokube": LoadImageTask: [https://artifacts.k8s.io/binaries/kops/1.18.0/images/protokube.tar.gz https://github.com/kubernetes/kops/releases/download/v1.18.0/images-protokube.tar.gz https://kubeupv2.s3.amazonaws.com/kops/1.18.0/images/protokube.tar.gz]
Aug 06 03:12:09 ip-10-162-154-92 nodeup[2041]: W0806 03:12:09.748969    2041 load_image.go:86] LoadImageTask checking if image present not yet implemented
Aug 06 03:12:09 ip-10-162-154-92 nodeup[2041]: I0806 03:12:09.749189    2041 http.go:78] Downloading "https://artifacts.k8s.io/binaries/kops/1.18.0/images/protokube.tar.gz"
Aug 06 03:12:11 ip-10-162-154-92 nodeup[2041]: I0806 03:12:11.147236    2041 files.go:100] Hash matched for "/var/cache/nodeup/sha256:8f09e5b87f6df65e1579c56775f7c9c6ddac315579f367129c0ed297c46840f5_https___artifacts_k8s_io_binaries_kops_1_18_0_images_protokube_tar_gz": sha256:8f09e5b87f6df65e1579c56775f7c9c6ddac315579f367129c0ed297c46840f5
Aug 06 03:12:13 ip-10-162-154-92 nodeup[2041]: I0806 03:12:13.740737    2041 load_image.go:168] running command docker load -i /tmp/loadimage617045936/https___artifacts_k8s_io_binaries_kops_1_18_0_images_protokube_tar_gz
Aug 06 03:12:14 ip-10-162-154-92 nodeup[2041]: W0806 03:12:14.181349    2041 executor.go:128] error running task "LoadImageTask/protokube" (9m55s remaining to succeed): error loading docker image with 'docker load -i /tmp/loadimage617045936/https___artifacts_k8s_io_binaries_kops_1_18_0_images_protokube_tar_gz': exit status 1: Error processing tar file(signal: killed):
Aug 06 03:12:14 ip-10-162-154-92 nodeup[2041]: I0806 03:12:14.181387    2041 executor.go:143] No progress made, sleeping before retrying 1 failed task(s)

And it printed this error message repeatedly for 20 minutes until:

Aug 06 03:32:48 ip-10-162-154-92 nodeup[33049]: I0806 03:32:48.049303   33049 executor.go:174] Executing task "LoadImageTask/protokube": LoadImageTask: [https://artifacts.k8s.io/binaries/kops/1.18.0/images/protokube.tar.gz https://github.com/kubernetes/kops/releases/download/v1.18.0/images-protokube.tar.gz https://kubeupv2.s3.amazonaws.com/kops/1.18.0/images/protokube.tar.gz]
Aug 06 03:32:48 ip-10-162-154-92 nodeup[33049]: W0806 03:32:48.049377   33049 load_image.go:86] LoadImageTask checking if image present not yet implemented
Aug 06 03:32:48 ip-10-162-154-92 nodeup[33049]: I0806 03:32:48.366008   33049 files.go:100] Hash matched for "/var/cache/nodeup/sha256:8f09e5b87f6df65e1579c56775f7c9c6ddac315579f367129c0ed297c46840f5_https___artifacts_k8s_io_binaries_kops_1_18_0_images_protokube_tar_gz": sha256:8f09e5b87f6df65e1579c56775f7c9c6ddac315579f367129c0ed297c46840f5
Aug 06 03:32:50 ip-10-162-154-92 nodeup[33049]: I0806 03:32:50.863217   33049 load_image.go:168] running command docker load -i /tmp/loadimage410493124/https___artifacts_k8s_io_binaries_kops_1_18_0_images_protokube_tar_gz
Aug 06 03:32:53 ip-10-162-154-92 nodeup[33049]: I0806 03:32:53.176292   33049 executor.go:103] Tasks: 72 done / 72 total; 0 can run
Aug 06 03:32:53 ip-10-162-154-92 nodeup[33049]: I0806 03:32:53.176318   33049 context.go:91] deleting temp dir: "/tmp/deploy453266362"
Aug 06 03:32:53 ip-10-162-154-92 nodeup[33049]: success
Aug 06 03:32:53 ip-10-162-154-92 systemd[1]: kops-configuration.service: Succeeded.
Aug 06 03:32:53 ip-10-162-154-92 systemd[1]: Finished Run kops bootstrap (nodeup).

So I think it is docker load that keeps failing.

@johngmyers
Copy link
Member

The next thing to find out is why the docker load command keeps getting a kill signal. Perhaps the docker log has a clue?

@yuha0
Copy link
Author

yuha0 commented Aug 7, 2020

When docker daemon started, it kept complaining about two things:

Aug 06 03:12:07 ip-10-162-154-92 systemd[1]: Starting Docker Application Container Engine...
Aug 06 03:12:07 ip-10-162-154-92 dockerd[5294]: time="2020-08-06T03:12:07.821161522Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.btrfs" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
Aug 06 03:12:07 ip-10-162-154-92 dockerd[5294]: time="2020-08-06T03:12:07.823651166Z" level=warning msg="could not use snapshotter btrfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
Aug 06 03:12:07 ip-10-162-154-92 dockerd[5294]: time="2020-08-06T03:12:07.823670408Z" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin"
Aug 06 03:12:08 ip-10-162-154-92 dockerd[5213]: time="2020-08-06T03:12:08.858911989Z" level=warning msg="Your kernel does not support swap memory limit"
Aug 06 03:12:08 ip-10-162-154-92 dockerd[5213]: time="2020-08-06T03:12:08.858950746Z" level=warning msg="Your kernel does not support cgroup rt period"
Aug 06 03:12:08 ip-10-162-154-92 dockerd[5213]: time="2020-08-06T03:12:08.858957418Z" level=warning msg="Your kernel does not support cgroup rt runtime"
Aug 06 03:12:08 ip-10-162-154-92 dockerd[5213]: time="2020-08-06T03:12:08.858962781Z" level=warning msg="Your kernel does not support cgroup blkio weight"
Aug 06 03:12:08 ip-10-162-154-92 dockerd[5213]: time="2020-08-06T03:12:08.858968162Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
Aug 06 03:12:09 ip-10-162-154-92 systemd[1]: Started Docker Application Container Engine.
Aug 06 03:12:09 ip-10-162-154-92 dockerd[5213]: time="2020-08-06T03:12:09.659334384Z" level=error msg="Not continuing with pull after error: errors:\ndenied: requested access to the resource is denied\nunauthorized: authentication required\n"
# ...
# dozens of pulling failures
# ...
Aug 06 03:13:24 ip-10-162-154-92 dockerd[5213]: time="2020-08-06T03:13:24.787009226Z" level=warning msg="failed to retrieve runc version: signal: killed"
# ...
# dozens of pulling failures
# ...
Aug 06 03:15:35 ip-10-162-154-92 dockerd[5213]: time="2020-08-06T03:15:35.653276919Z" level=warning msg="failed to retrieve runc version: signal: killed"
Aug 06 03:15:35 ip-10-162-154-92 dockerd[5213]: time="2020-08-06T03:15:35.736615197Z" level=warning msg="failed to retrieve docker-init version: signal: killed"

The pulling failures, I guess, was caused by the init script making attempts to run docker run protokube.... And the kill signals seemed to happen on docker load. Then I checked dmesg and found a bunch of logs regarding dockerd getting oomkilled:

# dmesg T | grep "dockerd invoked oom-killer"
[Thu Aug  6 03:19:38 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:19:52 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:19:52 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:20:32 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:20:46 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:20:59 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:21:26 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:23:44 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:24:28 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:24:57 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:25:43 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:25:58 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:26:42 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:28:23 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:28:50 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:29:17 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:29:30 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:30:11 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:30:38 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:31:45 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:31:58 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:32:11 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999
[Thu Aug  6 03:32:37 2020] dockerd invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=-999

The oomkill stopped after the successful docker load.

I am really confused why dockerd can get OOMKilled. As you can see this instance is a m5.xlarge with 15G memory and docker itself doesn't seem to have cgroup memory limit:

# cat /sys/fs/cgroup/memory/system.slice/docker.service/memory.limit_in_bytes
9223372036854771712

I can manually reproduce this on any new m5.xlarge node in AWS:

root@ip-10-162-193-142:/tmp# docker load -i protokube.tar.gz
Error processing tar file(signal: killed):
root@ip-10-162-193-142:/tmp# free -h
              total        used        free      shared  buff/cache   available
Mem:           15Gi       365Mi        12Gi       0.0Ki       2.0Gi        14Gi
Swap:            0B          0B          0B
root@ip-10-162-193-142:/tmp# free -h
              total        used        free      shared  buff/cache   available
Mem:           15Gi       363Mi        12Gi       0.0Ki       2.1Gi        14Gi
Swap:            0B          0B          0B
root@ip-10-162-193-142:/tmp# docker load -i protokube.tar.gz
Error processing tar file(signal: killed):
root@ip-10-162-193-142:/tmp# docker load -i protokube.tar.gz
Error processing tar file(signal: killed):
root@ip-10-162-193-142:/tmp# docker load -i protokube.tar.gz
1fe8fbac6f7b: Loading layer [==================================================>]   44.6MB/44.6MB
9439713c945e: Loading layer [==================================================>]  70.25MB/70.25MB
Error processing tar file(signal: killed):

@hakman
Copy link
Member

hakman commented Aug 8, 2020

The kube-reserved setting is a bit too aggressive and this is why bad things happen. Try using the following values and it should start, but you should check the docs for kube-reserved and choose a proper setting: https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#kube-reserved

    kubeReserved:
      cpu: "200m"
      memory: "512Mi"
      ephemeral-storage: "1Gi"

@yuha0
Copy link
Author

yuha0 commented Aug 8, 2020

@hakman Thank you. I can confirm increasing the memory did shorten the startup time (from 20+ minutes to 5 minutes). dockerd still got oomkilled once, but protokube image was successfully loaded in the second attempt.

https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#kube-reserved

kube-reserved is meant to capture resource reservation for kubernetes system daemons like the kubelet, container runtime, node problem detector, etc.

Maybe my understanding was incorrect, but I thought --kube-reserved was meant for "reservation" (the amount of memory reserved for you, kind of like pod's .spec.resource.requests.memory), but in additional, it acts like a limit as well?

I was not aware that docker is in kube-reserved cgroup. In fact, in my system, the process got OOMKilled was dockerd and it is in /system.slice:

$ systemctl status $(pgrep dockerd) | grep CGroup
     CGroup: /system.slice/docker.service

The documentation indeed says container runtime reservation is done via --kube-reserved though. That also confused me...

@hakman
Copy link
Member

hakman commented Aug 8, 2020

Maybe my understanding was incorrect, but I thought --kube-reserved was meant for "reservation" (the amount of memory reserved for you, kind of like pod's .spec.resource.requests.memory), but in additional, it acts like a limit as well?

Seems so, didn't have to dig that deep. In any case, in real life, reserving 100Mi for something will not help much.

I was not aware that docker is in kube-reserved cgroup. In fact, in my system, the process got OOMKilled was dockerd and it is in /system.slice:

I think the process was "runc", but anyway, you have docker-cli -> dockerd -> containerd -> runc.

I am updating the Kops example to be closer to what someone should set though. Hoping that will help in the future.

@yuha0
Copy link
Author

yuha0 commented Aug 8, 2020

I am not very familiar with lower level concepts but I thought docker load is just unpacking a tarball and move some files, and no container is created during this operation.

But anyways, thank you for the help. My problem is resolved.

@hakman
Copy link
Member

hakman commented Aug 8, 2020

It may be interesting to find out what exactly happens, but for sure the docker command sends a command to dockerd to do load the image, not doing the work itself. Not sure either how runc plays into this or if it is just a side effect of kubelet already running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants