Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containerd service split in 18.09+ #1062

Closed
whiteley opened this issue Feb 6, 2019 · 6 comments
Closed

Containerd service split in 18.09+ #1062

whiteley opened this issue Feb 6, 2019 · 6 comments
Assignees

Comments

@whiteley
Copy link

whiteley commented Feb 6, 2019

Cookbook version

4.9.1

Chef-client version

14.8.12

Platform Details

Ubuntu 16.04

Scenario:

systemctl management of the docker system is broken with 18.09+ because the upstream packaged changes to the systemd unit files are lost. This cookbook manages the systemd unit files in both /lib and /etc and after upgrading to a docker-ce version new enough with a split containerd service important service constraints are lost.

Steps to Reproduce:

Converge a docker_service resource with version 18.09.1

docker_service 'default' do
  version '18.09.1'
end

login to the host and run

$ systemctl stop containerd
$ systemctl status containerd docker

Notice that containerd is no longer running and normal docker operations now fail

$ sudo docker run -it --rm ubuntu:16.04 /bin/bash
Unable to find image 'ubuntu:16.04' locally
16.04: Pulling from library/ubuntu
7b722c1070cd: Pull complete 
5fbf74db61f1: Pull complete 
ed41cb72e5c9: Pull complete 
7ea47a67709e: Pull complete 
Digest: sha256:e4a134999bea4abb4a27bc437e6118fdddfb172e1b9d683129b74d254af51675
Status: Downloaded newer image for ubuntu:16.04
docker: Error response from daemon: all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix:///run/containerd/containerd.sock: timeout": unavailable.

Obviously this is a constructed case one would not follow but these services are expected to be bound together.

Expected Result:

Stopping containerd should stop docker
Starting docker should ensure containerd is running

Actual Result:

The containerd and docker services no longer are bound together from their systemd unit files.

The change in the docker.service included in the packages can be seen at https://gist.github.com/whiteley/71d65a1d18e35a0377e3f5ef3fcdf793/revisions#diff-79d698a60144caa9130d53c67f9586a6 and I believe the most relevant change to be that in line 4 where the BindsTo=containerd.service is added.

@gionn
Copy link
Contributor

gionn commented Jun 26, 2019

With 18.09.6 on ubuntu 18.04:

       * template[/lib/systemd/system/docker.service] action create                                                                                                                                       
          - update content in file /lib/systemd/system/docker.service from 0a28c1 to 1eff05                                                                                                                
          --- /lib/systemd/system/docker.service        2019-05-04 02:35:56.000000000 +0000                                                                                                                
          +++ /lib/systemd/system/.chef-docker20190626-13611-1cvgw1.service     2019-06-26 14:56:54.726838929 +0000                                                                                        
          @@ -1,47 +1,34 @@                                                                                                                                                                                
           [Unit]                                                                                                                                                                                          
           Description=Docker Application Container Engine                                                                                                                                                 
           Documentation=https://docs.docker.com                                                                                                                                                           
          -BindsTo=containerd.service                                                                                                                                                                      
          -After=network-online.target firewalld.service containerd.service                                                                                                                                
          -Wants=network-online.target                                                                                                                                                                     
          +After=network-online.target docker.socket firewalld.service                                                                                                                                     
           Requires=docker.socket                                                                                                                                                                          
          +Wants=network-online.target                                                                                                                                                                     
                                                                                                                                                                                                           
           [Service]                                                                                                                                                                                       
           Type=notify                                                                                                                                                                                     
           # the default is not to use systemd for cgroups because the delegate issues still                                                                                                               
           # exists and systemd currently does not support the cgroup feature set required                                                                                                                 
           # for containers run by docker                                                                                                                                                                  
          -ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
          +ExecStart=/usr/bin/dockerd -H fd://
           ExecReload=/bin/kill -s HUP $MAINPID
          -TimeoutSec=0
          -RestartSec=2
          -Restart=always
          -
          -# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
          -# Both the old, and new location are accepted by systemd 229 and up, so using the old location
          -# to make them work for either version of systemd.
          -StartLimitBurst=3
          -
          -# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
          -# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
          -# this option work for either version of systemd.
          -StartLimitInterval=60s
          -
          +LimitNOFILE=1048576
           # Having non-zero Limit*s causes performance problems due to accounting overhead
           # in the kernel. We recommend using cgroups to do container-local accounting.
          -LimitNOFILE=infinity
           LimitNPROC=infinity
           LimitCORE=infinity
          -
          -# Comment TasksMax if your systemd version does not supports it.
          -# Only systemd 226 and above support this option.
          +# Uncomment TasksMax if your systemd version supports it.
          +# Only systemd 226 and above support this version.
           TasksMax=infinity
          -
          +TimeoutStartSec=0
           # set delegate yes so that systemd does not reset the cgroups of docker containers
           Delegate=yes
          -
           # kill only the docker process, not all processes in the cgroup
           KillMode=process
          +# restart the docker process if it exits prematurely
          +Restart=on-failure
          +StartLimitBurst=3
          +StartLimitInterval=60s
           
           [Install]
           WantedBy=multi-user.target

@scalp42
Copy link
Contributor

scalp42 commented Jul 2, 2019

Seeing the same issue here.

@AVVS
Copy link

AVVS commented Sep 2, 2019

Is there any way this could get escalated?

@isuftin
Copy link
Contributor

isuftin commented Jan 31, 2020

I think this is what had killed a large number of my nodes when I updated to using the latest cookbook. The rollback was painful to say the least :(

@scalp42
Copy link
Contributor

scalp42 commented Jan 31, 2020

Yes @isuftin, be careful as it's going to be restarting Docker (which will nuke the running containers).

Always make sure to pin your cookbooks versions so you don't run into that issue and carefully review the changes, test on a single node etc.

I'd also advise on moving forward rather than a rollback if you can do that.

@tas50 this issue can be closed also.

@damacus
Copy link
Member

damacus commented Aug 24, 2021

Closed via #1080

@damacus damacus closed this as completed Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants