Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade fails occasionally on a selinux enabled system with mount denied on root #120

Closed
ShylajaDevadiga opened this issue Jul 19, 2021 · 1 comment
Labels
bug Something isn't working

Comments

@ShylajaDevadiga
Copy link

ShylajaDevadiga commented Jul 19, 2021

Version
rancher/system-upgrade-controller:v0.6.2

Platform/Architecture
Centos 8.2 Amd64 SELinux Enabled
Three node setup with two masters, one worker node (embedded etcd)

Describe the bug
Upgrade fails occasionally on a selinux enabled system with mount denied on root.

To Reproduce

  • Create a 3 node cluster on Centos with selinux enabled.
  • Import to rancher
  • Upgrade the cluster. Ex 1.19.12+k3s1 to 1.19.13-rc1+k3s1
$ kubectl get nodes
NAME                                         STATUS   ROLES    AGE    VERSION
ip-172-31-1-45.us-east-2.compute.internal    Ready    master   3h3m   v1.19.12+k3s1
ip-172-31-5-202.us-east-2.compute.internal   Ready    master   172m   v1.19.12+k3s1
ip-172-31-3-156.us-east-2.compute.internal   Ready    <none>   168m   v1.19.12+k3s1

$ kubectl get pods -A 
NAMESPACE       NAME                                                              READY   STATUS      RESTARTS   AGE
cattle-system   cattle-cluster-agent-656b66744b-n9zmt                             1/1     Running     0          3h12m
fleet-system    fleet-agent-d59db746-g5vp5                                        1/1     Running     0          3h12m
cattle-system   system-upgrade-controller-688cbcd688-ft67r                        1/1     Running     0          163m
cattle-system   apply-k3s-worker-plan-on-ip-172-31-3-156.us-east-2.comput-4wshl   0/1     Init:0/2    0          162m
cattle-system   apply-k3s-master-plan-on-ip-172-31-1-45.us-east-2.compute-8qtr7   0/1     Init:0/1    0          162m
kube-system     local-path-provisioner-7ff9579c6-g66bq                            1/1     Running     0          3h28m
kube-system     metrics-server-7b4f8b595-vh97z                                    1/1     Running     0          3h28m
kube-system     coredns-66c464876b-qgwmm                                          1/1     Running     0          3h28m
kube-system     helm-install-traefik-5djvn                                        0/1     Completed   0          3h28m
kube-system     svclb-traefik-zlcgf                                               2/2     Running     0          48s
kube-system     svclb-traefik-f495c                                               2/2     Running     0          48s
kube-system     svclb-traefik-p9mvk                                               2/2     Running     0          48s
kube-system     traefik-5dd496474-ddht8                                           1/1     Running     0          48s

Disabling selinux eventually upgraded the nodes after the specified deadline is reached. Default 15min.

[centos@ip-172-31-1-45 ~]$ kubectl get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-172-31-1-45.us-east-2.compute.internal    Ready    master   4h23m   v1.19.13-rc1+k3s1
ip-172-31-5-202.us-east-2.compute.internal   Ready    master   4h11m   v1.19.13-rc1+k3s1
ip-172-31-3-156.us-east-2.compute.internal   Ready    <none>   4h7m    v1.19.12+k3s1

[centos@ip-172-31-1-45 ~]$ kubectl get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-172-31-1-45.us-east-2.compute.internal    Ready    master   4h31m   v1.19.13-rc1+k3s1
ip-172-31-3-156.us-east-2.compute.internal   Ready    <none>   4h15m   v1.19.13-rc1+k3s1
ip-172-31-5-202.us-east-2.compute.internal   Ready    master   4h19m   v1.19.13-rc1+k3s1
[centos@ip-172-31-1-45 ~]$ 

Expected behavior
All nodes should be upgraded successfully

Actual behavior
Nodes are not upgraded. Upgrade job pods are stuck in init state

Additional context
System Upgrade controller logs:

time="2021-07-19T18:33:38Z" level=info msg="Starting upgrade.cattle.io/v1, Kind=Plan controller"
E0719 18:33:41.933993       1 controller.go:135] error syncing 'cattle-system/k3s-master-plan': handler system-upgrade-controller: failed to update cattle-system/apply-k3s-master-plan-on-ip-172-31-1-45.us-east-2.compute-23b3a batch/v1, Kind=Job for system-upgrade-controller cattle-system/k3s-master-plan: jobs.batch "apply-k3s-master-plan-on-ip-172-31-1-45.us-east-2.compute-23b3a" not found, requeuing
E0719 18:33:41.963558       1 controller.go:135] error syncing 'cattle-system/k3s-worker-plan': handler system-upgrade-controller: failed to update cattle-system/apply-k3s-worker-plan-on-ip-172-31-3-156.us-east-2.comput-9a4f2 batch/v1, Kind=Job for system-upgrade-controller cattle-system/k3s-worker-plan: jobs.batch "apply-k3s-worker-plan-on-ip-172-31-3-156.us-east-2.comput-9a4f2" not found, requeuing
[centos@ip-172-31-1-45 ~]$ kubectl logs -n cattle-system deploy/system-upgrade-controller
Found 2 pods, using pod/system-upgrade-controller-688cbcd688-ft67r

Audit logs:

----
time->Mon Jul 19 21:10:39 2021
type=AVC msg=audit(1626729039.544:4883): avc:  denied  { mount } for  pid=109057 comm="mount" name="/" dev="tmpfs" ino=734623 scontext=system_u:system_r:mount_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=filesystem permissive=0
----
time->Mon Jul 19 21:10:39 2021
type=AVC msg=audit(1626729039.544:4884): avc:  denied  { mount } for  pid=109057 comm="mount" name="/" dev="tmpfs" ino=734624 scontext=system_u:system_r:mount_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=filesystem permissive=0
----
time->Mon Jul 19 21:10:40 2021
type=AVC msg=audit(1626729040.427:4886): avc:  denied  { mount } for  pid=5615 comm="containerd" name="/" dev="tmpfs" ino=735268 scontext=system_u:system_r:container_runtime_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=filesystem permissive=0
@ShylajaDevadiga ShylajaDevadiga added the bug Something isn't working label Jul 19, 2021
@dweomer
Copy link
Contributor

dweomer commented Aug 24, 2021

IIRC, I was unable to reproduce. WIP Vagrantfile:

# -*- mode: ruby -*-
# vi: set ft=ruby :

def install_k3s(sh,env)
  _env = {
    'INSTALL_K3S_CHANNEL': ENV['INSTALL_K3S_CHANNEL'] || "stable",
    'INSTALL_K3S_VERSION': ENV['INSTALL_K3S_VERSION'],
    'INSTALL_K3S_SKIP_START': "true",
    'K3S_SELINUX': "true",
    'K3S_TOKEN': ENV['K3S_TOKEN'] || "vagrant",
  }
  sh.env = _env.merge(env)
  sh.inline = <<~SHELL
    #!/usr/bin/env bash
    set -eux -o pipefail
    mkdir -vp /etc/systemd/system/k3s-agent.service.d
    cat << EOF > /etc/systemd/system/k3s-agent.service.d/restorecon.conf
[Service]
ExecStartPre=-/sbin/restorecon /usr/local/bin/k3s
EOF
        curl -fsSL https://get.k3s.io | sh
        chmod +x /usr/local/bin/k3s
  SHELL
  sh.upload_path = "/tmp/vagrant-install-k3s"
end

def enable_k3s_server(sh)
  sh.inline = <<~SHELL
    #!/usr/bin/env bash
    set -eux -o pipefail
    systemctl enable --now k3s || true
  SHELL
  sh.upload_path = "/tmp/vagrant-enable-k3s"
end

def enable_k3s_agent(sh)
  sh.inline = <<~SHELL
    #!/usr/bin/env bash
    set -eux -o pipefail
    systemctl enable --now k3s-agent || true
  SHELL
  sh.upload_path = "/tmp/vagrant-enable-k3s"
end

def install_suc(sh,env)
    sh.env = {
        'UPGRADE_CTL_MANIFEST_URL': ENV['UPGRADE_CTL_MANIFEST_URL'] || "https://github.com/rancher/system-upgrade-controller/releases/download/v0.6.2/system-upgrade-controller.yaml",
    }
    sh.inline = <<~SHELL
        #!/usr/bin/env bash
        set -eux -o pipefail
        while ! /usr/local/bin/kubectl get node &>/dev/null; do
            sleep 5
        done
        curl -fsSL ${UPGRADE_CTL_MANIFEST_URL} | /usr/local/bin/kubectl apply -f-
    SHELL
    sh.upload_path = "/tmp/vagrant-install-suc"
end

Vagrant.configure("2") do |config|
  config.vm.box = "centos/8.2"

  config.vm.provider :virtualbox do |v|
    v.memory = 2048
    v.cpus = 2
    config.vm.box_url = "https://cloud.centos.org/centos/8/vagrant/x86_64/images/CentOS-8-Vagrant-8.2.2004-20200611.2.x86_64.vagrant-virtualbox.box"
  end
  config.vm.provider :libvirt do |v|
    v.memory = 2048
    v.cpus = 2
    config.vm.box_url = "https://cloud.centos.org/centos/8/vagrant/x86_64/images/CentOS-8-Vagrant-8.2.2004-20200611.2.x86_64.vagrant-libvirt.box"
  end

  config.vm.provision "disable-swap", type: "shell", run: "once" do |sh|
    sh.upload_path = "/tmp/vagrant-disable-swap"
    sh.inline = <<~SHELL
        #!/usr/bin/env bash
        set -eux -o pipefail
        if [ -f /swapfile ]; then
            swapoff -a
            sed -e 's/.*swapfile.*//g' -i /etc/fstab
            rm -vf /swapfile
        fi
    SHELL
  end

  # Disabled by default. To run:
  #   vagrant up --provision-with=upgrade-packages
  # To upgrade only specific packages:
  #   UPGRADE_PACKAGES=selinux vagrant up --provision-with=upgrade-packages
  #
  config.vm.provision "upgrade-packages", type: "shell", run: "never" do |sh|
    sh.upload_path = "/tmp/vagrant-upgrade-packages"
    sh.env = {
        'UPGRADE_PACKAGES': ENV['UPGRADE_PACKAGES'],
    }
    sh.inline = <<~SHELL
        #!/usr/bin/env bash
        set -eux -o pipefail
        yum -y upgrade ${UPGRADE_PACKAGES}
    SHELL
  end

  # To re-run, installing CNI from RPM:
  #   INSTALL_PACKAGES="containernetworking-plugins" vagrant up --provision-with=install-packages
  #
  config.vm.provision "install-packages", type: "shell", run: "once" do |sh|
    sh.upload_path = "/tmp/vagrant-install-packages"
    sh.env = {
        'INSTALL_PACKAGES': ENV['INSTALL_PACKAGES'],
    }
    sh.inline = <<~SHELL
        #!/usr/bin/env bash
        set -eux -o pipefail
        yum -y install \
            container-selinux \
            curl \
            iptables \
            less \
            lsof \
            socat \
            ${INSTALL_PACKAGES}
    SHELL
  end

  # SELinux is Enforcing by default.
  # To set SELinux as Disabled on a VM that has already been provisioned:
  #   SELINUX=Disabled vagrant up --provision-with=selinux
  # To set SELinux as Permissive on a VM that has already been provsioned
  #   SELINUX=Permissive vagrant up --provision-with=selinux
  config.vm.provision "selinux", type: "shell", run: "once" do |sh|
    sh.upload_path = "/tmp/vagrant-selinux"
    sh.env = {
        'SELINUX': ENV['SELINUX'] || "Enforcing"
    }
    sh.inline = <<~SHELL
        #!/usr/bin/env bash
        set -eux -o pipefail

        if ! type -p getenforce setenforce &>/dev/null; then
          echo SELinux is Disabled
          exit 0
        fi

        case "${SELINUX}" in
          Disabled)
            if mountpoint -q /sys/fs/selinux; then
              setenforce 0
              umount -v /sys/fs/selinux
            fi
            ;;
          Enforcing)
            mountpoint -q /sys/fs/selinux || mount -o rw,relatime -t selinuxfs selinuxfs /sys/fs/selinux
            setenforce 1
            ;;
          Permissive)
            mountpoint -q /sys/fs/selinux || mount -o rw,relatime -t selinuxfs selinuxfs /sys/fs/selinux
            setenforce 0
            ;;
          *)
            echo "SELinux mode not supported: ${SELINUX}" >&2
            exit 1
            ;;
        esac

        echo SELinux is $(getenforce)
    SHELL
  end

  config.vm.provision "install-suc-plans", type: "shell", run: "never" do |sh|
    sh.upload_path = "/tmp/vagrant-install-suc-plans"
    sh.env = {
        'UPGRADE_K3S_VERSION': ENV['UPGRADE_K3S_VERSION'],
    }
    sh.inline = <<~SHELL
        #!/usr/bin/env bash
        set -eux -o pipefail
        while ! /usr/local/bin/kubectl get node &>/dev/null; do
            sleep 5
        done
        cat << EOF > k3s-upgrade.yaml
---
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: k3s-server
  namespace: system-upgrade
  labels:
    k3s-upgrade: server
spec:
  concurrency: 1
  version: ${UPGRADE_K3S_VERSION}
  nodeSelector:
    matchExpressions:
      - {key: k3s-upgrade, operator: Exists}
      - {key: k3s-upgrade, operator: NotIn, values: ["disabled", "false"]}
      - {key: k3s.io/hostname, operator: Exists}
      - {key: k3os.io/mode, operator: DoesNotExist}
      - {key: node-role.kubernetes.io/master, operator: In, values: ["true"]}
  serviceAccountName: system-upgrade
  cordon: true
  upgrade:
    image: rancher/k3s-upgrade
---
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: k3s-agent
  namespace: system-upgrade
  labels:
    k3s-upgrade: agent
spec:
  concurrency: 2
  version: ${UPGRADE_K3S_VERSION}
  nodeSelector:
    matchExpressions:
      - {key: k3s-upgrade, operator: Exists}
      - {key: k3s-upgrade, operator: NotIn, values: ["disabled", "false"]}
      - {key: k3s.io/hostname, operator: Exists}
      - {key: k3os.io/mode, operator: DoesNotExist}
      - {key: node-role.kubernetes.io/master, operator: NotIn, values: ["true"]}
  serviceAccountName: system-upgrade
  prepare:
    image: rancher/k3s-upgrade
    args: ["prepare", "k3s-server"]
  drain:
    force: true
    skipWaitForDeleteTimeout: 30
  upgrade:
    image: rancher/k3s-upgrade
EOF
        while ! /usr/local/bin/kubectl apply -f k3s-upgrade.yaml; do
            sleep 5
        done
    SHELL
  end

  config.vm.provision "upgrade-cluster", type: "shell", run: "never" do |sh|
    sh.upload_path = "/tmp/vagrant-upgrade-cluster"
    sh.inline = <<~SHELL
        #!/usr/bin/env bash
        set -eux -o pipefail
        while ! /usr/local/bin/kubectl get node &>/dev/null; do
            sleep 5
        done
        /usr/local/bin/kubectl label node --all k3s-upgrade=true --overwrite=true
    SHELL
  end

  config.vm.define "node-0" do |node|
    node.vm.hostname = "node-0"
    node.vm.provider :virtualbox do |v|
        v.memory = 4096
        v.cpus = 4
    end
    node.vm.provider :libvirt do |v|
        v.memory = 4096
        v.cpus = 4
    end
    node.vm.provision "install-k3s", type: "shell", run: "once" do |sh|
      install_k3s(sh,{
        'INSTALL_K3S_EXEC': "server --disable=local-storage,traefik",
#         'INSTALL_K3S_EXEC': "server",
        'K3S_KUBECONFIG_MODE': "0644",
        'K3S_TOKEN': "rancher",
      })
    end
    node.vm.provision "enable-k3s", type: "shell", run: "once" do |sh|
      enable_k3s_server(sh)
    end
    node.vm.provision "install-helm", type: "shell", run: "once" do |sh|
        sh.env = {
            'PATH': "/usr/local/bin:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin",
        }
        sh.inline = <<~SHELL
            #!/usr/bin/env bash
            set -eux -o pipefail
            curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
            chmod 700 get_helm.sh
            ./get_helm.sh
        SHELL
        sh.upload_path = "/tmp/vagrant-install-helm"
    end
    node.vm.provision "wait-for-node", type: "shell", run: "once" do |sh|
        sh.env = {
            'PATH': "/usr/local/bin:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin",
        }
        sh.inline = <<~SHELL
            #!/usr/bin/env bash
            set -eux -o pipefail
            sleep 10
            while ! kubectl wait --for condition=ready node/$(hostname); do
                sleep 5
            done
        SHELL
        sh.upload_path = "/tmp/vagrant-wait-for-node"
    end
      node.vm.provision "install-cert-manager", type: "shell", run: "once" do |sh|
        sh.env = {
            'KUBECONFIG': "/etc/rancher/k3s/k3s.yaml",
            'PATH': "/usr/local/bin:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin",
        }
        sh.inline = <<~SHELL
            #!/usr/bin/env bash
            set -eux -o pipefail
            kubectl create namespace cert-manager
            helm repo add jetstack https://charts.jetstack.io
            helm repo update
            helm install cert-manager jetstack/cert-manager --namespace cert-manager --version v1.1.0 --set installCRDs=true
            kubectl -n cert-manager rollout status deploy/cert-manager
        SHELL
        sh.upload_path = "/tmp/vagrant-install-cert-manager"
      end
      node.vm.provision "install-rancher", type: "shell", run: "once" do |sh|
        sh.env = {
            'KUBECONFIG': "/etc/rancher/k3s/k3s.yaml",
            'PATH': "/usr/local/bin:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin",
        }
        sh.inline = <<~SHELL
            #!/usr/bin/env bash
            set -eux -o pipefail
            kubectl create namespace cattle-system
            helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
            helm repo update
            helm install rancher rancher-latest/rancher --namespace cattle-system --set hostname=rancher.local --set replicas=1 --set ingress.enabled=false || true
            kubectl -n cattle-system rollout status deploy/rancher --timeout=3m || true
        SHELL
        sh.upload_path = "/tmp/vagrant-install-rancher"
      end
  end

  config.vm.define "node-1" do |node|
    node.vm.hostname = "node-1"
    node.vm.provision "install-k3s", type: "shell", run: "once" do |sh|
      install_k3s(sh,{
        'INSTALL_K3S_EXEC': "server",
        'K3S_KUBECONFIG_MODE': "0644",
      })
    end
    node.vm.provision "enable-k3s", type: "shell", run: "once" do |sh|
      enable_k3s_server(sh)
    end
  end

  config.vm.define "node-2" do |node|
      node.vm.hostname = "node-2"
    node.vm.provision "install-k3s", type: "shell", run: "once" do |sh|
      install_k3s(sh,{
        'INSTALL_K3S_EXEC': "agent",
        'K3S_URL': ENV['K3S_URL'] || "https://node-1:6443",
      })
    end
    node.vm.provision "enable-k3s", type: "shell", run: "once" do |sh|
      enable_k3s_agent(sh)
    end
  end

end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants