Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHEL 7.7 with selinux enabled using RHEL docker on k8s 1.16 cluster fails to deploy #23662

Closed
sowmyav27 opened this issue Oct 24, 2019 · 25 comments
Assignees
Labels
internal kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement
Milestone

Comments

@sowmyav27
Copy link
Contributor

sowmyav27 commented Oct 24, 2019

What kind of request is this (question/bug/enhancement/feature request): bug

Steps to reproduce (least amount of steps as possible):
On a rancher:2.3.2-rc2 - deploy an RHEL 7.7 custom cluster - (3 etcd, 1 control plane, 3 worker nodes)

Docker - Native docker 1.13
Selinux - Enabled

Expected Result:
The cluster should be deployed successfully

Actual Result:
The cluster fails to deploy with error: Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system

Error in logs:

2019/10/24 19:16:59 [INFO] Removing container [rke-log-cleaner] on host [xxxx], try #1



















2019/10/24 19:16:59 [INFO] cluster [c-xcdfw] provisioning: [cleanup] Successfully started [rke-log-cleaner] container on host [xxxx]
2019/10/24 19:16:59 [INFO] cluster [c-xcdfw] provisioning: [cleanup] Successfully started [rke-log-cleaner] container on host [xxxx]
2019/10/24 19:16:59 [INFO] Removing container [rke-log-cleaner] on host [xxxx], try rancher/rancher#1
2019/10/24 19:16:59 [INFO] cluster [c-xcdfw] provisioning: [remove/rke-log-cleaner] Successfully removed container on host [xxxxx]
2019/10/24 19:16:59 [INFO] cluster [c-xcdfw] provisioning: [remove/rke-log-cleaner] Successfully removed container on host [xxxx]
2019/10/24 19:16:59 [INFO] cluster [c-xcdfw] provisioning: [remove/rke-log-cleaner] Successfully removed container on host [xxxx]
2019/10/24 19:16:59 [INFO] cluster [c-xcdfw] provisioning: [remove/rke-log-cleaner] Successfully removed container on host [xxxx]
2019/10/24 19:16:59 [INFO] cluster [c-xcdfw] provisioning: [sync] Syncing nodes Labels and Taints
2019/10/24 19:16:59 [INFO] cluster [c-xcdfw] provisioning: [sync] Successfully synced nodes Labels and Taints
2019/10/24 19:16:59 [INFO] cluster [c-xcdfw] provisioning: [network] Setting up network plugin: canal
2019/10/24 19:17:00 [INFO] cluster [c-xcdfw] provisioning: [addons] Saving ConfigMap for addon rke-network-plugin to Kubernetes
2019/10/24 19:17:00 [INFO] cluster [c-xcdfw] provisioning: [addons] Successfully saved ConfigMap for addon rke-network-plugin to Kubernetes
2019/10/24 19:17:00 [INFO] cluster [c-xcdfw] provisioning: [addons] Executing deploy job rke-network-plugin
2019/10/24 19:17:35 [ERROR] cluster [c-xcdfw] provisioning: Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system
2019/10/24 19:17:35 [INFO] kontainerdriver rancherkubernetesengine stopped
2019/10/24 19:17:35 [ERROR] ClusterController c-xcdfw [cluster-provisioner-controller] failed with : Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system

Other details that may be helpful:

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): 2.3.2-rc2
  • Installation option (single install/HA): HA

Cluster information

  • Cluster type (Hosted/Infrastructure Provider/Custom/Imported): custom - rhel 7.7
  • Kubernetes version (use kubectl version):
1.16.2-rancher1-1
  • Docker version (use docker version):
native docker - docker 1.13 (native)
@sowmyav27 sowmyav27 self-assigned this Oct 24, 2019
@sowmyav27 sowmyav27 added kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement status/blocker labels Oct 24, 2019
@sowmyav27 sowmyav27 added this to the v2.3.2 milestone Oct 24, 2019
@sowmyav27 sowmyav27 changed the title RHEL 7.7 , k8s 1.16 fails to deploy RHEL 7.7 on k8s 1.16 cluster fails to deploy Oct 24, 2019
@kinarashah
Copy link
Member

Error from the job,

[ec2-user@ip-172-31-11-173 ~]$ kubectl --kubeconfig /tmp/kubeconfig_admin.yaml logs -f rke-network-plugin-deploy-job-xbchn -n kube-system

error: the path "/etc/config/rke-network-plugin.yaml" cannot be accessed: stat /etc/config/rke-network-plugin.yaml: permission denied

@sangeethah
Copy link
Contributor

This issue is not seen when testing with K8s 1.15.5

On a rancher:2.3.2-rc2 - deploy an RHEL 7.7 custom cluster - (3 etcd, 1 control plane, 3 worker nodes) using K8s version 1.15.5 .
Docker - Native docker 1.13
Selinux - Enabled

@superseb
Copy link
Contributor

Requested/provided more info on upstream issue: kubernetes/kubernetes#83679

@sangeethah
Copy link
Contributor

@sowmyav27 Could you test the case of not having rhel 7.7 nodes with Selinux not enabled and having Native docker 1.13 for a k8s 1.16 and see if this issue is not seen ?

@kinarashah kinarashah removed their assignment Oct 25, 2019
@sangeethah
Copy link
Contributor

From @sowmyav27 testing , With RHEL 7.7 and selinux set to "permissive" , able to get a cluster is up and running usinf k8s -1.16, native docker 1.13 using v2.3.2-rc2

@deniseschannon deniseschannon changed the title RHEL 7.7 on k8s 1.16 cluster fails to deploy RHEL 7.7 with selinux enabled using RHEL docker on k8s 1.16 cluster fails to deploy Oct 28, 2019
@deniseschannon deniseschannon modified the milestones: v2.3.2, v2.3.3 Oct 28, 2019
@permanz
Copy link

permanz commented Oct 30, 2019

testing with Centos7.7 with selinux disable and docker version 18.06.3-ce, system image: k8s -1.16.2, can not set up a cluster, the same issue "Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system", but it works fine with k8s image 1.15

@deniseschannon
Copy link

Should be available with latest versions of k8s introduced: 1.17.4, 1.16.8

@sangeethah
Copy link
Contributor

Tesed with latest build from master:

Able to provision a K8s 1.16.8 cluster on hosts with Rhel 7.7 and native docker 1.13 with selinux enabled with following configuration:
1 control
1 etcd
3 workers

From node Details page from Rancher UI

Docker Version: 1.13.1
Kubelet Version: v1.16.8
Kube Proxy Version: v1.16.8
OS: RHEL 7.7 3.10.0-1062.el7.x86_64

From host:

 #docker info | grep Registry
Registry: https://registry.access.redhat.com/v1/
 #docker info | grep "Server Version"
Server Version: 1.13.1
# hostnamectl | grep "Operating System"
  Operating System: Red Hat Enterprise Linux Server 7.7 (Maipo)
# getenforce
Enforcing
# sestatus | grep "SELinux status"
SELinux status:                 enabled

Able to provision a K8s 1.17.4 cluster on hosts with Rhel 7.7 and native docker 1.13 with selinux enabled with following configuration:
1 control
1 etcd
3 workers

From node Details page from Rancher UI

Docker Version: 1.13.1
Kubelet Version: v1.17.4
Kube Proxy Version: v1.17.4
OS: RHEL 7.7 3.10.0-1062.el7.x86_64

From host:

# docker info | grep Registry
Registry: https://registry.access.redhat.com/v1/
# docker info | grep "Server Version"
Server Version: 1.13.1
# hostnamectl | grep "Operating System"
  Operating System: Red Hat Enterprise Linux Server 7.7 (Maipo)
#  getenforce
Enforcing
# sestatus | grep "SELinux status"
SELinux status:                 enabled

@deniseschannon
Copy link

In v2.3-head, can we specifically validate this for v1.16.8-rancher1-1 and v1.17.4-rancher1-1. Since the templates between master-head and v2.3-head are different, we need to do a separate validation for this.

@sangeethah
Copy link
Contributor

Tested with rancher server - 2.3.5 pointing to branch dev-v2.3.

Not ale to bring up a cluster successfully with K8s version 1.16.8(1.16.8-rancher1) or 1.17.4 (v1.17.4-rancher1-1) using hosts with OS RHEL 7.7 , native docker 1.13 and SELINUX on.

Not able to bring up "canal" successfully.
flexvol-driver stuck in "Waiting" state with this error:
cp: can't create '/host/driver/.uds': Permission denied

Screen Shot 2020-03-15 at 10 53 20 AM

@superseb
Copy link
Contributor

Available in rancher/rancher:v2.3.6-rc6

@sangeethah
Copy link
Contributor

Tested with rancher/rancher:v2.3.6-rc6.

Able to bring up a cluster successfully with K8s version 1.16.8(1.16.8-rancher1) using hosts with OS RHEL 7.7 , native docker 1.13 and SELINUX on.
Automation runs succeed on this cluster. No new issues reported.

Able to bring up a cluster successfully with K8s version 1.17.4 (v1.17.4-rancher1-1) using hosts with OS RHEL 7.7 , native docker 1.13 and SELINUX on.
Automation runs succeed on this cluster. No new issues reported.

Canal now uses rancher/calico-node:v3.13.0
Screen Shot 2020-03-16 at 4 17 10 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement
Projects
None yet
Development

No branches or pull requests