Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ironic containers running under the metal3-baremetal-operator pod are crash-looping #703

Closed
mcornea opened this issue Jul 29, 2019 · 8 comments

Comments

@mcornea
Copy link
Contributor

mcornea commented Jul 29, 2019

Describe the bug
ironic containers running under the metal3-baremetal-operator pod are crash-looping. After running make:

[cloud-user@rhhi-node-worker-0 ~]$ oc --config dev-scripts/ocp/auth/kubeconfig -n openshift-machine-api status
In project openshift-machine-api on server https://api.rhhi-virt-cluster.qe.lab.redhat.com:6443

svc/cluster-autoscaler-operator - 172.30.113.127 ports 443->8443, 8080->metrics
  deployment/cluster-autoscaler-operator deploys registry.svc.ci.openshift.org/ocp/4.2-2019-07-22-025130@sha256:b95392771f7b3e00d7c9560469a312c4933d097e6b3fe320e7961d688885d6ca
    deployment #1 running for 24 minutes - 1 pod

svc/machine-api-operator - 172.30.178.199:8080 -> metrics
  deployment/machine-api-operator deploys registry.svc.ci.openshift.org/ocp/4.2-2019-07-22-025130@sha256:f2b371781c1320e161a6bc15b9fbbc9f24f3a94c34a45809c2766966f4cc74f0
    deployment #1 running for 40 minutes - 1 pod (warning: 10 restarts)

deployment/machine-api-controllers deploys registry.svc.ci.openshift.org/ocp/4.2-2019-07-22-025130@sha256:0283c4a29d70e44848ecdddffe7b366c2dac25e6a3a1e57719b1c7c5f5ec8021,registry.svc.ci.openshift.org/ocp/4.2-2019-07-22-025130@sha256:0283c4a29d70e44848ecdddffe7b366c2dac25e6a3a1e57719b1c7c5f5ec8021,registry.svc.ci.openshift.org/ocp/4.2-2019-07-22-025130@sha256:f2b371781c1320e161a6bc15b9fbbc9f24f3a94c34a45809c2766966f4cc74f0
  deployment #1 running for 39 minutes - 1 pod

deployment/metal3-baremetal-operator deploys quay.io/metal3-io/baremetal-operator:master,quay.io/metal3-io/ironic:master,quay.io/metal3-io/ironic:master,quay.io/metal3-io/ironic:master,quay.io/metal3-io/ironic:master,quay.io/metal3-io/ironic:master,quay.io/metal3-io/ironic-inspector:master,quay.io/metal3-io/static-ip-manager:latest
  deployment #1 running for 15 minutes - 0/1 pods (warning: 5 restarts)

Errors:
  * container "ironic-api" in pod/metal3-baremetal-operator-74fdb86688-qw6rg is crash-looping
  * container "ironic-conductor" in pod/metal3-baremetal-operator-74fdb86688-qw6rg is crash-looping
  * container "ironic-dnsmasq" in pod/metal3-baremetal-operator-74fdb86688-qw6rg is crash-looping
  * container "ironic-httpd" in pod/metal3-baremetal-operator-74fdb86688-qw6rg is crash-looping

4 errors, 1 warning, 4 infos identified, use 'oc status --suggest' to see details.
[cloud-user@rhhi-node-worker-0 ~]$ 
[cloud-user@rhhi-node-worker-0 ~]$ 
[cloud-user@rhhi-node-worker-0 ~]$ 
[cloud-user@rhhi-node-worker-0 ~]$ 
[cloud-user@rhhi-node-worker-0 ~]$ oc --config dev-scripts/ocp/auth/kubeconfig -n openshift-machine-api get pods
NAME                                          READY   STATUS             RESTARTS   AGE
cluster-autoscaler-operator-cf969ffc4-btz5b   1/1     Running            0          25m
machine-api-controllers-85fc67ff6c-hjg79      3/3     Running            0          40m
machine-api-operator-577f585945-vrg6m         1/1     Running            10         41m
metal3-baremetal-operator-74fdb86688-qw6rg    4/8     CrashLoopBackOff   24         16m

To Reproduce
Deploy a 3 nodes cluster with following config:

[cloud-user@rhhi-node-worker-0 ~]$ cat dev-scripts/config_cloud-user.sh 
#!/bin/bash

# Get a valid pull secret (json string) from
# You can get this secret from https://cloud.openshift.com/clusters/install#pull-secret
set +x
NODES_PLATFORM=baremetal
INT_IF=eth1
PRO_IF=eth0
CLUSTER_PRO_IF=ens3
EXT_IF=
ROOT_DISK=/dev/sda
NODES_FILE=/home/cloud-user/instackenv.json
MANAGE_BR_BRIDGE=n
NUM_WORKERS=0
CLUSTER_NAME=rhhi-virt-cluster
BASE_DOMAIN=qe.lab.redhat.com

Expected/observed behavior
Deployment completes successfully but the ironic containers running under the baremetal-operator pod are crash-looping

Additional context

[cloud-user@rhhi-node-worker-0 ~]$ oc --config dev-scripts/ocp/auth/kubeconfig -n openshift-machine-api logs metal3-baremetal-operator-74fdb86688-qw6rg -c ironic-api
iptables v1.4.21: can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.
[cloud-user@rhhi-node-worker-0 ~]$ oc --config dev-scripts/ocp/auth/kubeconfig -n openshift-machine-api logs metal3-baremetal-operator-74fdb86688-qw6rg -c ironic-conductor
iptables v1.4.21: can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.
[cloud-user@rhhi-node-worker-0 ~]$ oc --config dev-scripts/ocp/auth/kubeconfig -n openshift-machine-api logs metal3-baremetal-operator-74fdb86688-qw6rg -c ironic-dnsmasq
iptables v1.4.21: can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.
[cloud-user@rhhi-node-worker-0 ~]$ oc --config dev-scripts/ocp/auth/kubeconfig -n openshift-machine-api logs metal3-baremetal-operator-74fdb86688-qw6rg -c ironic-httpd
iptables v1.4.21: can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.
@mcornea
Copy link
Contributor Author

mcornea commented Jul 30, 2019

I managed to get the ironic containers running on the master node after:

  • modprobe ip_tables
  • setenforce 0

avc denials:

[root@rhhi-node-master-0 core]# modprobe ip_tables
[root@rhhi-node-master-0 core]# lsmod | grep iptable
[root@rhhi-node-master-0 core]# setenforce 0
[root@rhhi-node-master-0 core]# lsmod | grep iptable
iptable_filter         16384  1
ip_tables              28672  1 iptable_filter

[root@rhhi-node-master-0 core]# dmesg | grep denied
[ 3505.474392] audit: type=1400 audit(1564448204.432:5): avc:  denied  { module_request } for  pid=98308 comm="iptables" kmod="iptable_filter" scontext=system_u:system_r:container_t:s0:c105,c202 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=0
[ 3505.481231] audit: type=1400 audit(1564448204.433:6): avc:  denied  { module_request } for  pid=98308 comm="iptables" kmod="iptable_filter" scontext=system_u:system_r:container_t:s0:c105,c202 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=0
[ 3505.486755] audit: type=1400 audit(1564448204.435:7): avc:  denied  { module_request } for  pid=98309 comm="iptables" kmod="iptable_filter" scontext=system_u:system_r:container_t:s0:c105,c202 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=0
[ 3505.491788] audit: type=1400 audit(1564448204.435:8): avc:  denied  { module_request } for  pid=98309 comm="iptables" kmod="iptable_filter" scontext=system_u:system_r:container_t:s0:c105,c202 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=0
[ 3587.168118] audit: type=1400 audit(1564448286.124:9): avc:  denied  { module_request } for  pid=104037 comm="iptables" kmod="iptable_filter" scontext=system_u:system_r:container_t:s0:c105,c202 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=0
[ 3587.176576] audit: type=1400 audit(1564448286.124:10): avc:  denied  { module_request } for  pid=104037 comm="iptables" kmod="iptable_filter" scontext=system_u:system_r:container_t:s0:c105,c202 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=0
[ 3587.184230] audit: type=1400 audit(1564448286.127:11): avc:  denied  { module_request } for  pid=104038 comm="iptables" kmod="iptable_filter" scontext=system_u:system_r:container_t:s0:c105,c202 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=0
[ 3587.191919] audit: type=1400 audit(1564448286.127:12): avc:  denied  { module_request } for  pid=104038 comm="iptables" kmod="iptable_filter" scontext=system_u:system_r:container_t:s0:c105,c202 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=0
[ 3589.014374] audit: type=1400 audit(1564448287.971:13): avc:  denied  { module_request } for  pid=104166 comm="iptables" kmod="iptable_filter" scontext=system_u:system_r:container_t:s0:c105,c202 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=0
[ 3589.024676] audit: type=1400 audit(1564448287.973:14): avc:  denied  { module_request } for  pid=104166 comm="iptables" kmod="iptable_filter" scontext=system_u:system_r:container_t:s0:c105,c202 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=0
[ 3589.033340] audit: type=1400 audit(1564448287.979:15): avc:  denied  { module_request } for  pid=104167 comm="iptables" kmod="iptable_filter" scontext=system_u:system_r:container_t:s0:c105,c202 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=0
[ 3589.042370] audit: type=1400 audit(1564448287.979:16): avc:  denied  { module_request } for  pid=104167 comm="iptables" kmod="iptable_filter" scontext=system_u:system_r:container_t:s0:c105,c202 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=0
[ 3600.827937] audit: type=1400 audit(1564448299.783:20): avc:  denied  { module_request } for  pid=105038 comm="iptables" kmod="iptable_filter" scontext=system_u:system_r:container_t:s0:c105,c202 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=1

@russellb
Copy link
Member

I would prefer that we remove all iptables calls from all of the containers. I don't think they are actually necessary for Ironic running within the cluster, but we need to verify that.

@e-minguez
Copy link
Contributor

I'm having the same issue in a baremetal deployment (and fixed with the modprobe and disable selinux thing).
Dan Walsh disapproves this workaround 🗡️

@dhellmann
Copy link
Member

I would prefer that we remove all iptables calls from all of the containers. I don't think they are actually necessary for Ironic running within the cluster, but we need to verify that.

Do we need those for the podman-run containers on the provisioning host? Should we add a switch to enable/disable them instead of just removing them?

@russellb
Copy link
Member

soon none of them will be running on the provisioning host once ironic is moved into the bootstrap VM.

@yprokule
Copy link
Contributor

Same as metal3-io/ironic-image#82 ?

@dhellmann
Copy link
Member

Is this still an issue?

@mcornea
Copy link
Contributor Author

mcornea commented Aug 23, 2019

Is this still an issue?

Nope, closed.

@mcornea mcornea closed this as completed Aug 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants