CIS scan 1.4 does not work on a multi node cluster #27652

sowmyav27 · 2020-06-19T00:24:08Z

What kind of request is this (question/bug/enhancement/feature request): bug

Steps to reproduce (least amount of steps as possible):

Deploy a cluster - 1 etcd, 1 control and 2 worker nodes
k8s can be - 1.18.4/1.17.7/1.16.11
When the cluster is up and Active, run CIS 1.4 Permissive scan on the cluster
The scan report is seen stuck in "Running" state.

Expected Result:
The scan should finish and the report should be generated successfully.

Other details that may be helpful:
Note:

on 1.15.12-rancher2-3 - 4 node - 1 etcd,1 control and 2 workers - Scan runs fine.
On a 1 node - all roles, 1.18.4 cluster, the scan runs fine

Environment information

Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): 2.4.5-rc8
Installation option (single install/HA): single

Cluster information

Cluster type (Hosted/Infrastructure Provider/Custom/Imported): custom
Kubernetes version (use kubectl version):

1.18.4/1.17.7/1.16.11

gz#13130
gz#13356

The text was updated successfully, but these errors were encountered:

prachidamle · 2020-06-19T19:01:48Z

Analysis of the problem so far with @leodotcloud :

CIS scans are failing because DNS name resolution is not working as the sonobuoy containers run with net:host and dnsPolicy: ClusterFirstWithHostNet
So, it's not a problem with just CIS ... it's a problem with a basic k8s scenario for the new versions 1.18.4/1.17.7/1.16.11
This problem can be recreated by launching any deployment with net:host and dnsPolicy: ClusterFirstWithHostNet on a cluster with these k8s versions
CIS scans run fine for a cluster with 1.18.3/1.17.6/1.16.10 for a similar node/role setup.

maggieliu · 2020-06-25T01:29:20Z

This should be resolved by PR here: kubernetes/kubernetes#92354. Waiting for the next k8s patch release.

sowmyav27 · 2020-07-10T02:35:00Z

This issue is still reproducible with k8s - 1.18.5, 1.17.8, 1.16.12.

Note:
Canal and flannel -->CIS scan Does NOT work. Weave and Calico --> CIS scan works.

Oats87 · 2020-07-10T04:51:20Z

This issue is not being caused by kubernetes/kubernetes#92354

UDP service resolution from the host network to a non-local node (in the case where kube-dns is run on 2/3 nodes and you try to resolve using the service portal from the third) does not work:

root@ip-172-31-13-169:~# dig rancher.com @10.43.0.10
; <<>> DiG 9.11.3-1ubuntu1.9-Ubuntu <<>> rancher.com @10.43.0.10
;; global options: +cmd
;; connection timed out; no servers could be reached

What does work is TCP resolution, i.e.

root@ip-172-31-13-169:~# dig +tcp rancher.com @10.43.0.10

; <<>> DiG 9.11.3-1ubuntu1.9-Ubuntu <<>> +tcp rancher.com @10.43.0.10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59809
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 46f7bf0b2f6623f9 (echoed)
;; QUESTION SECTION:
;rancher.com.			IN	A

;; ANSWER SECTION:
rancher.com.		30	IN	A	104.26.4.146
rancher.com.		30	IN	A	172.67.71.14
rancher.com.		30	IN	A	104.26.5.146

;; Query time: 37 msec
;; SERVER: 10.43.0.10#53(10.43.0.10)
;; WHEN: Fri Jul 10 04:51:09 UTC 2020
;; MSG SIZE  rcvd: 133

prachidamle · 2020-07-10T06:13:39Z

This upstream issues seems very relevant: kubernetes/kubernetes#87852

Oats87 · 2020-07-10T06:14:50Z

Reverting the same cluster to v1.18.3-rancher2-2 made remote UDP 53 DNS resolution work.

Oats87 · 2020-07-10T06:29:02Z

Running ethtool --offload flannel.1 rx off tx off made it start working

root@ip-172-31-13-169:~# dig @10.43.0.10 google.com
^Croot@ip-172-31-13-169:~# ethtool --offload flannel.1 rx off tx off
Actual changes:
rx-checksumming: off
tx-checksumming: off
	tx-checksum-ip-generic: off
tcp-segmentation-offload: off
	tx-tcp-segmentation: off [requested on]
	tx-tcp-ecn-segmentation: off [requested on]
	tx-tcp-mangleid-segmentation: off [requested on]
	tx-tcp6-segmentation: off [requested on]
root@ip-172-31-13-169:~# dig @10.43.0.10 google.com

; <<>> DiG 9.11.3-1ubuntu1.9-Ubuntu <<>> @10.43.0.10 google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17227
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 3d0e2dacabf1e076 (echoed)
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		30	IN	A	172.217.5.14

;; Query time: 2 msec
;; SERVER: 10.43.0.10#53(10.43.0.10)
;; WHEN: Fri Jul 10 06:26:10 UTC 2020
;; MSG SIZE  rcvd: 77

root@ip-172-31-13-169:~#

prachidamle · 2020-07-10T06:45:13Z

This is the exact workaround mentioned here kubernetes/kubernetes#87852 (comment)

sowmyav27 · 2020-07-16T19:16:33Z

Verified with 2.4.5 and KDM pointing to dev-v2.4

Deployed clusters using k8s 1.18.6-rancher1-1, 1.17.9-rancher1-1 and 1.16.13-rancher1-1 - all network providers - 4 node clusters - 1 etcd, 1 control plane and 2 worker nodes and 2 nodes - 1 etcd/control/worker and 1 worker node
CIS scan worked fine on all the clusters

sowmyav27 assigned sowmyav27 and prachidamle Jun 19, 2020

sowmyav27 added kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement status/blocker labels Jun 19, 2020

sowmyav27 added this to the v2.4.5 milestone Jun 19, 2020

maggieliu added the [zube]: Working label Jun 19, 2020

maggieliu modified the milestones: v2.4.5, v2.4.6 Jun 19, 2020

maggieliu added the [zube]: Need Info label Jun 25, 2020

zube bot removed the [zube]: Working label Jun 25, 2020

maggieliu mentioned this issue Jul 10, 2020

CVE-2020-8558 #27875

Closed

maggieliu added [zube]: To Test and removed [zube]: Need Info labels Jul 16, 2020

sowmyav27 closed this as completed Jul 16, 2020

zube bot added [zube]: Done and removed [zube]: To Test labels Jul 16, 2020

zube bot removed the [zube]: Done label Oct 15, 2020

leodotcloud mentioned this issue Dec 1, 2020

CIS Scan 1.5 does not work with canal/calico network plugin #30029

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CIS scan 1.4 does not work on a multi node cluster #27652

CIS scan 1.4 does not work on a multi node cluster #27652

sowmyav27 commented Jun 19, 2020 •

edited by leodotcloud

prachidamle commented Jun 19, 2020

maggieliu commented Jun 25, 2020 •

edited

sowmyav27 commented Jul 10, 2020 •

edited

Oats87 commented Jul 10, 2020

prachidamle commented Jul 10, 2020

Oats87 commented Jul 10, 2020

Oats87 commented Jul 10, 2020

prachidamle commented Jul 10, 2020

sowmyav27 commented Jul 16, 2020

CIS scan 1.4 does not work on a multi node cluster #27652

CIS scan 1.4 does not work on a multi node cluster #27652

Comments

sowmyav27 commented Jun 19, 2020 • edited by leodotcloud

prachidamle commented Jun 19, 2020

maggieliu commented Jun 25, 2020 • edited

sowmyav27 commented Jul 10, 2020 • edited

Oats87 commented Jul 10, 2020

prachidamle commented Jul 10, 2020

Oats87 commented Jul 10, 2020

Oats87 commented Jul 10, 2020

prachidamle commented Jul 10, 2020

sowmyav27 commented Jul 16, 2020

sowmyav27 commented Jun 19, 2020 •

edited by leodotcloud

maggieliu commented Jun 25, 2020 •

edited

sowmyav27 commented Jul 10, 2020 •

edited