Skip to content

Cannot run rancher on hardened system #3195

@rancher-max

Description

@rancher-max

Environmental Info:
RKE2 Version:

v1.22.12+rke2r1, v1.23.9+rke2r1, and v1.24.3+rke2r1

Node(s) CPU architecture, OS, and Version:

Any

Cluster Configuration:

3 servers. Also repro'ed on just 1 server.

Describe the bug:

After bringing up a hardened rke2 cluster and running rancher, I cannot access the rancher UI. The rancher pods appear to all be running, and the UI appears to be accessible via ClusterIP, but it fails to open when using the specified hostname.
I am using AWS in this testing and see that the TargetGroups setup to my LB are showing Unhealthy on 443 and 80 (though 9345 and 6443 are both showing Healthy).

Steps To Reproduce:

  1. Install rke2 using profile: cis-1.6 in the config. Also include a tls-san using a valid hostname.
  2. Ensure the nodes are all setup to recognize the hostname (for example, with AWS I register the nodes to four TargetGroups with ports: 6443, 9345, 80, and 443. Then attach those to a LoadBalancer. Then I create a route53 record pointing to the LoadBalancer DNS. I use the route53 record as the hostname in my tls-san in step 1 above).
  3. Install rancher:
# Update helm, setup namespaces, and install cert-manager
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest && \
helm repo add jetstack https://charts.jetstack.io && \
helm repo update && \
kubectl create namespace cattle-system && \
kubectl create namespace cert-manager && \
kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v1.8.0/cert-manager.crds.yaml && \
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --version v1.8.0

# Ensure cert-manager pods are running:
kubectl get pods --namespace cert-manager

# Install rancher. I've confirmed this happens using 2.6.7-rc5 and 2.6.6 both:
helm install rancher rancher-latest/rancher \
  --namespace cattle-system \
  --set hostname=my.redacted.hostname \
  --set rancherImageTag=v2.6.7-rc5 \
  --version=v2.6.7-rc5
  1. The above steps will finish with an output that contains information to access the UI. Wait until all the pods are available, then access that link:
kubectl -n cattle-system rollout status deploy/rancher

Expected behavior:

The Rancher UI should be accessible.

Actual behavior:

The Rancher UI is not accessible at all.

Additional context / logs:

I believe this is related to the changes from #2206 probably mixed with the PSPs and NetworkPolicies we have in a hardened setup. I think the fix should include updating those netpols or psps to account for this hostnetwork change. The rancher pods do NOT deploy with any specific hostnetwork setting as far as I can tell, and I can't see any errors in the logs:
rancher-pod-logs.log
ingress-nginx.log

I'm happy to reproduce and gather any more information necessary.

Metadata

Metadata

Labels

kind/bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions