Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registry TLS configuration from registries.yaml is only honored for mirror endpoints #5658

Closed
snipking opened this issue Apr 2, 2024 · 10 comments
Assignees
Labels
kind/bug Something isn't working kind/upstream-issue This issue appears to be caused by an upstream bug

Comments

@snipking
Copy link

snipking commented Apr 2, 2024

Environmental Info:
RKE2 Version:
v1.27.12+rke2r1

Node(s) CPU architecture, OS, and Version:
x86,CentOS7

Cluster Configuration:
any configuration can reproduce

Describe the bug:
config Containerd registry with following configuration in /etc/rancher/rke2/registries.yaml

configs:
  "192.168.2.74:31443":
    tls:
      insecure_skip_verify: true

which generate containerd configuration in /var/lib/rancher/rke2/agent/etc/containerd/certs.d/192.168.2.74:31443/hosts.toml and skip_verify not work

# File generated by rke2. DO NOT EDIT.

server = "https://192.168.2.74:31443/v2"
capabilities = ["pull", "resolve", "push"]

skip_verify = true

It seems rke2 1.27.12+rke2r1 generate hosts.toml in wrong format, 1.27.10+rke2r1 generate following hosts.toml and working fine.

# File generated by rke2. DO NOT EDIT.

[host."https://192.168.2.74:31443/v2"]
  capabilities = ["pull", "resolve"]
  skip_verify = true

also change hosts.toml to upper format let skip_verify works but restart rke2-server/rke2-agent will overwrite it with wrong one.

Steps To Reproduce:

  • install rke2 1.27.12+rke2r1
  • config /etc/rancher/rke2/registries.yaml
  • restart rke2-server or rke2-agent
  • verify /var/lib/rancher/rke2/agent/etc/containerd/certs.d/192.168.2.74:31443/hosts.toml
@Bixlid
Copy link

Bixlid commented Apr 2, 2024

Hi,
We upgraded rke2 from 1.27.11 to 1.27.12 and we have exactly the same problem.
Same workaround by changing the file format works, but as you mentioned we can't restart the agent.

@snipking
Copy link
Author

snipking commented Apr 2, 2024

As a temporary solution, download rke2 exec version 1.27.11+rke2r1 from releases and override /usr/bin/rke2, then restart rke2-server/rke2-agent.

@snipking
Copy link
Author

snipking commented Apr 2, 2024

Upstream issue containerd/containerd#10027

@brandond
Copy link
Contributor

brandond commented Apr 2, 2024

This is being tracked in k3s, as that is where the code in question lives: k3s-io/k3s#9839. This will be resolved when we pull through K3s updates for our next release cycle.

If possible, I would suggest using the workaround at k3s-io/k3s#9839 (comment). However this will only work if your registry namespace does not already include a port:

mirrors:
 172-17-0-7.sslip.io:
   endpoint:
     - https://172-17-0-7.sslip.io:443
configs:
 "172-17-0-7.sslip.io:443":
   tls:
     ca_file: /usr/local/share/ca-certificates/registry.crt

@brandond brandond self-assigned this Apr 2, 2024
@brandond brandond added kind/bug Something isn't working kind/upstream-issue This issue appears to be caused by an upstream bug labels Apr 2, 2024
@brandond brandond added this to the v1.29.4+rke2r1 milestone Apr 2, 2024
@brandond
Copy link
Contributor

brandond commented Apr 2, 2024

Using 172-17-0-7.sslip.io as an example registry, the two possible work-arounds are:

  1. If your registry namespace does not currently include a port, configure a mirror endpoint with a port:
    mirrors:
      172-17-0-7.sslip.io:
        endpoint:
          - https://172-17-0-7.sslip.io:443
    configs:
      "172-17-0-7.sslip.io:443":
        tls:
          ca_file: /usr/local/share/ca-certificates/registry.crt
  2. Manually drop the CA certificate into the registry namespace's configuration directory, and make it immutable so that RKE2 does not remove it when restarting:
    mkdir -p /var/lib/rancher/rke2/agent/etc/containerd/certs.d/172-17-0-7.sslip.io/
    cp /usr/local/share/ca-certificates/registry.crt /var/lib/rancher/rke2/agent/etc/containerd/certs.d/172-17-0-7.sslip.io/ca.crt
    chattr +i /var/lib/rancher/rke2/agent/etc/containerd/certs.d/172-17-0-7.sslip.io/ca.crt

@brandond brandond changed the title Containerd Registry Configuration generate wrong hosts.toml [stable v1.27.12+rke2r1] Registry TLS configuration from registries.yaml is only honored for mirror endpoints Apr 2, 2024
@brandond brandond pinned this issue Apr 2, 2024
@vincebrannon
Copy link

SURE-8103

@aganesh-suse aganesh-suse self-assigned this Apr 8, 2024
@belgaied2
Copy link

@brandond Can we have this in a v1.27.12+rke2r2 and not wait for a new patch version for K8s upstream ?

@brandond
Copy link
Contributor

No, we are not planning on doing an r2 for this. Upstream patches will be out next week, and there are two possible workarounds available on the current release.

@JacieChao
Copy link

Encounter with the problem where Skip TLS Verifications can't work properly when provisioning RKE2 and K3s clusters by Rancher v2.8.3. The workaround above worked for me.

For anyone who uses Rancher to provision RKE2 or K3s cluster and needs to configure Skip TLS or pass the CA cert, follow the steps below:

  • If the registry namespace does not include a port, configure a mirror endpoint with a port:
machineSelectorConfig:
  - config:
      protect-kernel-defaults: false
      system-default-registry: harbor.jacie.work
registries:
  configs:
    harbor.jacie.work:443:
      insecureSkipVerify: true
  mirrors:
    harbor.jacie.work:
      endpoint:
        - https://harbor.jacie.work:443
image
  • If the registry namespace with a port, configure the mirror hostname and endpoint differently:
machineSelectorConfig:
  - config:
      protect-kernel-defaults: false
      system-default-registry: 192.168.3.100
registries:
  configs:
    192.168.3.100:8088:
      caBundle: <your-ca-cert-here>
  mirrors:
    192.168.3.100:
      endpoint:
        - https://192.168.3.100:8088
image

@aganesh-suse
Copy link

Validated on master branch with version v1.29.4-rc1+rke2r1

Environment Details

Infrastructure

  • Cloud
  • Hosted

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"

$ uname -m
x86_64

Cluster Configuration:

HA : 3 server / 1 agent

or

1 server/ 1 agent

Config.yaml:

token: xxxx
write-kubeconfig-mode: "0644"
node-external-ip: 1.1.1.1

registries.yaml:

 $ sudo cat /etc/rancher/k3s/registries.yaml
mirrors:
  pvt-registry.com:
    endpoint:
      - pvt-registry.com
  docker.io:
    endpoint:
      - pvt-registry.com      
  k8s.gcr.io:
    endpoint:
      - pvt-registry.com      
configs:
  pvt-registry.com:
    auth:
      username: xxxx
      password: xxxx
    tls:
      ca_file: /home/user/ca.pem

test-image.yaml:

apiVersion: v1
kind: Namespace
metadata:
  name: pvt-reg-test
  labels:
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/warn: privileged
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pvt-reg-test
  namespace: pvt-reg-test
spec:
  selector:
    matchLabels:
      k8s-app: nginx-app-clusterip
  replicas: 2
  template:
    metadata:
      labels:
        k8s-app: nginx-app-clusterip
    spec:
      containers:
      - name: nginx
        image: pvt-registry.com/nginx:latest
        ports:
        - containerPort: 8080

Testing Steps

  1. Copy config.yaml
$ sudo mkdir -p /etc/rancher/rke2 && sudo cp config.yaml /etc/rancher/rke2
$ sudo cp registries.yaml /etc/rancher/rke2
  1. Install RKE2
curl -sfL https://get.rke2.io | sudo INSTALL_RKE2_VERSION='v1.29.4-rc1+rke2r1' INSTALL_RKE2_TYPE='server' INSTALL_RKE2_METHOD=tar sh -
  1. Start the RKE2 service
$ sudo systemctl enable --now rke2-server
or 
$ sudo systemctl enable --now rke2-agent
  1. Verify Cluster Status:
kubectl get nodes -o wide
kubectl get pods -A
  1. Push an image onto the private registry and try to deploy a pod with said image.
    The image should get pulled and pod should come up without any tls certificate errors.
$ kubectl apply -f test-image.yaml
$ kubectl get pods -n pvt-reg-test
$ kubectl describe pod/pvt-reg-test-abcd -n pvt-reg-test
  1. Check the hosts.toml files for host section

Replication Results:

  • rke2 version used for replication:
$ rke2 -v
rke2 version v1.29.3+rke2r1 (1c82f7ed292c4ac172692bb82b13d20733909804)
go version go1.21.8 X:boringcrypto
$ kubectl get pods -A
NAMESPACE      NAME                                                   READY   STATUS             RESTARTS   AGE
kube-system    cloud-controller-manager-ip-172-31-17-31               1/1     Running            0          5m2s
kube-system    cloud-controller-manager-ip-172-31-28-77               1/1     Running            0          7m32s
kube-system    cloud-controller-manager-ip-172-31-29-95               1/1     Running            0          6m6s
kube-system    etcd-ip-172-31-17-31                                   1/1     Running            0          4m47s
kube-system    etcd-ip-172-31-28-77                                   1/1     Running            0          7m15s
kube-system    etcd-ip-172-31-29-95                                   1/1     Running            0          5m56s
kube-system    helm-install-rke2-canal-fdvd2                          0/1     Completed          0          7m26s
kube-system    helm-install-rke2-coredns-j9vfg                        0/1     Completed          0          7m26s
kube-system    helm-install-rke2-ingress-nginx-fx8lk                  0/1     Completed          0          7m26s
kube-system    helm-install-rke2-metrics-server-tdl4g                 0/1     Completed          0          7m26s
kube-system    helm-install-rke2-snapshot-controller-crd-qwwwf        0/1     Completed          0          7m25s
kube-system    helm-install-rke2-snapshot-controller-p9nm6            0/1     Completed          0          7m26s
kube-system    helm-install-rke2-snapshot-validation-webhook-bdwc7    0/1     Completed          0          7m25s
kube-system    kube-apiserver-ip-172-31-17-31                         1/1     Running            0          5m2s
kube-system    kube-apiserver-ip-172-31-28-77                         1/1     Running            0          7m33s
kube-system    kube-apiserver-ip-172-31-29-95                         1/1     Running            0          6m20s
kube-system    kube-controller-manager-ip-172-31-17-31                1/1     Running            0          5m2s
kube-system    kube-controller-manager-ip-172-31-28-77                1/1     Running            0          7m32s
kube-system    kube-controller-manager-ip-172-31-29-95                1/1     Running            0          6m18s
kube-system    kube-proxy-ip-172-31-17-31                             1/1     Running            0          4m55s
kube-system    kube-proxy-ip-172-31-25-109                            1/1     Running            0          4m29s
kube-system    kube-proxy-ip-172-31-28-77                             1/1     Running            0          7m36s
kube-system    kube-proxy-ip-172-31-29-95                             1/1     Running            0          6m15s
kube-system    kube-scheduler-ip-172-31-17-31                         1/1     Running            0          5m2s
kube-system    kube-scheduler-ip-172-31-28-77                         1/1     Running            0          7m32s
kube-system    kube-scheduler-ip-172-31-29-95                         1/1     Running            0          6m18s
kube-system    rke2-canal-29n6c                                       2/2     Running            0          7m17s
kube-system    rke2-canal-dpkjj                                       2/2     Running            0          4m30s
kube-system    rke2-canal-s92xl                                       2/2     Running            0          6m21s
kube-system    rke2-canal-z4dz6                                       2/2     Running            0          5m3s
kube-system    rke2-coredns-rke2-coredns-5b7d84d764-rwjrx             1/1     Running            0          7m18s
kube-system    rke2-coredns-rke2-coredns-5b7d84d764-tcwlc             1/1     Running            0          6m17s
kube-system    rke2-coredns-rke2-coredns-autoscaler-b49765765-7fjjp   1/1     Running            0          7m19s
kube-system    rke2-ingress-nginx-controller-b2kpl                    1/1     Running            0          6m27s
kube-system    rke2-ingress-nginx-controller-d4s6z                    1/1     Running            0          4m6s
kube-system    rke2-ingress-nginx-controller-ptvxn                    1/1     Running            0          5m57s
kube-system    rke2-ingress-nginx-controller-rxf94                    1/1     Running            0          4m5s
kube-system    rke2-metrics-server-544c8c66fc-lrcbw                   1/1     Running            0          6m41s
kube-system    rke2-snapshot-controller-59cc9cd8f4-kmfk8              1/1     Running            0          6m40s
kube-system    rke2-snapshot-validation-webhook-54c5989b65-bhmvc      1/1     Running            0          6m39s
pvt-reg-test   pvt-reg-test-58487d4cbf-2sndw                          0/1     ImagePullBackOff   0          16s
pvt-reg-test   pvt-reg-test-58487d4cbf-wm2tm                          0/1     ImagePullBackOff   0          16s

Pod Events:

Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  6m37s                  default-scheduler  Successfully assigned pvt-reg-test/pvt-reg-test-58487d4cbf-wm2tm to ip-172-31-17-31
  Normal   Pulling    5m13s (x4 over 6m37s)  kubelet            Pulling image "pvt-registry.com/nginx:latest"
  Warning  Failed     5m13s (x4 over 6m37s)  kubelet            Failed to pull image "pvt-registry.com/nginx:latest": failed to pull and unpack image "pvt-registry.com/nginx:latest": failed to resolve reference "pvt-registry.com/nginx:latest": failed to do request: Head "https://pvt-registry.com/v2/nginx/manifests/latest": tls: failed to verify certificate: x509: certificate signed by unknown authority
  Warning  Failed     5m13s (x4 over 6m37s)  kubelet            Error: ErrImagePull
  Warning  Failed     5m1s (x6 over 6m37s)   kubelet            Error: ImagePullBackOff
  Normal   BackOff    92s (x21 over 6m37s)   kubelet            Back-off pulling image "pvt-registry.com/nginx:latest"

Validation Results:

  • rke2 version used for validation:
$ rke2 -v
rke2 version v1.29.4-rc1+rke2r1 (63fe2cafc55fedaa51b16eb108e73aa6a1344618)
go version go1.21.9 X:boringcrypto
$ kubectl get pods -A
NAMESPACE      NAME                                                   READY   STATUS      RESTARTS   AGE
kube-system    cloud-controller-manager-ip-172-31-22-158              1/1     Running     0          4m28s
kube-system    cloud-controller-manager-ip-172-31-22-216              1/1     Running     0          7m6s
kube-system    cloud-controller-manager-ip-172-31-24-57               1/1     Running     0          5m39s
kube-system    etcd-ip-172-31-22-158                                  1/1     Running     0          4m24s
kube-system    etcd-ip-172-31-22-216                                  1/1     Running     0          6m52s
kube-system    etcd-ip-172-31-24-57                                   1/1     Running     0          5m27s
kube-system    helm-install-rke2-canal-hnjm7                          0/1     Completed   0          6m53s
kube-system    helm-install-rke2-coredns-9xvfz                        0/1     Completed   0          6m53s
kube-system    helm-install-rke2-ingress-nginx-dxb4b                  0/1     Completed   0          6m53s
kube-system    helm-install-rke2-metrics-server-t2b7z                 0/1     Completed   0          6m53s
kube-system    helm-install-rke2-snapshot-controller-crd-nd9t7        0/1     Completed   0          6m52s
kube-system    helm-install-rke2-snapshot-controller-crht4            0/1     Completed   1          6m53s
kube-system    helm-install-rke2-snapshot-validation-webhook-2jnmq    0/1     Completed   0          6m52s
kube-system    kube-apiserver-ip-172-31-22-158                        1/1     Running     0          4m33s
kube-system    kube-apiserver-ip-172-31-22-216                        1/1     Running     0          6m57s
kube-system    kube-apiserver-ip-172-31-24-57                         1/1     Running     0          5m14s
kube-system    kube-controller-manager-ip-172-31-22-158               1/1     Running     0          4m28s
kube-system    kube-controller-manager-ip-172-31-22-216               1/1     Running     0          7m
kube-system    kube-controller-manager-ip-172-31-24-57                1/1     Running     0          5m39s
kube-system    kube-proxy-ip-172-31-20-45                             1/1     Running     0          4m23s
kube-system    kube-proxy-ip-172-31-22-158                            1/1     Running     0          4m33s
kube-system    kube-proxy-ip-172-31-22-216                            1/1     Running     0          7m6s
kube-system    kube-proxy-ip-172-31-24-57                             1/1     Running     0          5m43s
kube-system    kube-scheduler-ip-172-31-22-158                        1/1     Running     0          4m28s
kube-system    kube-scheduler-ip-172-31-22-216                        1/1     Running     0          7m
kube-system    kube-scheduler-ip-172-31-24-57                         1/1     Running     0          5m39s
kube-system    rke2-canal-6b65v                                       2/2     Running     0          4m23s
kube-system    rke2-canal-n5gjt                                       2/2     Running     0          6m41s
kube-system    rke2-canal-t9hmf                                       2/2     Running     0          4m47s
kube-system    rke2-canal-z5299                                       2/2     Running     0          6m2s
kube-system    rke2-coredns-rke2-coredns-5b7d84d764-cfdhb             1/1     Running     0          5m53s
kube-system    rke2-coredns-rke2-coredns-5b7d84d764-pqdzw             1/1     Running     0          6m43s
kube-system    rke2-coredns-rke2-coredns-autoscaler-b49765765-jmqn5   1/1     Running     0          6m43s
kube-system    rke2-ingress-nginx-controller-2s2kt                    1/1     Running     0          5m30s
kube-system    rke2-ingress-nginx-controller-7hscv                    1/1     Running     0          4m18s
kube-system    rke2-ingress-nginx-controller-8p2zr                    1/1     Running     0          4m2s
kube-system    rke2-ingress-nginx-controller-f6g45                    1/1     Running     0          5m38s
kube-system    rke2-metrics-server-5b965c548d-cvlkv                   1/1     Running     0          5m54s
kube-system    rke2-snapshot-controller-59cc9cd8f4-s2h5j              1/1     Running     0          5m50s
kube-system    rke2-snapshot-validation-webhook-54c5989b65-vnqnd      1/1     Running     0          5m53s
pvt-reg-test   pvt-reg-test-689d88767c-mfxhh                          1/1     Running     0          16s
pvt-reg-test   pvt-reg-test-689d88767c-x8xj9                          1/1     Running     0          16s

Pod Events:

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  14s   default-scheduler  Successfully assigned pvt-reg-test/pvt-reg-test-689d88767c-mfxhh to ip-172-31-20-45
  Normal  Pulling    14s   kubelet            Pulling image "pvt-registry.com/nginx:latest"
  Normal  Pulled     8s    kubelet            Successfully pulled image "pvt-registry.com/nginx:latest" in 5.167s (5.167s including waiting)
  Normal  Created    8s    kubelet            Created container nginx
  Normal  Started    8s    kubelet            Started container nginx

Check hosts.toml file contents for host section:

$ sudo cat /var/lib/rancher/rke2/agent/etc/containerd/certs.d/pvt-registry.com/hosts.toml 
# File generated by rke2. DO NOT EDIT.

server = "https://pvt-registry.com/v2"
capabilities = ["pull", "resolve", "push"]

ca = ["/home/ubuntu/ca.pem"]


[host]
$ sudo cat /var/lib/rancher/rke2/agent/etc/containerd/certs.d/docker.io/hosts.toml 
# File generated by rke2. DO NOT EDIT.

server = "https://registry-1.docker.io/v2"
capabilities = ["pull", "resolve", "push"]


[host]
[host."https://pvt-registry.com/v2"]
  capabilities = ["pull", "resolve"]
  ca = ["/home/ubuntu/ca.pem"]

$ sudo cat /var/lib/rancher/rke2/agent/etc/containerd/certs.d/k8s.gcr.io/hosts.toml 
# File generated by rke2. DO NOT EDIT.

server = "https://k8s.gcr.io/v2"
capabilities = ["pull", "resolve", "push"]


[host]
[host."https://pvt-registry.com/v2"]
  capabilities = ["pull", "resolve"]
  ca = ["/home/ubuntu/ca.pem"]

@manuelbuil manuelbuil unpinned this issue May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working kind/upstream-issue This issue appears to be caused by an upstream bug
Projects
None yet
Development

No branches or pull requests

7 participants