Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Ziti Controller Helm Chart Hangs During Install #133

Closed
TheDarkula opened this issue Sep 1, 2023 · 26 comments
Closed

The Ziti Controller Helm Chart Hangs During Install #133

TheDarkula opened this issue Sep 1, 2023 · 26 comments

Comments

@TheDarkula
Copy link
Contributor

I started playing with the helm charts, and I tried to install the ziti-controller.

Unfortunately, the initContainer never succeeds, but hangs on a configmap:

MountVolume.SetUp failed for volume "ziti-controller-ctrl-plane-cas" : configmap "ziti-controller-ctrl-plane-cas" not found

I installed the chart with this command:

helm upgrade --install ziti-controller openziti/ziti-controller --namespace ziti-controller --create-namespace --values values.yaml
@qrkourier
Copy link
Member

Is it possible it is successful but slow? That configmap is one of the last things to be automatically created. It's used by a Trust Manager Bundle resource to store Ziti's trust anchors. It's created by trust-manager when all the necessary certificates are finally created by cert-manager.

If trust-manager is waiting for cert-manager then there should be a complaint in the logs from one or both of them.

@TheDarkula
Copy link
Contributor Author

@qrkourier Thank you for the insight.
I had a look at the cert-manager pod logs, and it seems to be unhappy with spec.dnsNames:

I0901 18:01:52.756295       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-admin-client-cert" condition "Ready" to 2023-09-01 18:01:52.756283847 +0000 UTC m=+339978.708668099
I0901 18:01:52.759265       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-web-root-cert" condition "Ready" to 2023-09-01 18:01:52.759257936 +0000 UTC m=+339978.711642195
I0901 18:01:52.759408       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-ctrl-plane-intermediate-cert" condition "Ready" to 2023-09-01 18:01:52.759402627 +0000 UTC m=+339978.711786887
I0901 18:01:52.759824       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-edge-root-cert" condition "Ready" to 2023-09-01 18:01:52.759818722 +0000 UTC m=+339978.712202978
I0901 18:01:52.759970       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-ctrl-plane-identity" condition "Ready" to 2023-09-01 18:01:52.75996466 +0000 UTC m=+339978.712348924
I0901 18:01:52.763396       1 trigger_controller.go:194] "cert-manager/certificates-trigger: Certificate must be re-issued" key="ziti-controller/ziti-controller-web-identity-cert" reason="SecretMismatch" message="Existing issued Secret is not up to date for spec: [spec.dnsNames]"
I0901 18:01:52.763420       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-web-identity-cert" condition "Issuing" to 2023-09-01 18:01:52.763413735 +0000 UTC m=+339978.715797998
I0901 18:01:52.766933       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-web-identity-cert" condition "Ready" to 2023-09-01 18:01:52.76692576 +0000 UTC m=+339978.719310017
I0901 18:01:52.769548       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-ctrl-plane-root-cert" condition "Ready" to 2023-09-01 18:01:52.769540619 +0000 UTC m=+339978.721924877
I0901 18:01:52.770573       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-edge-signer-cert" condition "Ready" to 2023-09-01 18:01:52.770567183 +0000 UTC m=+339978.722951440
I0901 18:01:52.773231       1 conditions.go:96] Setting lastTransitionTime for Issuer "ziti-controller-edge-root-issuer" condition "Ready" to 2023-09-01 18:01:52.773223209 +0000 UTC m=+339978.725607470
I0901 18:01:52.774914       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-web-intermediate-cert" condition "Ready" to 2023-09-01 18:01:52.774905224 +0000 UTC m=+339978.727289482
I0901 18:01:52.777098       1 conditions.go:96] Setting lastTransitionTime for Issuer "ziti-controller-selfsigned-ca-issuer" condition "Ready" to 2023-09-01 18:01:52.777052897 +0000 UTC m=+339978.729437165
I0901 18:01:52.777124       1 conditions.go:96] Setting lastTransitionTime for Issuer "ziti-controller-ctrl-plane-root-issuer" condition "Ready" to 2023-09-01 18:01:52.777118874 +0000 UTC m=+339978.729503132
I0901 18:01:52.777353       1 conditions.go:96] Setting lastTransitionTime for Issuer "ziti-controller-edge-signer-issuer" condition "Ready" to 2023-09-01 18:01:52.777348777 +0000 UTC m=+339978.729733031
I0901 18:01:52.777406       1 conditions.go:96] Setting lastTransitionTime for Issuer "ziti-controller-ctrl-plane-intermediate-issuer" condition "Ready" to 2023-09-01 18:01:52.777401055 +0000 UTC m=+339978.729785309
I0901 18:01:52.780469       1 controller.go:162] "cert-manager/certificates-readiness: re-queuing item due to optimistic locking on resource" key="ziti-controller/ziti-controller-web-identity-cert" error="Operation cannot be fulfilled on certificates.cert-manager.io \"ziti-controller-web-identity-cert\": the object has been modified; please apply your changes to the latest version and try again"
I0901 18:01:52.782319       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-web-identity-cert" condition "Ready" to 2023-09-01 18:01:52.782309829 +0000 UTC m=+339978.734694092
I0901 18:01:52.782577       1 conditions.go:96] Setting lastTransitionTime for Issuer "ziti-controller-web-root-issuer" condition "Ready" to 2023-09-01 18:01:52.782571269 +0000 UTC m=+339978.734955521
I0901 18:01:52.786030       1 conditions.go:96] Setting lastTransitionTime for Issuer "ziti-controller-web-intermediate-issuer" condition "Ready" to 2023-09-01 18:01:52.786022409 +0000 UTC m=+339978.738406662
I0901 18:01:52.796080       1 controller.go:162] "cert-manager/certificates-readiness: re-queuing item due to optimistic locking on resource" key="ziti-controller/ziti-controller-web-identity-cert" error="Operation cannot be fulfilled on certificates.cert-manager.io \"ziti-controller-web-identity-cert\": the object has been modified; please apply your changes to the latest version and try again"
I0901 18:01:52.797302       1 controller.go:162] "cert-manager/certificates-key-manager: re-queuing item due to optimistic locking on resource" key="ziti-controller/ziti-controller-web-identity-cert" error="Operation cannot be fulfilled on certificates.cert-manager.io \"ziti-controller-web-identity-cert\": the object has been modified; please apply your changes to the latest version and try again"
I0901 18:01:52.797656       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-web-identity-cert" condition "Ready" to 2023-09-01 18:01:52.797649065 +0000 UTC m=+339978.750033340
I0901 18:01:52.815492       1 conditions.go:263] Setting lastTransitionTime for CertificateRequest "ziti-controller-web-identity-cert-rk6lk" condition "Approved" to 2023-09-01 18:01:52.815482915 +0000 UTC m=+339978.767867177
I0901 18:01:52.828598       1 conditions.go:263] Setting lastTransitionTime for CertificateRequest "ziti-controller-web-identity-cert-rk6lk" condition "Ready" to 2023-09-01 18:01:52.828590302 +0000 UTC m=+339978.780974554
I0901 18:01:52.852078       1 conditions.go:192] Found status change for Certificate "ziti-controller-web-identity-cert" condition "Ready": "False" -> "True"; setting lastTransitionTime to 2023-09-01 18:01:52.852072086 +0000 UTC m=+339978.804456345
I0901 18:01:52.866373       1 controller.go:162] "cert-manager/certificates-readiness: re-queuing item due to optimistic locking on resource" key="ziti-controller/ziti-controller-web-identity-cert" error="Operation cannot be fulfilled on certificates.cert-manager.io \"ziti-controller-web-identity-cert\": the object has been modified; please apply your changes to the latest version and try again"
I0901 18:01:52.867527       1 conditions.go:192] Found status change for Certificate "ziti-controller-web-identity-cert" condition "Ready": "False" -> "True"; setting lastTransitionTime to 2023-09-01 18:01:52.867522357 +0000 UTC m=+339978.819906614
I0901 18:01:53.112338       1 controller.go:162] "cert-manager/certificates-key-manager: re-queuing item due to optimistic locking on resource" key="ziti-controller/ziti-controller-web-identity-cert" error="Operation cannot be fulfilled on certificates.cert-manager.io \"ziti-controller-web-identity-cert\": the object has been modified; please apply your changes to the latest version and try again"
I0901 18:01:53.167859       1 controller.go:162] "cert-manager/certificates-issuing: re-queuing item due to optimistic locking on resource" key="ziti-controller/ziti-controller-web-identity-cert" error="Operation cannot be fulfilled on certificates.cert-manager.io \"ziti-controller-web-identity-cert\": the object has been modified; please apply your changes to the latest version and try again"

Also, this is the end of the trust-manager pod logs:

E0901 18:01:52.713068       1 bundle.go:165] trust/bundle "msg"="bundle source was not found" "error"="failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "bundle"="ziti-controller-ctrl-plane-cas"
I0901 18:01:52.713197       1 recorder.go:103] trust/manager/events "msg"="Bundle source was not found: failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "object"={"kind":"Bundle","name":"ziti-controller-ctrl-plane-cas","uid":"7afa4194-1c29-4b66-af17-6ded55138070","apiVersion":"trust.cert-manager.io/v1alpha1","resourceVersion":"4905784"} "reason"="SourceNotFound" "type"="Warning"
I0901 18:01:52.713260       1 recorder.go:103] trust/manager/events "msg"="Bundle source was not found: failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "object"={"kind":"Bundle","name":"ziti-controller-ctrl-plane-cas","uid":"7afa4194-1c29-4b66-af17-6ded55138070","apiVersion":"trust.cert-manager.io/v1alpha1","resourceVersion":"4905784"} "reason"="SourceNotFound" "type"="Warning"
E0901 18:01:52.720676       1 bundle.go:165] trust/bundle "msg"="bundle source was not found" "error"="failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "bundle"="ziti-controller-ctrl-plane-cas"
I0901 18:01:52.721115       1 recorder.go:103] trust/manager/events "msg"="Bundle source was not found: failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "object"={"kind":"Bundle","name":"ziti-controller-ctrl-plane-cas","uid":"7afa4194-1c29-4b66-af17-6ded55138070","apiVersion":"trust.cert-manager.io/v1alpha1","resourceVersion":"4905787"} "reason"="SourceNotFound" "type"="Warning"
I0901 18:01:52.721527       1 recorder.go:103] trust/manager/events "msg"="Bundle source was not found: failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "object"={"kind":"Bundle","name":"ziti-controller-ctrl-plane-cas","uid":"7afa4194-1c29-4b66-af17-6ded55138070","apiVersion":"trust.cert-manager.io/v1alpha1","resourceVersion":"4905787"} "reason"="SourceNotFound" "type"="Warning"

@qrkourier
Copy link
Member

@TheDarkula Does this seem to be reproducible with the latest release of the ziti-controller chart, or reproducible with an experimental ziti-controller chart, or more or less sporadic? Since the conditions seem to be related to the DNS SANs on the leaf certificate that's bound to the edge API web listeners, does it recur when you change the advertised address of one or both of those APIs?

@TheDarkula
Copy link
Contributor Author

@qrkourier I am using the latest helm chart version.
I ran helm repo update before trying a helm upgrade --install.

I have this section in the values.yaml:

clientApi:
  # -- cluster service target port on the container
  containerPort: 1280
  # -- global DNS name by which routers can resolve a reachable IP for this service
  advertisedHost: ziti-controller.domain.com

I am just thinking things through, does that DNS record need to exist publicly?

@qrkourier
Copy link
Member

does that DNS record need to exist publicly?

Yes, in most cases, global, public DNS records make the most sense for a Ziti network that provides an overlay spanning multiple points on the internet.

The client API's advertised address, value .clientApi.advertisedHost, is the DNS name that Ziti SDKs and routers will use to connect to the Ziti Edge Client API provided by the Ziti controller. If your Ziti SDKs and routers are not located inside the same K8s cluster, but connecting over the public internet, then the domain name probably does need to be a global record.

The Ziti controller has exactly one client API advertised address, and all SDKs and routers must use this same domain name to connect to the client API. This allows the SDKs and routers to verify the DNS SAN presented by the controller during TLS negotiation.

@TheDarkula
Copy link
Contributor Author

@qrkourier Thank you for the information :)

I created a public DNS record for the ziti-controller. That moved things along a bit.

Now, the cert-manager logs show this:

I0911 14:24:05.177650       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-edge-root-cert" condition "Ready" to 2023-09-11 14:24:05.177641894 +0000 UTC m=+1190911.130026152
I0911 14:24:05.183402       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-ctrl-plane-root-cert" condition "Ready" to 2023-09-11 14:24:05.183395956 +0000 UTC m=+1190911.135780212
I0911 14:24:05.184475       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-ctrl-plane-intermediate-cert" condition "Ready" to 2023-09-11 14:24:05.184468407 +0000 UTC m=+1190911.136852661
I0911 14:24:05.184690       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-ctrl-plane-identity" condition "Ready" to 2023-09-11 14:24:05.184686675 +0000 UTC m=+1190911.137070932
I0911 14:24:05.193081       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-web-root-cert" condition "Ready" to 2023-09-11 14:24:05.193070899 +0000 UTC m=+1190911.145455163
I0911 14:24:05.193352       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-admin-client-cert" condition "Ready" to 2023-09-11 14:24:05.193347751 +0000 UTC m=+1190911.145732009
I0911 14:24:05.199165       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-web-intermediate-cert" condition "Ready" to 2023-09-11 14:24:05.199158885 +0000 UTC m=+1190911.151543137
I0911 14:24:05.201168       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-edge-signer-cert" condition "Ready" to 2023-09-11 14:24:05.201161876 +0000 UTC m=+1190911.153546129
I0911 14:24:05.244935       1 conditions.go:203] Setting lastTransitionTime for Certificate "ziti-controller-web-identity-cert" condition "Ready" to 2023-09-11 14:24:05.244928621 +0000 UTC m=+1190911.197312880
I0911 14:24:05.258627       1 conditions.go:96] Setting lastTransitionTime for Issuer "ziti-controller-ctrl-plane-intermediate-issuer" condition "Ready" to 2023-09-11 14:24:05.258619324 +0000 UTC m=+1190911.211003585
I0911 14:24:05.258916       1 conditions.go:96] Setting lastTransitionTime for Issuer "ziti-controller-edge-signer-issuer" condition "Ready" to 2023-09-11 14:24:05.258908914 +0000 UTC m=+1190911.211293168
I0911 14:24:05.259518       1 conditions.go:96] Setting lastTransitionTime for Issuer "ziti-controller-web-root-issuer" condition "Ready" to 2023-09-11 14:24:05.259513372 +0000 UTC m=+1190911.211897629
I0911 14:24:05.260024       1 conditions.go:96] Setting lastTransitionTime for Issuer "ziti-controller-selfsigned-ca-issuer" condition "Ready" to 2023-09-11 14:24:05.260019392 +0000 UTC m=+1190911.212403645
I0911 14:24:05.267891       1 conditions.go:96] Setting lastTransitionTime for Issuer "ziti-controller-web-intermediate-issuer" condition "Ready" to 2023-09-11 14:24:05.267884366 +0000 UTC m=+1190911.220268618
I0911 14:24:05.272447       1 conditions.go:96] Setting lastTransitionTime for Issuer "ziti-controller-ctrl-plane-root-issuer" condition "Ready" to 2023-09-11 14:24:05.272440879 +0000 UTC m=+1190911.224825136
I0911 14:24:05.293013       1 conditions.go:96] Setting lastTransitionTime for Issuer "ziti-controller-edge-root-issuer" condition "Ready" to 2023-09-11 14:24:05.293006998 +0000 UTC m=+1190911.245391250

Running kubectl -n ziti-controller get certificates shows this:

NAME                                           READY   SECRET                                           AGE
ziti-controller-edge-root-cert                 True    ziti-controller-edge-root-secret                 4m59s
ziti-controller-ctrl-plane-root-cert           True    ziti-controller-ctrl-plane-root-secret           4m59s
ziti-controller-web-root-cert                  True    ziti-controller-web-root-secret                  4m59s
ziti-controller-ctrl-plane-identity            True    ziti-controller-ctrl-plane-identity-secret       4m59s
ziti-controller-web-intermediate-cert          True    ziti-controller-web-intermediate-secret          4m59s
ziti-controller-admin-client-cert              True    ziti-controller-admin-client-secret              4m59s
ziti-controller-edge-signer-cert               True    ziti-controller-edge-signer-secret               4m59s
ziti-controller-ctrl-plane-intermediate-cert   True    ziti-controller-ctrl-plane-intermediate-secret   4m59s
ziti-controller-web-identity-cert              True    ziti-controller-web-identity-secret              4m59s

Running kubectl -n ziti-controller get pods shows the init container hanging:

NAME                               READY   STATUS     RESTARTS   AGE
ziti-controller-7df94d7f89-vr2d2   0/1     Init:0/1   0          19m

Running kubectl -n ziti-controller describe pods shows this at the bottom:

  Warning  FailedMount  101s (x15 over 16m)  kubelet            MountVolume.SetUp failed for volume "ziti-controller-ctrl-plane-cas" : configmap "ziti-controller-ctrl-plane-cas" not found
  Warning  FailedMount  17s (x7 over 13m)    kubelet            Unable to attach or mount volumes: unmounted volumes=[ziti-controller-ctrl-plane-cas], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition

Looking at the trust-manager logs, it seems to be unhappy as well:

E0911 14:24:05.152037       1 bundle.go:165] trust/bundle "msg"="bundle source was not found" "error"="failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "bundle"="ziti-controller-ctrl-plane-cas"
I0911 14:24:05.152146       1 recorder.go:103] trust/manager/events "msg"="Bundle source was not found: failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "object"={"kind":"Bundle","name":"ziti-controller-ctrl-plane-cas","uid":"ab30529f-97be-4d71-96f6-46cc37f3dedf","apiVersion":"trust.cert-manager.io/v1alpha1","resourceVersion":"6556542"} "reason"="SourceNotFound" "type"="Warning"
I0911 14:24:05.152159       1 recorder.go:103] trust/manager/events "msg"="Bundle source was not found: failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "object"={"kind":"Bundle","name":"ziti-controller-ctrl-plane-cas","uid":"ab30529f-97be-4d71-96f6-46cc37f3dedf","apiVersion":"trust.cert-manager.io/v1alpha1","resourceVersion":"6556542"} "reason"="SourceNotFound" "type"="Warning"
E0911 14:24:05.160581       1 bundle.go:165] trust/bundle "msg"="bundle source was not found" "error"="failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "bundle"="ziti-controller-ctrl-plane-cas"
I0911 14:24:05.160692       1 recorder.go:103] trust/manager/events "msg"="Bundle source was not found: failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "object"={"kind":"Bundle","name":"ziti-controller-ctrl-plane-cas","uid":"ab30529f-97be-4d71-96f6-46cc37f3dedf","apiVersion":"trust.cert-manager.io/v1alpha1","resourceVersion":"6556545"} "reason"="SourceNotFound" "type"="Warning"
I0911 14:24:05.160708       1 recorder.go:103] trust/manager/events "msg"="Bundle source was not found: failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "object"={"kind":"Bundle","name":"ziti-controller-ctrl-plane-cas","uid":"ab30529f-97be-4d71-96f6-46cc37f3dedf","apiVersion":"trust.cert-manager.io/v1alpha1","resourceVersion":"6556545"} "reason"="SourceNotFound" "type"="Warning"

@qrkourier
Copy link
Member

Thanks for supplying the logs! Working backward from the controller pod waiting for its volume ziti-controller-ctrl-plane-cas, which is provided by the Bundle resource composed by trust-manager, we can see that trust-manager is failing to find the K8s secret ziti-controller-ctrl-plane-identity-secret. That's a secret resource that was already composed by cert-manager, so there's some problem with visibility by trust-manager.

I suspect the trust-manager "trust namespace" (link to doc) is not aligned perfectly. This must be set to the K8s namespace where cert-manager is composing the secrets and configmaps that are sourced by trust-manager.

The trust-manager trust namespace is configured with Helm value .trust-manager.app.trust.namespace, and the default value is "ziti." I see that you chose namespace "ziti-controller," so that's probably the issue!

This raises the question, "How can we detect this misalignment or, ideally, ensure the alignment automatically?"

@TheDarkula
Copy link
Contributor Author

@qrkourier Sure thing!

I just tried re-installing the chart with the namespace set to ziti, but the pod is still failing:

  Warning  FailedMount  6m22s (x6 over 6m38s)  kubelet            MountVolume.SetUp failed for volume "cert-ctrl-plane-identity" : secret "ziti-controller-ctrl-plane-identity-secret" not found
  Warning  FailedMount  6m22s (x6 over 6m38s)  kubelet            MountVolume.SetUp failed for volume "cert-web-identity" : secret "ziti-controller-web-identity-secret" not found
  Warning  FailedMount  6m22s (x6 over 6m38s)  kubelet            MountVolume.SetUp failed for volume "cert-edge-signer" : secret "ziti-controller-edge-signer-secret" not found
  Warning  FailedMount  26s (x11 over 6m38s)   kubelet            MountVolume.SetUp failed for volume "ziti-controller-ctrl-plane-cas" : configmap "ziti-controller-ctrl-plane-cas" not found

Here is the output of kubectl -n ziti get secrets:

NAME                                             TYPE                 DATA   AGE
ziti-controller-admin-secret                     Opaque               2      9m53s
sh.helm.release.v1.ziti-controller.v1            helm.sh/release.v1   1      9m53s
ziti-controller-web-root-secret                  kubernetes.io/tls    3      9m49s
ziti-controller-ctrl-plane-root-secret           kubernetes.io/tls    3      9m49s
ziti-controller-edge-root-secret                 kubernetes.io/tls    3      9m47s
ziti-controller-ctrl-plane-intermediate-secret   kubernetes.io/tls    3      9m45s
ziti-controller-web-intermediate-secret          kubernetes.io/tls    3      9m45s
ziti-controller-ctrl-plane-identity-secret       kubernetes.io/tls    3      9m28s
ziti-controller-web-identity-secret              kubernetes.io/tls    3      9m28s
ziti-controller-edge-signer-secret               kubernetes.io/tls    3      9m28s
ziti-controller-admin-client-secret              kubernetes.io/tls    3      8m48s

The trust-manager logs say this:

E0911 18:28:35.596928       1 bundle.go:165] trust/bundle "msg"="bundle source was not found" "error"="failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "bundle"="ziti-controller-ctrl-plane-cas"
I0911 18:28:35.597057       1 recorder.go:103] trust/manager/events "msg"="Bundle source was not found: failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "object"={"kind":"Bundle","name":"ziti-controller-ctrl-plane-cas","uid":"ba8da9ac-27ce-4f19-b1c1-4df38097c7ac","apiVersion":"trust.cert-manager.io/v1alpha1","resourceVersion":"6584997"} "reason"="SourceNotFound" "type"="Warning"
I0911 18:28:35.597062       1 recorder.go:103] trust/manager/events "msg"="Bundle source was not found: failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "object"={"kind":"Bundle","name":"ziti-controller-ctrl-plane-cas","uid":"ba8da9ac-27ce-4f19-b1c1-4df38097c7ac","apiVersion":"trust.cert-manager.io/v1alpha1","resourceVersion":"6584997"} "reason"="SourceNotFound" "type"="Warning"
E0911 18:28:35.606260       1 bundle.go:165] trust/bundle "msg"="bundle source was not found" "error"="failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "bundle"="ziti-controller-ctrl-plane-cas"
I0911 18:28:35.606385       1 recorder.go:103] trust/manager/events "msg"="Bundle source was not found: failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "object"={"kind":"Bundle","name":"ziti-controller-ctrl-plane-cas","uid":"ba8da9ac-27ce-4f19-b1c1-4df38097c7ac","apiVersion":"trust.cert-manager.io/v1alpha1","resourceVersion":"6585001"} "reason"="SourceNotFound" "type"="Warning"
I0911 18:28:35.606475       1 recorder.go:103] trust/manager/events "msg"="Bundle source was not found: failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "object"={"kind":"Bundle","name":"ziti-controller-ctrl-plane-cas","uid":"ba8da9ac-27ce-4f19-b1c1-4df38097c7ac","apiVersion":"trust.cert-manager.io/v1alpha1","resourceVersion":"6585001"} "reason"="SourceNotFound" "type"="Warning"

@qrkourier
Copy link
Member

I'm unsure why the resources that clearly do exist are not found. Barring any cluster-level security measures, I suspect an inconsistent state could have been caused by changing the namespace at the time of a Helm release upgrade.

Do you get a different result if you first delete the Helm release then install fresh with an aligned namespace?

You're installing trust-manager and cert-manager as sub-charts of ziti-controller, or installing one or both of those separately in advance of ziti-controller?

Are you installing the trust-manager or cert-manager CRDs separately in advance, or as part of their respective chart releases?

If you're unsure, and using ziti-controller chart defaults, then I'll be able to reason it out.

@TheDarkula
Copy link
Contributor Author

@qrkourier I just ran helm -n ziti uninstall ziti-controller, followed by kubectl delete ns ziti to be sure.

I then re-ran helm upgrade --install ziti-controller openziti/ziti-controller --namespace ziti --create-namespace --values values-controller.yaml

I installed cert-manager and trust-manager ahead of time, not as a sub-chart of the controller chart.

As far as the errors shown when describing the controller pod, the secrets definitely do exist, but it also complains about one configmap:

MountVolume.SetUp failed for volume "ziti-controller-ctrl-plane-cas" : configmap "ziti-controller-ctrl-plane-cas" not found

Looking at the output from kubectl -n ziti get cm shows that ziti-controller-ctrl-plane-cas is not created:

NAME                     DATA   AGE
kube-root-ca.crt         1      4m43s
ziti-controller-config   4      4m43s

I had the thought that the initContainer was hanging, and not retrying, so I deleted the pod.
Then, describing the pod shows this:

  Warning  FailedMount  60s (x8 over 2m4s)  kubelet            MountVolume.SetUp failed for volume "ziti-controller-ctrl-plane-cas" : configmap "ziti-controller-ctrl-plane-cas" not found
  Warning  FailedMount  1s                  kubelet            Unable to attach or mount volumes: unmounted volumes=[ziti-controller-ctrl-plane-cas], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition

Something tells me that the initContainer needs remedying.
Looking at the deployment.yaml, I see that it is running ziti-controller-init.bash. Where is the Dockerfile/repository for the controller?

@qrkourier
Copy link
Member

ziti-controller-ctrl-plane-cas should be created by trust-manager when it is able to source the secrets from the trust namespace that contain the CA certs for each PKI.

Here's the controller's Dockerfile.

@qrkourier
Copy link
Member

The controller's init container is waiting to start because that CM isn't available.

@TheDarkula
Copy link
Contributor Author

I understand. With that, it seems that the initContainer has something unhappy going on, as deleting the pod, thus forcing a restart, caused the errors regarding secrets to go away.

Thank you for the link to the Dockerfile, although I could not find where ziti-controller-init.bash was coming from.

Where is that file?

I have a feeling that there is some sort of hard pause or a failure to wait/retry going on there.

@qrkourier
Copy link
Member

Gack. I overlooked what you needed there. Yes, the controller's container image is relatively minimal. That initialization script is provided by the Helm chart here: https://github.com/openziti/helm-charts/blob/main/charts/ziti-controller/templates/configmap.yaml#L9

The logic simply checks whether the controller's BBolt DB file exists and runs the initialization procedure to create it if not.

The way the controller's pod is specified, the init container must succeed before the app container(s) start.

Are you saying you deleted the controller pod that was seemingly stuck at the init state and then both ran successfully?

@TheDarkula
Copy link
Contributor Author

No worries :)

Yes, I deleted the controller pod when it was complaining about the secrets.
That allowed the secrets bit to clear, but then is still stuck waiting for the ziti-controller-ctrl-plane-cas configMap.

@qrkourier
Copy link
Member

That means the secrets-related warning messages like this occurred during the first run of the controller pod, and were resolved before the second run that happened when you deleted the first pod.

  Warning  FailedMount  6m22s (x6 over 6m38s)  kubelet            MountVolume.SetUp failed for volume "cert-ctrl-plane-identity" : secret "ziti-controller-ctrl-plane-identity-secret" not found

This lends credibility to the hypothesis that the problem is still trust-manager can't "see" the secrets that it needs to compose the Bundle resource that creates the ziti-controller-ctrl-plane-cas ConfigMap.

Let's look closer at that.

Here's a warning from trust-manager that represents the problem:

E0911 18:28:35.606260       1 bundle.go:165] trust/bundle "msg"="bundle source was not found" "error"="failed to retrieve bundle from source: Secret \"ziti-controller-ctrl-plane-identity-secret\" not found" "bundle"="ziti-controller-ctrl-plane-cas"

Does it exist in the "ziti" namespace?

kubectl -n ziti get secret ziti-controller-ctrl-plane-identity-secret    

Is "ziti" trust-manager's trust namespace?

kubectl -n ziti get pods --selector app=trust-manager --output go-template='{{range .items}}{{range .spec.containers}}{{.args}} {{end}}{{end}}'

@TheDarkula
Copy link
Contributor Author

The secret does exist. kubectl -n ziti get secret ziti-controller-ctrl-plane-identity-secret:

NAME                                         TYPE                DATA   AGE
ziti-controller-ctrl-plane-identity-secret   kubernetes.io/tls   3      21h

However, the second output is blank, because I have trust-manager deployed to the trust-manager namespace.
As far as how I have it deployed, I have not modified this section from the values.yaml:

  trust:
    # -- Namespace used as trust source. Note that the namespace _must_ exist
    # before installing trust-manager.
    namespace: cert-manager

From my understanding of the link you provided, if I want to use trust-manager with multiple deployments, leaving the trust namespace set to cert-manager would be a centralised solution.

The only other way I could see it working, is if I create multiple trust-manager namespaces, like trust-manager-ziti, and modify the values.yaml to have the trust namespace match each deployment that would use it.

For example:

  • Helm deployment "first" links to the trust-manager-first namespace, which has one trust manager deployed to it
  • Helm deployment "second" links to the trust-manager-second namespace, which has another trust manager deployed to it

This seems a bit redundant, but I am not sure if that is the recommended way to configure everything.

Would it be more logical to have the ziti-controller chart use the cert-manager namespace by default/be configurable?

@qrkourier
Copy link
Member

Does it show us the trust namespace arg if you give it the correct namespace?

kubectl -n trust-manager get pods --selector app=trust-manager --output go-template='{{range .items}}{{range .spec.containers}}{{.args}} {{end}}{{end}}'

@TheDarkula
Copy link
Contributor Author

Yes, it does, kubectl -n trust-manager get pods --selector app=trust-manager --output go-template='{{range .items}}{{range .spec.containers}}{{.args}} {{end}}{{end}}'

[--log-level=1 --metrics-port=9402 --readiness-probe-port=6060 --readiness-probe-path=/readyz --trust-namespace=cert-manager --webhook-host=0.0.0.0 --webhook-port=6443 --webhook-certificate-dir=/tls --default-package-location=/packages/cert-manager-package-debian.json]

@qrkourier
Copy link
Member

The issue is that trust-manager is only able to source secrets from trusted namespace "cert-manager," but the necessary secrets are in namespace "ziti."

Thanks for raising the question "How should cert/trust-manager be aligned for concurrent OpenZiti namespaces?" I haven't personally explored that deeply yet. The main constraint I've encountered is that trust manager can only have one "trust namespace," and I've engaged the devs to express the need for configurable multiple trust namespaces. As far as I know, that's on their backlog. Meanwhile, we could have a separate discussion to explore multiple instances of trust-manager, one per OpenZiti namespace, assuming the Helm release names are uniquely named and that trust-manager's Cluster-level resources, adopt the Helm release name to ensure uniqueness.

@TheDarkula
Copy link
Contributor Author

Indeed!

Sure thing :)
The elegant solution would be for trust-manager to have a section similar to the traefik chart:

  kubernetesIngress:
    namespaces:
      - "cert-manager"
      - "ziti"
      - "more-namespaces"

What seems to be most logical for the ziti-controller chart, is to have two different configurations:

  1. Using the trust-manager sub-chart, which then can default to using the ziti namespace
  2. Using a pre-installed instance of trust-manager, which would then take the trust namespace as an input.

Something like this:

trust-manager:
  # -- install the trust-manager subchart to provide CRD Bundle
  subchart-enabled: false
  app:
    trust:
      # -- trust-manager needs to be configured to trust the namespace in which
      # the controller is deployed so that it will create the Bundle resource
      # for the ctrl plane trust bundle
      preinstalled-namespace: cert-manager
      subchart-namespace: ziti

@qrkourier
Copy link
Member

Helm is limited in its ability to check that an arbitrary set of input values expresses a sane configuration. It's still worthwhile to ensure that the default values are sane together, but things get complicated from there.

For this reason, it's necessary to have oversight or orchestration or expertise at some level that's higher than Helm. I've experimented with using Terraform to both document and automate feeding a sane set of inputs to the Helm chart. That worked pretty well, but I'm still hoping to arrive at something that feels more satisfying and acceptable to a wider audience.

Hypothetically, maintaining separate charts for each sane set of default input values would work, but it would also be hard work to rectify drift between them, so it would eventually demand some kind of nanny/parent system that renders each variant. That doesn't feel like the right direction.

A more appealing alternative is Kustomize, which can patch the rendered templates. I've fiddled with Kustomize, which is built-in to kubectl IIRC, but haven't delved deeply enough to be confident it's expressive enough to cross-check characteristics spanning the Helm release. A hypothetical workflow:

  1. Helm renders templates to temporary storage as K8s resource manifests
  2. Kustomize patches manifests according to target env and other arbitrary variables
  3. Patched manifests are written to a hot dir for a Gitops-driven deployment scheme

@qrkourier
Copy link
Member

The top-level trust-manager map in the controller chart's values.yaml is consumed by the trust-manager chart, not the controller chart. Since we don't control or modify that chart currently, there's no way to inject logic in the trust-manager chart like "let trust-manager namespace match the ziti-controller namespace" or invent new input values like subchart-namespace.

Kustomize would work well in this case when trust-manager is installed as a sub-chart. Helm could render the chart and sub-chart templates to manifests which can then be patched by Kustomize to ensure alignment.

We're currently pinning the upstream version of trust-manager, but another option would be to maintain a fork that requires a custom input value like the ziti-controller namespace, and ensures that it matches the trust namespace.

@TheDarkula
Copy link
Contributor Author

Thank you for all of the information!
I think the best solution at the moment is to just change the default trust namespace to ziti.
So you know, I tried that, and the pod does come up happily :)

With that, I do have one more query. How can I specify hostPath storage for persistence?

@qrkourier
Copy link
Member

Glad to hear that! Also: I'm happy to respond here and want to make sure you're aware of a rich resource: the Discourse forum: https://openziti.discourse.group/. We'd be happy to have you asking/answering over there too. A lot of those topics start out as symptoms or ideas and mature into GitHub Issues when they become sufficiently well-defined.

I'll start working on the hostPath answer. Would you be willing to post that question in the forum?

@TheDarkula
Copy link
Contributor Author

I will create an account momentarily :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants