Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ziti router run error with new release 0.27.3 #993

Closed
qrkourier opened this issue Feb 3, 2023 · 12 comments
Closed

ziti router run error with new release 0.27.3 #993

qrkourier opened this issue Feb 3, 2023 · 12 comments

Comments

@qrkourier
Copy link
Member

I started seeing an error from ziti router run with the upgrade to 0.27.3. With the same config I can roll back the binary to 0.27.2 as a workaround.

It appears that in the new release ziti router run has a new behavior that attempts to open a file named endpoints in the same directory as the router's config YAML file. I couldn't find any notes about this in the changelog, and I didn't find any interesting occurrences of "open" or "endpoints" in ziti/ziti/router.go.

[   0.413]   DEBUG ziti/ziti/util.LogReleaseVersionCheck: this build of ziti-router is the latest release v0.27.3
[   0.415]    INFO ziti/ziti/router.run: {configFile=[/etc/ziti/ziti-router.yaml] go-version=[go1.19.5] os=[linux] arch=[amd64] version=[v0.27.3] build-date=[2023-02-03T19:53:13Z] routerId=[r-iy3in.bE] revision=[dd3fee35c889]} starting ziti-router
[   0.415]    INFO fabric/router/forwarder.(*Faulter).run: started
[   0.415]    INFO fabric/router/forwarder.(*Scanner).run: started
[   0.416]    INFO fabric/metrics.GoroutinesPoolMetricsConfigF.func1.1: {maxWorkers=[32] minWorkers=[0] poolType=[pool.link.dialer] idleTime=[30s] maxQueueSize=[1000]} starting goroutine pool
[   0.416]    INFO fabric/metrics.GoroutinesPoolMetricsConfigF.func1.1: {idleTime=[30s] maxQueueSize=[1000] maxWorkers=[128] poolType=[pool.route.handler] minWorkers=[0]} starting goroutine pool
[   0.416] WARNING edge/router/internal/edgerouter.(*Config).LoadConfigFromMap: Invalid heartbeat interval [0] (min: 60, max: 10), setting to default [60]
[   0.416]   DEBUG edge/router/internal/apiproxy.Start: API Proxy disabled
[   0.416]   FATAL ziti/ziti/router.run: {error=[open /etc/ziti/endpoints: permission denied]} error starting

router config

v: 3
identity:
  cert: ${ZITI_ROUTER_IDENTITY_DIR}/client.crt
  server_cert: ${ZITI_ROUTER_IDENTITY_DIR}/tls.crt
  key: ${ZITI_ROUTER_IDENTITY_DIR}/tls.key
  ca: ${ZITI_ROUTER_IDENTITY_DIR}/ca.crt
ctrl:
  endpoint: tls:ctrl1.k3s.mira.qrk.us:30262
link:
  dialers:
    - binding: transport
listeners:
  - binding: edge
    address: tls:0.0.0.0:1290
    options:
      advertise: edge1.k3s.mira.qrk.us:1290
      connectTimeoutMs: 1000
      getSessionTimeout: 60
  - binding: tunnel
    options:
      mode: host
edge:
  csr:
    sans:
      dns:
        - localhost
        - edge1.k3s.mira.qrk.us
        - edge1.k3s.mira.qrk.us
      ip:
        - 127.0.0.1
forwarder:
  latencyProbeInterval: 10
  xgressDialQueueLength: 1000
  xgressDialWorkerCount: 128
  linkDialQueueLength: 1000
  linkDialWorkerCount: 32
@qrkourier
Copy link
Member Author

qrkourier commented Feb 3, 2023

The value of env var ZITI_ROUTER_IDENTITY_DIR is /etc/ziti/identity.

@qrkourier
Copy link
Member Author

The full command line was ziti router run /etc/ziti/ziti-router.yaml.

@qrkourier
Copy link
Member Author

Providing a writeable directory seems to have appeased ziti router run.

bash-4.4$ ls -ld /etc/ziti/endpoints/
drwxrwxrwx 2 root root 4096 Feb  3 18:08 /etc/ziti/endpoints/

@plorenz
Copy link
Member

plorenz commented Feb 3, 2023

This file is how routers will keep track changing nodes in an HA cluster. We probably don't need to create it in a non-HA mode. Let me look into it

@plorenz
Copy link
Member

plorenz commented Feb 3, 2023

Is the container read-only? The config file itself can be updated if the controller is advertising a new address even in non-HA mode

@qrkourier
Copy link
Member Author

qrkourier commented Feb 3, 2023

Right, the config and identities are mounted read-only from Kubernetes resources. This allows their data to be declared by Helm. I can add a storage dependency, but it would be good to characterize the storage in terms of capacity, durability, IO. It sounds like {config dir}/endpoints is a single, regular file that stores transient state.

As for the writeable router config file, I'm not sure how to handle that. It doesn't fit into the MO of declarative config mgmt. We can brainstorm on it if you like. There's a Kubernetes way but maybe it needs to be a configurable behavior. Will the router FATAL error if it can't clobber its own config file?

@qrkourier
Copy link
Member Author

I would like to understand the purpose of the address update event you mentioned. This allows the router control plane to declare a new router advertisement, or is this to allow the router control plane to communicate a change of its endpoint address to the router? I strongly suspect it is the latter and is intended to facilitate a change of the global domain name of the router control plane endpoint.

Is this event that needs to be persisted as a router configuration change representative of a broader category of events, or is it more of an anomaly? If it's a one-off, then it might make more sense for the Kubernetes users to think about this problem in terms of tightly-coupled Ziti controller+Ziti router configs instead of handling Ziti control plane events. An example of such a coupled configuration is Helm umbrella chart that would deploy both controller and router at the same time with the same values. In that scenario, the only way to change the controller's advertised address is to change the value that is consumed by both controller and router, and allow Kubernetes to propagate the change.

Here's a sketch of a design to minimally change the router while adding a Kubernetes Controller process to handle the router's events that need to be persisted as configuration changes.

ziti-kontroller (2)

sequenceDiagram
    loop Ziti Router ConfigMap Controller control loop
        router->>router.yml: read file at startup
        ziti ctrl plane->>k8s-ctrl: event: new Ziti controller control plane endpoint
        Note right of ziti ctrl plane: ignored by the read-only router
        k8s-ctrl->>router.yml: update configmap
    end
Loading

The ConfigMap Controller could also delete the router pod to begin a new process life cycle to load the updated config, but a frequent reloading or signaling behavior would be less intrusive.

@plorenz
Copy link
Member

plorenz commented Feb 13, 2023

There's a fix for openziti/fabric#601 which will prevent the endpoints file from being written as long as there are no changes. However, for proper functioning in an HA environment where controllers can come and go, the image will need to allow the endpoints file be created and writable.

@qrkourier
Copy link
Member Author

@plorenz That's still doable, and I'd still like to specify the runtime storage needs in terms of durability and capacity, at least. Is a small ephemeral volume sufficient or will the router expect to find state in {config dir}/endpoints file in a subsequent process life cycle, and therefore require persistence across restarts?

@plorenz
Copy link
Member

plorenz commented Feb 16, 2023

@plorenz That's still doable, and I'd still like to specify the runtime storage needs in terms of durability and capacity, at least. Is a small ephemeral volume sufficient or will the router expect to find state in {config dir}/endpoints file in a subsequent process life cycle, and therefore require persistence across restarts?

Ideally it would be persistent. The file is only used to persist the information across restarts.

@qrkourier
Copy link
Member Author

What is a reasonable capacity for the storage volume where the endpoints file is written? I've written the Kubernetes deployments to default to a 50MiB persistent volume.

@plorenz
Copy link
Member

plorenz commented Feb 21, 2023

What is a reasonable capacity for the storage volume where the endpoints file is written? I've written the Kubernetes deployments to default to a 50MiB persistent volume.

Endpoints file should generally be tiny (under 1k in size). It's only listing controller endpoints. You'd need a crazy number of controllers (or very long dns names) to make that file large.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants