cni-repair controller (#306) · linkerd/linkerd2-proxy-init@67cc03d

Commit

cni-repair controller (#306)

Fixes linkerd/linkerd2#11073

This fixes the issue of injected pods that cannot acquire proper network config because `linkerd-cni` and/or the cluster's network CNI haven't fully started. They are left in a permanent crash loop and once CNI is ready, they need to be restarted externally, which is what this controller does.

This controller "`linkerd-cni-repair-controller`" watches over events on pods in the current node, which have been injected but are in a terminated state and whose `linkerd-network-validator` container exited with code 95, and proceeds to delete them so they can restart with a proper network config.

The controller is to be deployed as an additional container in the `linkerd-cni` DaemonSet (addressed in linkerd/linkerd2#11699).

This exposes two custom counter metrics: `linkerd_cni_repair_controller_queue_overflow` (in the spirit of the destination controller's `endpoint_updates_queue_overflow`) and `linkerd_cni_repair_controller_deleted`

Loading branch information

alpeb committed Jan 2, 2024

1 parent 7417ddd commit 67cc03d

.dockerignore

-Original file line number
+Diff line change
@@ -1,5 +1,2 @@
-    Cargo.toml
-    Cargo.lock
     rust-toolchain
-    validator/
     target/

0 comments on commit `67cc03d`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `67cc03d`

Commit

There are no files selected for viewing

0 comments on commit 67cc03d

0 comments on commit `67cc03d`