Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ NOTE: This step assumes that the only `IsovalentEgressGatewayPolicy` resources o
----
kubectl get isovalentegressgatewaypolicy -l argocd.argoproj.io/instance=cilium -oyaml | \
yq '.items[] |
"kubectl --as=cluster-admin annotate namespace \(.metadata.name) cilium.syn.tools/egress-ip="
"kubectl --as=system:admin annotate namespace \(.metadata.name) cilium.syn.tools/egress-ip="
+ .metadata.annotations["cilium.syn.tools/egress-ip"]
' | \
bash <1>
Expand All @@ -93,5 +93,5 @@ NOTE: This step assumes that the only `IsovalentEgressGatewayPolicy` resources o
+
[source,bash]
----
kubectl --as=cluster-admin label isovalentegressgatewaypolicy -l argocd.argoproj.io/instance=cilium argocd.argoproj.io/instance-
kubectl --as=system:admin label isovalentegressgatewaypolicy -l argocd.argoproj.io/instance=cilium argocd.argoproj.io/instance-
----
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@ include::partial$runbooks/known_ebpf_maps.adoc[]
NODE=<node name of affected node> <1>
AGENT_POD=$(kubectl -n cilium get pods --field-selector=spec.nodeName=$NODE \
-l app.kubernetes.io/name=cilium-agent -oname)
kubectl -n cilium exec -it $AGENT_POD --as=cluster-admin -- cilium status <2>
kubectl -n cilium exec -it $AGENT_POD --as=cluster-admin -- cilium status --verbose <3>
kubectl -n cilium exec -it $AGENT_POD --as=system:admin -- cilium status <2>
kubectl -n cilium exec -it $AGENT_POD --as=system:admin -- cilium status --verbose <3>
kubectl -n cilium logs $AGENT_POD --tail=50 <4>
----
<1> The node indicated in the alert
<2> `--as=cluster-admin` is required on VSHN managed clusters
<2> `--as=system:admin` is required on VSHN managed clusters
<2> Show the agent status on the node
<3> Show verbose agent status on the node.
In this output, you may see details about eBPF sync jobs which have errors.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,9 @@ First, check the source cluster's overall cluster mesh status

[source,bash]
----
cilium -n cilium clustermesh status --as=cluster-admin <1>
cilium -n cilium clustermesh status --as=system:admin <1>
----
<1> `--as=cluster-admin` is required on VSHN Managed OpenShift, may need to be left out on other clusters.
<1> `--as=system:admin` is required on VSHN Managed OpenShift, may need to be left out on other clusters.

If the output indicates that all nodes are unable to connect to the remote cluster's clustermesh API, it's likely that the issue is either on the remote cluster, or in the network between the clusters.

Expand All @@ -52,25 +52,25 @@ NODE=<node name of affected node> <1>
AGENT_POD=$(kubectl -n cilium get pods --field-selector=spec.nodeName=$NODE \
-l app.kubernetes.io/name=cilium-agent -oname)

kubectl -n cilium exec -it $AGENT_POD --as=cluster-admin -- cilium status <2>
kubectl -n cilium exec -it $AGENT_POD --as=cluster-admin -- cilium troubleshoot clustermesh <3>
kubectl -n cilium exec -it $AGENT_POD --as=system:admin -- cilium status <2>
kubectl -n cilium exec -it $AGENT_POD --as=system:admin -- cilium troubleshoot clustermesh <3>
----
<1> Set this to the name of an affected node's `Node` object
<2> Show a summary of the Cilium agent status.
You should see in the output of this command whether the agent can't reach one or more of the remote cluster's nodes.
<3> This command will show connection details to the remote cluster's cluster mesh API server or the local cache in case you're using KVStoreMesh.

TIP: `--as=cluster-admin` may need to be left out on some clusters.
TIP: `--as=system:admin` may need to be left out on some clusters.

If the output of `cilium troubleshoot clustermesh` refers to the local cluster's cluster mesh API server, it's likely that you're using KVStoreMesh.
In that case you can check the KVStoreMesh connection to the remote cluster mesh API server in the `clustermesh-apiserver` deployment:

[source,bash]
----
kubectl -n cilium --as=cluster-admin exec -it deploy/clustermesh-apiserver -c kvstoremesh -- \
kubectl -n cilium --as=system:admin exec -it deploy/clustermesh-apiserver -c kvstoremesh -- \
clustermesh-apiserver kvstoremesh-dbg status <1>

kubectl exec -it -n cilium --as=cluster-admin deploy/clustermesh-apiserver -c kvstoremesh -- \
kubectl exec -it -n cilium --as=system:admin deploy/clustermesh-apiserver -c kvstoremesh -- \
clustermesh-apiserver kvstoremesh-dbg troubleshoot <2>
----
<1> Show a connection summary of the KVStoreMesh
Expand All @@ -80,7 +80,7 @@ You can also run `cilium-health status --probe` in the agent pod to actively pro

[source,bash]
----
kubectl -n cilium exec -it $AGENT_POD --as=cluster-admin -- cilium-health status --probe
kubectl -n cilium exec -it $AGENT_POD --as=system:admin -- cilium-health status --probe
----

include::partial$runbooks/check-node-routing-tables.adoc[]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ First, check the source cluster's overall cluster mesh status

[source,bash]
----
cilium -n cilium clustermesh status --as=cluster-admin <1>
cilium -n cilium clustermesh status --as=system:admin <1>
----
<1> `--as=cluster-admin` is required on VSHN Managed OpenShift, may need to be left out on other clusters.
<1> `--as=system:admin` is required on VSHN Managed OpenShift, may need to be left out on other clusters.

include::partial$runbooks/investigating-clustermesh-api.adoc[]

Expand All @@ -40,10 +40,10 @@ You can check the KVStoreMesh connection to the remote cluster mesh API server i

[source,bash]
----
kubectl -n cilium --as=cluster-admin exec -it deploy/clustermesh-apiserver -c kvstoremesh -- \
kubectl -n cilium --as=system:admin exec -it deploy/clustermesh-apiserver -c kvstoremesh -- \
clustermesh-apiserver kvstoremesh-dbg status <1>

kubectl exec -it -n cilium --as=cluster-admin deploy/clustermesh-apiserver -c kvstoremesh -- \
kubectl exec -it -n cilium --as=system:admin deploy/clustermesh-apiserver -c kvstoremesh -- \
clustermesh-apiserver kvstoremesh-dbg troubleshoot <2>
----
<1> Show a connection summary of the KVStoreMesh
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ For setups which use static routes to make the nodes of the clusters participati
----
NODE=<node name of affected node>
REMOTE_NODE=<ip of a node in the remote cluster>
oc -n syn-debug-nodes debug node/${NODE} --as=cluster-admin -- chroot /host ip r
oc -n syn-debug-nodes debug node/${NODE} --as=cluster-admin -- chroot /host ping -c4 ${REMOTE_NODE}
oc -n syn-debug-nodes debug node/${NODE} --as=system:admin -- chroot /host ip r
oc -n syn-debug-nodes debug node/${NODE} --as=system:admin -- chroot /host ping -c4 ${REMOTE_NODE}
----

.Other K8s
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@ TIP: Add a section below if you're debugging a map for which there's no info yet
NODE=<node name of affected node> <1>
AGENT_POD=$(kubectl -n cilium get pods --field-selector=spec.nodeName=$NODE \
-l app.kubernetes.io/name=cilium-agent -oname)
kubectl -n cilium exec -it $AGENT_POD --as=cluster-admin -- cilium-dbg policy selectors <2>
kubectl -n cilium exec -it $AGENT_POD --as=system:admin -- cilium-dbg policy selectors <2>
----
<1> The node indicated in the alert
<2> `--as=cluster-admin` is required on VSHN managed clusters
<2> `--as=system:admin` is required on VSHN managed clusters
<2> List the Cilium policy selectors (including matched endpoint IDs) that need to be deployed on the node.

. Check output for any policies that match a large amount of endpoints and investigate if you can tune the associated network policy to reduce the amount of matched endpoints.
Expand Down