diff --git a/content/posts/2025-02-14-cli_use_cases.md b/content/posts/2025-02-14-cli_use_cases.md new file mode 100644 index 0000000..98149f7 --- /dev/null +++ b/content/posts/2025-02-14-cli_use_cases.md @@ -0,0 +1,374 @@ +--- +layout: :theme/post +title: "Network Observability On Demand Use Cases" +description: Command line interface usage for concrete scenarios +tags: CLI,Monitoring,Troubleshooting +authors: [jpinsonneau] +--- + +# Network Observability On Demand use cases + +_Thanks to: Mohamed Mahmoud, Joël Takvorian, Sara Thomas and Amogh Rameshappa Devapura for reviewing_ + + +If you haven't read it yet, take a look at [CLI 1.8 update](../network-observability-on-demand-1-8-update/) article to list the new introduced features in release 1.8. + +For each of the following use cases, you must have the `netobserv CLI` installed and you need to be connected to your cluster using: +```sh +oc login --username --password +``` + +To observe what the CLI deploys on your cluster, you can run the `oc events -n netobserv-cli -w` command to watch all the events happening in the `netobserv-cli` namespace. + +The result look like: +```sh +LAST SEEN TYPE REASON OBJECT MESSAGE +0s Normal SuccessfulCreate DaemonSet/netobserv-cli Created pod: netobserv-cli-t2vlr +0s Normal Scheduled Pod/netobserv-cli-t2vlr Successfully assigned netobserv-cli/netobserv-cli-t2vlr to ip-10-0-1-202.ec2.internal +0s Normal SuccessfulCreate DaemonSet/netobserv-cli Created pod: netobserv-cli-hlmxx +0s Normal Scheduled Pod/netobserv-cli-hlmxx Successfully assigned netobserv-cli/netobserv-cli-hlmxx to ip-10-0-1-220.ec2.internal +0s Normal Pulling Pod/netobserv-cli-t2vlr Pulling image "quay.io/netobserv/netobserv-ebpf-agent:main" +0s Normal Pulling Pod/netobserv-cli-hlmxx Pulling image "quay.io/netobserv/netobserv-ebpf-agent:main" +0s Normal Pulled Pod/netobserv-cli-hlmxx Successfully pulled image "quay.io/netobserv/netobserv-ebpf-agent:main" in 2.049s (2.049s including waiting) +0s Normal Created Pod/netobserv-cli-hlmxx Created container netobserv-cli +0s Normal Started Pod/netobserv-cli-hlmxx Started container netobserv-cli +0s Normal Pulled Pod/netobserv-cli-t2vlr Successfully pulled image "quay.io/netobserv/netobserv-ebpf-agent:main" in 5.376s (5.376s including waiting) +0s Normal Created Pod/netobserv-cli-t2vlr Created container netobserv-cli +0s Normal Started Pod/netobserv-cli-t2vlr Started container netobserv-cli +0s Normal Scheduled Pod/collector Successfully assigned netobserv-cli/collector to ip-10-0-1-220.ec2.internal +0s Normal AddedInterface Pod/collector Add eth0 [10.129.0.35/23] from ovn-kubernetes +0s Normal Pulling Pod/collector Pulling image "quay.io/netobserv/network-observability-cli:main" +0s Normal Pulled Pod/collector Successfully pulled image "quay.io/netobserv/network-observability-cli:main" in 1.724s (1.724s including waiting) +0s Normal Created Pod/collector Created container collector +0s Normal Started Pod/collector Started container collector +``` + +## North / South and East / West traffic +The CLI is able to read configurations from `cluster-config-v1` and `network` to identify **Machine**, **Pods**, and **Services** subnets using the `--get-subnets` option. This automatically add `SrcSubnetLabel` and `DstSubnetLabel` to your flows. + +You can see subnets being configured during the creation of the agents: +```sh +creating flow-capture agents: +opt: get_subnets, value: true +Found subnets: + Services: "172.30.0.0/16" + Pods: "10.128.0.0/14" + Machines: "10.0.0.0/16" +``` + +Once running, you can cycle to different views using the left / right arrow keys and change the displayed enrichment columns using the page up / down keys. +Also, to adapt to your screen height, you can increase / decrease the number of displayed flows using the up / down arrow keys. + +![subnets]({page.image('cli/subnets.png')}) + +You can live-filter this capture by typing Machines, Pods or Services keywords to only see what you look for here. + +However, if you want to capture only a subset of these flows, you can use the regexes filter on top such as in the following example: +```sh +oc netobserv flows --get-subnets --regexes=SrcSubnetLabel~Pods,DstSubnetLabel~Services +``` + +**WARNING: Running regexes filters means that all the flows are captured and enriched before applying this filter stage in the pipeline. To avoid performance impact on your cluster, use eBPF filters such as IPs, Ports and Protocol as most as possible.** + +The output is now only showing **Pods** to **Services** flows: +![pods subnets]({page.image('cli/pods-subnets.png')}) + + +## Connectivity check(s) between two endpoints + +Let's start with a simple case where you have a pod not able to reach an endpoint. We are using a simple nodejs sample app deployed in `connectivity-scenario` namespace for the demo. + +![pod]({page.image('cli/connectivity-scenario-pod.png')}) + +This could be related to many issues such as: +- DNS issue +- Policy or kernel drops +- Configuration issue (such as UDN) +- Timeouts + +Since we don't know what we are looking for yet, we should enable all the features using the options: +```sh +--enable_all +``` + +By clicking on the pod name, we can see that our current pod IP is `10.129.0.48`. To capture all the traffic going in and out of this pod, we use the filter: +```sh +--peer_ip=10.129.0.48 +``` + +Alternatively, you could also use the service port: +```sh +--port=3001 +``` + +Finally, you can add a node selector label on top: +```sh +--node-selector=kubernetes.io/hostname:my-node +``` + +**WARNING: Running the capture without filtering is also an option, but it is not recommended as it collects all the flows of the cluster. Depending of the size of your cluster, this could be a lot and make the collector pod crash.** + +All together, the command to run flow capture with all the features on our pod IP is: +```sh +oc netobserv flows --enable_all --peer_ip=10.131.0.19 +``` + +The script connects to your cluster and start deploying eBPF agents and collector pod: +```sh +Checking dependencies... +'yq' is up to date (version v4.43.1). +'bash' is up to date (version v5.2.26). +Setting up... +cluster-admin +creating netobserv-cli namespace +namespace/netobserv-cli created +creating service account +serviceaccount/netobserv-cli created +clusterrole.rbac.authorization.k8s.io/netobserv-cli unchanged +clusterrolebinding.rbac.authorization.k8s.io/netobserv-cli unchanged +creating collector service +service/collector created +creating flow-capture agents: +opt: pkt_drop_enable, value: true +opt: dns_enable, value: true +opt: rtt_enable, value: true +opt: network_events_enable, value: true +opt: udn_enable, value: true +opt: pkt_xlat_enable, value: true +opt: filter_peer_ip, value: 10.129.0.19 +daemonset.apps/netobserv-cli created +Waiting for daemon set "netobserv-cli" rollout to finish: 0 of 2 updated pods are available... +Waiting for daemon set "netobserv-cli" rollout to finish: 1 of 2 updated pods are available... +daemon set "netobserv-cli" successfully rolled out +Running network-observability-cli get-flows... +pod/collector created +pod/collector condition met +``` + +Once that done, it connects to the collector and display its output: +```sh +------------------------------------------------------------------------ + _ _ _ _ ___ _ ___ + | \| |___| |_ ___| |__ ___ ___ _ ___ __ / __| | |_ _| + | .' / -_) _/ _ \ '_ (_-