Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions _topic_maps/_topic_map.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1230,7 +1230,7 @@ Topics:
File: about-advertising-ipaddresspool
- Name: Configuring MetalLB BGP peers
File: metallb-configure-bgp-peers
- Name: Advertising an IP address pool using the community alias
- Name: Advertising an IP address pool using the community alias
File: metallb-configure-community-alias
- Name: Configuring MetalLB BFD profiles
File: metallb-configure-bfd-profiles
Expand Down Expand Up @@ -2047,7 +2047,7 @@ Topics:
- Name: Enabling features using FeatureGates
File: nodes-cluster-enabling-features
Distros: openshift-enterprise,openshift-origin
- Name: Improving cluster stability in high latency environments using worker latency profiles
- Name: Improving cluster stability in high latency environments using worker latency profiles
File: nodes-cluster-worker-latency-profiles
Distros: openshift-enterprise,openshift-origin
- Name: Remote worker nodes on the network edge
Expand Down Expand Up @@ -2271,6 +2271,9 @@ Topics:
- Name: Deploying distributed units at scale in a disconnected environment
File: ztp-deploying-disconnected
Distros: openshift-origin,openshift-enterprise
- Name: Requesting CRI-O and Kubelet profiling data using the Node Observability Operator
File: node-observability-operator
Distros: openshift-origin,openshift-enterprise
---
Name: Specialized hardware and driver enablement
Dir: hardware_enablement
Expand Down
97 changes: 97 additions & 0 deletions modules/node-observability-create-custom-resource.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
// Module included in the following assemblies:
//
// * scalability_and_performance/understanding-node-observability-operator.adoc

:_content-type: PROCEDURE
[id="creating-node-observability-custom-resource_{context}"]
= Creating the Node Observability custom resource

Before you run profiling queries, you must create a `NodeObservability` custom resource (CR).

[IMPORTANT]
====
Creating a `NodeObservability` CR reboots all the worker nodes. It might take 10 or more minutes to complete.
====

When you apply the `NodeObservability` CR, it creates the necessary machine config and machine config pool CRs to enable the CRI-O profiling on the worker nodes.

[NOTE]
====
Kubelet profiling is enabled by default.
====

The CRI-O unix socket of the node is mounted on the agent pod, which allows the agent to communicate with CRIO to run the pprof request. Similiarly, the `kubelet-serving-ca` certificate chain is mounted on the agent pod, which allows secure communication between the agent and node's kubelet endpoint.

.Prerequisites
* You have installed the Node Observability Operator.
* You have installed the OpenShift CLI (oc).
* You have access to the cluster with `cluster-admin` privileges.

.Procedure

. Log in to the {product-title} CLI as a user with the `cluster-admin` role by running the following command:
+
[source,terminal]
----
$ oc login -u kubeadmin https://<HOSTNAME>:6443
----

. Switch back to the `node-observability-operator` namespace by running the following command:
+
[source,terminal]
----
$ oc project node-observability-operator
----

. Create a CR file named `nodeobservability.yaml` that contains the following text:
+
[source,yaml]
----
apiVersion: nodeobservability.olm.openshift.io/v1alpha1
kind: NodeObservability
metadata:
name: cluster <1>
spec:
labels:
node-role.kubernetes.io/worker: ""
type: crio-kubelet
----
<1> You must specify the name as `cluster` because there should be only one `NodeObservability` CR per cluster.

. Run the `NodeObservability` CR:
+
[source,terminal]
----
oc apply -f nodeobservability.yaml
----

+
.Example output
[source,terminal]
----
nodeobservability.olm.openshift.io/cluster created
----

. Review the status of the `NodeObservability` CR by running the following command:
+
[source,terminal]
----
$ oc get nob/cluster -o yaml | yq '.status.conditions'
----

+
.Example output
[source,terminal]
----
conditions:
conditions:
- lastTransitionTime: "2022-07-05T07:33:54Z"
message: 'DaemonSet node-observability-ds ready: true NodeObservabilityMachineConfig
ready: true'
reason: Ready
status: "True"
type: Ready
----

+
`NodeObservability` CR run is completed when the reason is `Ready` and the status is `True`.
11 changes: 11 additions & 0 deletions modules/node-observability-high-level-workflow.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
// Module included in the following assemblies:
//
// * scalability_and_performance/understanding-node-observability-operator.adoc

:_content-type: CONCEPT
[id="workflow-node-observability-operator_{context}"]
= High level workflow of the Node Observability Operator

After you install the Node Observability Operator in the {product-title} cluster, you have to create a `NodeObservability` custom resource, which creates a DaemonSet to deploy a Node Observability agent on each worker node.

To request a profiling query, you have to create a `NodeObservabilityRun` resource that requests the deployed Node Observability agent to trigger the CRI-O and Kubelet profiling. After the profiling is completed, the Node Observability agent stores the profiling data inside the container file system `/run/node-observability` directory, which is available for query.
119 changes: 119 additions & 0 deletions modules/node-observability-install-cli.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
// Module included in the following assemblies:
//
// * scalability_and_performance/understanding-node-observability-operator.adoc

:_content-type: PROCEDURE
[id="install-node-observability-using-cli_{context}"]
= Installing the Node Observability Operator using the CLI

You can install the Node Observability Operator by using the OpenShift CLI (oc).

.Prerequisites

* You have installed the OpenShift CLI (oc).
* You have access to the cluster with `cluster-admin` privileges.

.Procedure

. Confirm that the Node Observability Operator is available by running the following command:
+
[source,terminal]
----
$ oc get packagemanifests -n openshift-marketplace node-observability-operator
----

+
.Example output
[source,terminal]
----
NAME CATALOG AGE
node-observability-operator Red Hat Operators 9h
----

. Create the `node-observability-operator` namespace by running the following command::
+
[source,terminal]
----
$ oc new-project node-observability-operator
----

. Create an `OperatorGroup` object YAML file:
+
[source,yaml]
----
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: node-observability-operator
namespace: node-observability-operator
spec:
targetNamespaces:
- node-observability-operator
EOF
----

. Create a `Subscription` object YAML file to subscribe a namespace to an Operator:
+
[source,yaml]
----
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: node-observability-operator
namespace: node-observability-operator
spec:
channel: alpha
name: node-observability-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
EOF
----

.Verification

. View the install plan name by running the following command:
+
[source,terminal]
----
$ oc -n node-observability-operator get sub node-observability-operator -o yaml | yq '.status.installplan.name'
----

+
.Example output
[source,terminal]
----
install-dt54w
----

. Verify the install plan status by running the following command:
+
[source,terminal]
----
$ oc -n node-observability-operator get ip <install_plan_name> -o yaml | yq '.status.phase'
----
+
`<install_plan_name>` is the install plan name that you obtained from the output of the previous command.

+
.Example output
[source,terminal]
----
COMPLETE
----

. Verify that the Node Observability Operator is up and running:
+
[source,terminal]
----
$ oc get deploy -n node-observability-operator
----

+
.Example output
[source,terminal]
----
NAME READY UP-TO-DATE AVAILABLE AGE
node-observability-operator-controller-manager 1/1 1 1 40h
----
31 changes: 31 additions & 0 deletions modules/node-observability-install-web-console.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
// Module included in the following assemblies:
//
// * scalability_and_performance/understanding-node-observability-operator.adoc

:_content-type: PROCEDURE
[id="install-node-observability-using-web-console_{context}"]
= Installing the Node Observability Operator using the web console

You can install the Node Observability Operator from the {product-title} web console.

.Prerequisites

* You have access to the cluster with `cluster-admin` privileges.
* You have access to the {product-title} web console.

.Procedure

. Log in to the {product-title} web console.
. In the Administrator's navigation panel, expand *Operators* → *OperatorHub*.
. In the *All items* field, enter *Node Observability Operator* and select the *Node Observability Operator* tile.
. Click *Install*.
. On the *Install Operator* page, configure the following settings:
.. In the *Update channel* area, click *alpha*.
.. In the *Installation mode* area, click *A specific namespace on the cluster*.
.. From the *Installed Namespace* list, select *node-observability-operator* from the list.
.. In the *Update approval* area, select *Automatic*.
.. Click *Install*.

.Verification
. In the Administrator's navigation panel, expand *Operators* → *Installed Operators*.
. Verify that the Node Observability Operator is listed in the Operators list.
8 changes: 8 additions & 0 deletions modules/node-observability-installation.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
// Module included in the following assemblies:
//
// * scalability_and_performance/understanding-node-observability-operator.adoc

:_content-type: CONCEPT
[id="install-node-observability-operator_{context}"]
= Installing the Node Observability Operator
The Node Observability Operator is not installed in {product-title} by default. You can install the Node Observability Operator by using the {product-title} CLI or the web console.
83 changes: 83 additions & 0 deletions modules/node-observability-run-profiling-query.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
// Module included in the following assemblies:
//
// * scalability_and_performance/understanding-node-observability-operator.adoc

:_content-type: PROCEDURE
[id="running-profiling-query_{context}"]
= Running profiling query

Profiling query is a blocking operation that fetches CRI-O and Kubelet profiling data for a duration of 30 seconds. The Node Observability Operator stores the profiling data inside the container file system `/run/node-observability` directory. To request profiling data query, you have to create a `NodeObservabilityRun` resource.

[IMPORTANT]
====
You can request only one profiling query at any point of time.
====

.Prerequisites
* You have installed the Node Observability Operator.
* You have created the `NodeObservability` custom resource (CR).
* You have access to the cluster with `cluster-admin` privileges.

.Procedure

. Create a `NodeObservabilityRun` resource file named `nodeobservabilityrun.yaml` that contains the following text:
+
[source,yaml]
----
apiVersion: nodeobservability.olm.openshift.io/v1alpha1
kind: NodeObservabilityRun
metadata:
name: nodeobservabilityrun
spec:
nodeObservabilityRef:
name: cluster
----

. Run the `NodeObservabilityRun` to trigger the profiling:
+
[source,terminal]
----
$ oc apply -f nodeobservabilityrun.yaml
----

. Review the status of the `NodeObservabilityRun` by running the following command:
+
[source,terminal]
----
$ oc get nodeobservabilityrun -o yaml | yq '.status.conditions'
----

+
.Example output
[source,terminal]
----
conditions:
- lastTransitionTime: "2022-07-07T14:57:34Z"
message: Ready to start profiling
reason: Ready
status: "True"
type: Ready
- lastTransitionTime: "2022-07-07T14:58:10Z"
message: Profiling query done
reason: Finished
status: "True"
type: Finished
----

+
Profiling query is complete when the status is `True` and type is `Finished`.

. Run the following bash script to retrieve the profiling data from container's `/run/node-observability` path:
+
[source,bash]
----
for a in $(oc get nodeobservabilityrun nodeobservabilityrun -o yaml | yq .status.agents[].name); do
echo "agent ${a}"
mkdir -p "/tmp/${a}"
for p in $(oc exec "${a}" -c node-observability-agent -- bash -c "ls /run/node-observability/*.pprof"); do
f="$(basename ${p})"
echo "copying ${f} to /tmp/${a}/${f}"
oc exec "${a}" -c node-observability-agent -- cat "${p}" > "/tmp/${a}/${f}"
done
done
----
Loading