diff --git a/_attributes/common-attributes.adoc b/_attributes/common-attributes.adoc index fd0a073cfb53..0aa0d55145a2 100644 --- a/_attributes/common-attributes.adoc +++ b/_attributes/common-attributes.adoc @@ -38,6 +38,9 @@ :cmo-full: Cluster Monitoring Operator :cmo-short: CMO +// Observability Signal Correlation +:korrel8r: https://github.com/korrel8r/korrel8r[Korrel8r] +:sync: image:fa-sync-alt.svg[Sync,20] // Cluster Observability Operator :coo-first: Cluster Observability Operator (COO) diff --git a/images/coo-troubleshooting-experimental.png b/images/coo-troubleshooting-experimental.png deleted file mode 100644 index 051d9d92bb28..000000000000 Binary files a/images/coo-troubleshooting-experimental.png and /dev/null differ diff --git a/images/coo-troubleshooting-panel-advanced.png b/images/coo-troubleshooting-panel-advanced.png new file mode 100644 index 000000000000..b161b181cd7c Binary files /dev/null and b/images/coo-troubleshooting-panel-advanced.png differ diff --git a/images/coo-troubleshooting-panel-graph.png b/images/coo-troubleshooting-panel-graph.png index 119b1bad87c9..a4cdccce40a6 100644 Binary files a/images/coo-troubleshooting-panel-graph.png and b/images/coo-troubleshooting-panel-graph.png differ diff --git a/images/coo-troubleshooting-panel-link.png b/images/coo-troubleshooting-panel-link.png index b61da227ef84..5a7b074cc9b3 100644 Binary files a/images/coo-troubleshooting-panel-link.png and b/images/coo-troubleshooting-panel-link.png differ diff --git a/images/coo-troubleshooting-panel-log.png b/images/coo-troubleshooting-panel-log.png new file mode 100644 index 000000000000..db1e4d130d2a Binary files /dev/null and b/images/coo-troubleshooting-panel-log.png differ diff --git a/images/fa-sync-alt.svg b/images/fa-sync-alt.svg new file mode 100644 index 000000000000..99ba3155bebd --- /dev/null +++ b/images/fa-sync-alt.svg @@ -0,0 +1,2 @@ + + \ No newline at end of file diff --git a/modules/coo-troubleshooting-ui-plugin-creating-alert.adoc b/modules/coo-troubleshooting-ui-plugin-creating-alert.adoc deleted file mode 100644 index b8651c644fe1..000000000000 --- a/modules/coo-troubleshooting-ui-plugin-creating-alert.adoc +++ /dev/null @@ -1,46 +0,0 @@ -// Module included in the following assemblies: - -// * observability/cluster_observability_operator/ui_plugins/troubleshooting-ui-plugin.adoc - -:_mod-docs-content-type: PROCEDURE -[id="coo-troubleshooting-ui-plugin-creating-alert_{context}"] -= Creating the example alert - - -To trigger an alert as a starting point to use in the troubleshooting UI panel, you can deploy a container that is deliberately misconfigured. - -.Procedure - -. Use the following YAML, either from the command line or in the web console, to create a broken deployment in a system namespace: -+ -[source,yaml] ----- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: bad-deployment - namespace: default <1> -spec: - selector: - matchLabels: - app: bad-deployment - template: - metadata: - labels: - app: bad-deployment - spec: - containers: <2> - - name: bad-deployment - image: quay.io/openshift-logging/vector:5.8 ----- -<1> The deployment must be in a system namespace (such as `default`) to cause the desired alerts. -<2> This container deliberately tries to start a `vector` server with no configuration file. The server logs a few messages, and then exits with an error. Alternatively, you can deploy any container you like that is badly configured, causing it to trigger an alert. - -. View the alerts: -.. Go to *Observe* -> *Alerting* and click *clear all filters*. View the `Pending` alerts. -+ -[IMPORTANT] -==== -Alerts first appear in the `Pending` state. They do not start `Firing` until the container has been crashing for some time. By viewing `Pending` alerts, you do not have to wait as long to see them occur. -==== -.. Choose one of the `KubeContainerWaiting`, `KubePodCrashLooping`, or `KubePodNotReady` alerts and open the troubleshooting panel by clicking on the link. Alternatively, if the panel is already open, click the "Focus" button to update the graph. \ No newline at end of file diff --git a/modules/coo-troubleshooting-ui-plugin-install.adoc b/modules/coo-troubleshooting-ui-plugin-install.adoc index 10944ec92985..3543155e503b 100644 --- a/modules/coo-troubleshooting-ui-plugin-install.adoc +++ b/modules/coo-troubleshooting-ui-plugin-install.adoc @@ -3,6 +3,7 @@ // * observability/cluster_observability_operator/ui_plugins/troubleshooting-ui-plugin.adoc :_mod-docs-content-type: PROCEDURE +include::_attributes/common-attributes.adoc[] [id="coo-troubleshooting-ui-plugin-install_{context}"] = Installing the {coo-full} Troubleshooting UI plugin @@ -24,4 +25,4 @@ metadata: name: troubleshooting-panel spec: type: TroubleshootingPanel ----- \ No newline at end of file +---- diff --git a/modules/coo-troubleshooting-ui-plugin-using.adoc b/modules/coo-troubleshooting-ui-plugin-using.adoc index a93f38fe91c2..910af83fcf32 100644 --- a/modules/coo-troubleshooting-ui-plugin-using.adoc +++ b/modules/coo-troubleshooting-ui-plugin-using.adoc @@ -3,83 +3,75 @@ // * observability/cluster_observability_operator/ui_plugins/troubleshooting-ui-plugin.adoc :_mod-docs-content-type: PROCEDURE -[id="coo-troubleshooting-ui-plugin-using_{context}"] -= Using the {coo-full} troubleshooting UI plugin - -include::snippets/unified-perspective-web-console.adoc[] +include::_attributes/common-attributes.adoc[] +[id="coo-troubleshooting-ii-plugin-using_{context}"] += Using the {coo-full} Troubleshooting Panel UI plugin +:icons: font .Prerequisites -* You have access to the {ocp-product-title} cluster as a user with the `cluster-admin` cluster role. If your cluster version is 4.17+, you can access the troubleshooting UI panel from the Application Launcher {launch}. +* You have access to the {ocp-product-title} cluster as a user with the `cluster-admin` cluster role. * You have logged in to the {ocp-product-title} web console. -* You have installed {ocp-product-title} Logging, if you want to visualize correlated logs. -* You have installed {ocp-product-title} Network Observability, if you want to visualize correlated netflows. * You have installed the {coo-full}. * You have installed the {coo-full} troubleshooting UI plugin. -+ + [NOTE] ==== -The troubleshooting panel relies on the observability signal stores installed in your cluster. -Kuberenetes resources, alerts and metrics are always available by default in an {ocp-product-title} cluster. -Other signal types require optional components to be installed: +The troubleshooting panel can display additional signals if the following Red Hat provided components are installed in their recommended namespace. + +* **Logs:** +** `{ocp-product-title} Logging` (collection) +** `{ocp-product-title} Loki Operator` (store) +* **Network events:** +** `{ocp-product-title} Network observability` (collection) +** `{ocp-product-title} Loki Operator` (store) +* **Traces:** `{ocp-product-title} Distributed Tracing` (collection and store) +==== -* **Logs:** Red Hat Openshift Logging (collection) and Loki Operator provided by Red Hat (store) -* **Network events:** Network observability provided by Red Hat (collection) and Loki Operator provided by Red Hat (store) +[NOTE] +==== +Users with `ClusterRole/view` (for example `kube:admin`) will get the most complete results. +Other users will see more restricted graphs, showing only data that they have permission to see. ==== + .Procedure +The **Troubleshooting Panel** is a side-panel that shows a graph of resources and signals related to the information displayed in the main console (in this example the `apiserver` Deployment). +You can use the troubleshooting panel on most console pages - resources, alerts, metrics, logs, traces, network observability and more. -. In the web console, go to *Observe* -> *Alerting* and then select an alert. If the alert has correlated items, a **Troubleshooting Panel** link will appear above the chart on the alert detail page. +. Open the Application Launcher {launch}, click the **Signal Correlation** menu item. + -image::coo-troubleshooting-panel-link.png[Troubleshooting Panel link] +image::coo-troubleshooting-panel-link.png[Open panel] + -Click on the **Troubleshooting Panel** link to display the panel. -. The panel consists of query details and a topology graph of the query results. The selected alert is converted into a Korrel8r query string and sent to the `korrel8r` service. -The results are displayed as a graph network connecting the returned signals and resources. This is a _neighbourhood_ graph, starting at the current resource and including related objects up to 3 steps away from the starting point. -Clicking on nodes in the graph takes you to the corresponding web console pages for those resouces. -. You can use the troubleshooting panel to find resources relating to the chosen alert. +. Click **Focus** to show a correlation graph starting from the resources or signals in the main console display (in this example the `apiserver` Deployment). + -[NOTE] -==== -Clicking on a node may sometimes show fewer results than indicated on the graph. This is a known issue that will be addressed in a future release. -==== +image::coo-troubleshooting-panel-graph.png[Troubleshooting Panel] + -image::coo-troubleshooting-panel-graph.png[Troubleshooting panel] [arabic] -.. **Alert (1):** This node is the starting point in the graph and represents the `KubeContainerWaiting` alert displayed in the web console. - -.. **Pod (1):** This node indicates that there is a single `Pod` resource associated with this alert. Clicking on this node will open a console search showing the related pod directly. - -.. **Event (2):** There are two Kuberenetes events associated with the pod. Click this node to see the events. - -.. **Logs (74):** This pod has 74 lines of logs, which you can access by clicking on this node. - -.. **Metrics (105):** There are many metrics associated with the pod. - -.. **Network (6):** There are network events, meaning the pod has communicated over the network. The remaining nodes in the graph represent the `Service`, `Deployment` and `DaemonSet` resources that the pod has communicated with. - -.. **Focus:** Clicking this button updates the graph. By default, the graph itself does not change when you click on nodes in the graph. Instead, the main web console page changes, and you can then navigate to other resources using links on the page, while the troubleshooting panel itself stays open and unchanged. To force an update to the graph in the troubleshooting panel, click **Focus**. This draws a new graph, using the current resource in the web console as the starting point. - -.. **Show Query:** Clicking this button enables some experimental features: +.. **Focus** will re-calculate the graph from the main console at any time. +.. {sync} refreshes the graph that is currently displayed. +.. The root (top) node represents the resource in the main console window. +.. First degree neighbors - directly related to the initial resource. +.. Second degree neighbors - indirectly related via first-degree neighbors. + -image::coo-troubleshooting-experimental.png[Experimental features] -[arabic] -... **Hide Query** hides the experimental features. +. Clicking on a node opens the data in the main console. For example if we click on the Log node: -... The query that identifies the starting point for the graph. -The query language, part of the link:https://korrel8r.github.io/korrel8r[Korrel8r] correlation engine used to create the graphs, is experimental and may change in future. -The query is updated by the **Focus** button to correspond to the resources in the main web console window. +image::coo-troubleshooting-panel-log.png[Viewing Logs] -... **Neighbourhood depth** is used to display a smaller or larger neighbourhood. -+ -[NOTE] -==== -Setting a large value in a large cluster might cause the query to fail, if the number of results is too big. -==== -... **Goal class** results in a goal directed search instead of a neighbourhood search. A goal directed search shows all paths from the starting point to the goal class, which indicates a type of resource or signal. The format of the goal class is experimental and may change. Currently, the following goals are valid: -**** `k8s:__RESOURCE[VERSION.[GROUP]]__` identifying a kind of kuberenetes resource. For example `k8s:Pod` or `k8s:Deployment.apps.v1`. -**** `alert:alert` representing any alert. +== Advanced controls -**** `metric:metric` representing any metric. +The **Advanced** button opens additional controls over the correlation search, +these are *experimental* and may change in a future release. +They provide access to additional features of the {korrel8r} upstream project that powers the correlation search. -**** `netflow:network` representing any network observability network event. +image::coo-troubleshooting-panel-advanced.png[Advanced Controls] -**** `log:__LOG_TYPE__` representing stored logs, where `__LOG_TYPE__` must be one of `application`, `infrastructure` or `audit`. \ No newline at end of file +. **Recent** The search is limited to recent results, by default in the last day. +. **Range** Allows searching an explicit date/time range in the past. +. **Distance** Searches up to the given distance from the starting point. + Each correlation rule (edge in the graph) is a "step", the distance is the maximum number of steps. +. **Goal Class** Instead of a distance, select a goal: a resource or signal type. + Searches for paths from the start to the goal type. The path length is not limited. + See the {korrel8r} project documentation for more about correlation classes. +. **Query** The {korrel8r} query representing the starting point for the search. + See the {korrel8r} project documentation for more about correlation queries. + Clicking **Focus** will reset the query to reflect the current console page. + Clicking **Sync** will re-run the current query and update the graph. diff --git a/ui_plugins/troubleshooting-ui-plugin.adoc b/ui_plugins/troubleshooting-ui-plugin.adoc index eefb4380ac1f..c90d54bac699 100644 --- a/ui_plugins/troubleshooting-ui-plugin.adoc +++ b/ui_plugins/troubleshooting-ui-plugin.adoc @@ -1,26 +1,21 @@ :_mod-docs-content-type: ASSEMBLY [id="troubleshooting-ui-plugin"] -= Troubleshooting UI plugin += Troubleshooting Panel UI plugin include::_attributes/common-attributes.adoc[] :context: troubleshooting-ui-plugin - -toc::[] - :FeatureName: The {coo-full} troubleshooting panel UI plugin -include::snippets/technology-preview.adoc[leveloffset=+2] -The troubleshooting UI plugin for {ocp-product-title} version 4.16+ provides observability signal correlation, powered by the open source Korrel8r project. -With the troubleshooting panel that is available under *Observe* -> *Alerting*, you can easily correlate metrics, logs, alerts, netflows, and additional observability signals and resources, across different data stores. -Users of {ocp-product-title} version 4.17+ can also access the troubleshooting UI panel from the Application Launcher {launch}. +toc::[] -When you install the troubleshooting UI plugin, a link:https://github.com/korrel8r/korrel8r[Korrel8r] service named `korrel8r` is deployed in the same namespace, and it is able to locate related observability signals and Kubernetes resources from its correlation engine. +The Troubleshooting Panel UI plugin provides observability signal correlation. +With the troubleshooting panel, available from the Application Launcher {launch}, +you can easily navigate to resources, metrics, logs, alerts, netflows, and other observability signals +that are correlated to the current contents of the console screen. -The output of Korrel8r is displayed in the form of an interactive node graph in the {ocp-product-title} web console. +The panel displays an interactive node graph in the {ocp-product-title} web console. Nodes in the graph represent a type of resource or signal, while edges represent relationships. -When you click on a node, you are automatically redirected to the corresponding web console page with the specific information for that node, for example, metric, log, pod. +When you click on a node, you are automatically redirected to the corresponding web console page. include::modules/coo-troubleshooting-ui-plugin-install.adoc[leveloffset=+1] include::modules/coo-troubleshooting-ui-plugin-using.adoc[leveloffset=+1] - -include::modules/coo-troubleshooting-ui-plugin-creating-alert.adoc[leveloffset=+1] \ No newline at end of file