diff --git a/doc-Service-Telemetry-Framework/assemblies/assembly_advanced-features.adoc b/doc-Service-Telemetry-Framework/assemblies/assembly_advanced-features.adoc index 0cf2eaa3..d2006d3c 100644 --- a/doc-Service-Telemetry-Framework/assemblies/assembly_advanced-features.adoc +++ b/doc-Service-Telemetry-Framework/assemblies/assembly_advanced-features.adoc @@ -53,7 +53,6 @@ include::../modules/proc_configuring-high-availability.adoc[leveloffset=+2] //Dashboards include::../modules/con_dashboards.adoc[leveloffset=+1] include::../modules/proc_setting-up-grafana-to-host-the-dashboard.adoc[leveloffset=+2] -include::../modules/ref_the-grafana-infrastructure-dashboard.adoc[leveloffset=+2] //Monitoring the resource usage of Openstack services include::../modules/proc_monitoring-resource-usage-of-openstack-services.adoc[leveloffset=+1] diff --git a/doc-Service-Telemetry-Framework/modules/con_dashboards.adoc b/doc-Service-Telemetry-Framework/modules/con_dashboards.adoc index e1b4b9aa..a6fc6925 100644 --- a/doc-Service-Telemetry-Framework/modules/con_dashboards.adoc +++ b/doc-Service-Telemetry-Framework/modules/con_dashboards.adoc @@ -29,3 +29,13 @@ Use third-party application Grafana to visualize system-level metrics gathered by collectd for each individual host node. For more information about configuring collectd, see xref:configuring-red-hat-openstack-platform-overcloud-for-stf_assembly-completing-the-stf-configuration[]. + +You can use two dashboards to monitor a cloud: + +* Infrastructure dashboard +Use the infrastructure dashboard to view metrics for a single node at a time. Select a node from the upper left corner of the dashboard. + +* Cloud view dashboard +Use the cloud view dashboard to view panels for monitoring service resource usage, API stats, and cloud events. You must enable API health monitoring and service monitoring to provide the data for this dashboard. +** To enable API health monitoring, see xref:monitoring-container-health-and-api-status_assembly-advanced-features[]. +** To enable service monitoring, see xref:monitoring-resource-usage-of-openstack-services_assembly-advanced-features[]. diff --git a/doc-Service-Telemetry-Framework/modules/ref_the-grafana-infrastructure-dashboard.adoc b/doc-Service-Telemetry-Framework/modules/ref_the-grafana-infrastructure-dashboard.adoc deleted file mode 100644 index 3d0869a6..00000000 --- a/doc-Service-Telemetry-Framework/modules/ref_the-grafana-infrastructure-dashboard.adoc +++ /dev/null @@ -1,175 +0,0 @@ -// Module included in the following assemblies: -// -// - -// This module can be included from assemblies using the following include statement: -// include::/proc_operating-the-dashboard.adoc[leveloffset=+1] - -// The file name and the ID are based on the module title. For example: -// * file name: proc_doing-procedure-a.adoc -// * ID: [id='proc_doing-procedure-a_{context}'] -// * Title: = Doing procedure A -// -// The ID is used as an anchor for linking to the module. Avoid changing -// it after the module has been published to ensure existing links are not -// broken. -// -// The `context` attribute enables module reuse. Every module's ID includes -// {context}, which ensures that the module has a unique ID even if it is -// reused multiple times in a guide. -// -// Start the title with a verb, such as Creating or Create. See also -// _Wording of headings_ in _The IBM Style Guide_. -[id="the-grafana-infrastructure-dashboard_{context}"] -= The Grafana infrastructure dashboard - -[role="_abstract"] -The infrastructure dashboard shows metrics for a single node at a time. Select a node from the upper left corner of the dashboard. - -== Top panels - -|=== - -|**Title** | **Unit** | **Description** -| Current Global Alerts | - | Current alerts fired by Prometheus -| Recent Global Alerts | - | Recently fired alerts in 5m time steps -| Status Panel | - | Node status: up, down, unavailable -| Uptime | s/m/h/d/M/Y | Total operational time of node -| CPU Cores | cores | Total number of cores -| Memory | bytes | Total memory -| Disk Size | bytes | Total storage size -| Processes | processes | Total number of processes listed by type -| Load Average | processes | Load average represents the average number of running and uninterruptible processes residing in the kernel execution queue. -|=== - - - -== Networking panels -Panels that display the network interfaces of the node. - -|=== -|**Panel** | **Unit** | **Description** -| Physical Interfaces Ingress Errors | errors | Total errors with incoming data - -| Physical Interfaces Egress Errors | -errors | -Total errors with outgoing data - -| Physical Interfaces Ingress Error Rates | -errors/s | -Rate of incoming data errors - -| Physical Interfaces egress Error Rates | -errors/s | -Rate of outgoing data errors - -| Physical Interfaces Packets Ingress -pps -Incoming packets per second - -| Physical Interfaces Packets Egress | -pps | -Outgoing packets per second - -| Physical Interfaces Data Ingress | -bytes/s | -Incoming data rates - -| Physical Interfaces Data Egress | -bytes/s | -Outgoing data rates - -| Physical Interfaces Drop Rate Ingress | -pps | -Incoming packets drop rate - -| Physical Interfaces Drop Rate Egress | -pps | -Outgoing packets drop rate -|=== - -== CPU panels -Panels that display CPU usage of the node. -|=== -|**Panel** | **Unit** | **Description** - -| Current CPU Usage | -percent | -Instantaneous usage at the time of the last query. - - -| Aggregate CPU Usage | -percent | -Average non-idle CPU activity of all cores on a node. - - -| Aggr. CPU Usage by Type | -percent | -Shows time spent for each type of thread averaged across all cores. -|=== - - -== Memory panels -Panels that display memory usage on the node. - -|=== -|**Panel** | **Unit** | **Description** - - -| Memory Used | -percent | -Amount of memory being used at time of last query. - - -| Huge Pages Used | -hugepages | -Number of hugepages being used. - - -Memory | -bytes | -Memory marked as “used” by the OS. -|=== - - -== Disk/file system -Panels that display space used on disk. - -|=== -|**Panel** | **Unit** | **Description** | **Notes** - -| Disk Space Usage | -percent | -Total disk use at time of last query. | - - -| Inode Usage | -percent | -Total inode use at time of last query. | - - -| Aggregate Disk Space Usage | -bytes | -Total disk space used and reserved. | -Because this query relies on the `df` plugin, temporary file systems that do not necessarily use disk space are included in the results. The query tries to filter out most of these, but it might not be exhaustive. - -| Disk Traffic | -bytes/s | -Shows rates for both reading and writing. | - - -| Disk Load | -percent | -Approximate percentage of total disk bandwidth being used. -The weighted I/O time series includes the backlog that might be accumulating. For more information, see the collectd https://collectd.org/wiki/index.php/Plugin:Disk[disk plugin docs]. | - - -| Operations/s | -ops/s | -Operations done per second | - - -| Average I/O Operation Time | -seconds | -Average time each I/O operation took to complete. This average is not accurate, see the collectd https://collectd.org/wiki/index.php/Plugin:Disk[disk plugin docs]. | -|===