From 0433bd5d032a51a4cf29b18119ccd49464f3dcc5 Mon Sep 17 00:00:00 2001 From: Benjamin Jee Date: Mon, 3 Mar 2025 14:52:11 -0800 Subject: [PATCH 1/5] Update documentation on prometheus metrics --- content/ngf/how-to/monitoring/prometheus.md | 49 +- ...hboard.json => ngf-grafana-dashboard.json} | 527 ++++++++---------- 2 files changed, 259 insertions(+), 317 deletions(-) rename static/ngf/{grafana-dashboard.json => ngf-grafana-dashboard.json} (66%) diff --git a/content/ngf/how-to/monitoring/prometheus.md b/content/ngf/how-to/monitoring/prometheus.md index c50d8a8d8..8ba8f2b0a 100644 --- a/content/ngf/how-to/monitoring/prometheus.md +++ b/content/ngf/how-to/monitoring/prometheus.md @@ -83,9 +83,46 @@ NGINX Gateway Fabric provides a variety of metrics for monitoring and analyzing ### NGINX/NGINX Plus metrics -NGINX metrics cover specific NGINX operations such as the total number of accepted client connections. For a complete list of available NGINX/NGINX Plus metrics, refer to the [NGINX Prometheus Exporter developer docs](https://github.com/nginx/nginx-prometheus-exporter#exported-metrics). - -These metrics use the `nginx_gateway_fabric` namespace and include the `class` label, indicating the NGINX Gateway class. For example, `nginx_gateway_fabric_connections_accepted{class="nginx"}`. +NGINX metrics cover specific NGINX operations such as the total number of accepted client connections. These metrics are +collected through NGINX Agent and are reported by each NGINX Pod. + +NGINX Gateway Fabric currently supports a subset of all metrics available through NGINX OSS and Plus. Listed below are +the supported metrics along with a small accompanying description. + +Metrics given in NGINX OSS include: +- `nginx_http_connections`: NGINX-wide statistics describing HTTP connections. +- `nginx_http_requests`: The total number of client requests received from clients. + +Metrics given in NGINX Plus include those in NGINX OSS in addition to: +- `nginx_config_reloads`: The total number of NGINX config reloads. +- `nginx_http_response_status_responses_total`: The number of responses, grouped by status code range. +- `nginx_http_upstream_keepalive_count_connections`: The current number of idle keepalive connections per HTTP upstream. +- `nginx_http_request_discarded_requests_total`: The total number of requests completed without sending a response. +- `nginx_http_request_processing_count_requests`: The number of client requests that are currently being processed. +- `nginx_http_request_byte_io_bytes_total`: The total number of HTTP byte IO. +- `nginx_http_upstream_peer_byte_io_bytes_total`: The total number of byte IO per HTTP upstream peer. +- `nginx_http_upstream_peer_count_peers`: The current count of peers on the HTTP upstream grouped by state. +- `nginx_http_upstream_peer_fails_attempts`: The total number of unsuccessful attempts to communicate with the HTTP upstream peer. +- `nginx_http_upstream_peer_header_time_milliseconds`: The average time to get the response header from the HTTP upstream peer. +- `nginx_http_upstream_peer_health_checks_requests_total`: The total number of health check requests made to a HTTP upstream peer. +- `nginx_http_upstream_peer_requests_total`: The total number of client requests forwarded to the HTTP upstream peer. +- `nginx_http_upstream_peer_response_time_milliseconds`: The average time to get the full response from the HTTP upstream peer. +- `nginx_http_upstream_peer_responses_total`: The total number of responses obtained from the HTTP upstream peer grouped by status range. +- `nginx_http_upstream_peer_state_is_deployed`: Current state of an upstream peer in deployment. +- `nginx_http_upstream_peer_unavailables_requests_total`: Number of times the server became unavailable for client requests (“unavail”). +- `nginx_http_upstream_queue_limit_requests`: The maximum number of requests that can be in the queue at the same time. +- `nginx_http_upstream_queue_overflows_responses_total`: The total number of requests rejected due to the queue overflow. +- `nginx_http_upstream_queue_usage_requests`: The current number of requests in the queue. +- `nginx_http_upstream_zombie_count_is_deployed`: The current number of upstream peers removed from the group but still processing active client requests. +- `nginx_slab_page_free_pages`: The current number of free memory pages. +- `nginx_slab_page_usage_pages`: The current number of used memory pages. +- `nginx_slab_slot_allocations_total`: The number of attempts to allocate memory of specified size. +- `nginx_slab_slot_free_slots`: The current number of free memory slots. +- `nginx_slab_slot_usage_slots`: The current number of used memory slots. +- `nginx_ssl_certificate_verify_failures_certificates_total`: The total number of SSL certificate verification failures. +- `nginx_ssl_handshakes_total`: The total number of SSL handshakes. + +These metrics are available under the namespace where your NGINX Pods are deployed. --- @@ -93,13 +130,9 @@ These metrics use the `nginx_gateway_fabric` namespace and include the `class` l Metrics specific to NGINX Gateway Fabric include: -- `nginx_reloads_total`: Counts successful NGINX reloads. -- `nginx_reload_errors_total`: Counts NGINX reload failures. -- `nginx_stale_config`: Indicates if NGINX Gateway Fabric couldn't update NGINX with the latest configuration, resulting in a stale version. -- `nginx_reloads_milliseconds`: Time in milliseconds for NGINX reloads. - `event_batch_processing_milliseconds`: Time in milliseconds to process batches of Kubernetes events. -All these metrics are under the `nginx_gateway_fabric` namespace and include a `class` label set to the Gateway class of NGINX Gateway Fabric. For example, `nginx_gateway_fabric_nginx_reloads_total{class="nginx"}`. +All these metrics are under the `nginx-gatewy` namespace and include a `class` label set to the GatewayClass of NGINX Gateway Fabric. For example, `nginx_gateway_fabric_event_batch_processing_milliseconds_sum{class="nginx"}`. --- diff --git a/static/ngf/grafana-dashboard.json b/static/ngf/ngf-grafana-dashboard.json similarity index 66% rename from static/ngf/grafana-dashboard.json rename to static/ngf/ngf-grafana-dashboard.json index 0c3c40392..40758cdbc 100644 --- a/static/ngf/grafana-dashboard.json +++ b/static/ngf/ngf-grafana-dashboard.json @@ -19,9 +19,8 @@ "editable": true, "fiscalYearStartMonth": 0, "graphTooltip": 0, - "id": 1, + "id": 2, "links": [], - "liveNow": false, "panels": [ { "collapsed": false, @@ -31,7 +30,7 @@ "x": 0, "y": 0 }, - "id": 5, + "id": 13, "panels": [], "title": "Status", "type": "row" @@ -44,69 +43,104 @@ "fieldConfig": { "defaults": { "color": { - "mode": "thresholds" + "mode": "palette-classic" }, - "mappings": [ - { - "options": { - "0": { - "index": 0, - "text": "Down" - }, - "1": { - "index": 1, - "text": "Up" - } - }, - "type": "value" + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" } - ], + }, + "mappings": [], "thresholds": { "mode": "absolute", "steps": [ { - "color": "semi-dark-red", + "color": "green", "value": null }, { - "color": "#EAB839", - "value": 1 - }, - { - "color": "semi-dark-green", - "value": 1 + "color": "red", + "value": 80 } ] - }, - "unit": "none", - "unitScale": true + } }, - "overrides": [] + "overrides": [ + { + "__systemRef": "hideSeriesFrom", + "matcher": { + "id": "byNames", + "options": { + "mode": "exclude", + "names": [ + "up{app_kubernetes_io_instance=\"my-release\", app_kubernetes_io_name=\"nginx-gateway-fabric\", instance=\"10.244.0.6:9113\", job=\"kubernetes-pods\", namespace=\"nginx-gateway\", node=\"kind-control-plane\", pod=\"my-release-nginx-gateway-fabric-bb7bcd756-clzbt\", pod_template_hash=\"bb7bcd756\"}" + ], + "prefix": "All except:", + "readOnly": true + } + }, + "properties": [ + { + "id": "custom.hideFrom", + "value": { + "legend": false, + "tooltip": false, + "viz": true + } + } + ] + } + ] }, "gridPos": { - "h": 4, - "w": 6, + "h": 8, + "w": 12, "x": 0, "y": 1 }, - "id": 3, + "id": 12, "options": { - "colorMode": "background", - "graphMode": "none", - "justifyMode": "auto", - "orientation": "horizontal", - "reduceOptions": { - "calcs": [ - "lastNotNull" - ], - "fields": "", - "values": false + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true }, - "showPercentChange": false, - "textMode": "auto", - "wideLayout": true + "tooltip": { + "hideZeros": false, + "mode": "single", + "sort": "none" + } }, - "pluginVersion": "10.3.3", + "pluginVersion": "11.5.2", "targets": [ { "datasource": { @@ -115,38 +149,23 @@ }, "disableTextWrap": false, "editorMode": "builder", - "expr": "nginx_gateway_fabric_up{instance=~\"$instance\"}", + "expr": "up{instance=~\"$ngf_instance\"}", "fullMetaSearch": false, "includeNullMetadata": true, - "instant": false, "legendFormat": "", "range": true, "refId": "A", "useBackend": false } ], - "title": "NGINX Status for $instance", - "type": "stat" - }, - { - "collapsed": false, - "gridPos": { - "h": 1, - "w": 24, - "x": 0, - "y": 5 - }, - "id": 6, - "panels": [], - "title": "Metrics", - "type": "row" + "title": "NGF Status", + "type": "timeseries" }, { "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "", "fieldConfig": { "defaults": { "color": { @@ -156,11 +175,12 @@ "axisBorderShow": false, "axisCenteredZero": false, "axisColorMode": "text", - "axisLabel": "Connections (rate)", + "axisLabel": "", "axisPlacement": "auto", "barAlignment": 0, + "barWidthFactor": 0.6, "drawStyle": "line", - "fillOpacity": 10, + "fillOpacity": 0, "gradientMode": "none", "hideFrom": { "legend": false, @@ -170,7 +190,7 @@ "insertNulls": false, "lineInterpolation": "linear", "lineWidth": 1, - "pointSize": 1, + "pointSize": 5, "scaleDistribution": { "type": "linear" }, @@ -191,21 +211,23 @@ { "color": "green", "value": null + }, + { + "color": "red", + "value": 80 } ] - }, - "unit": "reqps", - "unitScale": true + } }, "overrides": [] }, "gridPos": { - "h": 10, + "h": 8, "w": 12, - "x": 0, - "y": 6 + "x": 12, + "y": 1 }, - "id": 1, + "id": 14, "options": { "legend": { "calcs": [], @@ -214,10 +236,12 @@ "showLegend": true }, "tooltip": { + "hideZeros": false, "mode": "single", "sort": "none" } }, + "pluginVersion": "11.5.2", "targets": [ { "datasource": { @@ -225,34 +249,33 @@ "uid": "${DS_PROMETHEUS}" }, "disableTextWrap": false, - "editorMode": "code", - "expr": "irate(nginx_gateway_fabric_connections_accepted{instance=~\"$instance\"}[1m])", + "editorMode": "builder", + "expr": "up{instance=~\"$nginx_instance\"}", + "format": "time_series", "fullMetaSearch": false, - "includeNullMetadata": false, - "instant": false, - "interval": "", - "legendFormat": "{{instance}} accepted", + "includeNullMetadata": true, + "legendFormat": "", "range": true, "refId": "A", "useBackend": false - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "irate(nginx_gateway_fabric_connections_handled{instance=~\"$instance\"}[1m])", - "hide": false, - "instant": false, - "legendFormat": "{{instance}} handled", - "range": true, - "refId": "B" } ], - "title": "Processed Connections", + "title": "NGINX Status for All", "type": "timeseries" }, + { + "collapsed": false, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 9 + }, + "id": 6, + "panels": [], + "title": "Metrics", + "type": "row" + }, { "datasource": { "type": "prometheus", @@ -271,6 +294,7 @@ "axisLabel": "Connections", "axisPlacement": "auto", "barAlignment": 0, + "barWidthFactor": 0.6, "drawStyle": "line", "fillOpacity": 10, "gradientMode": "none", @@ -306,16 +330,15 @@ } ] }, - "unit": "short", - "unitScale": true + "unit": "short" }, "overrides": [] }, "gridPos": { "h": 10, "w": 12, - "x": 12, - "y": 6 + "x": 0, + "y": 10 }, "id": 4, "options": { @@ -326,35 +349,46 @@ "showLegend": true }, "tooltip": { + "hideZeros": false, "mode": "single", "sort": "none" } }, + "pluginVersion": "11.5.2", "targets": [ { "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "editorMode": "code", - "expr": "nginx_gateway_fabric_connections_active{instance=~\"$instance\"}", + "disableTextWrap": false, + "editorMode": "builder", + "exemplar": false, + "expr": "nginx_http_connections_count{instance=~\"$nginx_instance\", nginx_connections_outcome=\"ACTIVE\"}", + "fullMetaSearch": false, + "includeNullMetadata": true, "instant": false, "legendFormat": "{{instance}} active", "range": true, - "refId": "A" + "refId": "A", + "useBackend": false }, { "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, + "disableTextWrap": false, "editorMode": "code", - "expr": "nginx_gateway_fabric_connections_reading{instance=~\"$instance\"}", + "expr": "nginx_http_connections_count{instance=~\"$nginx_instance\", nginx_connections_outcome=\"READING\"}", + "fullMetaSearch": false, "hide": false, + "includeNullMetadata": true, "instant": false, "legendFormat": "{{instance}} reading", "range": true, - "refId": "B" + "refId": "B", + "useBackend": false }, { "datasource": { @@ -362,7 +396,7 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "nginx_gateway_fabric_connections_waiting{instance=~\"$instance\"}", + "expr": "nginx_http_connections_count{instance=~\"$nginx_instance\", nginx_connections_outcome=\"WAITING\"}", "hide": false, "instant": false, "legendFormat": "{{instance}} waiting", @@ -375,7 +409,7 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "nginx_gateway_fabric_connections_writing{instance=~\"$instance\"}", + "expr": "nginx_http_connections_count{instance=~\"$nginx_instance\", nginx_connections_outcome=\"WRITING\"}", "hide": false, "instant": false, "legendFormat": "{{instance}} writing", @@ -383,7 +417,7 @@ "refId": "D" } ], - "title": "Active Connections", + "title": "NGINX Active Connections", "type": "timeseries" }, { @@ -403,8 +437,9 @@ "axisLabel": "", "axisPlacement": "auto", "barAlignment": 0, + "barWidthFactor": 0.6, "drawStyle": "line", - "fillOpacity": 10, + "fillOpacity": 0, "gradientMode": "none", "hideFrom": { "legend": false, @@ -414,7 +449,7 @@ "insertNulls": false, "lineInterpolation": "linear", "lineWidth": 1, - "pointSize": 1, + "pointSize": 5, "scaleDistribution": { "type": "linear" }, @@ -435,21 +470,23 @@ { "color": "green", "value": null + }, + { + "color": "red", + "value": 80 } ] - }, - "unit": "reqps", - "unitScale": true + } }, "overrides": [] }, "gridPos": { - "h": 8, - "w": 24, - "x": 0, - "y": 16 + "h": 10, + "w": 12, + "x": 12, + "y": 10 }, - "id": 2, + "id": 11, "options": { "legend": { "calcs": [], @@ -458,10 +495,12 @@ "showLegend": true }, "tooltip": { + "hideZeros": false, "mode": "single", "sort": "none" } }, + "pluginVersion": "11.5.2", "targets": [ { "datasource": { @@ -470,17 +509,29 @@ }, "disableTextWrap": false, "editorMode": "code", - "expr": "irate(nginx_gateway_fabric_http_requests_total{instance=~\"$instance\"}[1m])", + "expr": "irate(nginx_http_connections_total{instance=~\"$nginx_instance\", nginx_connections_outcome=\"ACCEPTED\"}[1m])", "fullMetaSearch": false, - "includeNullMetadata": false, - "instant": false, - "legendFormat": "{{instance}} total requests", + "includeNullMetadata": true, + "legendFormat": "{{instance}} accepted", "range": true, "refId": "A", "useBackend": false + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "irate(nginx_http_connections_total{instance=~\"$nginx_instance\", nginx_connections_outcome=\"HANDLED\"}[1m])", + "hide": false, + "instant": false, + "legendFormat": "{{instance}} handled", + "range": true, + "refId": "B" } ], - "title": "Total Requests", + "title": "NGINX Processed Connections", "type": "timeseries" }, { @@ -500,6 +551,7 @@ "axisLabel": "", "axisPlacement": "auto", "barAlignment": 0, + "barWidthFactor": 0.6, "drawStyle": "line", "fillOpacity": 10, "gradientMode": "none", @@ -535,17 +587,17 @@ } ] }, - "unitScale": true + "unit": "reqps" }, "overrides": [] }, "gridPos": { "h": 8, - "w": 12, + "w": 24, "x": 0, - "y": 24 + "y": 20 }, - "id": 8, + "id": 2, "options": { "legend": { "calcs": [], @@ -554,11 +606,12 @@ "showLegend": true }, "tooltip": { + "hideZeros": false, "mode": "single", "sort": "none" } }, - "pluginVersion": "10.3.3", + "pluginVersion": "11.5.2", "targets": [ { "datasource": { @@ -566,185 +619,24 @@ "uid": "${DS_PROMETHEUS}" }, "disableTextWrap": false, - "editorMode": "code", - "expr": "irate(nginx_gateway_fabric_nginx_reloads_total{instance=~\"$instance\"}[1m])", + "editorMode": "builder", + "expr": "irate(nginx_http_requests_total{instance=~\"$nginx_instance\"}[1m])", "fullMetaSearch": false, "includeNullMetadata": false, "instant": false, - "legendFormat": "{{instance}}", + "legendFormat": "{{instance}} total requests", "range": true, "refId": "A", "useBackend": false } ], - "title": "Total NGINX Reloads Rate", + "title": "Total Requests", "type": "timeseries" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "fieldConfig": { - "defaults": { - "color": { - "mode": "thresholds" - }, - "mappings": [], - "thresholds": { - "mode": "absolute", - "steps": [ - { - "color": "green", - "value": null - }, - { - "color": "red", - "value": 1 - } - ] - }, - "unitScale": true - }, - "overrides": [] - }, - "gridPos": { - "h": 8, - "w": 6, - "x": 12, - "y": 24 - }, - "id": 9, - "options": { - "colorMode": "value", - "graphMode": "area", - "justifyMode": "auto", - "orientation": "auto", - "reduceOptions": { - "calcs": [ - "lastNotNull" - ], - "fields": "", - "values": false - }, - "showPercentChange": false, - "textMode": "auto", - "wideLayout": true - }, - "pluginVersion": "10.3.3", - "targets": [ - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "disableTextWrap": false, - "editorMode": "builder", - "expr": "nginx_gateway_fabric_nginx_reload_errors_total{instance=~\"$instance\"}", - "fullMetaSearch": false, - "includeNullMetadata": true, - "instant": false, - "legendFormat": "{{instance}}", - "range": true, - "refId": "A", - "useBackend": false - } - ], - "title": "Total NGINX Reload Errors", - "type": "stat" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "fieldConfig": { - "defaults": { - "color": { - "mode": "thresholds" - }, - "mappings": [ - { - "options": { - "0": { - "color": "semi-dark-green", - "index": 0, - "text": "Up to date" - }, - "1": { - "color": "semi-dark-red", - "index": 1, - "text": "Stale" - } - }, - "type": "value" - } - ], - "thresholds": { - "mode": "absolute", - "steps": [ - { - "color": "green", - "value": null - }, - { - "color": "semi-dark-red", - "value": 1 - } - ] - }, - "unitScale": true - }, - "overrides": [] - }, - "gridPos": { - "h": 8, - "w": 6, - "x": 18, - "y": 24 - }, - "id": 10, - "options": { - "colorMode": "value", - "graphMode": "area", - "justifyMode": "auto", - "orientation": "auto", - "reduceOptions": { - "calcs": [ - "lastNotNull" - ], - "fields": "", - "values": false - }, - "showPercentChange": false, - "textMode": "auto", - "wideLayout": true - }, - "pluginVersion": "10.3.3", - "targets": [ - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "disableTextWrap": false, - "editorMode": "builder", - "expr": "nginx_gateway_fabric_nginx_stale_config{instance=~\"$instance\"}", - "fullMetaSearch": false, - "includeNullMetadata": true, - "instant": false, - "legendFormat": "__auto", - "range": true, - "refId": "A", - "useBackend": false - } - ], - "title": "NGINX Config State", - "type": "stat" } ], + "preload": false, "refresh": "5s", - "schemaVersion": 39, + "schemaVersion": 40, "tags": [ "nginx-gateway-fabric" ], @@ -752,26 +644,45 @@ "list": [ { "current": { - "selected": false, - "text": "default", - "value": "default" + "text": "prometheus", + "value": "beerh65rwdji8d" }, - "hide": 0, "includeAll": false, "label": "datasource", - "multi": false, "name": "DS_PROMETHEUS", "options": [], "query": "prometheus", - "queryValue": "", "refresh": 1, "regex": "", - "skipUrlSync": false, "type": "datasource" }, { "current": { - "selected": true, + "text": "All", + "value": [ + "$__all" + ] + }, + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "definition": "label_values(nginx_http_connections_total,instance)", + "includeAll": true, + "multi": true, + "name": "nginx_instance", + "options": [], + "query": { + "qryType": 1, + "query": "label_values(nginx_http_connections_total,instance)", + "refId": "PrometheusVariableQueryEditor-VariableQuery" + }, + "refresh": 1, + "regex": "", + "type": "query" + }, + { + "current": { "text": [ "All" ], @@ -783,21 +694,19 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "definition": "label_values(nginx_gateway_fabric_up,instance)", - "hide": 0, + "definition": "label_values(nginx_gateway_fabric_event_batch_processing_milliseconds_sum,instance)", "includeAll": true, + "label": "ngf_instance", "multi": true, - "name": "instance", + "name": "ngf_instance", "options": [], "query": { "qryType": 1, - "query": "label_values(nginx_gateway_fabric_up,instance)", + "query": "label_values(nginx_gateway_fabric_event_batch_processing_milliseconds_sum,instance)", "refId": "PrometheusVariableQueryEditor-VariableQuery" }, "refresh": 1, "regex": "", - "skipUrlSync": false, - "sort": 0, "type": "query" } ] @@ -810,6 +719,6 @@ "timezone": "", "title": "NGINX Gateway Fabric", "uid": "cdb1c6f6-7c77-4cee-a177-593f41364dbe", - "version": 1, + "version": 4, "weekStart": "" -} +} \ No newline at end of file From 7ad87fc403897107014d569ce4668337b236d661 Mon Sep 17 00:00:00 2001 From: Benjamin Jee Date: Tue, 4 Mar 2025 11:13:33 -0800 Subject: [PATCH 2/5] Update dashboard and fix small feedback --- content/ngf/how-to/monitoring/prometheus.md | 4 +-- static/ngf/ngf-grafana-dashboard.json | 35 +++------------------ 2 files changed, 6 insertions(+), 33 deletions(-) diff --git a/content/ngf/how-to/monitoring/prometheus.md b/content/ngf/how-to/monitoring/prometheus.md index 8ba8f2b0a..b1fee25e3 100644 --- a/content/ngf/how-to/monitoring/prometheus.md +++ b/content/ngf/how-to/monitoring/prometheus.md @@ -96,10 +96,10 @@ Metrics given in NGINX OSS include: Metrics given in NGINX Plus include those in NGINX OSS in addition to: - `nginx_config_reloads`: The total number of NGINX config reloads. - `nginx_http_response_status_responses_total`: The number of responses, grouped by status code range. -- `nginx_http_upstream_keepalive_count_connections`: The current number of idle keepalive connections per HTTP upstream. - `nginx_http_request_discarded_requests_total`: The total number of requests completed without sending a response. - `nginx_http_request_processing_count_requests`: The number of client requests that are currently being processed. - `nginx_http_request_byte_io_bytes_total`: The total number of HTTP byte IO. +- `nginx_http_upstream_keepalive_count_connections`: The current number of idle keepalive connections per HTTP upstream. - `nginx_http_upstream_peer_byte_io_bytes_total`: The total number of byte IO per HTTP upstream peer. - `nginx_http_upstream_peer_count_peers`: The current count of peers on the HTTP upstream grouped by state. - `nginx_http_upstream_peer_fails_attempts`: The total number of unsuccessful attempts to communicate with the HTTP upstream peer. @@ -132,7 +132,7 @@ Metrics specific to NGINX Gateway Fabric include: - `event_batch_processing_milliseconds`: Time in milliseconds to process batches of Kubernetes events. -All these metrics are under the `nginx-gatewy` namespace and include a `class` label set to the GatewayClass of NGINX Gateway Fabric. For example, `nginx_gateway_fabric_event_batch_processing_milliseconds_sum{class="nginx"}`. +All these metrics are under the `nginx-gateway` namespace and include a `class` label set to the GatewayClass of NGINX Gateway Fabric. For example, `nginx_gateway_fabric_event_batch_processing_milliseconds_sum{class="nginx"}`. --- diff --git a/static/ngf/ngf-grafana-dashboard.json b/static/ngf/ngf-grafana-dashboard.json index 40758cdbc..05cfcc94b 100644 --- a/static/ngf/ngf-grafana-dashboard.json +++ b/static/ngf/ngf-grafana-dashboard.json @@ -19,7 +19,7 @@ "editable": true, "fiscalYearStartMonth": 0, "graphTooltip": 0, - "id": 2, + "id": 1, "links": [], "panels": [ { @@ -93,32 +93,7 @@ ] } }, - "overrides": [ - { - "__systemRef": "hideSeriesFrom", - "matcher": { - "id": "byNames", - "options": { - "mode": "exclude", - "names": [ - "up{app_kubernetes_io_instance=\"my-release\", app_kubernetes_io_name=\"nginx-gateway-fabric\", instance=\"10.244.0.6:9113\", job=\"kubernetes-pods\", namespace=\"nginx-gateway\", node=\"kind-control-plane\", pod=\"my-release-nginx-gateway-fabric-bb7bcd756-clzbt\", pod_template_hash=\"bb7bcd756\"}" - ], - "prefix": "All except:", - "readOnly": true - } - }, - "properties": [ - { - "id": "custom.hideFrom", - "value": { - "legend": false, - "tooltip": false, - "viz": true - } - } - ] - } - ] + "overrides": [] }, "gridPos": { "h": 8, @@ -645,7 +620,7 @@ { "current": { "text": "prometheus", - "value": "beerh65rwdji8d" + "value": "aeeumt3huyhogd" }, "includeAll": false, "label": "datasource", @@ -683,9 +658,7 @@ }, { "current": { - "text": [ - "All" - ], + "text": "All", "value": [ "$__all" ] From e81627530a2ee009294c9046542c1c8ec48359b3 Mon Sep 17 00:00:00 2001 From: Benjamin Jee Date: Tue, 4 Mar 2025 11:31:31 -0800 Subject: [PATCH 3/5] Another update to dashboard --- static/ngf/ngf-grafana-dashboard.json | 34 +++++++++++++++++---------- 1 file changed, 22 insertions(+), 12 deletions(-) diff --git a/static/ngf/ngf-grafana-dashboard.json b/static/ngf/ngf-grafana-dashboard.json index 05cfcc94b..9af953ce1 100644 --- a/static/ngf/ngf-grafana-dashboard.json +++ b/static/ngf/ngf-grafana-dashboard.json @@ -339,7 +339,7 @@ "disableTextWrap": false, "editorMode": "builder", "exemplar": false, - "expr": "nginx_http_connections_count{instance=~\"$nginx_instance\", nginx_connections_outcome=\"ACTIVE\"}", + "expr": "irate(nginx_http_connections_count{instance=~\"$nginx_instance\", nginx_connections_outcome=\"ACTIVE\"}[1m])", "fullMetaSearch": false, "includeNullMetadata": true, "instant": false, @@ -354,8 +354,8 @@ "uid": "${DS_PROMETHEUS}" }, "disableTextWrap": false, - "editorMode": "code", - "expr": "nginx_http_connections_count{instance=~\"$nginx_instance\", nginx_connections_outcome=\"READING\"}", + "editorMode": "builder", + "expr": "irate(nginx_http_connections_count{instance=~\"$nginx_instance\", nginx_connections_outcome=\"READING\"}[1m])", "fullMetaSearch": false, "hide": false, "includeNullMetadata": true, @@ -370,26 +370,34 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "editorMode": "code", - "expr": "nginx_http_connections_count{instance=~\"$nginx_instance\", nginx_connections_outcome=\"WAITING\"}", + "disableTextWrap": false, + "editorMode": "builder", + "expr": "irate(nginx_http_connections_count{instance=~\"$nginx_instance\", nginx_connections_outcome=\"WAITING\"}[1m])", + "fullMetaSearch": false, "hide": false, + "includeNullMetadata": true, "instant": false, "legendFormat": "{{instance}} waiting", "range": true, - "refId": "C" + "refId": "C", + "useBackend": false }, { "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "editorMode": "code", - "expr": "nginx_http_connections_count{instance=~\"$nginx_instance\", nginx_connections_outcome=\"WRITING\"}", + "disableTextWrap": false, + "editorMode": "builder", + "expr": "irate(nginx_http_connections_count{instance=~\"$nginx_instance\", nginx_connections_outcome=\"WRITING\"}[1m])", + "fullMetaSearch": false, "hide": false, + "includeNullMetadata": true, "instant": false, "legendFormat": "{{instance}} writing", "range": true, - "refId": "D" + "refId": "D", + "useBackend": false } ], "title": "NGINX Active Connections", @@ -658,7 +666,9 @@ }, { "current": { - "text": "All", + "text": [ + "All" + ], "value": [ "$__all" ] @@ -685,13 +695,13 @@ ] }, "time": { - "from": "now-15m", + "from": "now-5m", "to": "now" }, "timepicker": {}, "timezone": "", "title": "NGINX Gateway Fabric", "uid": "cdb1c6f6-7c77-4cee-a177-593f41364dbe", - "version": 4, + "version": 5, "weekStart": "" } \ No newline at end of file From 5ddb9048e86767ffffc17ac65680b1927dc11094 Mon Sep 17 00:00:00 2001 From: Benjamin Jee Date: Tue, 4 Mar 2025 14:10:40 -0800 Subject: [PATCH 4/5] Correct documentation on namespace of metrics --- content/ngf/how-to/monitoring/prometheus.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/content/ngf/how-to/monitoring/prometheus.md b/content/ngf/how-to/monitoring/prometheus.md index b1fee25e3..d530c6aae 100644 --- a/content/ngf/how-to/monitoring/prometheus.md +++ b/content/ngf/how-to/monitoring/prometheus.md @@ -122,8 +122,6 @@ Metrics given in NGINX Plus include those in NGINX OSS in addition to: - `nginx_ssl_certificate_verify_failures_certificates_total`: The total number of SSL certificate verification failures. - `nginx_ssl_handshakes_total`: The total number of SSL handshakes. -These metrics are available under the namespace where your NGINX Pods are deployed. - --- ### NGINX Gateway Fabric metrics @@ -132,7 +130,7 @@ Metrics specific to NGINX Gateway Fabric include: - `event_batch_processing_milliseconds`: Time in milliseconds to process batches of Kubernetes events. -All these metrics are under the `nginx-gateway` namespace and include a `class` label set to the GatewayClass of NGINX Gateway Fabric. For example, `nginx_gateway_fabric_event_batch_processing_milliseconds_sum{class="nginx"}`. +All these metrics are under the `nginx_gateway_fabric` namespace and include a `class` label set to the GatewayClass of NGINX Gateway Fabric. For example, `nginx_gateway_fabric_event_batch_processing_milliseconds_sum{class="nginx"}`. --- From e42310abe4397103fda3bdbdbedbe295d8c79a8b Mon Sep 17 00:00:00 2001 From: Benjamin Jee Date: Wed, 5 Mar 2025 10:37:32 -0800 Subject: [PATCH 5/5] Add feedback --- content/ngf/how-to/monitoring/prometheus.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/ngf/how-to/monitoring/prometheus.md b/content/ngf/how-to/monitoring/prometheus.md index d530c6aae..6ac0b5146 100644 --- a/content/ngf/how-to/monitoring/prometheus.md +++ b/content/ngf/how-to/monitoring/prometheus.md @@ -83,17 +83,17 @@ NGINX Gateway Fabric provides a variety of metrics for monitoring and analyzing ### NGINX/NGINX Plus metrics -NGINX metrics cover specific NGINX operations such as the total number of accepted client connections. These metrics are +NGINX metrics include NGINX-specific data such as the total number of accepted client connections. These metrics are collected through NGINX Agent and are reported by each NGINX Pod. NGINX Gateway Fabric currently supports a subset of all metrics available through NGINX OSS and Plus. Listed below are the supported metrics along with a small accompanying description. -Metrics given in NGINX OSS include: +Metrics provided by NGINX Open Source include: - `nginx_http_connections`: NGINX-wide statistics describing HTTP connections. - `nginx_http_requests`: The total number of client requests received from clients. -Metrics given in NGINX Plus include those in NGINX OSS in addition to: +In addition to the previous metrics provided by NGINX Open Source, NGINX Plus includes: - `nginx_config_reloads`: The total number of NGINX config reloads. - `nginx_http_response_status_responses_total`: The number of responses, grouped by status code range. - `nginx_http_request_discarded_requests_total`: The total number of requests completed without sending a response.