feat: Prometheus dashboards — Phase 1 (12 dashboards + 3 new collectors)#58
Merged
Merged
Conversation
Add three new collectors and register them in sql_exporter.yml under two new jobs (mssql_msdb, mssql_xevent): - mssql_sqlagent_jobs: per-job enabled / last_run_outcome / last_run_duration_seconds / last_run_end_time_utc / next_run_time_utc / is_running / step_failures_last_24h, sourced from msdb.dbo.sysjobs/sysjobhistory/sysjobactivity/sysjobschedules. - mssql_backup_history: per-(database, backup_type) last_time_utc / last_duration_seconds / last_size_bytes / last_compressed_size_bytes / age_seconds / count_last_24h, sourced from msdb.dbo.backupset. - mssql_xevent: aggregated events_count / cpu_time_ms_sum / duration_seconds_sum / logical_reads_sum / physical_reads_sum / writes_sum per (event_name, database_name, result, client_app_name) over the most recent 5 minutes. Caps at TOP 500 to bound cardinality. Guarded with an existence check on DBA.dbo.xevent_metrics so the collector is a no-op on instances that don't run the XEvent collector proc.
Port 12 SQLMonitor Grafana dashboards to Prometheus under
sql_exporter/Prometheus-Dashboards/. Each dashboard is generated from a
small Python spec; every spec produces a *.json that imports directly
into Grafana (schemaVersion 42, __inputs-bound DS_PROMETHEUS).
Dashboards (UID / data panels / text-link panels):
prom_core_metrics_trend 9 / 0
prom_wait_stats 4 / 0
prom_disk_space 5 / 0
prom_ag_health_state 3 / 0
prom_sql_agent_jobs 6 / 0
prom_backup_history 6 / 0
prom_xevent_trend 4 / 0
prom_database_file_io_stats 12 / 0
prom_dba_inventory 6 / 8
prom_monitoring_live_all_servers 15 / 6
prom_monitoring_live_distributed 52 / 6
prom_monitoring_perfmon_quest 51 / 4
Helper library (_lib/prom_dashboard.py, _lib/build.py):
- Panel, Target, query_var, custom_var, constant_var dataclasses.
- row() and legacy_link_panel() helpers.
- build_dashboard(): per-panel-type options, thresholds, transforms.
- write_dashboard(): JSON serialization.
Specs (_specs/*.py) use high-fidelity PromQL patterns:
- increase(metric[$__range]) for selective-duration deltas.
- @ end() offset $__range for prior-window comparison tables.
- quantile_over_time($percentile_q, (expr)[$trend_window:]) for
percentile trends.
- topk($top_n, sum by (…) (…)) for bounded series rendering.
Panels that require the SQLMonitor inventory DB (alert history,
AG-vs-nonAG backup split, LAMA config-change deltas, dm_os_memory_clerks
snapshot, tempdb/log_space cache tables, sql_server_patching) use
legacy_link_panel(...) to markdown-link back to the SQL-backed
dashboard, keeping every source section accounted for.
Developer tooling (_tools/):
- inspect_panels.py: source dashboard panel inventory.
- validate.py: structural JSON + target/expr sanity check; all 12
generated dashboards validate clean.
- docs/prometheus.md: add rows for mssql_sqlagent_jobs, mssql_backup_history, mssql_xevent to the collectors table; add a 'Prometheus-backed dashboard pack' section with the 12 Phase 1 dashboards and regeneration commands. - sql_exporter/README-sql_exporter.md: add a 'Collectors' table describing every mssql_*.collector.yml file, its job binding, scrape interval and the metric prefix it publishes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 1 — Prometheus-backed Grafana dashboards
Ports 12 SQLMonitor dashboards to a new
sql_exporter/Prometheus-Dashboards/folder, and adds 3 newsql_exportercollectors to cover the metrics those dashboards depend on.New sql_exporter collectors (3)
mssql_sqlagent_jobs.collector.ymlmssql_backup_history.collector.ymlmssql_xevent.collector.ymlRegistered in
sql_exporter.ymlunder two new jobs:mssql_msdb,mssql_xevent.Generated dashboards (12)
Each dashboard is built from a small Python spec (
_specs/<name>.py) viagenerate.py.prom_core_metrics_trendprom_wait_statsprom_disk_spaceprom_ag_health_stateprom_sql_agent_jobsprom_backup_historyprom_xevent_trendprom_database_file_io_statsprom_dba_inventoryprom_monitoring_live_all_serversprom_monitoring_live_distributedprom_monitoring_perfmon_questTotals: 173 Prometheus-backed data panels + 24
legacy_link_panel(...)deep-link tiles for panels that require the SQLMonitor inventory DB (alert history, AG-vs-nonAG backup split, LAMA config-change deltas, dm_os_memory_clerks snapshot, tempdb/log_space cache tables, sql_server_patching).High-fidelity PromQL patterns
increase(metric[$__range])— selective-duration deltas (File IO, Wait Stats).@ end() offset $__range— prior-window comparison tables.quantile_over_time($percentile_q, (expr)[$trend_window:])— Core Metrics - Trend percentile trends.topk($top_n, sum by (…) (…))— XEvent / wait-type / memory-consumer trends.time() - timestamp(up == 1)— data-collection-issue detection.Regeneration & validation
All 12 dashboards build clean and pass validation with no warnings.
Next phases
sql_exporter/README-sql_exporter.mdrefresh +docs/deployment/prometheus.mdcross-links.sqlmonitor/AgHost-1A/AgHost-1Band verify metrics onhttps://prometheus.ajaydwivedi.com.Pull Request opened by Augment Code with guidance from the PR author