fix(BA-5768): Add Prometheus relabel for model-service metrics#11170
Open
seedspirit wants to merge 5 commits intomainfrom
Open
fix(BA-5768): Add Prometheus relabel for model-service metrics#11170seedspirit wants to merge 5 commits intomainfrom
seedspirit wants to merge 5 commits intomainfrom
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an optional Prometheus relabel rule to rewrite model-services HTTP-SD targets to a host-accessible address (to handle Docker-internal/loopback targets), wires the behavior into the pyinfra Prometheus dashboard configuration via kernel_metrics_host, and introduces a component test to validate end-to-end scraping through the rewrite.
Changes:
- Add
relabel_configsfor thehttp-sdjob to rewritemodel-services__address__to a configured host (kernel_metrics_host). - Plumb
kernel_metrics_hostthrough pyinfra’s Prometheus dashboard config and template rendering. - Add a new component test that spins up Prometheus + mock SD/metrics endpoints and verifies scrape success after relabeling.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
tests/component/common/clients/prometheus/test_sd_relabel.py |
New component test covering HTTP-SD + relabel rewrite + scrape verification. |
src/ai/backend/install/pyinfra/deploy/monitor/dashboard/prometheus/templates/prometheus.yml.j2 |
Adds conditional relabel rule to rewrite model-service targets when configured. |
src/ai/backend/install/pyinfra/deploy/monitor/dashboard/prometheus/deploy.py |
Passes kernel_metrics_host into the Jinja template context. |
src/ai/backend/install/pyinfra/configs/dashboard.py |
Adds the kernel_metrics_host Prometheus dashboard setting. |
configs/prometheus/prometheus.yaml |
Updates halfstack Prometheus config to include the relabel rewrite rule. |
changes/11170.fix.md |
Changelog entry for the scraping fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+205
to
+207
| max_attempts = 15 | ||
| result: PrometheusResponse | None = None | ||
|
|
Comment on lines
+205
to
+210
| max_attempts = 15 | ||
| result: PrometheusResponse | None = None | ||
|
|
||
| for _ in range(max_attempts): | ||
| time.sleep(2) | ||
| result = await prometheus_client_with_relabel.query_instant(up_model_service_preset) |
Comment on lines
+137
to
+146
| container = ( | ||
| DockerContainer("prom/prometheus:v2.53.0") | ||
| .with_name(f"test--prom-relabel-slot-{get_parallel_slot()}-{random_id}") | ||
| .with_exposed_ports(9090) | ||
| .with_volume_mapping( | ||
| str(prometheus_config_with_relabel), | ||
| "/etc/prometheus/prometheus.yml", | ||
| mode="ro", | ||
| ) | ||
| .with_kwargs( |
| ) | ||
|
|
||
|
|
||
| class TestKernelMetricsScrapeWithRelabel: |
| # Jinja2 context (resolve host.docker.internal to actual host IP) | ||
| http_sd_host=self.resolve_host(self.config.http_sd_host), | ||
| http_sd_port=self.config.http_sd_port, | ||
| kernel_metrics_host=self.config.kernel_metrics_host, |
Comment on lines
+31
to
+38
| {%- if kernel_metrics_host %} | ||
| relabel_configs: | ||
| # Rewrite model-service targets from Docker-internal IPs to host-accessible address | ||
| - source_labels: [service_group, __address__] | ||
| separator: ; | ||
| regex: model-services;[^:]+:(.+) | ||
| target_label: __address__ | ||
| replacement: {{ kernel_metrics_host }}:${1} |
Comment on lines
+28
to
+31
| "Host-accessible address for kernel metrics scraping. " | ||
| "When set, model-service targets returned by HTTP SD will have their " | ||
| "Docker-internal IPs rewritten to this address via relabel_configs. " | ||
| "Leave empty to disable rewriting (use when kernel IPs are already routable)." |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
resolve #11169 (BA-5768)
Summary
model-servicesscrape targets to a host-accessible addresskernel_metrics_hostWhy
Model-service targets returned by HTTP service discovery may use Docker-internal or loopback addresses that are not directly reachable from the Prometheus container. This change adds an optional host rewrite so Prometheus can still scrape the service metrics path.
Validation
tests/component/common/clients/prometheus/test_sd_relabel.py