[Serve] Add objref resolution latency metric#62355
[Serve] Add objref resolution latency metric#62355abrarsheikh merged 7 commits intoray-project:masterfrom
Conversation
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a new metric, serve_objref_resolution_latency_ms, to track the time spent resolving upstream ObjectRef or DeploymentResponse arguments in the Ray Serve router. The changes include the metric initialization, measurement logic within the routing process, and a corresponding test case. Feedback suggests using time.monotonic() instead of time.time() for more reliable duration measurements, as the latter is susceptible to system clock adjustments.
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
|
cc: @abrarsheikh for further review and merge |
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
| resolution_start = time.monotonic() | ||
| await self._resolve_request_arguments(pr) | ||
| resolution_ms = (time.monotonic() - resolution_start) * 1000 | ||
| self._objref_resolution_latency_ms.observe(resolution_ms) |
There was a problem hiding this comment.
Metric emitted for all requests, not just objref resolutions
Low Severity
The serve_objref_resolution_latency_ms metric is observed for every request where pr.resolved is False, which is the default for all new PendingRequest instances. This includes requests with plain arguments (no DeploymentResponse or ObjectRef). For these, _resolve_request_arguments iterates args, finds nothing to resolve, and returns almost instantly — but the metric still records a near-zero value. This dilutes the histogram with noise, making percentile calculations misleading when a deployment receives a mix of request types. The metric name and description claim it tracks "resolving upstream ObjectRef or DeploymentResponse arguments," so it would be more accurate to only observe it when actual resolution work occurred.
Reviewed by Cursor Bugbot for commit ad96b6c. Configure here.
abrarsheikh
left a comment
There was a problem hiding this comment.
update monitoring.md file
| ) | ||
|
|
||
| self._objref_resolution_latency_ms = metrics.Histogram( | ||
| "serve_objref_resolution_latency_ms", |
There was a problem hiding this comment.
| "serve_objref_resolution_latency_ms", | |
| "serve_router_args_resolution_latency_ms", |
| "deployment": deployment_id.name, | ||
| "application": deployment_id.app_name, | ||
| "handle": handle_id, | ||
| "actor_id": self_actor_id if self_actor_id else "", |
There was a problem hiding this comment.
self_actor_id cannot be None right?
There was a problem hiding this comment.
Thats right. It is always a string. Fixed in a2a7a06
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Reviewed by Cursor Bugbot for commit ab8b5e6. Configure here.
| "handle": handle_id, | ||
| "actor_id": self_actor_id, | ||
| } | ||
| ) |
There was a problem hiding this comment.
Variable name inconsistent with renamed metric name
Low Severity
The attribute _objref_resolution_latency_ms was not renamed to match the metric name serve_router_args_resolution_latency_ms after the metric was renamed per reviewer feedback. The reviewer suggested changing from serve_objref_resolution_latency_ms to serve_router_args_resolution_latency_ms, and the metric name was updated, but the Python attribute name still uses the old "objref" terminology. This inconsistency between the internal variable name and the exported metric name can confuse future maintainers trying to grep for or understand the metric.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit ab8b5e6. Configure here.


Description
The PR adds a new Prometheus histogram metric
serve_objref_resolution_latency_msthat tracks how long the Serve router spends resolving upstream DeploymentResponse arguments before a request enters the routing queue.This gives visibility into resolution wait time that was previously hidden as part of
fulfillment_time_ms.Related issues
Fixes #62286
Additional information
Used the following reproduction script that compares two cases: one passing an unresolved DeploymentResponse as an argument and another passing a plain dict, to isolate the resolution overhead.