Skip to content

[Serve] Add objref resolution latency metric#62355

Merged
abrarsheikh merged 7 commits intoray-project:masterfrom
vaishdho1:objref-res-latency-metric
Apr 17, 2026
Merged

[Serve] Add objref resolution latency metric#62355
abrarsheikh merged 7 commits intoray-project:masterfrom
vaishdho1:objref-res-latency-metric

Conversation

@vaishdho1
Copy link
Copy Markdown
Contributor

Description

The PR adds a new Prometheus histogram metric serve_objref_resolution_latency_ms that tracks how long the Serve router spends resolving upstream DeploymentResponse arguments before a request enters the routing queue.
This gives visibility into resolution wait time that was previously hidden as part of fulfillment_time_ms .

Related issues

Fixes #62286

Additional information

Used the following reproduction script that compares two cases: one passing an unresolved DeploymentResponse as an argument and another passing a plain dict, to isolate the resolution overhead.

@serve.deployment(num_replicas=1)
class SlowDependency:
    async def __call__(self, request=None):
        await asyncio.sleep(2)
        return {"source": "upstream", "ts": time.time()}

@serve.deployment(num_replicas=1)
class DownstreamUnresolved:
    async def __call__(self, upstream_result):
        return {
            "downstream_received": upstream_result,
            "processed_at": time.time(),
        }


@serve.deployment(num_replicas=1)
class PipelineUnresolved:
    def __init__(self, upstream_handle, downstream_handle):
        self.upstream = upstream_handle
        self.downstream = downstream_handle

    async def __call__(self, request):
        upstream_resp = self.upstream.remote()
        return await self.downstream.remote(upstream_resp)

@serve.deployment(num_replicas=1)
class DownstreamPreresolved:
    async def __call__(self, upstream_result):
        return {
            "downstream_received": upstream_result,
            "processed_at": time.time(),
        }


@serve.deployment(num_replicas=1)
class PipelinePreresolved:
    def __init__(self, upstream_handle, downstream_handle):
        self.upstream = upstream_handle
        self.downstream = downstream_handle

    async def __call__(self, request):
        upstream_result = await self.upstream.remote()
        return await self.downstream.remote(upstream_result)

up_a = SlowDependency.bind()
down_a = DownstreamUnresolved.bind()
pipe_a = PipelineUnresolved.bind(up_a, down_a)
serve.run(pipe_a, name="pipeline_unresolved", route_prefix="/pipeline-unresolved")

up_b = SlowDependency.bind()
down_b = DownstreamPreresolved.bind()
pipe_b = PipelinePreresolved.bind(up_b, down_b)
serve.run(pipe_b, name="pipeline_preresolved", route_prefix="/pipeline-preresolved")

image

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
@vaishdho1 vaishdho1 requested a review from a team as a code owner April 5, 2026 20:44
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new metric, serve_objref_resolution_latency_ms, to track the time spent resolving upstream ObjectRef or DeploymentResponse arguments in the Ray Serve router. The changes include the metric initialization, measurement logic within the routing process, and a corresponding test case. Feedback suggests using time.monotonic() instead of time.time() for more reliable duration measurements, as the latter is susceptible to system clock adjustments.

Comment thread python/ray/serve/_private/router.py Outdated
@ray-gardener ray-gardener bot added serve Ray Serve Related Issue observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling community-contribution Contributed by the community labels Apr 6, 2026
Comment thread python/ray/serve/tests/test_metrics.py
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
@harshit-anyscale harshit-anyscale added the go add ONLY when ready to merge, run all tests label Apr 6, 2026
@harshit-anyscale
Copy link
Copy Markdown
Contributor

cc: @abrarsheikh for further review and merge

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
resolution_start = time.monotonic()
await self._resolve_request_arguments(pr)
resolution_ms = (time.monotonic() - resolution_start) * 1000
self._objref_resolution_latency_ms.observe(resolution_ms)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metric emitted for all requests, not just objref resolutions

Low Severity

The serve_objref_resolution_latency_ms metric is observed for every request where pr.resolved is False, which is the default for all new PendingRequest instances. This includes requests with plain arguments (no DeploymentResponse or ObjectRef). For these, _resolve_request_arguments iterates args, finds nothing to resolve, and returns almost instantly — but the metric still records a near-zero value. This dilutes the histogram with noise, making percentile calculations misleading when a deployment receives a mix of request types. The metric name and description claim it tracks "resolving upstream ObjectRef or DeploymentResponse arguments," so it would be more accurate to only observe it when actual resolution work occurred.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ad96b6c. Configure here.

Copy link
Copy Markdown
Contributor

@abrarsheikh abrarsheikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update monitoring.md file

Comment thread python/ray/serve/_private/router.py Outdated
)

self._objref_resolution_latency_ms = metrics.Histogram(
"serve_objref_resolution_latency_ms",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"serve_objref_resolution_latency_ms",
"serve_router_args_resolution_latency_ms",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a2a7a06

Comment thread python/ray/serve/_private/router.py Outdated
"deployment": deployment_id.name,
"application": deployment_id.app_name,
"handle": handle_id,
"actor_id": self_actor_id if self_actor_id else "",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self_actor_id cannot be None right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats right. It is always a string. Fixed in a2a7a06

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Comment thread python/ray/serve/_private/router.py Outdated
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Reviewed by Cursor Bugbot for commit ab8b5e6. Configure here.

"handle": handle_id,
"actor_id": self_actor_id,
}
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable name inconsistent with renamed metric name

Low Severity

The attribute _objref_resolution_latency_ms was not renamed to match the metric name serve_router_args_resolution_latency_ms after the metric was renamed per reviewer feedback. The reviewer suggested changing from serve_objref_resolution_latency_ms to serve_router_args_resolution_latency_ms, and the metric name was updated, but the Python attribute name still uses the old "objref" terminology. This inconsistency between the internal variable name and the exported metric name can confuse future maintainers trying to grep for or understand the metric.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ab8b5e6. Configure here.

@abrarsheikh abrarsheikh merged commit cda1a2e into ray-project:master Apr 17, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community go add ONLY when ready to merge, run all tests observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Serve] add metric for object ref resolution latency

3 participants