[Serve] Add objref resolution latency metric by vaishdho1 · Pull Request #62355 · ray-project/ray

vaishdho1 · 2026-04-05T20:44:19Z

Description

The PR adds a new Prometheus histogram metric serve_objref_resolution_latency_ms that tracks how long the Serve router spends resolving upstream DeploymentResponse arguments before a request enters the routing queue.
This gives visibility into resolution wait time that was previously hidden as part of fulfillment_time_ms .

Related issues

Fixes #62286

Additional information

Used the following reproduction script that compares two cases: one passing an unresolved DeploymentResponse as an argument and another passing a plain dict, to isolate the resolution overhead.

@serve.deployment(num_replicas=1)
class SlowDependency:
    async def __call__(self, request=None):
        await asyncio.sleep(2)
        return {"source": "upstream", "ts": time.time()}

@serve.deployment(num_replicas=1)
class DownstreamUnresolved:
    async def __call__(self, upstream_result):
        return {
            "downstream_received": upstream_result,
            "processed_at": time.time(),
        }


@serve.deployment(num_replicas=1)
class PipelineUnresolved:
    def __init__(self, upstream_handle, downstream_handle):
        self.upstream = upstream_handle
        self.downstream = downstream_handle

    async def __call__(self, request):
        upstream_resp = self.upstream.remote()
        return await self.downstream.remote(upstream_resp)

@serve.deployment(num_replicas=1)
class DownstreamPreresolved:
    async def __call__(self, upstream_result):
        return {
            "downstream_received": upstream_result,
            "processed_at": time.time(),
        }


@serve.deployment(num_replicas=1)
class PipelinePreresolved:
    def __init__(self, upstream_handle, downstream_handle):
        self.upstream = upstream_handle
        self.downstream = downstream_handle

    async def __call__(self, request):
        upstream_result = await self.upstream.remote()
        return await self.downstream.remote(upstream_result)

up_a = SlowDependency.bind()
down_a = DownstreamUnresolved.bind()
pipe_a = PipelineUnresolved.bind(up_a, down_a)
serve.run(pipe_a, name="pipeline_unresolved", route_prefix="/pipeline-unresolved")

up_b = SlowDependency.bind()
down_b = DownstreamPreresolved.bind()
pipe_b = PipelinePreresolved.bind(up_b, down_b)
serve.run(pipe_b, name="pipeline_preresolved", route_prefix="/pipeline-preresolved")

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

gemini-code-assist

Code Review

This pull request introduces a new metric, serve_objref_resolution_latency_ms, to track the time spent resolving upstream ObjectRef or DeploymentResponse arguments in the Ray Serve router. The changes include the metric initialization, measurement logic within the routing process, and a corresponding test case. Feedback suggests using time.monotonic() instead of time.time() for more reliable duration measurements, as the latter is susceptible to system clock adjustments.

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

harshit-anyscale · 2026-04-06T16:23:39Z

cc: @abrarsheikh for further review and merge

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor · 2026-04-08T16:47:26Z

+                resolution_start = time.monotonic()
                await self._resolve_request_arguments(pr)
+                resolution_ms = (time.monotonic() - resolution_start) * 1000
+                self._objref_resolution_latency_ms.observe(resolution_ms)


Metric emitted for all requests, not just objref resolutions

Low Severity

The serve_objref_resolution_latency_ms metric is observed for every request where pr.resolved is False, which is the default for all new PendingRequest instances. This includes requests with plain arguments (no DeploymentResponse or ObjectRef). For these, _resolve_request_arguments iterates args, finds nothing to resolve, and returns almost instantly — but the metric still records a near-zero value. This dilutes the histogram with noise, making percentile calculations misleading when a deployment receives a mix of request types. The metric name and description claim it tracks "resolving upstream ObjectRef or DeploymentResponse arguments," so it would be more accurate to only observe it when actual resolution work occurred.

^{Reviewed by Cursor Bugbot for commit ad96b6c. Configure here.}

abrarsheikh

update monitoring.md file

abrarsheikh · 2026-04-11T07:13:03Z

        )

+        self._objref_resolution_latency_ms = metrics.Histogram(
+            "serve_objref_resolution_latency_ms",


Suggested change

"serve_objref_resolution_latency_ms",

"serve_router_args_resolution_latency_ms",

Fixed in a2a7a06

abrarsheikh · 2026-04-11T07:14:28Z

+                "deployment": deployment_id.name,
+                "application": deployment_id.app_name,
+                "handle": handle_id,
+                "actor_id": self_actor_id if self_actor_id else "",


self_actor_id cannot be None right?

Thats right. It is always a string. Fixed in a2a7a06

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{Reviewed by Cursor Bugbot for commit ab8b5e6. Configure here.}

cursor · 2026-04-11T20:40:46Z

+                "handle": handle_id,
+                "actor_id": self_actor_id,
+            }
+        )


Variable name inconsistent with renamed metric name

Low Severity

The attribute _objref_resolution_latency_ms was not renamed to match the metric name serve_router_args_resolution_latency_ms after the metric was renamed per reviewer feedback. The reviewer suggested changing from serve_objref_resolution_latency_ms to serve_router_args_resolution_latency_ms, and the metric name was updated, but the Python attribute name still uses the old "objref" terminology. This inconsistency between the internal variable name and the exported metric name can confuse future maintainers trying to grep for or understand the metric.

Additional Locations (1)

python/ray/serve/_private/router.py#L948-L949

^{Reviewed by Cursor Bugbot for commit ab8b5e6. Configure here.}

[Serve] Add objref resolution latency metric

5444a6c

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

vaishdho1 requested a review from a team as a code owner April 5, 2026 20:44

gemini-code-assist bot reviewed Apr 5, 2026

View reviewed changes

Comment thread python/ray/serve/_private/router.py Outdated

ray-gardener bot added serve Ray Serve Related Issue observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling community-contribution Contributed by the community labels Apr 6, 2026

harshit-anyscale reviewed Apr 6, 2026

View reviewed changes

Comment thread python/ray/serve/tests/test_metrics.py

[Serve] Updated tests and modified timing mechanism

c8db071

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

harshit-anyscale added the go add ONLY when ready to merge, run all tests label Apr 6, 2026

harshit-anyscale approved these changes Apr 6, 2026

View reviewed changes

[serve] Fix objref_resolution_metric value test timing error

ad96b6c

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor bot reviewed Apr 8, 2026

View reviewed changes

abrarsheikh reviewed Apr 11, 2026

View reviewed changes

[Serve] Removed redundant check and updated name of metric

a2a7a06

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor bot reviewed Apr 11, 2026

View reviewed changes

Comment thread python/ray/serve/_private/router.py Outdated

vaishdho1 added 2 commits April 11, 2026 12:37

[Serve] Cleaned up comments

2f4d8cb

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

[Serve] Updated monitoring.md

ab8b5e6

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor bot reviewed Apr 11, 2026

View reviewed changes

Merge branch 'master' into objref-res-latency-metric

7147d90

abrarsheikh approved these changes Apr 17, 2026

View reviewed changes

abrarsheikh merged commit cda1a2e into ray-project:master Apr 17, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serve] Add objref resolution latency metric#62355

[Serve] Add objref resolution latency metric#62355
abrarsheikh merged 7 commits intoray-project:masterfrom
vaishdho1:objref-res-latency-metric

vaishdho1 commented Apr 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

harshit-anyscale commented Apr 6, 2026

Uh oh!

cursor bot Apr 8, 2026

Uh oh!

abrarsheikh left a comment

Uh oh!

abrarsheikh Apr 11, 2026

Uh oh!

vaishdho1 Apr 11, 2026

Uh oh!

abrarsheikh Apr 11, 2026

Uh oh!

vaishdho1 Apr 11, 2026

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	"serve_objref_resolution_latency_ms",
	"serve_router_args_resolution_latency_ms",

Conversation

vaishdho1 commented Apr 5, 2026

Description

Related issues

Additional information

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

harshit-anyscale commented Apr 6, 2026

Uh oh!

cursor bot Apr 8, 2026

Choose a reason for hiding this comment

Metric emitted for all requests, not just objref resolutions

Uh oh!

abrarsheikh left a comment

Choose a reason for hiding this comment

Uh oh!

abrarsheikh Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

vaishdho1 Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

abrarsheikh Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

vaishdho1 Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 11, 2026

Choose a reason for hiding this comment

Variable name inconsistent with renamed metric name

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants