fix(gax): record fractional latency metrics#12979
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates MetricsTracer to support fractional millisecond latency reporting by calculating elapsed time from nanoseconds instead of using the millisecond precision provided by the Stopwatch class. It also introduces Ticker injection to improve testability, allowing for more precise unit tests using FakeTicker. A review comment suggests replacing the magic number 1,000,000.0 with TimeUnit constants to improve code readability and self-documentation.
I am having trouble creating individual review comments. Click here to see my feedback.
sdk-platform-java/gax-java/gax/src/main/java/com/google/api/gax/tracing/MetricsTracer.java (246)
To improve readability and avoid using a magic number, consider using the TimeUnit class for the conversion. This makes the code more self-documenting about the units involved.
return stopwatch.elapsed(TimeUnit.NANOSECONDS) / (double) TimeUnit.MILLISECONDS.toNanos(1);
|
|


Summary
Record GAX operation and attempt latency metrics with fractional millisecond precision instead of truncating to whole milliseconds.
Previously
MetricsTracerused:This floors sub-millisecond precision before the value is recorded into the histogram. For low-latency RPCs, this can make Cloud Monitoring percentiles look significantly lower than application-observed latency.
For example, a Spanner customer investigation showed:
The apparent ~800us P50 gap looked like extra client-side latency in Spanner, but raw metric prints showed values being recorded as whole numbers only (3.0, 4.0, 7.0). The sub-ms precision was lost before histogram bucketization, so this was a rounding/truncation artifact rather than actual Spanner client overhead.
Fix
Use nanosecond elapsed time and convert to fractional milliseconds:
stopwatch.elapsed(TimeUnit.NANOSECONDS) / 1_000_000.0
Applied to: