-
Notifications
You must be signed in to change notification settings - Fork 113
[tc][fc][dvc] Add client availability/latency metrics in Otel #1689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lluwm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @m-nagarajan! The change looks good to me. Just a few minor comments.
clients/venice-thin-client/src/main/java/com/linkedin/venice/client/stats/BasicClientStats.java
Outdated
Show resolved
Hide resolved
...da-vinci-client/src/main/java/com/linkedin/davinci/client/StatsAvroGenericDaVinciClient.java
Outdated
Show resolved
Hide resolved
...da-vinci-client/src/main/java/com/linkedin/davinci/client/StatsAvroGenericDaVinciClient.java
Outdated
Show resolved
Hide resolved
.../venice-thin-client/src/test/java/com/linkedin/venice/client/stats/BasicClientStatsTest.java
Outdated
Show resolved
Hide resolved
m-nagarajan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lluwm for the review. Addressed the comments.
.../venice-thin-client/src/test/java/com/linkedin/venice/client/stats/BasicClientStatsTest.java
Outdated
Show resolved
Hide resolved
...da-vinci-client/src/main/java/com/linkedin/davinci/client/StatsAvroGenericDaVinciClient.java
Outdated
Show resolved
Hide resolved
.../venice-client/src/main/java/com/linkedin/venice/fastclient/StatsAvroGenericStoreClient.java
Show resolved
Hide resolved
|
Valid tags are those of the deployables and user libraries affected by the change (i.e. including all those calling into the modified venice-common code). Thanks. |
gaojieliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall and left some minor comments.
clients/venice-thin-client/src/main/java/com/linkedin/venice/client/stats/BasicClientStats.java
Show resolved
Hide resolved
clients/venice-thin-client/src/main/java/com/linkedin/venice/client/stats/BasicClientStats.java
Outdated
Show resolved
Hide resolved
...da-vinci-client/src/main/java/com/linkedin/davinci/client/StatsAvroGenericDaVinciClient.java
Show resolved
Hide resolved
...nice-thin-client/src/main/java/com/linkedin/venice/client/store/StatTrackingStoreClient.java
Show resolved
Hide resolved
m-nagarajan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gaojieliu for the review. Left some replies to your comments.
...da-vinci-client/src/main/java/com/linkedin/davinci/client/StatsAvroGenericDaVinciClient.java
Show resolved
Hide resolved
clients/venice-thin-client/src/main/java/com/linkedin/venice/client/stats/BasicClientStats.java
Show resolved
Hide resolved
...nice-thin-client/src/main/java/com/linkedin/venice/client/store/StatTrackingStoreClient.java
Show resolved
Hide resolved
...da-vinci-client/src/main/java/com/linkedin/davinci/client/StatsAvroGenericDaVinciClient.java
Outdated
Show resolved
Hide resolved
clients/venice-thin-client/src/main/java/com/linkedin/venice/client/stats/BasicClientStats.java
Outdated
Show resolved
Hide resolved
...inci-client/src/test/java/com/linkedin/davinci/client/StatsAvroGenericDaVinciClientTest.java
Outdated
Show resolved
Hide resolved
lluwm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks @m-nagarajan !
…in#1689) Add client availability/latency metrics in Otel on top of the existing tehuti metrics under existing Otel configs in VeniceMetricsConfig: name of the metrics be venice.thin_client.<> , venice.fast_client.<>, venice.davinci_client.<>, etc - to keep the metrics of each clients isolated and not let some clients like davinci which emits a lot of metrics (including ingestion metrics) reduce the searchability aspect of metrics emitted by other clients which emits a small amount of metrics, - to keep all metrics of the same name emit the same set of dimensions to keep data aggregated via the pre-aggregates in the underlying metrics processing frameworks to be consistent The below metrics are added - venice.thin_client.call_count - venice.thin_client.call_time - venice.fast_client.call_count - venice.fast_client.call_time - venice.davinci_client.call_count - venice.davinci_client.call_time All these metrics are currently in a single class BasicClientStats.java, this PR resuses this class by also adding unhealthy_ latency_metrics to it (from ClientStats.java). This means DVC will be emitting this metric as well from now on. Other existing tehuti metric like request, success_request_ratio will be derived metrics in OTel based by aggregating based on the dimension venice.response.status_code_category (success/fail) Added setOtelAdditionalMetricsReader in VeniceMetricsConfig to be able to pass inInMemoryMetricReader from io.opentelemetry:opentelemetry-sdk-testing to be able to test the otel metrics.
Problem Statement
Add client availability/latency metrics in Otel on top of the existing tehuti metrics under existing Otel configs in
VeniceMetricsConfigSolution
2 solutions can be considered:
venice.client.<>and add a dimensionvenice.client.typewith valuesthin_client,fast_client,da_vinci_client, etc. This requires a metric with the same name across clients (eg:venice.client.call_count) to have the same set of dimensions.venice.thin_client.<>,venice.fast_client.<>,venice.davinci_client.<>, etc with no new dimension for the type.This PR follows the second approach to
The below metrics are added
All these metrics are currently in a single class
BasicClientStats.java, this PR resuses this class by also addingunhealthy_ latency_metricsto it (fromClientStats.java). This means DVC will be emitting this metric as well from now on.Other existing tehuti metric like
request,success_request_ratiowill be derived metrics in OTel based by aggregating based on the dimensionvenice.response.status_code_category(success/fail)Added
setOtelAdditionalMetricsReaderinVeniceMetricsConfigto be able to pass inInMemoryMetricReaderfromio.opentelemetry:opentelemetry-sdk-testingto be able to test the otel metrics.Code changes
Concurrency-Specific Checks
Both reviewer and PR author to verify
synchronized,RWLock) are used where needed.ConcurrentHashMap,CopyOnWriteArrayList).How was this PR tested?
TBD
Does this PR introduce any user-facing or breaking changes?
Clients will start emitting these new metrics in OTel if enabled