[COR-177] Latency Fixes #72

gunnarsundberg · 2022-11-17T22:32:52Z

Fixes a couple of issues from initial latency PR.

Latency Units

Latencies were exported in microseconds, so the values fell outside of all buckets for histogram. Latencies are still recorded in microseconds, but exported in float64 milliseconds.

Example of current data from prod:

# HELP latency_connector_histogram Latency from connector reception to kafka produce
# TYPE latency_connector_histogram histogram
latency_connector_histogram_bucket{Connector="sushiswap",Env="prod",job="streamserver",le="0"} 0
latency_connector_histogram_bucket{Connector="sushiswap",Env="prod",job="streamserver",le="5"} 0
latency_connector_histogram_bucket{Connector="sushiswap",Env="prod",job="streamserver",le="10"} 0
latency_connector_histogram_bucket{Connector="sushiswap",Env="prod",job="streamserver",le="25"} 0
latency_connector_histogram_bucket{Connector="sushiswap",Env="prod",job="streamserver",le="50"} 0
latency_connector_histogram_bucket{Connector="sushiswap",Env="prod",job="streamserver",le="75"} 0
latency_connector_histogram_bucket{Connector="sushiswap",Env="prod",job="streamserver",le="100"} 0
latency_connector_histogram_bucket{Connector="sushiswap",Env="prod",job="streamserver",le="250"} 0
latency_connector_histogram_bucket{Connector="sushiswap",Env="prod",job="streamserver",le="500"} 0
latency_connector_histogram_bucket{Connector="sushiswap",Env="prod",job="streamserver",le="750"} 0
latency_connector_histogram_bucket{Connector="sushiswap",Env="prod",job="streamserver",le="1000"} 0
latency_connector_histogram_bucket{Connector="sushiswap",Env="prod",job="streamserver",le="2500"} 0
latency_connector_histogram_bucket{Connector="sushiswap",Env="prod",job="streamserver",le="5000"} 0
latency_connector_histogram_bucket{Connector="sushiswap",Env="prod",job="streamserver",le="7500"} 0
latency_connector_histogram_bucket{Connector="sushiswap",Env="prod",job="streamserver",le="10000"} 0
latency_connector_histogram_bucket{Connector="sushiswap",Env="prod",job="streamserver",le="+Inf"} 54470
latency_connector_histogram_sum{Connector="sushiswap",Env="prod",job="streamserver"} 4.2127170146e+10

Example from local testing with updated units:

# HELP latency_connector_histogram Latency from connector reception to kafka produce
# TYPE latency_connector_histogram histogram
latency_connector_histogram_bucket{Connector="streamserver",Env="staging",job="streamserver",le="0"} 0
latency_connector_histogram_bucket{Connector="streamserver",Env="staging",job="streamserver",le="5"} 0
latency_connector_histogram_bucket{Connector="streamserver",Env="staging",job="streamserver",le="10"} 0
latency_connector_histogram_bucket{Connector="streamserver",Env="staging",job="streamserver",le="25"} 0
latency_connector_histogram_bucket{Connector="streamserver",Env="staging",job="streamserver",le="50"} 0
latency_connector_histogram_bucket{Connector="streamserver",Env="staging",job="streamserver",le="75"} 0
latency_connector_histogram_bucket{Connector="streamserver",Env="staging",job="streamserver",le="100"} 0
latency_connector_histogram_bucket{Connector="streamserver",Env="staging",job="streamserver",le="250"} 0
latency_connector_histogram_bucket{Connector="streamserver",Env="staging",job="streamserver",le="500"} 0
latency_connector_histogram_bucket{Connector="streamserver",Env="staging",job="streamserver",le="750"} 0
latency_connector_histogram_bucket{Connector="streamserver",Env="staging",job="streamserver",le="1000"} 18
latency_connector_histogram_bucket{Connector="streamserver",Env="staging",job="streamserver",le="2500"} 31
latency_connector_histogram_bucket{Connector="streamserver",Env="staging",job="streamserver",le="5000"} 41
latency_connector_histogram_bucket{Connector="streamserver",Env="staging",job="streamserver",le="7500"} 42
latency_connector_histogram_bucket{Connector="streamserver",Env="staging",job="streamserver",le="10000"} 42
latency_connector_histogram_bucket{Connector="streamserver",Env="staging",job="streamserver",le="+Inf"} 42
latency_connector_histogram_sum{Connector="streamserver",Env="staging",job="streamserver"} 76029.403
latency_connector_histogram_count{Connector="streamserver",Env="staging",job="streamserver"} 42

After this update, a dash to show the distribution of latency values will be possible. Currently, we can only see a mean and that all values are less than infinity.

`metric.go` Updates

ExportLatencyMetrics was refactored to be more readable. It performs the same function, but in a way where someone that isn't me can tell what is going on. The initial version was a confusing mess of maps (sorry). The unit tests were updated to match the changes as well.

cnkarz

overall looks good. just one comment about the function design.

monitor/metric.go

requested changes were done ✅

gunnarsundberg added 3 commits November 17, 2022 16:23

refactor and convert histograms to milliseconds

8ec79e2

update test to use ctx

1db99cf

keep latency observations backwards compatible

435baf0

gunnarsundberg requested review from azhang, cnkarz and hughiednguyen November 17, 2022 22:32

cnkarz previously requested changes Nov 18, 2022

View reviewed changes

monitor/metric.go Outdated Show resolved Hide resolved

update function design

0f8a14c

gunnarsundberg requested a review from cnkarz November 18, 2022 23:15

hughiednguyen approved these changes Nov 19, 2022

View reviewed changes

gunnarsundberg merged commit 3b8a5d3 into main Nov 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[COR-177] Latency Fixes #72

[COR-177] Latency Fixes #72

gunnarsundberg commented Nov 17, 2022

cnkarz left a comment

[COR-177] Latency Fixes #72

[COR-177] Latency Fixes #72

Conversation

gunnarsundberg commented Nov 17, 2022

Latency Units

metric.go Updates

cnkarz left a comment

Choose a reason for hiding this comment

`metric.go` Updates