[otlp] Fix panic in dropped count (again!) #3538

gouthamve · 2022-11-28T12:44:13Z

This doesn't accurately count the dropped samples. For example if a single metric with multiple samples is faulty, we get a single error rather than an error per sample.

But I believe its the best best-effort measurement.

Before we used to do DatapointCount() - samplesInMap()

The problem is the following:

target_info is a synthetic metric added in Prometheus, so the final samples could higher.
A single histogram datapoint in OTLP corresponds to many samples in Prometheus.

Checklist

Tests updated
~~Documentation added~~
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

gouthamve · 2022-11-28T12:47:17Z

cc @pstibrany @replay

This doesn't accurately count the dropped samples. For example if a single metric with multiple samples is faulty, we get a single error rather than an error per sample. But I believe its the best best-effort measurement. Before we used to do `DatapointCount() - samplesInMap()` The problem is the following: 1. target_info is a synthetic metric added in Prometheus, so the final samples could higher. 2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

pkg/util/push/otel.go

replay

🚀

pstibrany

Code change looks good to me.

It's little strange to me that tenant check is only done in case of errors. Should we do it before we even try to convert metrics?

Added unit tests don't seem to be checking for dropped metric at all. In other words, they are unrelated to the PR. 🤔

gouthamve · 2022-11-28T17:32:50Z

It's little strange to me that tenant check is only done in case of errors. Should we do it before we even try to convert metrics?

It is on an authenticated route so the check essentially NEVER fails. It only exists to load the userID value.

Added unit tests don't seem to be checking for dropped metric at all. In other words, they are unrelated to the PR. 🤔

I first wrote the unit tests to reproduce the panic and then proceeded to fix it. Which is why it seems like the unit test is unrelated.

This doesn't accurately count the dropped samples. For example if a single metric with multiple samples is faulty, we get a single error rather than an error per sample. But I believe its the best best-effort measurement. Before we used to do `DatapointCount() - samplesInMap()` The problem is the following: 1. target_info is a synthetic metric added in Prometheus, so the final samples could higher. 2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

This reverts commit 3744cb5.

This doesn't accurately count the dropped samples. For example if a single metric with multiple samples is faulty, we get a single error rather than an error per sample. But I believe its the best best-effort measurement. Before we used to do `DatapointCount() - samplesInMap()` The problem is the following: 1. target_info is a synthetic metric added in Prometheus, so the final samples could higher. 2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> (cherry picked from commit 2ad7eb4)

This doesn't accurately count the dropped samples. For example if a single metric with multiple samples is faulty, we get a single error rather than an error per sample. But I believe its the best best-effort measurement. Before we used to do `DatapointCount() - samplesInMap()` The problem is the following: 1. target_info is a synthetic metric added in Prometheus, so the final samples could higher. 2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> (cherry picked from commit 2ad7eb4) Co-authored-by: Goutham Veeramachaneni <gouthamve@gmail.com>

This doesn't accurately count the dropped samples. For example if a single metric with multiple samples is faulty, we get a single error rather than an error per sample. But I believe its the best best-effort measurement. Before we used to do `DatapointCount() - samplesInMap()` The problem is the following: 1. target_info is a synthetic metric added in Prometheus, so the final samples could higher. 2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> (cherry picked from commit 2ad7eb4)

This doesn't accurately count the dropped samples. For example if a single metric with multiple samples is faulty, we get a single error rather than an error per sample. But I believe its the best best-effort measurement. Before we used to do `DatapointCount() - samplesInMap()` The problem is the following: 1. target_info is a synthetic metric added in Prometheus, so the final samples could higher. 2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

This reverts commit 3744cb5.

This doesn't accurately count the dropped samples. For example if a single metric with multiple samples is faulty, we get a single error rather than an error per sample. But I believe its the best best-effort measurement. Before we used to do `DatapointCount() - samplesInMap()` The problem is the following: 1. target_info is a synthetic metric added in Prometheus, so the final samples could higher. 2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> (cherry picked from commit 2ad7eb4) Co-authored-by: Goutham Veeramachaneni <gouthamve@gmail.com>

gouthamve requested a review from a team as a code owner November 28, 2022 12:44

gouthamve force-pushed the fix-otel-panics-once-and-forall branch from f3c0849 to f2fe3f4 Compare November 28, 2022 12:45

gouthamve force-pushed the fix-otel-panics-once-and-forall branch from f2fe3f4 to 46f0333 Compare November 28, 2022 12:50

gouthamve force-pushed the fix-otel-panics-once-and-forall branch from 46f0333 to 03cbd10 Compare November 28, 2022 12:57

pstibrany reviewed Nov 28, 2022

View reviewed changes

pkg/util/push/otel.go Show resolved Hide resolved

replay mentioned this pull request Nov 28, 2022

Release 2.5.0 #3525

Closed

37 tasks

replay approved these changes Nov 28, 2022

View reviewed changes

pstibrany approved these changes Nov 28, 2022

View reviewed changes

replay merged commit 2ad7eb4 into main Nov 28, 2022

replay deleted the fix-otel-panics-once-and-forall branch November 28, 2022 18:44

replay added a commit that referenced this pull request Nov 29, 2022

Revert "[otlp] Fix panic in dropped count (again!) (#3538)"

4fcf6fd

This reverts commit 3744cb5.

replay added the backport release-2.5 label Nov 29, 2022

grafanabot mentioned this pull request Nov 29, 2022

[release-2.5] [otlp] Fix panic in dropped count (again!) #3576

Merged

masonmei pushed a commit to udmire/mimir that referenced this pull request Dec 16, 2022

Revert "[otlp] Fix panic in dropped count (again!) (grafana#3538)"

ac4c71f

This reverts commit 3744cb5.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[otlp] Fix panic in dropped count (again!) #3538

[otlp] Fix panic in dropped count (again!) #3538

gouthamve commented Nov 28, 2022 •

edited

Loading

gouthamve commented Nov 28, 2022

replay left a comment

pstibrany left a comment

gouthamve commented Nov 28, 2022

[otlp] Fix panic in dropped count (again!) #3538

[otlp] Fix panic in dropped count (again!) #3538

Conversation

gouthamve commented Nov 28, 2022 • edited Loading

Checklist

gouthamve commented Nov 28, 2022

replay left a comment

Choose a reason for hiding this comment

pstibrany left a comment

Choose a reason for hiding this comment

gouthamve commented Nov 28, 2022

gouthamve commented Nov 28, 2022 •

edited

Loading