Conversation
|
|
||
| metrics.increment(Metric.OUTPUT_MESSAGES, tags=tags) | ||
| if message_size is not None: | ||
| metrics.increment(Metric.OUTPUT_BYTES, tags=tags, value=message_size) |
There was a problem hiding this comment.
Error counter not scaled to compensate for sampling
High Severity
The Metric.ERRORS counter in output_metrics is not scaled by 1 / sample_rate, unlike Metric.INPUT_MESSAGES in input_metrics which correctly uses value=1 / sample_rate. Since output_metrics returns early based on random.random() > sample_rate, errors are both sampled down and recorded as a value of 1, leading to systematic under-reporting by a factor of 1/sample_rate (e.g., 10x with the default 0.1 rate). This makes error counts unreliable in monitoring dashboards.
Reviewed by Cursor Bugbot for commit 6e7d58f. Configure here.
| metrics = get_metrics() | ||
| tags = {"step": name} | ||
| if error: | ||
| tags["error"] = error |
There was a problem hiding this comment.
Bug: The ERRORS metric is not scaled by 1/sample_rate to account for sampling, unlike INPUT_MESSAGES, leading to undercounted errors.
Severity: HIGH
Suggested Fix
In the output_metrics block, scale the ERRORS metric by 1 / sample_rate to match the handling of INPUT_MESSAGES. Change metrics.increment(Metric.ERRORS, tags=tags) to metrics.increment(Metric.ERRORS, value=1 / sample_rate, tags=tags).
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: sentry_streams/sentry_streams/adapters/arroyo/rust_arroyo.py#L141
Potential issue: The `ERRORS` metric in `output_metrics` is not scaled by
`1/sample_rate`, while the `INPUT_MESSAGES` metric is. This inconsistency leads to a
systematic undercounting of errors whenever sampling is active. The calculated error
rate (`ERRORS`/`INPUT_MESSAGES`) will be lower than the true rate by a factor of
`sample_rate`. This can cause monitoring dashboards and alerts to miss real production
issues, making the system appear more reliable than it actually is. For instance, with a
`sample_rate` of 0.1, only 10% of the actual errors will be reported.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit f2ca949. Configure here.
| if _metrics is None: | ||
| _metrics = Metrics(_dummy_metrics_backend) | ||
|
|
||
| return _metrics |
There was a problem hiding this comment.
Caching dummy metrics breaks later configuration call
Medium Severity
get_metrics() now mutates the global _metrics by caching a dummy-backed Metrics instance when none is configured. Previously it returned a fresh throwaway object without side effects. If get_metrics() is ever called before configure_metrics(), the cached dummy will cause configure_metrics(force=False) to hit the assert _metrics is None guard and crash.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit f2ca949. Configure here.
| def get_metrics() -> Metrics: | ||
| if _metrics_backend is None: | ||
| return Metrics(_dummy_metrics_backend) | ||
| return Metrics(_metrics_backend) | ||
| global _metrics | ||
| if _metrics is None: | ||
| _metrics = Metrics(_dummy_metrics_backend) | ||
|
|
||
| return _metrics |
There was a problem hiding this comment.
Bug: The test fixture reset_metrics_backend was not updated after _metrics_backend was renamed to _metrics, causing state to leak between tests.
Severity: MEDIUM
Suggested Fix
In the reset_metrics_backend fixture located in test_metrics.py, update the line that resets the global variable to target the new name. Change metrics_module._metrics_backend = None to metrics_module._metrics = None.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: sentry_streams/sentry_streams/metrics/metrics.py#L474-L479
Potential issue: The global variable `_metrics_backend` was renamed to `_metrics`, but
the test fixture `reset_metrics_backend` was not updated to reflect this change. The
fixture continues to reset the old, non-existent `_metrics_backend` variable, causing
the state of the new `_metrics` variable to leak between tests. This leads to test
failures, particularly in functions like `configure_metrics()` which assert that
`_metrics` is `None` before initialization. The broken test fixture compromises the
integrity of the test suite, as tests will run with stale data from previous runs.
|
Replaced by #305 |


Recording metrics at every step is still having a major impact on throughput.
In this profile we ran the protobuf parsing of a message + metrics + message instantiation 1M times:
3/4 of the time is spent in recording metrics in the buffered backend.
This does not include sending the metrics, jsut the buffering logic.
This is not unexpected:
In this PR: