Tracking: Report user errors #7831

jon-chuang · 2023-02-10T09:36:24Z

Implements: #7824

Simple report ExprError as compute_error in stream jobs (feat(stream): Report compute_error_count to prometheus #7832)
Report source errors (parser) (feat(stream): source_error_count reporting to prometheus #7877)
Supress errors if unique errors are exceeded. Reset error table every 1 hour. Cluster config for max_unique. Set to U64::MAX for test.
- Compute (feat(stream): ErrorSuppressor for user compute errors #8132)
- Stream Source (feat(connector): Refactor metrics, source_info into source_ctx and add ErrorSuppressor for user source errors #8156)
Warn user for batch source errors (feat(error-reporting): WARN user about batch source errors #8135)
(Optional?) define e2e test that queries prometheus/the prometheus endpoint directly after triggering some user errors (maybe takes too long). Can be part of longevity testing? @lmatz (test(stream): Test reporting of stream errors in e2e setting #8037)
~~- [ ] Truncate errors if blacklisted (Stream Error Truncation Mechanism #7871)~~

Orthogonal:

Improve error messages of user errors so that they are more informative (e.g. show the expression when encountering ExprError - however, this might be quite hard as the Debug fmt may not clear for the nested BoxedExpression used in our executors)
Incidentally, we could also report errors which kill actor to prometheus, and display in grafana under the same Stream Errors panel (as suggested here: streaming: report actor error #37)

The text was updated successfully, but these errors were encountered:

BugenZhao · 2023-02-13T05:04:36Z

Clean ExprError to UserError

I hope I understand it right, the ExprError itself should be preserved as part of the interfaces of the expr crate. But for the usages in the stream crate, we can erase the specific variant and represent it as a string.

Simple report ExprError in stream jobs

Do we report the concrete error message or only the error count? If it's the former, is there a mechanism for persisting (or buffering) the error messages or throttling a bulk of errors (for example, wrong column descriptors lead to failure for every line in the parser)?

jon-chuang · 2023-02-13T06:34:21Z

Do we report the concrete error message or only the error count?

We will record the concrete error message if it has not been truncated.

We will truncate only if the message is blacklisted (i.e. we do not know if it can have unbounded cardinality) and if the number of unique blacklisted messages recorded exceed some threshold (e.g. 50).

Our hope is to recover the full, informative error message in the happy path.

EDIT: truncation is no longer planned

jon-chuang added the type/feature label Feb 10, 2023

github-actions bot added this to the release-0.1.17 milestone Feb 10, 2023

jon-chuang mentioned this issue Feb 10, 2023

feat(stream): Report compute_error_count to prometheus #7832

Merged

5 tasks

jon-chuang mentioned this issue Feb 13, 2023

feat(stream): source_error_count reporting to prometheus #7877

Merged

5 tasks

fuyufjh removed this from the release-0.1.17 milestone Feb 20, 2023

fuyufjh assigned jon-chuang Feb 20, 2023

jon-chuang mentioned this issue Feb 23, 2023

feat(stream): ErrorSuppressor for user compute errors #8132

Merged

6 tasks

jon-chuang changed the title ~~Tracking: Report user errors in stream jobs~~ Tracking: Report user errors Feb 23, 2023

jon-chuang mentioned this issue Feb 23, 2023

feat(connector): Refactor metrics, source_info into source_ctx and add ErrorSuppressor for user source errors #8156

Merged

6 tasks

jon-chuang added this to the release-0.1.18 milestone Feb 28, 2023

fuyufjh removed this from the release-0.18 milestone Mar 22, 2023

fuyufjh mentioned this issue Jun 9, 2023

Cannot handle postgres numeric/decimal type with scale unspecified in Postgres Debezium CDC #10247

Closed

fuyufjh closed this as completed Feb 22, 2024

fuyufjh mentioned this issue Feb 22, 2024

Discussion: deprecate the old user-error reporting via Prometheus #15192

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking: Report user errors #7831

Tracking: Report user errors #7831

jon-chuang commented Feb 10, 2023 •

edited

BugenZhao commented Feb 13, 2023

jon-chuang commented Feb 13, 2023 •

edited

Tracking: Report user errors #7831

Tracking: Report user errors #7831

Comments

jon-chuang commented Feb 10, 2023 • edited

BugenZhao commented Feb 13, 2023

jon-chuang commented Feb 13, 2023 • edited

jon-chuang commented Feb 10, 2023 •

edited

jon-chuang commented Feb 13, 2023 •

edited