[pkg/ottl] Add ErrorMode configuration option #18187

TylerHelmuth · 2023-01-30T23:53:13Z

Description:
Adds a new configuration option to OTTL parsers that allow configuring ErrorMode.

The valid options at the moment are IgnoreError and PropagateError. When using IgnoreError, OTTL will log the error but not return it to the component executing the Statement. When using PropagateError, the error will be returned.

I have set PropagateError to be the default error mode, following the idea that OTTL should fail openly whenever possible.

Since this PR is only the configuration option and not actual enablement in any implementing components, I have set the OTTL contexts to use PropagateError so that this isn't a breaking change for transformprocessor, filterprocessor, or routingprocessor users. In future PRs I will update those components to allow for configuring this capability. At that point I will set the ottlcontexts to use IgnoreError by default.

This is the first of several PRs I'd like to submit to improve/standardize error handling when using OTTL.

Link to tracking Issue:

Works towards #16519
Related to #17563

Testing:
Added unit tests

Documentation:
I plan to do a large godocs PR once the OTTL API has stabilized.

TylerHelmuth · 2023-01-30T23:53:43Z

/cc @swiatekm-sumo @sumo-drosiek @astencel-sumo

runforesight · 2023-01-30T23:55:19Z

Foresight Summary

Major Impacts

build-and-test-windows duration(5 seconds) has decreased 40 minutes 10 seconds compared to main branch avg(40 minutes 15 seconds).

View More Details

✅ check-links workflow has finished in 43 seconds (1 minute 14 seconds less than `main` branch avg.) and finished at 1st Feb, 2023.

Job	Failed Steps	Tests
changed files	- 🔗	N/A	See Details
check-links	- 🔗	N/A	See Details

✅ telemetrygen workflow has finished in 1 minute 4 seconds (1 minute 29 seconds less than `main` branch avg.) and finished at 1st Feb, 2023.

Job	Failed Steps	Tests
build-dev	- 🔗	N/A	See Details
publish-latest	- 🔗	N/A	See Details
publish-stable	- 🔗	N/A	See Details

✅ prometheus-compliance-tests workflow has finished in 3 minutes 55 seconds (3 minutes 49 seconds less than `main` branch avg.) and finished at 1st Feb, 2023.

Job	Failed Steps	Tests
prometheus-compliance-tests	- 🔗	✅ 21 ❌ 0 ⏭ 0 🔗	See Details

✅ build-and-test workflow has finished in 41 minutes 51 seconds (11 minutes 28 seconds less than `main` branch avg.) and finished at 1st Feb, 2023.

Job	Failed Steps	Tests
unittest-matrix (1.19, internal)	- 🔗	✅ 561 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.18, internal)	- 🔗	✅ 561 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.19, processor)	- 🔗	✅ 1471 ❌ 0 ⏭ 0 🔗	See Details
correctness-metrics	- 🔗	✅ 2 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.18, processor)	- 🔗	✅ 1471 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.19, extension)	- 🔗	✅ 546 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.18, extension)	- 🔗	✅ 546 ❌ 0 ⏭ 0 🔗	See Details
correctness-traces	- 🔗	✅ 17 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.18, receiver-0)	- 🔗	✅ 2577 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.19, receiver-0)	- 🔗	✅ 2577 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.19, exporter)	- 🔗	✅ 2429 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.18, receiver-1)	- 🔗	✅ 1944 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.18, exporter)	- 🔗	✅ 2429 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.19, receiver-1)	- 🔗	✅ 1944 ❌ 0 ⏭ 0 🔗	See Details
integration-tests	- 🔗	✅ 55 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.19, other)	- 🔗	✅ 4668 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.18, other)	- 🔗	✅ 4668 ❌ 0 ⏭ 0 🔗	See Details
setup-environment	- 🔗	N/A	See Details
lint-matrix (receiver-0)	- 🔗	N/A	See Details
lint-matrix (receiver-1)	- 🔗	N/A	See Details
lint-matrix (processor)	- 🔗	N/A	See Details
lint-matrix (exporter)	- 🔗	N/A	See Details
lint-matrix (extension)	- 🔗	N/A	See Details
lint-matrix (internal)	- 🔗	N/A	See Details
lint-matrix (other)	- 🔗	N/A	See Details
check-codeowners	- 🔗	N/A	See Details
check-collector-module-version	- 🔗	N/A	See Details
checks	- 🔗	N/A	See Details
build-examples	- 🔗	N/A	See Details
lint	- 🔗	N/A	See Details
unittest (1.19)	- 🔗	N/A	See Details
unittest (1.18)	- 🔗	N/A	See Details
cross-compile (darwin, amd64)	- 🔗	N/A	See Details
cross-compile (darwin, arm64)	- 🔗	N/A	See Details
cross-compile (linux, 386)	- 🔗	N/A	See Details
cross-compile (linux, amd64)	- 🔗	N/A	See Details
cross-compile (linux, arm)	- 🔗	N/A	See Details
cross-compile (linux, arm64)	- 🔗	N/A	See Details
cross-compile (linux, ppc64le)	- 🔗	N/A	See Details
cross-compile (windows, 386)	- 🔗	N/A	See Details
cross-compile (windows, amd64)	- 🔗	N/A	See Details
build-package (deb)	- 🔗	N/A	See Details
build-package (rpm)	- 🔗	N/A	See Details
windows-msi	- 🔗	N/A	See Details
publish-check	- 🔗	N/A	See Details
publish-dev	- 🔗	N/A	See Details
publish-stable	- 🔗	N/A	See Details

✅ load-tests workflow has finished in 9 minutes 42 seconds (4 minutes 51 seconds less than `main` branch avg.) and finished at 1st Feb, 2023.

Job	Failed Steps	Tests
loadtest (TestIdleMode)	- 🔗	✅ 1 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestTraceAttributesProcessor)	- 🔗	✅ 3 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestMetricResourceProcessor\|TestTrace10kSPS)	- 🔗	✅ 12 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestTraceBallast1kSPSWithAttrs\|TestTraceBallast1kSPSAddAttrs)	- 🔗	✅ 10 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestMetric10kDPS\|TestMetricsFromFile)	- 🔗	✅ 6 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestTraceNoBackend10kSPS\|TestTrace1kSPSWithAttrs)	- 🔗	✅ 8 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestBallastMemory\|TestLog10kDPS)	- 🔗	✅ 19 ❌ 0 ⏭ 0 🔗	See Details
setup-environment	- 🔗	N/A	See Details

⭕ changelog workflow has finished in 4 seconds (2 minutes 2 seconds less than `main` branch avg.) and finished at 3rd Feb, 2023.

Job	Failed Steps	Tests
changelog	- 🔗	N/A	See Details

⭕ build-and-test-windows workflow has finished in 5 seconds (40 minutes 10 seconds less than `main` branch avg.) and finished at 3rd Feb, 2023.

Job	Failed Steps	Tests
windows-unittest-matrix	- 🔗	N/A	See Details
windows-unittest	- 🔗	N/A	See Details

🔎 See details on Foresight

^{*You can configure Foresight comments in your organization settings page.}

swiatekm-sumo · 2023-01-31T18:02:54Z

I'm not sure I like the names. My immediate assumption for ErrorMode set to Drop was that it would drop data on error, and that Send would send the data ahead.

Instead of Drop, I like Ignore, as I think it's impossible to misinterpret in this context. The other option is a more difficult, maybe something like Propagate?

TylerHelmuth · 2023-01-31T19:00:34Z

@swiatekm-sumo we could also consider naming the configuraiton on_error, which is what pkg/stanza uses.

evan-bradley · 2023-01-31T19:10:31Z

I agree that we should reconsider the names for Drop and Send, since these verbs could also apply to telemetry. I could see it as meaning "on error, drop/send telemetry", which would be the opposite behavior of "on error, drop/send the error".

One idea could be to make the terms more explicit, such as DropErrors and SendErrors. Granted, on_error: drop_errors in the configuration feels a little redundant. Other possible terms could be "fatal" and "nonfatal" or "stop execution" and "continue execution". I think "ignore" and "propagate" are good options as well.

swiatekm-sumo · 2023-01-31T21:43:12Z

One idea could be to make the terms more explicit, such as DropErrors and SendErrors. Granted, on_error: drop_errors in the configuration feels a little redundant. Other possible terms could be "fatal" and "nonfatal" or "stop execution" and "continue execution". I think "ignore" and "propagate" are good options as well.

I don't think on_error: drop_error is even that bad. Definitely better than on_error: drop, which again makes it sound like we're dropping data, and this is exactly how stanza behaves. I'd personally be fine with stanza's terminology, though I still think ignore is the less ambiguous word in this context.

TylerHelmuth · 2023-01-31T23:26:05Z

this is exactly how stanza behaves.

whoops, I shoulda looked at stanza a little closer. I'll update to ignore and propagate.

bogdandrutu · 2023-02-01T19:37:26Z

pkg/ottl/parser.go

+		if s.errorMode == PropagateError {
+			return nil, false, err
+		}
+		s.telemetrySettings.Logger.Error("error executing condition", zap.Error(err))
+		return nil, false, nil


This to me looks like unnecessary here. The caller can decide what to do with the error. If a statement fail at any point there is not like we can continue executing that statement, so we need to stop and I would say return an error.

The caller of Execute can determine what they want to do.

I fail to understand where this extra code is useful for a caller?

I would see this useful in a "type Statements struct {statements []Statement}" that executes multiple statements to ignore the error from a previous statement and continue with next one.

@bogdandrutu I believe main advantage of adding error handling like this to the OTTL is that is allows us to set some expectations when using OTTL.

By including error handling logic to OTTL it becomes much easier to enforce the concept of "open erroring" within the package itself, which then allows us to make more exact statements about how functions should handle errors. By including this type of error handling we can make a statement like "functions should always return any failed type assertions as errors", while ensuring that OTTL will gracefully handle the error.

It also has the benefit of providing a default error handling strategy for components that use OTTL. If they want to always keep processing, they don't have to do anything extra.

We don't provide a type Statements struct {statements []Statement} struct, but I can see how this error handling fits nicely with that struct. I'll play with that as part of this PR and update the implementing components to use it in a future PR.

A struct like that would help with the future problem I foresee where an implementing component, like the transformprocessor, wants to configure not only the ability to skip to the next statement on error, but also skip to the next TransformContext and halting (#16519 (comment))

@bogdandrutu add new Statements struct and moved the error handling there. In a followup PR I''l update ParseStatements to return a Statements struct. This will work for all of the components in contrib that use OTTL except for routingprocessor. For the routingprocessor, which only needs to execute a single statement and wants the results, condition, and error, I'll add a new ParseStatement function that just returns a Statement. What do you think?

Since Parser will end up with the ability to parse a list of statements and a single statement, I think ErrorMode will be more appropriate as an arg/option on ParseStatements instead of the whole Parser struct. But I could see either being ok.

If anyone is curious, I played around with the result and I like the implications: a6c2f19

pkg/ottl/parser.go

bogdandrutu

I like the direction. Some unknowns for me are how you intend to let user get access to a Statements instance.

TylerHelmuth added 2 commits January 30, 2023 16:39

Add ErrorMode configuration option to OTTL

043b84f

Use Send by default

a129402

TylerHelmuth requested review from a team, bogdandrutu and evan-bradley as code owners January 30, 2023 23:53

github-actions bot assigned Aneurysm9 Jan 30, 2023

github-actions bot added the pkg/ottl label Jan 30, 2023

github-actions bot requested a review from kentquirk January 30, 2023 23:53

TylerHelmuth added 4 commits January 30, 2023 16:55

Add changelog entry

d903680

Save reordering for separate PR

157fcf3

fix lint

30d24d0

Merge branch 'open-telemetry:main' into ottl-error-handling

6752711

Update naming

d73dc24

swiatekm-sumo approved these changes Feb 1, 2023

View reviewed changes

evan-bradley approved these changes Feb 1, 2023

View reviewed changes

bogdandrutu reviewed Feb 1, 2023

View reviewed changes

TylerHelmuth added 2 commits February 1, 2023 13:53

Move error handling to Statements

2fb56c4

Remove ErrorMode from the parser struct

7be8752

TylerHelmuth requested review from bogdandrutu, evan-bradley and swiatekm-sumo February 1, 2023 21:05

remove changelog entry

723b391

TylerHelmuth added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Feb 1, 2023

github-actions bot removed the request for review from swiatekm-sumo February 1, 2023 21:06

bogdandrutu approved these changes Feb 2, 2023

View reviewed changes

TylerHelmuth added the ready to merge Code review completed; ready to merge by maintainers label Feb 3, 2023

bogdandrutu merged commit f5e42bf into open-telemetry:main Feb 3, 2023

TylerHelmuth deleted the ottl-error-handling branch February 3, 2023 19:30

TylerHelmuth mentioned this pull request Feb 6, 2023

[pkg/ottl] Add NewStatements to each context #18385

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pkg/ottl] Add ErrorMode configuration option #18187

[pkg/ottl] Add ErrorMode configuration option #18187

TylerHelmuth commented Jan 30, 2023 •

edited

Loading

TylerHelmuth commented Jan 30, 2023

runforesight bot commented Jan 30, 2023 •

edited

Loading

`build-and-test-windows` duration(5 seconds) has decreased 40 minutes 10 seconds compared to main branch avg(40 minutes 15 seconds).

✅ check-links workflow has finished in 43 seconds (1 minute 14 seconds less than `main` branch avg.) and finished at 1st Feb, 2023.

✅ telemetrygen workflow has finished in 1 minute 4 seconds (1 minute 29 seconds less than `main` branch avg.) and finished at 1st Feb, 2023.

✅ prometheus-compliance-tests workflow has finished in 3 minutes 55 seconds (3 minutes 49 seconds less than `main` branch avg.) and finished at 1st Feb, 2023.

✅ build-and-test workflow has finished in 41 minutes 51 seconds (11 minutes 28 seconds less than `main` branch avg.) and finished at 1st Feb, 2023.

✅ load-tests workflow has finished in 9 minutes 42 seconds (4 minutes 51 seconds less than `main` branch avg.) and finished at 1st Feb, 2023.

⭕ changelog workflow has finished in 4 seconds (2 minutes 2 seconds less than `main` branch avg.) and finished at 3rd Feb, 2023.

⭕ build-and-test-windows workflow has finished in 5 seconds (40 minutes 10 seconds less than `main` branch avg.) and finished at 3rd Feb, 2023.

swiatekm-sumo commented Jan 31, 2023

TylerHelmuth commented Jan 31, 2023

evan-bradley commented Jan 31, 2023

swiatekm-sumo commented Jan 31, 2023

TylerHelmuth commented Jan 31, 2023

bogdandrutu Feb 1, 2023

bogdandrutu Feb 1, 2023

TylerHelmuth Feb 1, 2023

TylerHelmuth Feb 1, 2023

TylerHelmuth Feb 1, 2023 •

edited

Loading

TylerHelmuth Feb 1, 2023

bogdandrutu left a comment

[pkg/ottl] Add ErrorMode configuration option #18187

[pkg/ottl] Add ErrorMode configuration option #18187

Conversation

TylerHelmuth commented Jan 30, 2023 • edited Loading

TylerHelmuth commented Jan 30, 2023

runforesight bot commented Jan 30, 2023 • edited Loading

Foresight Summary

build-and-test-windows duration(5 seconds) has decreased 40 minutes 10 seconds compared to main branch avg(40 minutes 15 seconds).

✅ check-links workflow has finished in 43 seconds (1 minute 14 seconds less than main branch avg.) and finished at 1st Feb, 2023.

✅ telemetrygen workflow has finished in 1 minute 4 seconds (1 minute 29 seconds less than main branch avg.) and finished at 1st Feb, 2023.

✅ prometheus-compliance-tests workflow has finished in 3 minutes 55 seconds (3 minutes 49 seconds less than main branch avg.) and finished at 1st Feb, 2023.

✅ build-and-test workflow has finished in 41 minutes 51 seconds (11 minutes 28 seconds less than main branch avg.) and finished at 1st Feb, 2023.

✅ load-tests workflow has finished in 9 minutes 42 seconds (4 minutes 51 seconds less than main branch avg.) and finished at 1st Feb, 2023.

⭕ changelog workflow has finished in 4 seconds (2 minutes 2 seconds less than main branch avg.) and finished at 3rd Feb, 2023.

⭕ build-and-test-windows workflow has finished in 5 seconds (40 minutes 10 seconds less than main branch avg.) and finished at 3rd Feb, 2023.

swiatekm-sumo commented Jan 31, 2023

TylerHelmuth commented Jan 31, 2023

evan-bradley commented Jan 31, 2023

swiatekm-sumo commented Jan 31, 2023

TylerHelmuth commented Jan 31, 2023

bogdandrutu Feb 1, 2023

Choose a reason for hiding this comment

bogdandrutu Feb 1, 2023

Choose a reason for hiding this comment

TylerHelmuth Feb 1, 2023

Choose a reason for hiding this comment

TylerHelmuth Feb 1, 2023

Choose a reason for hiding this comment

TylerHelmuth Feb 1, 2023 • edited Loading

Choose a reason for hiding this comment

TylerHelmuth Feb 1, 2023

Choose a reason for hiding this comment

bogdandrutu left a comment

Choose a reason for hiding this comment

TylerHelmuth commented Jan 30, 2023 •

edited

Loading

runforesight bot commented Jan 30, 2023 •

edited

Loading

`build-and-test-windows` duration(5 seconds) has decreased 40 minutes 10 seconds compared to main branch avg(40 minutes 15 seconds).

✅ check-links workflow has finished in 43 seconds (1 minute 14 seconds less than `main` branch avg.) and finished at 1st Feb, 2023.

✅ telemetrygen workflow has finished in 1 minute 4 seconds (1 minute 29 seconds less than `main` branch avg.) and finished at 1st Feb, 2023.

✅ prometheus-compliance-tests workflow has finished in 3 minutes 55 seconds (3 minutes 49 seconds less than `main` branch avg.) and finished at 1st Feb, 2023.

✅ build-and-test workflow has finished in 41 minutes 51 seconds (11 minutes 28 seconds less than `main` branch avg.) and finished at 1st Feb, 2023.

✅ load-tests workflow has finished in 9 minutes 42 seconds (4 minutes 51 seconds less than `main` branch avg.) and finished at 1st Feb, 2023.

⭕ changelog workflow has finished in 4 seconds (2 minutes 2 seconds less than `main` branch avg.) and finished at 3rd Feb, 2023.

⭕ build-and-test-windows workflow has finished in 5 seconds (40 minutes 10 seconds less than `main` branch avg.) and finished at 3rd Feb, 2023.

TylerHelmuth Feb 1, 2023 •

edited

Loading