Add new `errext.AbortReason` type, move and use old `RunStatus` type only for k6 cloud code #2810

na-- · 2022-12-07T14:55:12Z

This is built on top of #2807 and is a part of #1889 and part of #2430

RunStatus is very specific to the k6 cloud and doesn't map perfectly to how k6 OSS sees the causes of prematurely stopped tests. So, this PR restricts the usage of RunStatus only to the cloudapi/ package and to the k6 cloud command handling in cmd/cloud.go.

It does that by introducing a new type, errext.AbortReason, and a way to attach this type to test run errors. This allows us a cleaner error propagation to the outputs, and every output can map these generic values to its own internal ones however it wants. It is not necessary for that mapping to be exactly 1:1 and, indeed, as you can see, the errext.AbortReason:cloudapi.RunStatus mapping in cloudapi/ is not 1:1.

olegbespalov

LGTM 👍 left minor comment

output/cloud/output.go

codecov-commenter · 2022-12-14T12:00:35Z

Codecov Report

Merging #2810 (6d515f5) into master (41472d1) will increase coverage by 0.01%.
The diff coverage is 88.05%.

❗ Current head 6d515f5 differs from pull request most recent head 2955dcc. Consider uploading reports for the commit 2955dcc to get more accurate results

@@            Coverage Diff             @@
##           master    #2810      +/-   ##
==========================================
+ Coverage   76.87%   76.88%   +0.01%     
==========================================
  Files         217      218       +1     
  Lines       16619    16650      +31     
==========================================
+ Hits        12776    12802      +26     
- Misses       3039     3041       +2     
- Partials      804      807       +3

Flag	Coverage Δ
ubuntu	`76.76% <88.05%> (-0.04%)`	⬇️
windows	`76.63% <88.05%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
js/timeout_error.go	`52.94% <0.00%> (-3.31%)`	⬇️
lib/testutils/mockoutput/mockoutput.go	`41.66% <ø> (-4.49%)`	⬇️
output/manager.go	`79.16% <70.00%> (+4.16%)`	⬆️
errext/abort_reason.go	`75.00% <75.00%> (ø)`
output/cloud/output.go	`85.59% <91.30%> (+0.18%)`	⬆️
cloudapi/api.go	`52.08% <100.00%> (ø)`
cmd/cloud.go	`71.42% <100.00%> (ø)`
cmd/run.go	`72.22% <100.00%> (+0.52%)`	⬆️
core/engine.go	`94.33% <100.00%> (-0.18%)`	⬇️
errext/interrupt_error.go	`71.42% <100.00%> (+4.76%)`	⬆️
... and 6 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

imiric

Moving RunStatus to cloudapi makes sense, but I have some suggestions and questions about the other changes.

imiric · 2022-12-19T11:15:52Z

output/cloud/output.go

+		case errext.AbortedByThresholdsAfterTestEnd:
+			// The test finished normally, it wasn't prematurely aborted. Such
+			// failures are tracked by the restult_status, not the run_status
+			// (so called "tainted" in some places of the API here).
+			return cloudapi.RunStatusFinished


It's strange to have AbortedByThresholdsAfterTestEnd be an AbortReason. Like the comment says, the test wasn't prematurely aborted.

So I would remove it from AbortReason, and simply pass nil to this method, which would return RunStatusFinished. I haven't traced the calls back to see how feasible this is, but it would be more intuitive, without having to explain anything here.

This comment also raises the questions: "what's the difference between result_status and run_status?", and "what does 'tainted' mean?". Also, s/restult/result.

I improved the comments in be3aa5f, which hopefully clears up some things.

The TLDR versions is that we have a mismatch between the k6 OSS and cloud. k6 OSS needs to have an error here, so it can exit with a non-0 exit code, but the k6 cloud tracks that error differently and needs the normal RunStatusFinished here 😞 Though I am also not super happy with the AbortedByThresholdsAfterTestEnd name, so if anyone has any better suggestions, please comment here:

k6/errext/abort_reason.go

Line 13 in be3aa5f

AbortedByThresholdsAfterTestEnd // TODO: rename?

Thanks, but the comment was the least of my concerns here 😅 While a long explanation of this mismatch helps, I'd prefer if we could avoid it.

My main objection is that if thresholds fail after the test ends, the test wasn't aborted. So it's not so much about the naming of AbortedByThresholdsAfterTestEnd, but that it's wrong and confusing, and that it shouldn't be an AbortReason at all. The only reason we use it, is so we can easily catch it here, but we can avoid this if we detect this scenario in a different way.

Maybe use a different error altogether? I'm not sure what the best approach would be, but the current situation is confusing even with the long explanation.

but we can avoid this if we detect this scenario in a different way.

The problem is that this will require a very, very substantial refactoring, just for this one single minor mismatch... 😞 And we'd essentially end up needing two different mechanisms for reporting the error to the output, or something like that 🤔

But overall it is a very fair point 🤔 AbortReason is not that great of a name for other reasons, now that I think about it some more, especially if we want to use it for #2810 (comment) 🤔 What do you think about TestRunError as the type name and these as the values:

// TestRunError is used to attach extra information for the cause of a test run failure or interruption. type TestRunError uint8 const ( TestRunAbortedByUser TestRunError = iota + 1 TestRunAbortedByBreachedThresholds TestRunThresholdsWereBreached // at the end of the test TestRunAbortedByScriptError TestRunAbortedByScriptAbort TestRunAbortedByTimeout ) // IsTestRunError is a wrapper around an error with an attached TestRunError. type IsTestRunError interface { error TestRunError() TestRunError }

🤔 Yeah, TestRunError is a better name than AbortReason, but it seems like we're just prefixing TestRun to make it more generic so we can fit TestRunThresholdsWereBreached 😄

The cloud output already knows if thresholds were breached with this "tainted" check. It could arguably even avoid this check, since the Engine (or whatever we end up moving this to in this series of PRs) already knows this from engine.IsTainted(), and the output could just query it. In the meantime, could it infer the RunStatus from this?

I feel like we're abusing errors as a way to pass information to outputs, and making global changes that are only useful for the cloud output. This could be avoided if we had an API for outputs to query this and other information instead, which is related to the discussion below about StopWithTestError().

If you think this would be a much larger refactor, then let's use TestRunError for now, but we should discuss and improve this soon-ish.

Yeah, it will be quite a big refactor, and it will be difficult to make an API that doesn't have timing issues. This querying API will only work once the test has finished and the thresholds have been crunched. But the output will only know that this has happened once its Stop method has been called. So my thinking was - why not pass that information to the Stop method directly anyway? And, thus, WithStopWithTestError was born 😅

Even my original idea from #2430, to have outputs know that they need to stop when their Samples channel is closed, is not very workable... 😞 After we finish the test and run out of samples, we first need to crunch the thresholds and only then stop the rest of the outputs... So yeah, this is complicated and I'd very much like to leave solving it for after we have gotten rid of the Engine, since that will simplify things tremendously

imiric · 2022-12-19T12:05:34Z

core/engine.go

+			err = errext.WithAbortReasonIfNone(
+				errext.WithExitCodeIfNone(errors.New("test run aborted by signal"), exitcodes.ExternalAbort),
+				errext.AbortedByUser,
+			)


This wrapping is a bit convoluted. It seems we're only wrapping exit code errors in aborted errors because it's convenient to return a single error here. But conceptually, these errors aren't related in this way. If anything, maybe there should be a mapping from AbortReason to ExitCode, which can happen outside of core. After all, why should the Engine be concerned with process exit codes?

I'm not sure I have a concrete improvement suggestion here, but this seems fishy in several ways. 🤔

Great point! Now that this PR adds the new AbortReason type that can accurately represent the internal reason for why a k6 test run failed, we can use it for external mappings. The PR does the mapping from AbortReason to cloudapi.RunStatus, but I don't see a reason why we couldn't also have a mapping to ExitCode. We'd probably need to add more AbortReason values, but that should be fine 🤔 In any case, I'd prefer to handle this in a separate PR, so I added this #2804 (comment)

imiric · 2022-12-19T12:24:29Z

output/types.go

+// WithStopWithTestError allows output to receive the error value that the test
+// finished with. It could be nil, if the test finished nominally.
+//
+// If this interface is implemented by the output, StopWithError() will be
+// called instead of Stop().
+//
+// TODO: refactor the main interface to use this method instead of Stop()? Or
+// something else along the lines of https://github.com/grafana/k6/issues/2430 ?
+type WithStopWithTestError interface {


I think the proposal in #2430 that outputs should stop when the samples channel is closed would be cleaner, but we would still need a way for them to query the test error, as suggested here.

So WDYT if we add a method to output.Params for this purpose? This would avoid this deprecation of Stop() and eventual breaking change, and only outputs that need it can call it. Eventually, we can deprecate and remove Stop() in favor of the proposal in #2430, but that would be a single breaking change, instead of the two in this case.

An alternative to changing output.Params could be for outputs to subscribe to error events, using the event system suggested for JS extensions in #2432, but I think that's still a long ways off.

We don't have a breaking change here, this is why I added an extra optional interface. All of the existing outputs and output extensions that use the old Stop() method will continue to work. They will just continue not knowing if the test finished with an error or not, which most outputs don't care about anyway.

I am softly against adding a control surface to output.Params, it is counter-intuitive. But I am open to discussion and willing to be convinced otherwise, when we decide to tackle #2430. For now, this solution allows us to delay any breaking changes until we do so, exactly so we can do them only once and not constantly break extensions.

We're avoiding a breaking change here by doing this optional expansion of the Output interface, which is IMO a code smell. If we decide to remove Stop() in #2430, then we'll also need to remove StopWithTestError(), hence two eventual breaking changes. I'd rather we avoided adding future work, and reconsider how outputs can query test errors. Essentially, doing this in a way that brings us closer to #2430, not farther.

Adding a method to output.Params was a suggestion, but I also agree not a very intuitive one. Maybe via some other public method?

which is IMO a code smell

Why? I'd consider it a fairly idiomatic Go thing, the stdlib does it all the time 😕

If we decide to remove Stop() in #2430, then we'll also need to remove StopWithTestError(), hence two eventual breaking changes.

It is likely that when we tackle #2430, we will refactor all of the existing interfaces, and that won't be 6 different breaking changes, it will be one big change 😅 And even so, it's likely it will be somewhat graceful, since we will likely have a Go shim to the old Output APIs that will work (with a warning) for a few k6 versions.

In any case, we still don't have a clear idea of how exactly we will do #2430, and when exactly we will do it, or even if we will actually do it... So I don't want to block this PR and its follow-ups until we have that figured out. And, FWIW, figuring out how the new Outputs will look like should come after we have removed the Engine in #2815, since it changes things in this area a lot, but it also depends on this PR first.

OK, let's leave this for later then.

Why? I'd consider it a fairly idiomatic Go thing, the stdlib does it all the time

I don't mean in the general sense. Composing interfaces is sometimes clean and convenient. But in this case it means that outputs don't just implement output.Output, but some outputs also implement WithStopWithTestError, which besides being awkwardly named, also requires runtime type checks wherever this additional functionality is required. And we're doing all this just to avoid a breaking change, which could be prevented by rethinking how this is done, and getting us closer to #2430.

codebien

test-tip is not happy with the latest push

olegbespalov · 2023-01-23T07:38:01Z

I guess the go-tip is unhappy with the master branch golang/go@62a9948 😢

codebien · 2023-01-23T10:32:40Z

output/types.go

 	Output
-	SetRunStatus(latestStatus lib.RunStatus)
+	StopWithTestError(testRunErr error) error // nil testRunErr means error-free test run


Suggested change

StopWithTestError(testRunErr error) error // nil testRunErr means error-free test run

StopWithTestError(ctx context.Context, testRunErr error) error // nil testRunErr means error-free test run

I would use this opportunity to introduce a graceful stop. I'm ok with just changing the API here and addressing the graceful stop from the output manager in a dedicated PR after the removal.

hmm, the convention so far has been to always wait for outputs to finish stopping 🤔 that is, unless a user hits Ctrl+C twice to trigger the hard stop with os.Exit(), k6 will always wait for outputs to finish... which makes some sense to me and I am not sure we should change it, since we then will need some way to configure this timeout

so, I can add a context here, but we probably won't ever use it... similar to what I said in #2810 (comment), my vote would be to add a note in #2430 and we can discuss it when we completely refactor the outputs 🤔

hmm, the convention so far has been to always wait for outputs to finish stopping

It doesn't sound to be aligned with the comment here: #2430 (comment)

This will allow us to set a soft limit for how long outputs are allowed to stop.

I'm ok with moving this as a discussion for 2430

codebien

LGTM, @na-- I wait for the rebase

The run_status constants are something very specific to the k6 cloud, they shouldn't be a part of the main packages.

RunStatus is very specific to the k6 cloud and doesn't map perfectly to how k6 OSS sees the causes of prematurely stopped tests. So, this commit restricts the usage of RunStatus only to the cloudapi/ package and to the `k6 cloud` command handling in `cmd/cloud.go`. It does that by introducing a new type, `errext.AbortReason`, and a way to attach this type to test run errors. This allows us a cleaner error propagation to the outputs, and every output can map these generic values to its own internal ones however it wants. It is not necessary for that mapping to be exactly 1:1 and, indeed, the `errext.AbortReason`:`cloudapi.RunStatus` mapping in cloudapi/ is not 1:1.

na-- · 2023-01-23T15:11:56Z

Rebased

github-actions bot requested review from imiric and olegbespalov December 7, 2022 14:55

na-- added this to the v0.43.0 milestone Dec 7, 2022

na-- mentioned this pull request Dec 7, 2022

Move core/local/ExecutionScheduler to execution/Scheduler #2812

Merged

na-- added the refactor label Dec 8, 2022

na-- mentioned this pull request Dec 9, 2022

[WIP] PoC for native distributed execution #2438

Closed

na-- requested a review from codebien December 9, 2022 11:05

olegbespalov reviewed Dec 13, 2022

View reviewed changes

output/cloud/output.go Outdated Show resolved Hide resolved

na-- force-pushed the fix-aborted-by-user-exit-codes branch from e0e7e45 to 0039acb Compare December 14, 2022 11:52

na-- force-pushed the refactor-run-status branch from 62eb4f9 to b456ed7 Compare December 14, 2022 12:00

na-- requested a review from olegbespalov December 14, 2022 12:01

olegbespalov previously approved these changes Dec 14, 2022

View reviewed changes

imiric reviewed Dec 19, 2022

View reviewed changes

na-- mentioned this pull request Jan 3, 2023

CSV output: vu and iter system tags not output after v0.40.0 #2827

Closed

na-- force-pushed the fix-aborted-by-user-exit-codes branch from 0039acb to 73f67fa Compare January 11, 2023 14:01

Base automatically changed from fix-aborted-by-user-exit-codes to master January 11, 2023 14:22

na-- dismissed olegbespalov’s stale review via 8c360a1 January 11, 2023 14:30

na-- force-pushed the refactor-run-status branch 2 times, most recently from 8c360a1 to 4077da3 Compare January 16, 2023 13:24

na-- force-pushed the refactor-run-status branch from 4077da3 to bcceeb3 Compare January 20, 2023 18:15

codebien requested changes Jan 21, 2023

View reviewed changes

na-- mentioned this pull request Jan 23, 2023

Unify and standardize behavior and exit codes when k6 is stopped #2804

Open

na-- requested review from olegbespalov, codebien and imiric January 23, 2023 09:00

codebien requested changes Jan 23, 2023

View reviewed changes

na-- requested a review from codebien January 23, 2023 12:03

imiric previously approved these changes Jan 23, 2023

View reviewed changes

codebien reviewed Jan 23, 2023

View reviewed changes

na-- added 2 commits January 23, 2023 17:08

Move the RunStatus type from the lib/ package to cloudapi/

94ca5e0

The run_status constants are something very specific to the k6 cloud, they shouldn't be a part of the main packages.

na-- dismissed imiric’s stale review via 2955dcc January 23, 2023 15:11

na-- force-pushed the refactor-run-status branch from be3aa5f to 2955dcc Compare January 23, 2023 15:11

na-- requested review from codebien and imiric January 23, 2023 15:12

imiric approved these changes Jan 23, 2023

View reviewed changes

olegbespalov approved these changes Jan 23, 2023

View reviewed changes

codebien approved these changes Jan 24, 2023

View reviewed changes

na-- merged commit a2a5b39 into master Jan 24, 2023

na-- deleted the refactor-run-status branch January 24, 2023 07:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new `errext.AbortReason` type, move and use old `RunStatus` type only for k6 cloud code #2810

Add new `errext.AbortReason` type, move and use old `RunStatus` type only for k6 cloud code #2810

na-- commented Dec 7, 2022 •

edited

Loading

olegbespalov left a comment

codecov-commenter commented Dec 14, 2022 •

edited

Loading

imiric left a comment

imiric Dec 19, 2022

na-- Jan 23, 2023 •

edited

Loading

imiric Jan 23, 2023

na-- Jan 23, 2023

imiric Jan 23, 2023 •

edited

Loading

na-- Jan 23, 2023 •

edited

Loading

imiric Dec 19, 2022

na-- Jan 23, 2023

imiric Dec 19, 2022

na-- Jan 23, 2023

imiric Jan 23, 2023

na-- Jan 23, 2023

imiric Jan 23, 2023

codebien left a comment

olegbespalov commented Jan 23, 2023

codebien Jan 23, 2023

na-- Jan 23, 2023

codebien Jan 23, 2023

codebien left a comment •

edited

Loading

na-- commented Jan 23, 2023

	StopWithTestError(testRunErr error) error // nil testRunErr means error-free test run
	StopWithTestError(ctx context.Context, testRunErr error) error // nil testRunErr means error-free test run

Add new errext.AbortReason type, move and use old RunStatus type only for k6 cloud code #2810

Add new errext.AbortReason type, move and use old RunStatus type only for k6 cloud code #2810

Conversation

na-- commented Dec 7, 2022 • edited Loading

olegbespalov left a comment

Choose a reason for hiding this comment

codecov-commenter commented Dec 14, 2022 • edited Loading

Codecov Report

imiric left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

na-- Jan 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imiric Jan 23, 2023 • edited Loading

Choose a reason for hiding this comment

na-- Jan 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codebien left a comment

Choose a reason for hiding this comment

olegbespalov commented Jan 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codebien left a comment • edited Loading

Choose a reason for hiding this comment

na-- commented Jan 23, 2023

Add new `errext.AbortReason` type, move and use old `RunStatus` type only for k6 cloud code #2810

Add new `errext.AbortReason` type, move and use old `RunStatus` type only for k6 cloud code #2810

na-- commented Dec 7, 2022 •

edited

Loading

codecov-commenter commented Dec 14, 2022 •

edited

Loading

na-- Jan 23, 2023 •

edited

Loading

imiric Jan 23, 2023 •

edited

Loading

na-- Jan 23, 2023 •

edited

Loading

codebien left a comment •

edited

Loading