Handle k6 exit codes #75

b0nete · 2021-09-27T16:30:56Z

Hi, i'm executing load tests in my kubernetes cluster but i have a problem when tests fails.

I need tests be executed only one time, and if these run succesfully o fails don't be executed again.
Currently if tests running ok these dont be executed again, but if test threshold faild automatically starter container is created and launch another pod to try run test again.

I leave my config files here, i tried to set abortOnFail in threshold and use abortTest() function but the problem persist.
I think it is a k6-operator behaviour, maybe you can help me.

This is my test file.

apiVersion: v1
kind: ConfigMap
metadata:
  name: k6-test
  namespace: k6-operator-system
data:
  test.js: |
    import http from 'k6/http';
    import { Rate } from 'k6/metrics';
    import { check, sleep, abortTest } from 'k6';

    const failRate = new Rate('failed_requests');

    export let options = {
      stages: [
        { target: 1, duration: '1s' },
        { target: 0, duration: '1s' },
      ],
      thresholds: {
        failed_requests: [{threshold: 'rate<=0', abortOnFail: true}],
        http_req_duration: [{threshold: 'p(95)<1', abortOnFail: true}],
      },
    };

    export default function () {
      const result = http.get('http://test/login/');
      check(result, {
        'http response status code is 200': result.status === 500,
      });
      failRate.add(result.status !== 200);
      sleep(1);
      abortTest();
    }

And this is my k6 definition.

apiVersion: k6.io/v1alpha1
kind: K6
metadata:
  name: k6-sample
  namespace: k6-operator-system
spec:
  parallelism: 1
  script:
    configMap:
      name: k6-test
      file: test.js
  arguments: --out influxdb=http://influxdb.influxdb:8086/test
  scuttle:
    enabled: "false"

I hope you can help me, thanks!

The text was updated successfully, but these errors were encountered:

knechtionscoding · 2021-09-29T13:51:13Z

So, I think this is because k6 exits with a non 0 exit code and so the k6 operator will try to keep it going till it succeeds.

We could probably add that to the crd as an option, restart never. And have k6-operator interpret that.

yorugac · 2021-12-20T13:30:24Z

@b0nete thanks for opening the issue!

I agree with @knechtionscoding that this happens because of non-zero exit of k6 run. It seems that number of completions for the k8s job is 1 by default so operator expects at least one successful exit. Another curious thing is that I don't actually observe multiple test runs when I try this scenario: the 1st runner fails with non-zero exit, then the 2nd runner is created and gets stuck in "paused" state. This likely happens because the 1st starter finished successfully and operator doesn't have any additional logic for this case: no 2nd starter is created and the 2nd runner waits indefinitely to be started.

IMO, this shouldn't be the default behavior: if thresholds fail, it is a reason for someone to look into the SUT and the script and figure out what to do with that. So k6-operator shouldn't be restarting any pods on failing thresholds 🤔

yorugac · 2021-12-21T11:21:02Z

Looking at https://github.com/grafana/k6/blob/master/errext/exitcodes/codes.go:

k6 error	exit code	meaning in k6-operator context	restart the runner?	is startup-only error?
CloudTestRunFailed	97	this error should never happen in k6-operator	no	-
CloudFailedToGetProgress	98	this error should never happen in k6-operator	no-
ThresholdsHaveFailed	99	regular error, action is to be determined by user	no	-
SetupTimeout	100	regular error, likely the script or configuration needs to be reviewed	no	-
TeardownTimeout	101	regular error, likely the script or configuration needs to be reviewed	no	-
GenericTimeout	102	regular error, likely the script or configuration needs to be reviewed	no	-
GenericEngine	103	something going wrong in k6 setup and must be investigated	no
InvalidConfig	104	regular error, test config should be reviewed	no	-
ExternalAbort	105	`os.Interrupt`, `SIGINT` or `SIGTERM` are regular errors but everything else should never happen in k6-operator	yes*	no
CannotStartRESTAPI	106	runner cannot be started without working REST	yes	yes
ScriptException	107	regular error, script must be reviewed	no	-
ScriptAborted	108	regular error, script must be reviewed	no	-

~~unless there is a point in restart on SIGINT and SIGTERM specifically?~~ Other cases of ExternalAbort happen in k6 cloud execution which is not used in operator. During k6 run, ExternalAbort implies interrupts, SIGINTs and SIGTERMs.

EDIT 17 Feb: updated the table with Simme's input and additional info.

simskij · 2021-12-26T23:15:48Z

CannotStartRESTAPI should probably lead to a reschedule, as this likely is caused by networking issues on the cluster node.
ExternalAbort is also (most) likely to happen due to timing/scheduling issues because of pod eviction policies being triggered, and there is a pretty high chance that rescheduling the job would resolve that.

Do note that I use the term reschedule rather than restart though. Restarting the exact same pod would likely lead to another f failure, but allowing k8s to destroy the pod and reschedule it (preferably even to another node) might not.

yorugac · 2022-01-05T11:10:46Z

CannotStartRESTAPI should probably lead to a reschedule, as this likely is caused by networking issues on the cluster node.

Good point! There should be a limit to number of such restarts though.

yorugac · 2022-02-18T16:23:04Z

In PR #86, backoff limit for runner jobs was set to 0: that disables all restarts no matter the exit codes. It's a partial solution to this issue. Cases when there should be a restart (as noted in above comments) should be solved separately.

jsravn · 2022-03-30T15:04:33Z

Any progress on this? It blocks usage of the operator for me unfortunately. I'm thinking as a workaround, I could patch the job after the operator creates it.

yorugac · 2022-03-31T08:44:01Z

Hi @jsravn, as described in the last comment before yours, this was partially fixed in 0cdcc9d as part of PR #86. I expected that PR to be merged in by now but it's being delayed due to other issues 😞

I'll pull out this specific commit with backoff tomorrow so that it can be merged into main branch independently from #86. Please watch for the updates 🙂

mhaddon · 2022-04-28T23:01:14Z

Was this merged up? @yorugac

yorugac · 2022-04-29T07:19:39Z

@mhaddon yes, the fix is in main 2780355
So the last image from main branch contains it.

mhaddon · 2022-04-29T10:08:02Z

what image is that? because i tried v0.0.7rc4 (https://github.com/grafana/k6-operator/tree/v0.0.7rc4/config/default) and it doesnt have it).

ghcr.io/grafana/operator:latest

or do i build it myself?

yorugac · 2022-04-29T12:31:22Z

No, you don't need to build it, it's present with commit as tag:
ghcr.io/grafana/operator:278035580ffaa523b1a62f02e801fe7e35c7c5ab
You can find all the images built for operator on this page:
https://github.com/grafana/k6-operator/pkgs/container/operator

yorugac · 2023-03-17T15:22:47Z

Connected issue in k6: grafana/k6#2804

yorugac added bug Something isn't working evaluation needed labels Dec 20, 2021

yorugac mentioned this issue Jan 21, 2022

Feat: Execution of test using a LocalFile inside the Docker Image #93

Merged

yorugac mentioned this issue Feb 7, 2023

Revisit error status definition #196

Open

yorugac changed the title ~~Avoid test be executed again when it fails.~~ Handle k6 exit codes Apr 25, 2023

yorugac added the PLZ label Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle k6 exit codes #75

Handle k6 exit codes #75

b0nete commented Sep 27, 2021

knechtionscoding commented Sep 29, 2021

yorugac commented Dec 20, 2021

yorugac commented Dec 21, 2021 •

edited

simskij commented Dec 26, 2021 •

edited

yorugac commented Jan 5, 2022

yorugac commented Feb 18, 2022 •

edited

jsravn commented Mar 30, 2022

yorugac commented Mar 31, 2022 •

edited

mhaddon commented Apr 28, 2022

yorugac commented Apr 29, 2022

mhaddon commented Apr 29, 2022 •

edited

yorugac commented Apr 29, 2022

yorugac commented Mar 17, 2023

Handle k6 exit codes #75

Handle k6 exit codes #75

Comments

b0nete commented Sep 27, 2021

knechtionscoding commented Sep 29, 2021

yorugac commented Dec 20, 2021

yorugac commented Dec 21, 2021 • edited

simskij commented Dec 26, 2021 • edited

yorugac commented Jan 5, 2022

yorugac commented Feb 18, 2022 • edited

jsravn commented Mar 30, 2022

yorugac commented Mar 31, 2022 • edited

mhaddon commented Apr 28, 2022

yorugac commented Apr 29, 2022

mhaddon commented Apr 29, 2022 • edited

yorugac commented Apr 29, 2022

yorugac commented Mar 17, 2023

yorugac commented Dec 21, 2021 •

edited

simskij commented Dec 26, 2021 •

edited

yorugac commented Feb 18, 2022 •

edited

yorugac commented Mar 31, 2022 •

edited

mhaddon commented Apr 29, 2022 •

edited