Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

err: Unable to Get the chaosengine #2301

Closed
niebomin opened this issue Oct 29, 2020 · 15 comments
Closed

err: Unable to Get the chaosengine #2301

niebomin opened this issue Oct 29, 2020 · 15 comments

Comments

@niebomin
Copy link
Contributor

err: Unable to Get the chaosengine

My environment is pretty simple. I have an azure k8s cluster, and followed this guide https://istio.io/latest/docs/setup/getting-started/ to setup my environment. In other words, the namespace is enabled with istio injection.

Litmus is setup, and I was trying to run pod-delete experiment. Chaos runner pod is created, but I saw errors though. "pod-delete" pod is also created, but the target pod was not deleted. When I was looking at logs by k logs -f pod-delete-4y48y4-ql64c -c pod-delete-4y48y4, I saw error

Unable to initialise probes details from chaosengine, err: Unable to Get the chaosengine, err: Get \"https://10.0.0.1:443/apis/litmuschaos.io/v1alpha1/namespaces/default/chaosengines/bookinfo-chaos\": dial tcp 10.0.0.1:443: connect: connection refused

This is my engine yaml, https://drive.google.com/file/d/1HAaMLamHS3BZP6SDNtD_-YO1pdl46vnh/view?usp=sharing

Litmus version is 1.9.0

@ajeshbaby ajeshbaby added area/community-charts project/community Issues raised by community members labels Nov 19, 2020
@ispeakc0de
Copy link
Member

Hi @niebomin,

I am wondering about this failure. It is not able to reach kube-apiserver to get the chaosengine details. Are you facing this issue every time? You can try once again if you still face this issue then you can verify by exec inside pod-delete-xxxx pod and manually run the kubectl get chaosengines bookinfo-chaos -n default command.

@niebomin
Copy link
Contributor Author

kubectl get chaosengines bookinfo-chaos -n default works good. I guess it's because that istio is not totally ready (make sense?). Do I need to do anything specifically for istio?

@niebomin
Copy link
Contributor Author

And, the thing is, as long as I disable the istio auto inject, it works.

@ksatchit
Copy link
Member

Interesting, the seems to be reproduced locally too. Has something to do w/ istio. Need to check this further

@niebomin
Copy link
Contributor Author

Thanks. Please...

@ksatchit
Copy link
Member

ksatchit commented Nov 20, 2020

This seems to be the cause: istio/istio#8696

There are also workarounds suggested here: istio/istio#12187 -- especially look at istio/istio#12187 (comment) (adding a label to not inject istio sidecars specifically to the chaos pods might solve this issue)

The above can be achieved by adding additional labels here: https://github.com/litmuschaos/chaos-charts/blob/12ac9fb80b0b2567657b1676de1187c562f0fdf8/charts/generic/pod-delete/experiment.yaml#L77. These labels are propagated to all experiment pods and helper pods if any.

However, the chaos-runner pod needs to an explicit addition of these labels here: https://github.com/litmuschaos/chaos-operator/blob/68d050966fe073377c4e818606506dd951898803/pkg/controller/chaosengine/chaosengine_controller.go#L234

@ksatchit
Copy link
Member

Ok we need to add the annotation sidecar.istio.io/inject: "false" in the chaos-runner, experiment pod & helpers to prevent sidecar injection & in turn avoid this issue in accessing kube-apiserver. Litmus supports adding custom annotations to the chaos pods via an instrumentation in the chaosengine. Please refer the example below:

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: nginx-chaos
  namespace: default
spec:
  appinfo:
    appns: 'default'
    applabel: 'app=nginx'
    appkind: 'deployment'
  annotationCheck: 'true'
  engineState: 'active'
  chaosServiceAccount: pod-delete-sa
  monitoring: false
  components:
    runner: 
      runnerannotation:
        sidecar.istio.io/inject: "false"
  jobCleanUpPolicy: 'delete'
  experiments:
    - name: pod-delete
      spec:
        components:
          experimentannotation: 
            sidecar.istio.io/inject: "false"
          env:
            - name: TOTAL_CHAOS_DURATION
              value: '30'
            - name: CHAOS_INTERVAL
              value: '10'
            - name: FORCE
              value: 'false'

@ksatchit
Copy link
Member

@niebomin can you please try this and let us know?

@niebomin
Copy link
Contributor Author

This great. It works around istio and works for me. Thanks!

@niebomin
Copy link
Contributor Author

niebomin commented Nov 23, 2020

By the way, let me reuse this thread. What does ChaosResult mean? Does Pass just mean the experiment has been successfully conducted? Or I actually can add something to verify my system that has been injected the chaos?

@ksatchit
Copy link
Member

ksatchit commented Dec 3, 2020

@niebomin apologies on the late response. Somehow missed this Question.

The chaosresult is a resource carrying details of the state & verdict of the experiment. Pass typically means (a) the chaos was injected (b) the exit checks specified for the experiment (by default, we check if the AUT pods, i.e., pods of app under test (those mentioned via the applabel) are in running state/containers are in ready state.

Having said that, you can insert of lot of "custom" exit checks on the experiment via usage of the Litmus Probes. The pass is then bound the conditions you set via probes.

A verdict of fail can mean a couple of things: (a) the exit checks specified couldn't be met, i.e., your app or infra is therefore deemed intolerant to the fault injected. (b) or it is possible the chaos injection didn't go through due to some reason - (i) either a misconfiguration in the CRs or (ii) bug in experiment business logic.

The latter (b)(ii) are increasingly rarer as experiments mature. We are also working on adding more static/dynamic schema validation to prevent (b)(i). So, in most cases, the app is not resilient.

The chaosresults also have a field called failStep to indicate at what point the failure has occurred. This is to aid debug and ascertain the reason.

@ksatchit
Copy link
Member

ksatchit commented Dec 3, 2020

@niebomin let me welcome you to the litmus slack channel - we live in the Kubernetes Slack workspace and will be happy to see you there. You might get faster responses from community folks & also participate in other interesting discussions around chaos engineering.

You can use this self-service link https://slack.kubernetes.io to register & join the workspace. Once done, you can search and join the #litmus channel.

@niebomin
Copy link
Contributor Author

niebomin commented Dec 7, 2020

Thanks!! I'm in the channel now.

@pritamdas1309
Copy link

pritamdas1309 commented Aug 9, 2021

@ksatchit ,

I have tried to disable istio inside chaosengine but its not working.
Kindly help me.
version :- 1.13.8

I have taken reference and used above example but it's not working for me.
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: nginx-chaos
namespace: nginx
spec:
appinfo:
appns: 'nginx'
applabel: 'app=nginx'
appkind: 'deployment'
annotationCheck: 'true'
engineState: 'active'
auxiliaryAppInfo: ''
chaosServiceAccount: pod-delete-sa
components:
runner:
runnerannotation:
sidecar.istio.io/inject: "false"
jobCleanUpPolicy: 'delete'
experiments:
- name: pod-delete
spec:
components:
experimentannotation:
sidecar.istio.io/inject: "false"
env:
- name: TOTAL_CHAOS_DURATION
value: '30'
- name: CHAOS_INTERVAL
value: '10'
- name: FORCE
value: 'false'
- name: PODS_AFFECTED_PERC
value: ''

@arodindev
Copy link

Hello, sorry for commenting on this old thread, but I'm facing the exact same issue. I installed Litmus on a namespace with label istio-injection: enabled. If i were to remove that label, everything works as expected. @ksatchit if i understand you correctly, i would have to manually edit the chaosengine manifests and add the sidecar.istio.io/inject: "false" annotation? If so, is there no better way to work around this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants