Skip to content

Latest commit

 

History

History
530 lines (489 loc) · 20.7 KB

README.md

File metadata and controls

530 lines (489 loc) · 20.7 KB

Litmus Chaos Exporter

BCH compliance Go Report Card FOSSA Status

  • This is a custom Prometheus and CloudWatch exporter to expose Litmus Chaos metrics. To learn more about Litmus Chaos Experiments & the Litmus Chaos Operator, visit this link: Litmus Docs

  • Typically deployed along with the chaos-operator deployment, which, in-turn is associated with all chaosresults in the cluster.

  • Two types of metrics are exposed:

    • AggregateMetrics: These metrics are derived from the all the chaosresults present inside WATCH_NAMESPACE. If WATCH_NAMESPACE is not defined then it derived metrics from all namespaces. It exposes total_passed_experiment, total_failed_experiment, total_awaited_experiment, experiment_run_count, experiment_installed_count metrices.

    • ExperimentScoped: Individual experiment run status. It exposes passed_experiment, failed_experiment, awaited_experiment, probe_success_percentage, startTime, endTime, totalDuration, chaosInjectTime metrices.

ExperimentScoped Metrics

Metrics Name litmuschaos_passed_experiments
Description It contains total number of passed experiments
Source ChaosResult
Sample Metrics litmuschaos_passed_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1
Notes The litmuschaos_passed_experiments contains the cumulative sum of passed runs for the given ChaosResult.
Metrics Name litmuschaos_failed_experiments
Description It contains total number of failed experiments
Source ChaosResult
Sample Metrics litmuschaos_failed_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 0
Notes The litmuschaos_failed_experiments contains the cumulative sum of failed runs for the given ChaosResult.
Metrics Name litmuschaos_awaited_experiments
Description It contains total number of awaited experiments
Source ChaosResult
Sample Metrics litmuschaos_awaited_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1
Notes The litmuschaos_awaited_experiments denotes the queued experiments for each ChaosResult. It contains the value as 1 if the ChaosResult's verdict is Awaited otherwise it's value is 0.
Metrics Name litmuschaos_probe_success_percentage
Description It contains the ProbeSuccessPercentage for the experiment
Source ChaosResult
Sample Metrics litmuschaos_probe_success_percentage{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 100
Notes The litmuschaos_probe_success_percentage defines the percentage of passed probes out of total probes defined inside the ChaosEngine.
Metrics Name litmuschaos_experiment_start_time
Description It contains the start time of the experiment
Source ExperimentDependencyCheck event inside the ChaosEngine
Sample Metrics litmuschaos_experiment_start_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618425155e+09
Notes The litmuschaos_experiment_start_time denotes the start time of the experiment, which calculated based on the ExperimentDependencyCheck event(created by the chaos-runner just before launching experiment pod).
Metrics Name litmuschaos_experiment_end_time
Description It contains the end time of the experiment
Source Summary event inside the ChaosEngine
Sample Metrics litmuschaos_experiment_end_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618425219e+09
Notes The litmuschaos_experiment_end_time denotes the end time of the experiment, which calculated based on the Summary event(created by experiment pod in the end of experiment).
Metrics Name litmuschaos_experiment_chaos_injected_time
Description It contains the chaos injection time of the experiment
Source ChaosInject event inside the ChaosEngine
Sample Metrics litmuschaos_experiment_chaos_injected_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618425199e+09
Notes The litmuschaos_experiment_chaos_injected_time defines the time duration when chaos is actually injected, which calculated based on the ChaosInject event(created by the experiment/helper pod just before chaos injection).
Metrics Name litmuschaos_experiment_total_duration
Description It contains the total chaos duration of the experiment
Source It is time difference b/w startTime and endTime
Sample Metrics litmuschaos_experiment_total_duration{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 64
Notes The litmuschaos_experiment_total_duration defines the total chaos duration of the experiment. It is time interval betweeen start time and the end time.
Metrics Name litmuschaos_experiment_verdict
Description It contains the experiment verdict details
Source ChaosResult
Sample Metrics litmuschaos_experiment_verdict{app_kind="deployment",app_label="run=nginx",app_namespace="nginx",chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus",chaosresult_verdict="Pass",probe_success_percentage="100.000000"} 1
Notes The litmuschaos_experiment_verdict sets the metrics based on the ChaosResult verdict. In case of Awaited verdict it always set to 0. In case of other verdicts it contains value as 1. But if the verdict is repeated more than TSDB_SCRAPE_INTERVAL(passed as ENV) then it will set to 0 until verdict change to a different value.

NamespacedScoped Metrics

Metrics Name litmuschaos_namespace_scoped_passed_experiments
Description It contains the total passed experiments count in the WATCH_NAMESPACE
Source Aggregated sum of all the litmuschaos_passed_experiments metrics derived from the ChaosResult present inside WATCH_NAMESPACE
Sample Metrics litmuschaos_namespace_scoped_passed_experiments 2
Notes The litmuschaos_namespace_scoped_passed_experiments defines the total number of passed experiments in the WATCH_NAMESPACE. It is the summation of litmuschaos_passed_experiments metrics for every ChaosResult present inside the WATCH_NAMESPACE.
Metrics Name litmuschaos_namespace_scoped_failed_experiments
Description It contains the total failed experiments count in the WATCH_NAMESPACE
Source Aggregated sum of all the litmuschaos_failed_experiments metrics derived from the ChaosResult present inside WATCH_NAMESPACE
Sample Metrics litmuschaos_namespace_scoped_failed_experiments 0
Notes The litmuschaos_namespace_scoped_failed_experiments defines the total number of failed experiments in the WATCH_NAMESPACE. It is the summation of litmuschaos_failed_experiments metrics for every ChaosResult present inside the WATCH_NAMESPACE.
Metrics Name litmuschaos_namespace_scoped_awaited_experiments
Description It contains the total awaited experiments count in the WATCH_NAMESPACE
Source Aggregated sum of all the litmuschaos_awaited_experiments metrics derived from the ChaosResult present inside WATCH_NAMESPACE
Sample Metrics litmuschaos_namespace_scoped_awaited_experiments 0
Notes The litmuschaos_namespace_scoped_awaited_experiments defines the total number of awaited/queued experiments in the WATCH_NAMESPACE. It is the summation of litmuschaos_awaited_experiments metrics for every ChaosResult present inside the WATCH_NAMESPACE.
Metrics Name litmuschaos_namespace_scoped_experiments_run_count
Description It contains the total experiments run count in the WATCH_NAMESPACE
Source Aggregated sum of all the experiments runs in the WATCH_NAMESPACE
Sample Metrics litmuschaos_namespace_scoped_experiments_run_count 2
Notes The litmuschaos_namespace_scoped_experiments_run_count defines the total experiment runs in the WATCH_NAMESPACE. It is summation of litmuschaos_passed_experiments + litmuschaos_failed_experiments + litmuschaos_awaited_experiments for every ChaosResult present present inside the WATCH_NAMESPACE.
Metrics Name litmuschaos_namespace_scoped_experiments_installed_count
Description It contains the total unique experiments installed/run in the WATCH_NAMESPACE
Source It contains total unique experiments count in the WATCH_NAMESPACE
Sample Metrics litmuschaos_namespace_scoped_experiments_installed_count 1
Notes The litmuschaos_namespace_scoped_experiments_installed_count defines the total unique experiments installed/run in the WATCH_NAMESPACE. It is equal to the total number of ChaosResult present inside the WATCH_NAMESPACE.

ClusterScoped Metrics

Metrics Name litmuschaos_cluster_scoped_passed_experiments
Description It contains the total passed experiments count in all the namespaces
Source Aggregated sum of all the litmuschaos_passed_experiments metrics derived from the ChaosResult present inside all the namespaces
Sample Metrics litmuschaos_cluster_scoped_passed_experiments 2
Notes The litmuschaos_cluster_scoped_passed_experiments defines the total number of passed experiments across the cluster. It is the summation of litmuschaos_passed_experiments metrics for every ChaosResult in all the namespaces.
Metrics Name litmuschaos_cluster_scoped_failed_experiments
Description It contains the total failed experiments count in all the namespaces
Source Aggregated sum of all the litmuschaos_failed_experiments metrics derived from the ChaosResult present inside all the namespaces
Sample Metrics litmuschaos_cluster_scoped_failed_experiments 0
Notes The litmuschaos_cluster_scoped_failed_experiments defines the total number of failed experiments across the cluster. It is the summation of litmuschaos_failed_experiments metrics for every ChaosResult in all the namespaces.
Metrics Name litmuschaos_cluster_scoped_awaited_experiments
Description It contains the total awaited experiments count in all the namespaces
Source Aggregated sum of all the litmuschaos_awaited_experiments metrics derived from the ChaosResult present inside all the namespaces
Sample Metrics litmuschaos_cluster_scoped_awaited_experiments 0
Notes The litmuschaos_cluster_scoped_awaited_experiments defines the total number of awaited/queued experiments across the cluster. It is the summation of litmuschaos_awaited_experiments metrics for every ChaosResult in all the namespaces.
Metrics Name litmuschaos_cluster_scoped_experiments_run_count
Description It contains the total experiments run count in all the namespaces
Source Aggregated sum of all the experiments runs in all the namespaces
Sample Metrics litmuschaos_cluster_scoped_experiments_run_count 2
Notes The litmuschaos_cluster_scoped_experiments_run_count defines the total experiment runs across the cluster. It is summation of litmuschaos_passed_experiments + litmuschaos_failed_experiments + litmuschaos_awaited_experiments for every ChaosResult present inside all the namespaces.
Metrics Name litmuschaos_cluster_scoped_experiments_installed_count
Description It contains the total unique experiments installed/run in all the namespaces
Source It contains total unique experiments count in all the namespaces
Sample Metrics litmuschaos_cluster_scoped_experiments_installed_count 1
Notes The litmuschaos_cluster_scoped_experiments_installed_count defines the total unique experiments installed/run across the cluster. It is equal to the total number of ChaosResult present inside all the namespaces.

Steps to build & deploy:

Running Litmus Chaos Experiments in order to generate metrics

  • Follow the steps described here to start running litmus chaos experiments ans storing chaos results. The chaos custom resources are used by the exporter to generate metrics.

Running Chaos Exporter on the local Machine

  • Run the exporter container (litmuschaos/chaos-exporter:ci) on host network. It is necessary to mount the kubeconfig & override entrypoint w/ ./exporter -kubeconfig <path>

  • Execute curl 127.0.0.1:8080/metrics to view metrics

Running Chaos Exporter as a deployment on the Kubernetes Cluster

  • Install the RBAC (serviceaccount, role, rolebinding) as per deploy/rbac.md

  • Deploy the chaos-exporter.yaml

  • From a cluster node, execute curl <exporter-service-ip>:8080/metrics

Example Metrics

# HELP litmuschaos_awaited_experiments Total number of awaited experiments
# TYPE litmuschaos_awaited_experiments gauge
litmuschaos_awaited_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 0
# HELP litmuschaos_cluster_scoped_awaited_experiments Total number of awaited experiments in all namespaces
# TYPE litmuschaos_cluster_scoped_awaited_experiments gauge
litmuschaos_cluster_scoped_awaited_experiments 0
# HELP litmuschaos_cluster_scoped_experiments_installed_count Total number of experiments in all namespaces
# TYPE litmuschaos_cluster_scoped_experiments_installed_count gauge
litmuschaos_cluster_scoped_experiments_installed_count 1
# HELP litmuschaos_cluster_scoped_experiments_run_count Total experiments run in all namespaces
# TYPE litmuschaos_cluster_scoped_experiments_run_count gauge
litmuschaos_cluster_scoped_experiments_run_count 2
# HELP litmuschaos_cluster_scoped_failed_experiments Total number of failed experiments in all namespaces
# TYPE litmuschaos_cluster_scoped_failed_experiments gauge
litmuschaos_cluster_scoped_failed_experiments 0
# HELP litmuschaos_cluster_scoped_passed_experiments Total number of passed experiments in all namespaces
# TYPE litmuschaos_cluster_scoped_passed_experiments gauge
litmuschaos_cluster_scoped_passed_experiments 2
# HELP litmuschaos_experiment_chaos_injected_time chaos injected time of the experiments
# TYPE litmuschaos_experiment_chaos_injected_time gauge
litmuschaos_experiment_chaos_injected_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618426086e+09
# HELP litmuschaos_experiment_end_time end time of the experiments
# TYPE litmuschaos_experiment_end_time gauge
litmuschaos_experiment_end_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618426108e+09
# HELP litmuschaos_experiment_start_time start time of the experiments
# TYPE litmuschaos_experiment_start_time gauge
litmuschaos_experiment_start_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618426056e+09
# HELP litmuschaos_failed_experiments Total number of failed experiments
# TYPE litmuschaos_failed_experiments gauge
litmuschaos_failed_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 0
# HELP litmuschaos_passed_experiments Total number of passed experiments
# TYPE litmuschaos_passed_experiments gauge
litmuschaos_passed_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 2
# HELP litmuschaos_probe_success_percentage ProbeSuccesPercentage for the experiments
# TYPE litmuschaos_probe_success_percentage gauge
litmuschaos_probe_success_percentage{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 100

License

FOSSA Status