-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate the infrastruture pieces #117
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
cerberus: | ||
distribution: openshift # Distribution can be kubernetes or openshift | ||
kubeconfig_path: ~/.kube/config # Path to kubeconfig | ||
port: 8080 # http server port where cerberus status is published | ||
watch_nodes: True # Set to True for the cerberus to monitor the cluster nodes | ||
watch_cluster_operators: True # Set to True for cerberus to monitor cluster operators | ||
watch_url_routes: # Route url's you want to monitor, this is a double array with the url and optional authorization parameter | ||
watch_master_schedulable: # When enabled checks for the schedulable master nodes with given label. | ||
enabled: True | ||
label: node-role.kubernetes.io/master | ||
watch_namespaces: # List of namespaces to be monitored | ||
- openshift-etcd | ||
- openshift-apiserver | ||
- openshift-kube-apiserver | ||
- openshift-monitoring | ||
- openshift-kube-controller-manager | ||
- openshift-machine-api | ||
- openshift-kube-scheduler | ||
- openshift-ingress | ||
- openshift-sdn # When enabled, it will check for the cluster sdn and monitor that namespace | ||
cerberus_publish_status: True # When enabled, cerberus starts a light weight http server and publishes the status | ||
inspect_components: False # Enable it only when OpenShift client is supported to run | ||
# When enabled, cerberus collects logs, events and metrics of failed components | ||
|
||
prometheus_url: # The prometheus url/route is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes. | ||
prometheus_bearer_token: # The bearer token is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes. This is needed to authenticate with prometheus. | ||
# This enables Cerberus to query prometheus and alert on observing high Kube API Server latencies. | ||
|
||
slack_integration: False # When enabled, cerberus reports the failed iterations in the slack channel | ||
# The following env vars needs to be set: SLACK_API_TOKEN ( Bot User OAuth Access Token ) and SLACK_CHANNEL ( channel to send notifications in case of failures ) | ||
# When slack_integration is enabled, a watcher can be assigned for each day. The watcher of the day is tagged while reporting failures in the slack channel. Values are slack member ID's. | ||
watcher_slack_ID: # (NOTE: Defining the watcher id's is optional and when the watcher slack id's are not defined, the slack_team_alias tag is used if it is set else no tag is used while reporting failures in the slack channel.) | ||
Monday: | ||
Tuesday: | ||
Wednesday: | ||
Thursday: | ||
Friday: | ||
Saturday: | ||
Sunday: | ||
slack_team_alias: # The slack team alias to be tagged while reporting failures in the slack channel when no watcher is assigned | ||
|
||
custom_checks: # Relative paths of files conataining additional user defined checks | ||
|
||
tunings: | ||
timeout: 60 # Number of seconds before requests fail | ||
iterations: 5 # Iterations to loop before stopping the watch, it will be replaced with infinity when the daemon mode is enabled | ||
sleep_time: 5 # Sleep duration between each iteration | ||
kube_api_request_chunk_size: 250 # Large requests will be broken into the specified chunk size to reduce the load on API server and improve responsiveness. | ||
daemon_mode: True # Iterations are set to infinity which means that the cerberus will monitor the resources forever | ||
cores_usage_percentage: 0.5 # Set the fraction of cores to be used for multiprocessing | ||
|
||
database: | ||
database_path: /tmp/cerberus.db # Path where cerberus database needs to be stored | ||
reuse_database: False # When enabled, the database is reused to store the failures |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
kraken: | ||
distribution: openshift # Distribution can be kubernetes or openshift | ||
kubeconfig_path: /root/.kube/config # Path to kubeconfig | ||
exit_on_failure: False # Exit when a post action scenario fails | ||
litmus_version: v1.10.0 # Litmus version to install | ||
litmus_uninstall: False # If you want to uninstall litmus if failure | ||
chaos_scenarios: # List of policies/chaos scenarios to load | ||
- pod_scenarios: # List of chaos pod scenarios to load | ||
- - scenarios/etcd.yml | ||
- - scenarios/regex_openshift_pod_kill.yml | ||
- scenarios/post_action_regex.py | ||
- node_scenarios: # List of chaos node scenarios to load | ||
- scenarios/node_scenarios_example.yml | ||
- pod_scenarios: | ||
- - scenarios/openshift-apiserver.yml | ||
- - scenarios/openshift-kube-apiserver.yml | ||
- time_scenarios: # List of chaos time scenarios to load | ||
- scenarios/time_scenarios_example.yml | ||
- litmus_scenarios: # List of litmus scenarios to load | ||
- - https://hub.litmuschaos.io/api/chaos/1.10.0?file=charts/generic/node-cpu-hog/rbac.yaml | ||
- scenarios/node_hog_engine.yaml | ||
- cluster_shut_down_scenarios: | ||
- - scenarios/cluster_shut_down_scenario.yml | ||
- scenarios/post_action_shut_down.py | ||
cerberus: | ||
cerberus_enabled: True # Enable it when cerberus is previously installed | ||
cerberus_url: http://0.0.0.0:8080 # When cerberus_enabled is set to True, provide the url where cerberus publishes go/no-go signal | ||
|
||
performance_monitoring: | ||
deploy_dashboards: True # Install a mutable grafana and load the performance dashboards. Enable this only when running on OpenShift | ||
repo: "https://github.com/cloud-bulldozer/performance-dashboards.git" | ||
kube_burner_binary_url: "https://github.com/cloud-bulldozer/kube-burner/releases/download/v0.9.1/kube-burner-0.9.1-Linux-x86_64.tar.gz" | ||
capture_metrics: True | ||
config_path: config/kube_burner.yaml # Define the Elasticsearch url and index name in this config | ||
metrics_profile_path: config/metrics-aggregated.yaml | ||
prometheus_url: # The prometheus url/route is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes. | ||
prometheus_bearer_token: # The bearer token is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes. This is needed to authenticate with prometheus. | ||
uuid: # uuid for the run is generated by default if not set | ||
enable_alerts: True # Runs the queries specified in the alert profile and displays the info or exits 1 when severity=error | ||
alert_profile: config/alerts # Path to alert profile with the prometheus queries | ||
|
||
tunings: | ||
wait_duration: 60 # Duration to wait between each chaos scenario | ||
iterations: 1 # Number of times to execute the scenarios | ||
daemon_mode: False # Iterations are set to infinity which means that the kraken will cause chaos forever |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
version: "3" | ||
services: | ||
elastic: | ||
image: docker.elastic.co/elasticsearch/elasticsearch:7.13.2 | ||
deploy: | ||
replicas: 1 | ||
restart_policy: | ||
condition: on-failure | ||
ports: | ||
- "9200:9200" | ||
- "9300:9300" | ||
environment: | ||
discovery.type: single-node | ||
kibana: | ||
image: docker.elastic.co/kibana/kibana:7.13.2 | ||
deploy: | ||
replicas: 1 | ||
restart_policy: | ||
condition: on-failure | ||
ports: | ||
- "5601:5601" | ||
environment: | ||
ELASTICSEARCH_HOSTS: "http://0.0.0.0:9200" | ||
cerberus: | ||
image: quay.io/openshift-scale/cerberus:latest | ||
privileged: true | ||
deploy: | ||
replicas: 1 | ||
restart_policy: | ||
condition: on-failure | ||
ports: | ||
- "8080:8080" | ||
volumes: | ||
- ./config/cerberus.yaml:/root/cerberus/config/cerberus.yaml:Z # Modify the config in case of the need to monitor additional components | ||
- /root/.kube/config:/root/.kube/config:Z |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth mentioning here that this command will continuously run/don't end it till after kraken execution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it, thanks.