This is a small python script that will generate separate watches for each group of pods.
If you have a namespace called test
and a deployment called nginx
and elasticsearch
this script will automagically create watches called test.nginx
and test.elasticsearch
which will send email and or slack alerts if any of the pods in these deployments are not ready.
An alert will be sent if a single pod is not ready for 2 minutes in a 5 minute window. This can be increased or decreased by changing the failures
settings.
An alert will be sent if a cronjob fails 2 times in a row. This behavior can be changed by modifying the job_failures
Monitoring can be enabled on a namespace or pod level. To enable monitoring on the a namespace called infra
you can add the label watcher=enabled
kubectl label namespace infra watcher=enabled
This can also be enabled directly inside the namespace definition
apiVersion: v1
kind: Namespace
metadata:
name: infra
labels:
watcher: enabled
Enabling watcher for a single deployment in a namespace that isn't enabled
spec:
template:
metadata:
labels:
watcher: enabled
Disabling watcher for a single deployment in a namespace that is enabled
spec:
template:
metadata:
labels:
watcher: disabled
All of the defaults (read from ./kuberwatcher.yml
) can be overridden on a namespace and a pod level under the annotations metadata in the Kubernetes object
Sending all slack alerts in the michael
namespace to slack user @michael.russell
apiVersion: v1
kind: Namespace
metadata:
name: michael
labels:
watcher: enabled
annotations:
watcher.alerts.slack: '@michael.russell'
Adding alert documentation links for pods in a deployment
spec:
template:
metadata:
annotations:
watcher.docs: https://github.com/elastic/kuberwatcher/blob/master/README.md
watcher.alerts.email: 'username@elastic.co,team@elastic.co' # Comma separated list of email addresses
watcher.alerts.slack: '@michael.russell,infra' # Comma separated list of `@usernames` and `channelnames`
watcher.kibana_url 'https://kibana.elastic.co' # Base URl for generating links to Kibana
watcher.docs: 'https://example.com/HALP.md' # Documentation link that will be included in the alert
watcher.failures: 12 # How many failed attempts need to have happened in the current window per pod (optional)
watcher.job_failures: 2 # How many times a cronjob needs to fail in a row being an alert is sent (optional)
watcher.interval: '30s' # How often the query is run (optional)
watcher.reply_to 'team@elastic.co,reply@elastic.o' # Comma separated list of email addresses to use for the reply_to field in the email alerts
watcher.throttle: 600000 # How often to send the alert in ms (optional)
watcher.window: '300s' # How long the the pods need to be not ready before alerting (optional)
watcher.metricbeat_index_pattern: 'metricbeat-*' # The Index Pattern to search for Metricbeat metrics (optional)
- Metricbeat (with kube-state-metrics) deployed on kubernetes with the
state_pod
metricset enabled. - Elasticsearch with X-Pack and watcher enabled
- If you want to send emails you will need to configure an email account in X-Pack.
- If you want to send slack alerts you will need to configure a slack account in X-Pack.
If you just want to try out kuberwatcher running locally it will use your current kubectl context to generate alerts.
- First update the local kuberwatcher.yml
- Set the elasticsearch connection details
export ES_USERNAME='elastic'
export ES_PASSWORD='changeme'
export ES_HOSTS='http://elasticsearch:9200'
- Run kuberwatcher in docker
make run
There is an example in k8s/kuberwatcher.yml for running kuberwatcher as a CronJob in kubernetes.
- Create the secrets for kuberwatcher to connect to Elasticsearch
kubectl create secret generic kuberwatcher --from-literal=endpoint=http://elasticsearch:9200 --from-literal=password=changeme --from-literal=username=elastic
- Modify the configuration in the configmap defined in k8s/kuberwatcher.yml
- Deploy!
kubectl apply -f k8s/kuberwatcher.yml
Requirements:
- Docker
- Make
To run all of the tests:
make test
Because this project interacts with two fast moving projects (Elasticsearch and kubernetes) it uses vcrpy to record interactions with these APIs instead of trying to constantly update mocks for both projects.
If you are making changes that affect API calls to kubernetes or Elasticsearch you will need to make sure you have a working development environment to record the new API transactions. To start Elasticsearch and Kibana you can run:
# if you want to test with slack you need to add your slack url too
export SLACK_URL='https://hooks.slack.com/services/SDKLSD/SDFLIJSDF323f3f'
make deps
To generate the test pods in a local minikube cluster you can run install the fixtures by running:
make fixtures
This will start metricbeat, kube-state-metrics and a bunch of tests pods which were used to generate the cassette data.
If you want to refresh the recorded API data in the cassettes you can run:
make clean_cassettes
When you are finished you can clean up everything by running:
make clean