Skip to content

loopDelicious/kubernetes-chaos

Repository files navigation

Building resilient APIs with chaos engineering 🔥

The example that follows supports an article published on Better Practices. Learn more about Building resilient APIs with chaos engineering.

Get Started

This sample app is forked from Google's Hipster Shop: Cloud-Native Microservices Demo Application - a web-based e-commerce app called “Hipster Shop” where users can browse items, add them to the cart, and purchase them. The application works on any Kubernetes cluster (such as a local one).

Set up Gremlin and create a Kubernetes cluster on EKS

Start with this guide to Install Gremlin to use with Amazon EKS. You'll need an AWS account, the AWS CLI configured to use eksctl to create the EKS cluster, and a Gremlin account.

  • Step 0 - Verify your account AWS CLI Installation
  • Step 1 - Create an EKS cluster using eksctl
  • Step 2 - Load up the kubeconfig for the cluster
  • Step 3 - Deploy Kubernetes Dashboard
  • Step 4 - Deploy a Microservice Demo application
  • Step 5 - Run a Shutdown Container Attack using Gremlin (skip)

hipster shop

Set up Grafana and Prometheus

You'll need to a way to observe the results of your attack. You can skip this step if you have a different way to do this.

Proceed with Monitoring Kubernetes clusters with Grafana. This particular guide starts with Google Kubernetes Engine (GKE) instead of EKS, but most of the steps are the same after you've create your cluster.

  • Step 0 - Create a GKE cluster (skip)
  • Step 1 - Lots and lots and lots of yaml configuration
  • Step 2 - Configure your cluster settings on Grafana (skip)

Instead of configuring your cluster settings on Grafana, you can simply import an existing dashboard if your monitoring tools are running on your cluster. See the gotchas below.

grafana dashboard 3131

Use Postman to shut down a single container via the Gremlin API

In the Postman app, import the template called Chaos engineering that includes the chaosEngineering environment, and then look for the folder called Shut down a container. You will need to update the Postman environment with your gremlin_api_key and your_deployed_app_url. Read the Chaos engineering collection documentation for step-by-step instructions.

Run in Postman

  1. Get a list of all active containers
  2. Create a shutdown attack on a specific container
  3. Verify app health
  4. Stop the attack (if you need to) attack in postman 500 error collection runner

A few gotchas

Managing Users or IAM Roles for your AWS EKS cluster

If you're using aws-iam-authenticator to manage your clusters from the CLI and have an MFA authentication requirement, you may need to get a session token, and update a separate profile in your /.aws/credentials in order to use the CLI and build a programmatic solution.

Configuring your EKS cluster in Grafana

Your configuration will depend on where you've deployed your Kubernetes cluster and Grafana. I opted to have Grafana and Prometheus running on my cluster.

You can use an ingress controller to manage external access to the apps running inside the cluster like ingress-nginx.

  • You might also need a service that automatically creates and manages TLS certs in Kubernetes like cert-manager (as in this tutorial).

Additionally, you can use an existing Grafana dashboard like this one for an overview of all nodes in a Kubernetes Cluster. There are other options as well.

  • Configuring your cluster in Grafana using the kubernetes-app plugin is not super well-documented, so these notes might help.
  • TLS certs are required for authentication using the plugin, and EKS doesn't support TLS. You can try using a different managed service like GKE (as in this tutorial), or many more steps will be required (if you're using the plugin).