Eventstore on Azure Container Services - benchmarking file storage vs local pod storage
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
assets
src
.gitignore
README.adoc
docinfo.html

README.adoc

EventStore on Azure Kubernetes (ACS)

Prep - set up az

install azure xplat cli

  1. pip install azure-cli

Login and set subscription

  1. az login

  2. az account set -s <subid>

Set up K8S

create service principal

  1. Create service principal

    C:\Users\raghuramanr>az ad sp create-for-rbac --role="Contributor" --scopes="/subscriptions/<masked>"
    Retrying role assignment creation: 1/36
    {
      "appId": "30521959-8e29-4855-9176-ede965cc8432",
      "displayName": "azure-cli-2017-11-14-04-56-25",
      "name": "http://azure-cli-2017-11-14-04-56-25",
      "password": "<masked>",
      "tenant": "7a33d93c-28b7-4b2f-af94-9c3c883b8c95"
    }

Create cluster

  1. Create cluster on Azure portal

  2. download the template for later - saved as d:\downloads\azure-k8s-cluster.zip

  3. Start cluster deployment

  4. Get coffee.

Download cluster creds

  1. Download cluster creds

    C:\Users\raghuramanr>az acs kubernetes get-credentials --resource-group=kube-cluster --name=kube-cluster
    Merged "k8smgmt" as current context in C:\Users\raghuramanr\.kube\config
  2. Verify cluster is working

    C:\Users\raghuramanr>kubectl get nodes
    NAME                       STATUS    ROLES     AGE       VERSION
    k8s-agentpool-31036649-0   Ready     agent     6m        v1.7.9
    k8s-agentpool-31036649-1   Ready     agent     6m        v1.7.9
    k8s-agentpool-31036649-2   Ready     agent     6m        v1.7.9
    k8s-master-31036649-0      Ready     master    6m        v1.7.9

Connect to K8S Web UI

  1. kubectl proxy

  2. Browse to http://localhost:8001/ui

Add Persistent Storage (Azure Files)

  • Reference link: https://docs.microsoft.com/en-us/azure/aks/azure-files

  • Github Repo/branch - https://github.com/raghur/eventstore-kubernetes / aks-persistentdisk

    1. Create storage account

      C:\Users\raghuramanr>az storage account create --n persistentstorage11342 -g kube-cluster --sku Standard_LRS
    2. List keys

      C:\Users\raghuramanr>az storage account keys list -n persistentstorage11342 -g kube-cluster --output table
      KeyName    Permissions    Value
      ---------  -------------  ----------------------------------------------------------------------------------------
      key1       Full           <masked>
      key2       Full           <masked>
    3. base64 encode keys and account name

      Don’t do this on cmd.exe - do it on a real linux box or wsl

    4. Create the fileshares - I just used the web portal - esdisk-1.. esdisk-N

    5. Create an azure secret as in ref link above - template is in fileshare/azure-secret.yml

    6. Add the azure secret to your k8s cluster - kubectl apply -f fileshare/azure-secret.yml

    7. Run cd scripts && generate-deployment.sh 3 for a 3 node cluster.

    8. Create ES Pods: Run kubectl create -f .tmp/es_deployment_X.yaml files to create the ES nodes

    9. Create a service fronting the PODS: Run kubectl create -f services/eventstore.yaml

    10. Create a configmap for nginx: Run kubectl.exe create configmap nginx-es-pd-frontend-conf --from-file nginx/frontend.conf

    11. Create the Nginx front end proxy Pod: Run kubectl create -f deployments/frontend-es.yaml

    12. Create the Nginx service: Run kubectl create -f services/frontend-es.yaml

Perf Bench

Prepping k8s on ACS for monitoring with Heapster, Grafana and InfluxDB

To collect pod utilization under load, we need heapster, grafana and influx db working as described here with setup instructions here. They however require some tweaks on ACS because the default ACS deployment includes heapster but not grafana and influx db. Due to this, the heapster node is not provided a sink (and so ineffective). To fix:

  1. Clone the heapster repo - https://github.com/kubernetes/heapster

  2. Follow this step in the guide:

    $ kubectl create -f deploy/kube-config/influxdb/
    $ kubectl create -f deploy/kube-config/rbac/heapster-rbac.yaml
  3. Now fix up heapster

    1. Open heapster on kubernetes dashboard (http://localhost:8001/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy/#!/deployment/kube-system/heapster?namespace=kube-system)

    2. Click 'Edit'

    3. Find the container with name: heapster and add a --sink=influxdb:http://monitoring-influxdb.kube-system.svc:8086

      alt
      Figure 1. Add the sink parameter to heapster
  4. Now we need to make Grafana accessible from outside the cluster.

  5. Once k8s updates the service, you should see an external IP - and browsing to http://<externalip>; should bring you to the Grafana dashboard.

Test Scenario

  • Each user creates a stream, adds 10 events, then reads the stream completely followed by reading each event individually.

  • Test run is 10 concurrent users repeating for 5 mins from a single client node (my machine)

Happy path - no node failures - 10 concurrent users

As expected, the podversion is able to serve 33% more requests though CPU utilization is a little higher since IO happens locally?

Test Results - client summary

Table 1. A 5 minute test with 10 concurrent users
PodVersion (local pod storage) Persistent Disk (Azure file share)
    ✓ is status 201
    ✓ is status 200

    checks................: 100.00%
    data_received.........: 13 MB (45 kB/s)
    data_sent.............: 1.9 MB (6.5 kB/s)
    http_req_blocked......: avg=169.85µs max=123.74ms med=0s min=0s p(90)=0s p(95)=0s
    http_req_connecting...: avg=163.3µs max=123.74ms med=0s min=0s p(90)=0s p(95)=0s
    http_req_duration.....: avg=37.31ms max=384.74ms med=25.25ms min=11.02ms p(90)=64.17ms p(95)=70.53ms
    http_req_receiving....: avg=135.21µs max=112.45ms med=0s min=0s p(90)=966.6µs p(95)=1ms
    http_req_sending......: avg=44.73µs max=18.04ms med=0s min=0s p(90)=0s p(95)=0s
    http_req_waiting......: avg=37.13ms max=383.74ms med=25.09ms min=11.02ms p(90)=64.17ms p(95)=70.31ms
    http_reqs.............: 79237 (264.12333333333333/s)
    vus...................: 10
    vus_max...............: 10
    ✓ is status 201
    ✗ is status 200
          0.02% (6/33058)

    checks................: 99.99%
    data_received.........: 11 MB (36 kB/s)
    data_sent.............: 1.5 MB (5.1 kB/s)
    http_req_blocked......: avg=192.75µs max=1.01s med=0s min=0s p(90)=0s p(95)=0s
    http_req_connecting...: avg=188.42µs max=1.01s med=0s min=0s p(90)=0s p(95)=0s
    http_req_duration.....: avg=47.04ms max=4.57s med=30.07ms min=11.01ms p(90)=83.87ms p(95)=99.23ms
    http_req_receiving....: avg=120.67µs max=72.19ms med=0s min=0s p(90)=489µs p(95)=1ms
    http_req_sending......: avg=32.91µs max=2ms med=0s min=0s p(90)=0s p(95)=0s
    http_req_waiting......: avg=46.88ms max=4.57s med=29.11ms min=10.99ms p(90)=83.39ms p(95)=98.46ms
    http_reqs.............: 63163 (210.54333333333332/s)
    vus...................: 10
    vus_max...............: 10

Test Results - CPU utilization

Table 2. A 5 minute test with 10 concurrent users
PodVersion (local pod storage) Persistent Disk (Azure file share)
alt
alt

Happy path - no node failures - 100 concurrent users

Test Results - client summary

Table 3. A 5 minute test with 100 concurrent users
PodVersion (local pod storage) Persistent Disk (Azure file share)
    ✓ is status 201
    ✓ is status 200

    checks................: 100.00%
    data_received.........: 43 MB (144 kB/s)
    data_sent.............: 6.3 MB (21 kB/s)
    http_req_blocked......: avg=885.64µs max=3.06s med=0s min=0s p(90)=0s p(95)=0s
    http_req_connecting...: avg=878.71µs max=3.06s med=0s min=0s p(90)=0s p(95)=0s
    http_req_duration.....: avg=117.77ms max=3.56s med=105.27ms min=13.03ms p(90)=158.42ms p(95)=184.48ms
    http_req_receiving....: avg=910µs max=1.82s med=0s min=0s p(90)=0s p(95)=1ms
    http_req_sending......: avg=23.87µs max=7.04ms med=0s min=0s p(90)=0s p(95)=0s
    http_req_waiting......: avg=116.84ms max=3.56s med=104.29ms min=13.03ms p(90)=157.42ms p(95)=182.48ms
    http_reqs.............: 252400 (841.3333333333334/s)
    vus...................: 100
    vus_max...............: 100
    ✓ is status 201
    ✓ is status 200

    checks................: 100.00%
    data_received.........: 33 MB (109 kB/s)
    data_sent.............: 4.6 MB (15 kB/s)
    http_req_blocked......: avg=1.05ms max=9.1s med=0s min=0s p(90)=0s p(95)=0s
    http_req_connecting...: avg=1.04ms max=9.1s med=0s min=0s p(90)=0s p(95)=0s
    http_req_duration.....: avg=149.99ms max=7.14s med=121.29ms min=12.03ms p(90)=219.57ms p(95)=281.74ms
    http_req_receiving....: avg=1.21ms max=3.67s med=0s min=0s p(90)=0s p(95)=1ms
    http_req_sending......: avg=22.99µs max=10.02ms med=0s min=0s p(90)=0s p(95)=0s
    http_req_waiting......: avg=148.75ms max=6.08s med=120.31ms min=12.03ms p(90)=218.58ms p(95)=279.74ms
    http_reqs.............: 198334 (661.1133333333333/s)
    vus...................: 100
    vus_max...............: 100

Test Results - CPU utilization

Table 4. A 5 minute test with 100 concurrent users
PodVersion (local pod storage) Persistent Disk (Azure file share)
alt
alt

Happy path - no node failures - 1000 concurrent users

Now we start seeing a bunch of errors - however, these were client timeouts so I’m not exactly sure if things broke at the server end. The pattern continues though - POD version serves more reqs/s at a slightly higher CPU utilization.

I should probably run a couple of nodes to drive traffic and do that - but that means reading more k6.io documentation which I’d rather not ATM

POD Version Persistent Disk Version
# POD version
    ✗ is status 201
          0.44% (229/52608)
    ✗ is status 200
          0.65% (332/51389)

    checks................: 99.46%
    data_received.........: 32 MB (267 kB/s)
    data_sent.............: 5.7 MB (47 kB/s)
    http_req_blocked......: avg=50.02ms max=21.13s med=0s min=0s p(90)=0s p(95)=0s
    http_req_connecting...: avg=49.89ms max=21.09s med=0s min=0s p(90)=0s p(95)=0s
    http_req_duration.....: avg=868.85ms max=1m0s med=188.52ms min=106.3ms p(90)=1.42s p(95)=2.66s
    http_req_receiving....: avg=173.69ms max=59.56s med=0s min=0s p(90)=0s p(95)=69.15ms
    http_req_sending......: avg=34.97µs max=1.2s med=0s min=0s p(90)=0s p(95)=0s
    http_req_waiting......: avg=695.13ms max=59.74s med=188.47ms min=106.3ms p(90)=1.22s p(95)=2.14s
    http_reqs.............: 103996 (866.6333333333333/s)
    vus...................: 1000
    vus_max...............: 1000
# persistentdisk version - more failures
    ✗ is status 200
          1.26% (573/45500)
    ✗ is status 201
          1.03% (482/46886)

    checks................: 98.86%
    data_received.........: 34 MB (282 kB/s)
    data_sent.............: 6.1 MB (51 kB/s)
    http_req_blocked......: avg=65.61ms max=21.03s med=0s min=0s p(90)=0s p(95)=0s
    http_req_connecting...: avg=65.33ms max=21.01s med=0s min=0s p(90)=0s p(95)=0s
    http_req_duration.....: avg=1.06s max=1m0s med=459.24ms min=95.22ms p(90)=1.73s p(95)=3.02s
    http_req_receiving....: avg=158.52ms max=59.74s med=0s min=0s p(90)=0s p(95)=1.02ms
    http_req_sending......: avg=740.5µs max=19.39s med=0s min=1ms p(90)=0s p(95)=0s
    http_req_waiting......: avg=907.79ms max=59.52s med=445.19ms min=95.22ms p(90)=1.58s p(95)=2.58s
    http_reqs.............: 92386 (769.8833333333333/s)
    vus...................: 1000
    vus_max...............: 1000
PodVersion for 1000 cusers
Persistent Disk - CPU - 1000cusers

A real-world test with node failures

So for this, I think I’m going to run a 500 user test for 5 mins on each configuration and then randomly kill pods during the test.

The POD version will get a new node which will have to catch up to the cluster master since it will start off with empty storage.

The Persistent Disk version OTOH, has data intact - so the moment a node comes up, it should just carry on. IMO, in this test, we should see the Persistent Disk version do better.

The results

Interesting to say the least. The persistent disk version did not a do a lot better as expected (or, said the other way round, the pod version recovered pretty quickly on pod failure). There are slightly more failures on the pod version, but not a whole lot - we’re talking .03% difference. The persistent disk version pulled ahead by a small factor for once (20req/s) but that’s it.

Caveat

In this case, pod failures were probably far enough to not matter - ie pod1 was deleted and pod1' came online and caught up before pod2 was deleted. If both pods went offline in quick succession, then data loss is a real possibility.

POD Version Persistent Disk Version
  1. Pod es-223* was deleted at 1m

  2. Pod es-223* was deleted at 3m18s

# POD version
    ✗ is status 201
          0.15% (184/126755)
    ✗ is status 200
          0.19% (265/136609)

    checks................: 99.83%
    data_received.........: 51 MB (170 kB/s)
    data_sent.............: 7.9 MB (26 kB/s)
    http_req_blocked......: avg=8.44ms max=21s med=0s min=0s p(90)=0s p(95)=0s
    http_req_connecting...: avg=8.43ms max=21s med=0s min=0s p(90)=0s p(95)=0s
    http_req_duration.....: avg=539.67ms max=1m0s med=198.5ms min=147.39ms p(90)=952.55ms p(95)=1.55s
    http_req_receiving....: avg=57.89ms max=59.55s med=0s min=0s p(90)=0s p(95)=1.03ms
    http_req_sending......: avg=23.15µs max=3.5ms med=0s min=0s p(90)=0s p(95)=0s
    http_req_waiting......: avg=481.75ms max=59.1s med=196.52ms min=147.39ms p(90)=891.36ms p(95)=1.47s
    http_reqs.............: 263364 (877.88/s)
    vus...................: 500
    vus_max...............: 500
  1. Pod es-1* was deleted at 1m

  2. Pod es-3* was deleted at 3m18s

# persistentdisk version
# deleted pods at 1m mark and 3m18s mark
    ✗ is status 201
          0.12% (148/123644)
    ✗ is status 200
          0.13% (173/133111)

    checks................: 99.87%
    data_received.........: 50 MB (167 kB/s)
    data_sent.............: 7.8 MB (26 kB/s)
    http_req_blocked......: avg=11.62ms max=21.02s med=0s min=0s p(90)=0s p(95)=0s
    http_req_connecting...: avg=11.56ms max=21.01s med=0s min=0s p(90)=0s p(95)=0s
    http_req_duration.....: avg=553.44ms max=1m0s med=272.74ms min=80.21ms p(90)=948.51ms p(95)=1.49s
    http_req_receiving....: avg=58.1ms max=59.76s med=0s min=0s p(90)=0s p(95)=1.02ms
    http_req_sending......: avg=562.74µs max=21.96s med=0s min=0s p(90)=0s p(95)=0s
    http_req_waiting......: avg=494.78ms max=59.37s med=267.69ms min=79.23ms p(90)=902.37ms p(95)=1.37s
    http_reqs.............: 256755 (855.85/s)
    vus...................: 500
    vus_max...............: 500
PodVersion for 1000 cusers
Figure 2. This pod was not deleted
Persistent Disk - CPU - 1000cusers
Figure 3. This pod was not deleted