- About
- Prerequisites
- Pricing model
- Deployment - Quick steps
The scripts aim to deploy a basic Apache Pulsar cluster on a Digital Ocean managed Kubernetes cluster. The two biggest differences with the official scripts are :
- The use of StatefulSets rather than DaemonSets to provide sticky identities, permanent storage and tune pod scheduling
- The use of Persistent Volume Claims with a special Storage Class to attach Digital Ocean Block Storage to the pods
The deployment method shown has been tested on a fresh 3 nodes (2vCPU, 4GB RAM) pool.
Like the official documentation (https://pulsar.apache.org/docs/en/deploy-kubernetes/), this method will deploy :
- A three-node ZooKeeper cluster
- A two-bookie BookKeeper cluster
- A three-broker Pulsar cluster
- A pod from which you can run administrative commands using the pulsar-admin CLI tool
OPTIONNAL
- A monitoring stack consisting of Prometheus, Grafana, and the Pulsar dashboard
- NodePort services to expose components to the outside of your Kubernetes cluster (= the whole internet)
- A Digital Ocean Kubernetes cluster
- kubectl configured to interact with your cluster
Official documentation for each prerequisite :
- https://www.digitalocean.com/docs/kubernetes/how-to/create-clusters/
- https://www.digitalocean.com/docs/kubernetes/how-to/connect-to-cluster/
THIS MODEL DOES NOT TAKE INTO ACCOUNT BANDWIDTH USAGE AND BILLING !
Price per month in USD = (Nprice * N) + (Z * Zstorage * 0.10) + (B * (Bjournal + Bledger) * 0.10) + (Pstorage * 0.10)
- N : number of nodes
- Nprice : price in USD/month of a node
- Z : Number of ZooKeeper replicas
- Zstorage : Size of the storage (GiB) requested by a ZooKeeper replica
- B : Number of BookKeeper replicas
- Bjournal : Size of the storage (GiB) requested by a BookKeeper replica for the journal
- Bledger : Size of the storage (GiB) requested by a BookKeeper replica for the ledger
- Pstorage : Size of the storage (GiB) requested by the Prometheus replica
- Block storage is currently billed 0.10 USD per month per GiB
Pricing for the example
- 3 Nodes billed 20 USD/month
- 3 ZooKeeper replicas with 2GiB storage each
- 2 BookKeeper replicas with 50GiB storage for ledger + 12 GiB storage for journal
- 1 Prometheus replica with 10 GiB storage
Price in USD/month = (3 * 20) + (3 * 2 * 0.10) + (2 * (50 + 12) * 0.10) + (10 * 0.10) = 74 USD/month
You can use the following command to deploy your ZooKeeper cluster :
kubectl apply -f https://raw.githubusercontent.com/guillaume-braibant/unofficial-pulsar-digitalocean-k8s-deployment/master/zookeeper.yaml
This command allow you to check the state of your ZooKeeper pods :
kubectl get pods -l component=zookeeper -o wide
NOTE : The first container deployment can take up to several minutes because the container image to pull is big (https://hub.docker.com/r/apachepulsar/pulsar-all/tags)
Once ALL your ZooKeeper pods are in Running State, you can deploy a job that will initialize your ZooKeeper cluster with some metadata.
This command deploys the job that initializes metadata on your ZooKeeper cluster (cluster-metadata.yaml) :
kubectl apply -f https://raw.githubusercontent.com/guillaume-braibant/unofficial-pulsar-digitalocean-k8s-deployment/master/cluster-metadata.yaml
This command deploys the BookKeeper cluster :
kubectl apply -f https://raw.githubusercontent.com/guillaume-braibant/unofficial-pulsar-digitalocean-k8s-deployment/master/bookie.yaml
This command allow you to check the state of your BookKeeper pods :
kubectl get pods -l component=bookkeeper -o wide
Once your bookies are up, you can deploy the brokers :
kubectl apply -f https://raw.githubusercontent.com/guillaume-braibant/unofficial-pulsar-digitalocean-k8s-deployment/master/broker.yaml
This command deploys a pod from which you can run administrative commands using the pulsar-admin CLI tool :
kubectl apply -f https://raw.githubusercontent.com/guillaume-braibant/unofficial-pulsar-digitalocean-k8s-deployment/master/admin.yaml
From here, you can run a simple test that uses the pulsar-admin pod to create a producer and consumer :
This command creates the producer :
kubectl exec pulsar-admin -it -- bin/pulsar-perf produce persistent://public/default/test-topic --rate 1000
This command creates the consumer :
kubectl exec pulsar-admin -it -- bin/pulsar-perf consume persistent://public/default/test-topic --subscriber-name test-subscription
The first thing to do is to create a ServiceAccount that allow Prometheus to query your Kubernetes cluster in order to find the different components from which metrics should be scrapped :
kubectl apply -f https://raw.githubusercontent.com/guillaume-braibant/unofficial-pulsar-digitalocean-k8s-deployment/master/prometheus-rbac.yaml
The next step consists in deploying the various components of the monitoring stack :
kubectl apply -f https://raw.githubusercontent.com/guillaume-braibant/unofficial-pulsar-digitalocean-k8s-deployment/master/monitoring.yaml
CAUTION : Allowing access from outside your Digital Ocean Kubernetes cluster exposes the components to the whole Internet. Security considerations and component security configuration are beyond the perimeter of this guide.
Currently, the component of your Apache Pulsar cluster can be accessed only from within your Digital Ocean Kubernetes cluster.
This command deploys a NodePort service that exposes your brokers (30001 & 30002) :
kubectl apply -f https://raw.githubusercontent.com/guillaume-braibant/unofficial-pulsar-digitalocean-k8s-deployment/master/broker-proxy.yml
This command deploys a NodePort service that exposes Prometheus (30003) :
kubectl apply -f https://raw.githubusercontent.com/guillaume-braibant/unofficial-pulsar-digitalocean-k8s-deployment/master/prometheus-proxy.yml
This command deploys a NodePort service that exposes Grafana (30004) :
kubectl apply -f https://raw.githubusercontent.com/guillaume-braibant/unofficial-pulsar-digitalocean-k8s-deployment/master/grafana-proxy.yml
This command deploys a NodePort service that exposes the Pulsar Dashboard (30005) :
kubectl apply -f https://raw.githubusercontent.com/guillaume-braibant/unofficial-pulsar-digitalocean-k8s-deployment/master/dashboard-proxy.yml
You can access your component by using the Nodeport service port and the public IP of one of the droplets that compose your Digital Ocean cluster (Droplet public IP : port). The ways to provide a less fragile way (not relying on one public IP) to target the components of your Apache Pulsar cluster from outside your Digital Ocean Kubernetes cluster are also beyong the perimeter of this guide.