## Kafka Benchmark

### Managed Kafka on GCP

#### 1. Create a Kafka instance

In [5]:
%set_env PROJECT_ID=peace-demo
%set_env LOCATION=us-central1
%set_env SUBNET=default
%set_env KAFKA_ID=demo-etl-kafka-instance
%set_env TOPIC_ID=benchmark_1
%set_env CLIENT_PROPS=/home/client.properties
%set_env GCS_PATH=gs://peace-demo-temp-us-central1/kafka_benchmark_1_to_gcs/
%set_env JOB_NAME=kafka-benchmark1-to-gcs

env: PROJECT_ID=peace-demo
env: LOCATION=us-central1
env: SUBNET=default
env: KAFKA_ID=demo-etl-kafka-instance
env: TOPIC_ID=benchmark_1
env: CLIENT_PROPS=/home/client.properties
env: GCS_PATH=gs://peace-demo-temp-us-central1/kafka_benchmark_1_to_gcs/
env: JOB_NAME=kafka-benchmark1-to-gcs


In [2]:
!gcloud beta managed-kafka clusters create $KAFKA_ID \
    --location=$LOCATION \
    --cpu=3 \
    --memory=3221225472 \
    --subnets=projects/$PROJECT_ID/regions/$LOCATION/subnetworks/$SUBNET \
    --auto-rebalance

Create request issued for: [demo-etl-kafka-instance]
Waiting for operation [projects/peace-demo/locations/us-central1/operations/ope
ration-1727421379920-62314a1ba41d7-eb75bec1-081078c5] to complete...done.      
Created cluster [demo-etl-kafka-instance].


To take a quick anonymous survey, run:
  $ gcloud survey



#### 2. Create a Topic

In [3]:
!gcloud beta managed-kafka topics create $TOPIC_ID \
    --cluster=$KAFKA_ID --location=$LOCATION \
    --partitions=3 \
    --replication-factor=1

Created topic [benchmark_1].


#### 3. Create a Dataflow job from Kafka to GCS

In [7]:
!gcloud dataflow flex-template run $JOB_NAME \
    --template-file-gcs-location gs://dataflow-templates-$LOCATION/latest/flex/Kafka_to_Gcs_Flex \
    --region $LOCATION \
    --num-workers 1 \
    --parameters readBootstrapServerAndTopic=projects/$PROJECT_ID/locations/$LOCATION/clusters/$KAFKA_ID/topics/$TOPIC_ID,\
    --parameters windowDuration=5m,\
    --parameters outputDirectory=$GCS_PATH,\
    --parameters outputFilenamePrefix=output-,\
    --parameters numShards=0,\
    --parameters enableCommitOffsets=false,\
    --parameters kafkaReadOffset=latest,\
    --parameters kafkaReadAuthenticationMode=APPLICATION_DEFAULT_CREDENTIALS,\
    --parameters messageFormat=JSON,\
    --parameters useBigQueryDLQ=false,\
    --parameters autoscalingAlgorithm=NONE

job:
  createTime: '2024-09-27T09:21:35.803677Z'
  currentStateTime: '1970-01-01T00:00:00Z'
  id: 2024-09-27_02_21_35-162507244572144438
  location: us-central1
  name: kafka-benchmark1-to-gcs
  projectId: peace-demo
  startTime: '2024-09-27T09:21:35.803677Z'


#### 4. Install golang and run the benchmark job.

- (Optional) Create a benchmark instance
```bash
gcloud compute instances create instance-kafka-benchmark \
    --project=peace-demo \
    --zone=us-central1-c \
    --machine-type=n2d-standard-8 \
    --network-interface=network-tier=PREMIUM,stack-type=IPV4_ONLY,subnet=default \
    --metadata=enable-osconfig=TRUE,enable-oslogin=true \
    --no-restart-on-failure \
    --maintenance-policy=TERMINATE \
    --provisioning-model=SPOT \
    --instance-termination-action=STOP \
    --max-run-duration=21600s \
    --service-account=642598805451-compute@developer.gserviceaccount.com \
    --scopes=https://www.googleapis.com/auth/cloud-platform \
    --create-disk=auto-delete=yes,boot=yes,device-name=instance-20240927-115233,image=projects/debian-cloud/global/images/debian-12-bookworm-v20240910,mode=rw,size=100,type=pd-balanced \
    --no-shielded-secure-boot \
    --shielded-vtpm \
    --shielded-integrity-monitoring \
    --labels=goog-ops-agent-policy=v2-x86-template-1-3-0,goog-ec-src=vm_add-gcloud \
    --reservation-affinity=any
```
- Install golang
```bash
sudo su root
apt update -y
wget https://go.dev/dl/go1.23.1.linux-amd64.tar.gz
rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go1.23.1.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin
go version
```
- init go mod and get libraries
```bash
cd /home/jupyter/github/peace-demo/benchmark/kafka
go mod init github.com/ping.coder/peace-demo/benchmark-kafka
go get -u github.com/confluentinc/confluent-kafka-go/kafka
go get -u cloud.google.com/go/bigquery
```
- Run benchmark
```bash
go run kafka_benchmark.go
```

## Ended to release all the resources.

In [None]:
!gcloud dataflow jobs list --region=$LOCATION --status=active --format="value(JOB_ID)" --filter="name=$JOB_NAME" | tail -n 1 | cut -f 1 -d " "

Please modify the job id with above result and run it.

In [None]:
!echo y|gcloud dataflow jobs cancel <job-id> --region=$LOCATION
!gcloud storage rm --recursive $GCS_PATH
!echo y|gcloud beta managed-kafka clusters delete $KAFKA_ID \
    --location=$LOCATION \
    --async

## THE END.