Skip to content

Latest commit

 

History

History
33 lines (26 loc) · 2.36 KB

README.md

File metadata and controls

33 lines (26 loc) · 2.36 KB

azure-kafka-spark-adls

Deploy to Azure

This ARM template deploys multiple HDInsight clusters (Spark + Kafka) in the same Virtual Network. Spark's storage is primarily backed by Azure Data Lake Store while Kafka uses Blob Storage.

Since ADLS on HDInsight requires Service Principal with certificate, we've created a Bash script to automate entire deployment. Script creates a self-signed certificate and converts it to PKCS12 format.

Caveats

  • For simplicity we've kept as many resource names as $CLUSTER_NAME as possible.
  • VNet address space, VM Sizes and number of Head/Worker/Zookeeper nodes are hardcoded inside the template.

Prerequisites

Deploy

./deploy.sh <CLUSTER_NAME>

Provide password when prompted. It will be used for accessing all dashboards and SSH. It takes ~20 minutes to deploy all resources.

Limitations

  • It's not possible to create Service Principal inside an ARM template, since it resides outside resource groups.
  • As of now ADLS is only available in these regions.
  • Kafka doesn't support ADLS as primary storage.
  • HDInsight doesn't allow direct connection to Kafka over public internet.
  • Once an HDInsight cluster is provisioned, only number of worker nodes can be scaled, not the size of VMs.
  • Existing HDInsight cluster cannot join a new VNet.

Resources