Terraform module to deploy Apache Spark on OpenStack. By deploying this module you will get:
- A standalone Apache Spark cluster up and running
- A co-located HDFS file system
- Apache Zeppelin for interactive analysis
- Tensorflow dependencies on each node
- NVIDIA GPU drivers on each node
On your workstation you need to:
- Install Terraform
- Set up the environmet by sourcing the OpenStack RC file for your project
On your OpenStack project you need to:
Start by creating a directory, locating into it and by creating the main Terraform configuration file:
mkdir deployment
cd deployment
touch main.tf
In main.tf
paste and fill in the following configuration:
module "spark" {
source = "mcapuccini/spark/openstack"
# Required variables
public_key="" # Path to a public SSH key
external_net_uuid="" # External network UUID
floating_ip_pool="" # Floating IP pool name
coreos_image_name="" # Name of a CoreOS Container-Linux image in your project
master_flavor_name="" # Flavor name to be used for the master node
worker_flavor_name="" # Flavor name to be user for the worker nodes
worker_volume_size="" # Worker block storage volume size in GB (used as HDFS data directory)
workers_count=3 # Number of worker nodes to deploy
}
Init the Terraform directory by running:
terraform init
To deploy please run:
terraform apply
Once the deployment is done, to get the SSH tunnelling commands to the interfaces you can run:
terraform output -module=spark
To scale the cluster you can increase and decrease the number of workers in main.tf
and rerun terraform apply
.
You can delete the cluster by running:
terraform destroy