Skip to content

intel/terraform-intel-databricks-cluster

Intel Logo

Intel Optimized Cloud Modules for Terraform

© Copyright 2024, Intel Corporation

Intel Optimized Databricks Cluster

The module can deploy an Intel Optimized Databricks Cluster. Instance Selection and Intel Optimizations have been defaulted in the code.

Learn more about Intel optimizations :

Performance Data

Link

Link

Link

Link

Link

Usage

All the examples in example folder shows how to create a Intel Optimized Databricks cluster using this module along with the Intel Cloud Optimization Module for Databricks Workspace in AWS and Azure

Usage Considerations

Run Terraform

terraform init  
terraform plan
terraform apply 

Considerations

More Information regarding deploying Databricks Workspace Databricks

Requirements

Name Version
aws ~> 5.31
azurerm ~> 3.48
databricks ~> 1.14.2

Providers

Name Version
databricks ~> 1.14.2

Modules

No modules.

Resources

Name Type
databricks_cluster.dbx_cluster resource
databricks_token.pat resource
databricks_spark_version.latest_lts data source

Inputs

Name Description Type Default Required
aws_dbx_node_type_id The type of the AWS Compute Machine that are supported by databricks. string "i4i.2xlarge" no
azure_dbx_node_type_id The type of the Azure Compute Machine that are supported by databricks. string "Standard_E8ds_v5" no
dbx_auto_terminate_min Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 60. We highly recommend having this setting present for Interactive/BI clusters. number 120 no
dbx_cloud Flag that decides which Cloud to use for the instance type in Databricks Cluster string n/a yes
dbx_cluster_name Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string. string "dbx_optimized_cluster" no
dbx_host Required URL for the databricks workspace string n/a yes
dbx_num_workers Number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes. number 8 no
dbx_runtime_engine The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include: PHOTON, STANDARD. string "PHOTON" no
dbx_spark_config Key - Value pair for Intel Optimizations for Spark configuration map(string)
{
"spark.databricks.adaptive.autoOptimizeShuffle.enabled": "true",
"spark.databricks.delta.preview.enabled": "true",
"spark.databricks.io.cache.enabled": "true",
"spark.databricks.io.cache.maxDiskUsage": "100g",
"spark.databricks.io.cache.maxMetaDataCache": "10g",
"spark.databricks.passthrough.enabled": "true"
}
no
enable_intel_tags If true adds additional Intel tags to resources bool true no
intel_tags Intel Tags map(string)
{
"intel-module": "terraform-intel-databricks-cluster",
"intel-registry": "https://registry.terraform.io/namespaces/intel"
}
no
tags Tags map(string)
{
"key": "value"
}
no

Outputs

Name Description
dbx_cluster_autoterminate_min Autoterminate minute of the databricks cluster
dbx_cluster_custom_tags Custom Tags
dbx_cluster_name Name of the databricks cluster
dbx_cluster_node_type_id Instance type of the databricks cluster
dbx_cluster_num_workers Num of workers nodes of the databricks cluster
dbx_cluster_runtime_engine Runtime Engine of the databricks cluster
dbx_cluster_spark_conf Spark Configurations of the databricks cluster
dbx_cluster_spark_version Spark version of the databricks cluster
dbx_pat Personal Access Token