Welcome to the terraform-aws-emr repo!
Please see the sample set of examples below for a better understanding of implementation
- Complete - Complete Example
Name | Version |
---|---|
terraform | >= 1.0.0 |
aws | >= 4.0 |
Name | Version |
---|---|
aws | >= 4.0 |
Name | Source | Version |
---|---|---|
emr_cluster | ./modules/emr-cluster | n/a |
emr_security_configuration | ./modules/emr-security-configuration | n/a |
label_cloudwatch_rule | cloudposse/label/null | 0.25.0 |
label_cloudwatch_target | cloudposse/label/null | 0.25.0 |
label_core | cloudposse/label/null | 0.25.0 |
label_master | cloudposse/label/null | 0.25.0 |
label_master_managed | cloudposse/label/null | 0.25.0 |
label_notebook_instance | cloudposse/label/null | 0.25.0 |
label_notebook_master | cloudposse/label/null | 0.25.0 |
label_service_managed | cloudposse/label/null | 0.25.0 |
label_slave | cloudposse/label/null | 0.25.0 |
label_slave_managed | cloudposse/label/null | 0.25.0 |
this | cloudposse/label/null | 0.25.0 |
Name | Description | Type | Default | Required |
---|---|---|---|---|
additional_info | (Optional) A JSON string for selecting additional features such as adding proxy information. Note: Currently there is no API to retrieve the value of this argument after EMR cluster creation from provider, therefore Terraform cannot detect drift from the actual EMR cluster if its value is changed outside Terraform. | string |
null |
no |
additional_tag_map | Additional key-value pairs to add to each map in tags_as_list_of_maps . Not added to tags or id .This is for some rare cases where resources want additional configuration of tags and therefore take a list of maps with tag key, value, and additional configuration. |
map(string) |
{} |
no |
applications | (Optional) A list of applications for the cluster. | set(string) |
null |
no |
attributes | ID element. Additional attributes (e.g. workers or cluster ) to add to id ,in the order they appear in the list. New attributes are appended to the end of the list. The elements of the list are joined by the delimiter and treated as a single ID element. |
list(string) |
[] |
no |
autoscaling_role | (Optional) An IAM role for automatic scaling policies. The IAM role provides permissions that the automatic scaling feature requires to launch and terminate EC2 instances in an instance group | string |
null |
no |
bootstrap_action | (Optional) Ordered list of bootstrap actions that will be run before Hadoop is started on the cluster nodes. | list(object({ |
[] |
no |
bootstrap_s3_bucket | (Required) The name of the bucket to put the file in. Alternatively, an S3 access point ARN can be specified. | string |
null |
no |
bootstrap_s3_key | (Required) The name of the object once it is in the bucket. | string |
null |
no |
bootstrap_s3_kms_key_id | (Optional) Specifies the AWS KMS Key ARN to use for object encryption. This value is a fully qualified ARN of the KMS Key. | string |
null |
no |
bootstrap_s3_server_side_encryption | (Optional) Specifies server-side encryption of the object in S3. Valid values are 'AES256' and 'aws:kms'. | string |
"AES256" |
no |
cluster_name | The name of the job flow | string |
null |
no |
configurations | (Optional) List of configurations supplied for the EMR cluster you are creating | string |
null |
no |
configurations_json | (Optional) A JSON string for supplying list of configurations for the EMR cluster. | string |
null |
no |
context | Single object for setting entire context at once. See description of individual variables for details. Leave string and numeric variables as null to use default value.Individual variable settings (non-null) override settings in context object, except for attributes, tags, and additional_tag_map, which are merged. |
any |
{ |
no |
core_instance_autoscaling_max_capacity | (Required) The max capacity of the scalable target. | number |
1 |
no |
core_instance_autoscaling_min_capacity | (Required) The min capacity of the scalable target. | number |
1 |
no |
core_instance_group_autoscaling_policy | (Optional) String containing the EMR Auto Scaling Policy JSON. | string |
null |
no |
core_instance_group_bid_price | (Optional) Bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances. | string |
null |
no |
core_instance_group_ebs_config_iops | (Optional) The number of I/O operations per second (IOPS) that the volume supports | number |
null |
no |
core_instance_group_ebs_config_size | (Required) The volume size, in gibibytes (GiB). | number |
null |
no |
core_instance_group_ebs_config_type | (Required) The volume type. Valid options are gp2, io1, standard and st1. | string |
null |
no |
core_instance_group_ebs_config_volumes_per_instance | (Optional) The number of EBS volumes with this configuration to attach to each EC2 instance in the instance group (default is 1) | number |
1 |
no |
core_instance_group_instance_count | (Optional) Target number of instances for the instance group. Must be at least 1. Defaults to 1. | number |
1 |
no |
core_instance_group_instance_type | (Required) EC2 instance type for all instances in the instance group. | string |
null |
no |
custom_ami_id | (Optional) A custom Amazon Linux AMI for the cluster (instead of an EMR-owned AMI). | string |
null |
no |
delimiter | Delimiter to be used between ID elements. Defaults to - (hyphen). Set to "" to use no delimiter at all. |
string |
null |
no |
descriptor_formats | Describe additional descriptors to be output in the descriptors output map.Map of maps. Keys are names of descriptors. Values are maps of the form {<br> format = string<br> labels = list(string)<br>} (Type is any so the map values can later be enhanced to provide additional options.)format is a Terraform format string to be passed to the format() function.labels is a list of labels, in order, to pass to format() function.Label values will be normalized before being passed to format() so they will beidentical to how they appear in id .Default is {} (descriptors output will be empty). |
any |
{} |
no |
ebs_root_volume_size | (Optional) Size in GiB of the EBS root device volume of the Linux AMI that is used for each EC2 instance. Available in Amazon EMR version 4.x and later. | number |
null |
no |
ec2_attributes_additional_master_security_groups | (Optional) String containing a comma separated list of additional Amazon EC2 security group IDs for the master node | string |
null |
no |
ec2_attributes_additional_slave_security_groups | (Optional) String containing a comma separated list of additional Amazon EC2 security group IDs for the slave nodes as a comma separated string | string |
null |
no |
ec2_attributes_emr_managed_master_security_group | (Optional) Identifier of the Amazon EC2 EMR-Managed security group for the master node | string |
null |
no |
ec2_attributes_emr_managed_slave_security_group | (Optional) Identifier of the Amazon EC2 EMR-Managed security group for the slave nodes | string |
null |
no |
ec2_attributes_instance_profile | (Required) Instance Profile for EC2 instances of the cluster assume this role | string |
null |
no |
ec2_attributes_key_name | (Optional) Amazon EC2 key pair that can be used to ssh to the master node as the user called hadoop | string |
null |
no |
ec2_attributes_service_access_security_group | (Optional) Identifier of the Amazon EC2 service-access security group - required when the cluster runs on a private subnet | string |
null |
no |
ec2_attributes_subnet_id | (Optional) VPC subnet id where you want the job flow to launch. Cannot specify the cc1.4xlarge instance type for nodes of a job flow launched in a Amazon VPC | string |
null |
no |
enabled | Set to false to prevent the module from creating any resources | bool |
null |
no |
environment | ID element. Usually used for region e.g. 'uw2', 'us-west-2', OR role 'prod', 'staging', 'dev', 'UAT' | string |
null |
no |
id_length_limit | Limit id to this many characters (minimum 6).Set to 0 for unlimited length.Set to null for keep the existing setting, which defaults to 0 .Does not affect id_full . |
number |
null |
no |
keep_job_flow_alive_when_no_steps | (Optional) Switch on/off run cluster with no steps or when all steps are complete (default is on) | bool |
true |
no |
kerberos_attributes_ad_domain_join_password | (Optional) The Active Directory password for ad_domain_join_user. Terraform cannot perform drift detection of this configuration | string |
null |
no |
kerberos_attributes_ad_domain_join_user | (Optional) Required only when establishing a cross-realm trust with an Active Directory domain. A user with sufficient privileges to join resources to the domain. Terraform cannot perform drift detection of this configuration. | string |
null |
no |
kerberos_attributes_cross_realm_trust_principal_password | (Optional) Required only when establishing a cross-realm trust with a KDC in a different realm. The cross-realm principal password, which must be identical across realms. Terraform cannot perform drift detection of this configuration. | string |
null |
no |
kerberos_attributes_kdc_admin_password | (Required) The password used within the cluster for the kadmin service on the cluster-dedicated KDC, which maintains Kerberos principals, password policies, and keytabs for the cluster. Terraform cannot perform drift detection of this configuration. | string |
null |
no |
kerberos_attributes_realm | (Required) The name of the Kerberos realm to which all nodes in a cluster belong. For example, EC2.INTERNAL | string |
null |
no |
label_key_case | Controls the letter case of the tags keys (label names) for tags generated by this module.Does not affect keys of tags passed in via the tags input.Possible values: lower , title , upper .Default value: title . |
string |
null |
no |
label_order | The order in which the labels (ID elements) appear in the id .Defaults to ["namespace", "environment", "stage", "name", "attributes"]. You can omit any of the 6 labels ("tenant" is the 6th), but at least one must be present. |
list(string) |
null |
no |
label_value_case | Controls the letter case of ID elements (labels) as included in id ,set as tag values, and output by this module individually. Does not affect values of tags passed in via the tags input.Possible values: lower , title , upper and none (no transformation).Set this to title and set delimiter to "" to yield Pascal Case IDs.Default value: lower . |
string |
null |
no |
labels_as_tags | Set of labels (ID elements) to include as tags in the tags output.Default is to include all labels. Tags with empty values will not be included in the tags output.Set to [] to suppress all generated tags.Notes: The value of the name tag, if included, will be the id , not the name .Unlike other null-label inputs, the initial setting of labels_as_tags cannot bechanged in later chained modules. Attempts to change it will be silently ignored. |
set(string) |
[ |
no |
log_uri | (Optional) S3 bucket to write the log files of the job flow. If a value is not provided, logs are not created | string |
null |
no |
master_allowed_cidr_blocks | List of CIDR blocks to be allowed to access the master instances | list(string) |
[] |
no |
master_allowed_security_groups | List of security groups to be allowed to connect to the master instances | list(string) |
[] |
no |
master_instance_group_bid_price | (Optional) Bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances | string |
null |
no |
master_instance_group_ebs_config_iops | (Optional) The number of I/O operations per second (IOPS) that the volume supports | number |
null |
no |
master_instance_group_ebs_config_size | (Required) The volume size, in gibibytes (GiB). | number |
null |
no |
master_instance_group_ebs_config_type | (Required) The volume type. Valid options are gp2, io1, standard and st1. | string |
null |
no |
master_instance_group_ebs_config_volumes_per_instance | (Optional) The number of EBS volumes with this configuration to attach to each EC2 instance in the instance group (default is 1) | number |
1 |
no |
master_instance_group_instance_count | (Optional) Target number of instances for the instance group. Must be 1 or 3. Defaults to 1. Launching with multiple master nodes is only supported in EMR version 5.23.0+, and requires this resource's core_instance_group to be configured. Public (Internet accessible) instances must be created in VPC subnets that have map public IP on launch enabled. Termination protection is automatically enabled when launched with multiple master nodes and Terraform must have the termination_protection = false configuration applied before destroying this resource. | number |
null |
no |
master_instance_group_instance_type | (Required) EC2 instance type for all instances in the instance group. | string |
null |
no |
name | ID element. Usually the component or solution name, e.g. 'app' or 'jenkins'. This is the only ID element not also included as a tag .The "name" tag is set to the full id string. There is no tag with the value of the name input. |
string |
null |
no |
namespace | ID element. Usually an abbreviation of your organization name, e.g. 'eg' or 'cp', to help ensure generated IDs are globally unique | string |
null |
no |
regex_replace_chars | Terraform regular expression (regex) string. Characters matching the regex will be removed from the ID elements. If not set, "/[^a-zA-Z0-9-]/" is used to remove all characters other than hyphens, letters and digits. |
string |
null |
no |
release_label | (Required) The release label for the Amazon EMR release | string |
n/a | yes |
scale_down_behavior | (Optional) The way that individual Amazon EC2 instances terminate when an automatic scale-in activity occurs or an instance group is resized. | string |
null |
no |
security_configuration | (Optional) The security configuration name to attach to the EMR cluster. Only valid for EMR clusters with release_label 4.8.0 or greater | string |
null |
no |
service_role | (Required) IAM role that will be assumed by the Amazon EMR service to access AWS resources | string |
null |
no |
slave_allowed_cidr_blocks | List of CIDR blocks to be allowed to access the slave instances | list(string) |
[] |
no |
slave_allowed_security_groups | List of security groups to be allowed to connect to the slave instances | list(string) |
[] |
no |
sns_topic_arn | (Required) The Amazon Resource Name (ARN) associated of the SNS target. | string |
null |
no |
stage | ID element. Usually used to indicate role, e.g. 'prod', 'staging', 'source', 'build', 'test', 'deploy', 'release' | string |
null |
no |
step | (Optional) List of steps to run when creating the cluster. | list(any) |
[] |
no |
step_concurrency_level | (Optional) The number of steps that can be executed concurrently. You can specify a maximum of 256 steps. | number |
1 |
no |
subnet_type | The type of subnet the EMR cluster is provisioned in. Used to determine if service related security groups are required. Defaults to 'private' | string |
"private" |
no |
tags | Additional tags (e.g. {'BusinessUnit': 'XYZ'} ).Neither the tag keys nor the tag values will be modified by this module. |
map(string) |
{} |
no |
tenant | ID element _(Rarely used, not included by default)_. A customer identifier, indicating who this instance of a resource is for | string |
null |
no |
termination_protection | (Optional) Switch on/off termination protection (default is false, except when using multiple master nodes). Before attempting to destroy the resource when termination protection is enabled, this configuration must be applied with its value set to false. | bool |
false |
no |
visible_to_all_users | (Optional) Whether the job flow is visible to all IAM users of the AWS account associated with the job flow. | bool |
true |
no |
vpc_id | The VPC ID to create the security groups in | string |
n/a | yes |
Name | Description |
---|---|
applications | The applications installed on this cluster. |
arn | The ARN of the cluster. |
bootstrap_action | A list of bootstrap actions that will be run before Hadoop is started on the cluster nodes. |
configurations | The list of Configurations supplied to the EMR cluster. |
core_instance_group_0_id | Core node type Instance Group ID, if using Instance Group for this node type. |
ec2_attributes | Provides information about the EC2 instances in a cluster grouped by category: key name, subnet ID, IAM instance profile, and so on. |
id | The ID of the EMR Cluster |
log_uri | The path to the Amazon S3 location where logs for this cluster are stored. |
managed_master_security_group_id | EMR managed_master security group ID |
managed_service_access_security_group_id | EMR managed_service_access security group ID |
managed_slave_security_group_id | EMR managed_slave security group ID |
master_instance_group_0_id | Master node type Instance Group ID, if using Instance Group for this node type. |
master_public_dns | The public DNS name of the master EC2 instance. |
master_security_group_id | EMR master security group ID |
name | The name of the cluster. |
notebook_instance_security_group_id | Notebook instance security group ID |
notebook_master_security_group_id | Notebook master security group ID |
release_label | The release label for the Amazon EMR release. |
service_role | The IAM role that will be assumed by the Amazon EMR service to access AWS resources on your behalf. |
slave_security_group_id | EMR slave security group ID |
visible_to_all_users | Indicates whether the job flow is visible to all IAM users of the AWS account associated with the job flow. |