Amazon EMR (ElasticMapReduce)

The Amazon EMR integration allows you to monitor Amazon EMR — a fully managed big data processing and analytics service.

Use the Amazon EMR integration to collect metrics related to your EMR instances. Then visualize that data in Kibana, create alerts to notify you if something goes wrong, and reference the metrics when troubleshooting an issue.

For example, you could use this data to track Amazon EMR cluster progress and cluster storage. Then you can alert when utilization for an instance crosses a predefined threshold.

IMPORTANT: Extra AWS charges on AWS API requests will be generated by this integration. Please refer to the AWS integration for more details.

Data streams

The Amazon EMR integration collects two types of data: metrics and logs.

Metrics give you insight into the state of Amazon EMR. The metrics collected by the Amazon EMR integration include cluster progress, cluster state, cluster or node storage, and more. See more details in the Metrics reference

Logs help you keep a record of events happening in Amazon EMR. Logs collected by the Amazon EMR integration include the cluster status, node status details and more.

Requirements

You need Elasticsearch for storing and searching your data and Kibana for visualizing and managing it. You can use our hosted Elasticsearch Service on Elastic Cloud, which is recommended, or self-manage the Elastic Stack on your own hardware.

Before using any AWS integration you will need:

AWS Credentials to connect with your AWS account.
AWS Permissions to make sure the user you're using to connect has permission to share the relevant data.

For more details about these requirements, see the AWS integration documentation.

Setup

Use this integration if you only need to collect data from the Amazon EMR service.

If you want to collect data from two or more AWS services, consider using the AWS integration. When you configure the AWS integration, you can collect data from as many AWS services as you'd like.

For step-by-step instructions on how to set up an integration, see the Getting started guide.

Metrics reference

An example event for emr looks as following:

{
    "@timestamp": "2022-07-26T21:43:00.000Z",
    "agent": {
        "name": "docker-fleet-agent",
        "id": "2d4b09d0-cdb6-445e-ac3f-6415f87b9864",
        "type": "metricbeat",
        "ephemeral_id": "cdaaaabb-be7e-432f-816b-bda019fd7c15",
        "version": "8.3.2"
    },
    "elastic_agent": {
        "id": "2d4b09d0-cdb6-445e-ac3f-6415f87b9864",
        "version": "8.3.2",
        "snapshot": false
    },
    "cloud": {
        "provider": "aws",
        "region": "eu-central-1",
        "account": {
            "name": "elastic-beats",
            "id": "428152502467"
        }
    },
    "ecs": {
        "version": "8.0.0"
    },
    "service": {
        "type": "aws"
    },
    "data_stream": {
        "namespace": "default",
        "type": "metrics",
        "dataset": "aws.emr_metrics"
    },
    "metricset": {
        "period": 300000,
        "name": "cloudwatch"
    },
    "aws": {
        "elasticmapreduce": {
            "metrics": {
                "IsIdle": {
                    "avg": 1
                }
            }
        },
        "cloudwatch": {
            "namespace": "AWS/ElasticMapReduce"
        },
        "dimensions": {
            "JobFlowId": "j-3LRBO17JBA7H9"
        }
    },
    "event": {
        "duration": 11576777300,
        "agent_id_status": "verified",
        "ingested": "2022-07-26T21:47:48Z",
        "module": "aws",
        "dataset": "aws.emr_metrics"
    }
}

Exported fields

Field	Description	Type	Unit	Metric Type
@timestamp	Event timestamp.	date
agent.id	Unique identifier of this agent (if one exists). Example: For Beats this would be beat.id.	keyword
aws.cloudwatch.namespace	The namespace specified when query cloudwatch api.	keyword
aws.dimensions.JobFlowId	Filters metrics by cluster ID.	keyword
aws.elasticmapreduce.metrics.AppsCompleted.sum	The number of applications submitted to YARN that have completed.	long		gauge
aws.elasticmapreduce.metrics.AppsFailed.sum	The number of applications submitted to YARN that have failed to complete.	long		gauge
aws.elasticmapreduce.metrics.AppsKilled.sum	The number of applications submitted to YARN that have been killed.	long		gauge
aws.elasticmapreduce.metrics.AppsPending.sum	The number of applications submitted to YARN that are in a pending state.	long		gauge
aws.elasticmapreduce.metrics.AppsRunning.sum	The number of applications submitted to YARN that are running.	long		gauge
aws.elasticmapreduce.metrics.AppsSubmitted.sum	The number of applications submitted to YARN.	long		gauge
aws.elasticmapreduce.metrics.AutoTerminationIsClusterIdle.avg	Indicates whether the cluster is in use.	long	percent	gauge
aws.elasticmapreduce.metrics.CapacityRemainingGB.sum	The amount of remaining HDFS disk capacity.	long	byte	gauge
aws.elasticmapreduce.metrics.ContainerAllocated.sum	The number of resource containers allocated by the ResourceManager.	long		gauge
aws.elasticmapreduce.metrics.ContainerPending.sum	The number of containers in the queue that have not yet been allocated.	long		gauge
aws.elasticmapreduce.metrics.ContainerPendingRatio.avg	The ratio of pending containers to containers allocated	long	percent	gauge
aws.elasticmapreduce.metrics.ContainerReserved.sum	The number of containers reserved.	long		gauge
aws.elasticmapreduce.metrics.CoreNodesPending.sum	The number of core nodes waiting to be assigned.	long		gauge
aws.elasticmapreduce.metrics.CoreNodesRequested.max	The target number of CORE nodes in a cluster as determined by managed scaling.	long		gauge
aws.elasticmapreduce.metrics.CoreNodesRunning.avg	The current number of CORE nodes running in a cluster.	long		gauge
aws.elasticmapreduce.metrics.CoreUnitsRequested.max	The target number of CORE units in a cluster as determined by managed scaling.	long		gauge
aws.elasticmapreduce.metrics.CoreUnitsRunning.avg	The current number of CORE units running in a cluster.	long		gauge
aws.elasticmapreduce.metrics.CoreVCPURequested.max	The target number of CORE vCPUs in a cluster as determined by managed scaling.	long		gauge
aws.elasticmapreduce.metrics.CoreVCPURunning.avg	The current number of CORE vCPUs running in a cluster.	long		gauge
aws.elasticmapreduce.metrics.CorruptBlocks.max	The number of blocks that HDFS reports as corrupted.	long		gauge
aws.elasticmapreduce.metrics.DfsPendingReplicationBlocks.sum	The status of block replication - blocks being replicated, age of replication requests, and unsuccessful replication requests.	long		gauge
aws.elasticmapreduce.metrics.HDFSBytesRead.sum	The number of bytes read from HDFS.	long	byte	gauge
aws.elasticmapreduce.metrics.HDFSBytesWritten.sum	The number of bytes written to HDFS.	long	byte	gauge
aws.elasticmapreduce.metrics.HDFSUtilization.avg	The percentage of HDFS storage currently used.	double	percent	gauge
aws.elasticmapreduce.metrics.IsIdle.avg	Indicates that a cluster is no longer performing work, but is still alive and accruing charges.	long	percent	gauge
aws.elasticmapreduce.metrics.LiveDataNodes.avg	The percentage of data nodes that are receiving work from Hadoop.	double	percent	gauge
aws.elasticmapreduce.metrics.MRActiveNodes.sum	The number of nodes presently running MapReduce tasks or jobs.	long		gauge
aws.elasticmapreduce.metrics.MRDecommissionedNodes.sum	The number of nodes allocated to MapReduce applications that have been marked in a DECOMMISSIONED state.	long		gauge
aws.elasticmapreduce.metrics.MRLostNodes.sum	The number of nodes allocated to MapReduce that have been marked in a LOST state.	long		gauge
aws.elasticmapreduce.metrics.MRRebootedNodes.sum	The number of nodes available to MapReduce that have been rebooted and marked in a REBOOTED state.	long		gauge
aws.elasticmapreduce.metrics.MRTotalNodes.sum	The number of nodes presently available to MapReduce jobs.	long		gauge
aws.elasticmapreduce.metrics.MRUnhealthyNodes.sum	The number of nodes available to MapReduce jobs marked in an UNHEALTHY state.	long		gauge
aws.elasticmapreduce.metrics.MemoryAllocatedMB.sum	The amount of memory allocated to the cluster.	long	byte	gauge
aws.elasticmapreduce.metrics.MemoryAvailableMB.sum	The amount of memory available to be allocated.	long	byte	gauge
aws.elasticmapreduce.metrics.MemoryReservedMB.sum	The amount of memory reserved.	long	byte	gauge
aws.elasticmapreduce.metrics.MemoryTotalMB.sum	The total amount of memory in the cluster.	long	byte	gauge
aws.elasticmapreduce.metrics.MissingBlocks.max	The number of blocks in which HDFS has no replicas.	long		gauge
aws.elasticmapreduce.metrics.MultiMasterInstanceGroupNodesRequested.sum	The number of requested master nodes.	long		gauge
aws.elasticmapreduce.metrics.MultiMasterInstanceGroupNodesRunning.sum	The number of running master nodes.	long		gauge
aws.elasticmapreduce.metrics.MultiMasterInstanceGroupNodesRunningPercentage.avg	The percentage of master nodes that are running over the requested master node instance count.	double	percent	gauge
aws.elasticmapreduce.metrics.PendingDeletionBlocks.sum	The number of blocks marked for deletion.	long		gauge
aws.elasticmapreduce.metrics.S3BytesRead.sum	The number of bytes read from Amazon S3.	long	byte	gauge
aws.elasticmapreduce.metrics.S3BytesWritten.sum	The number of bytes written to Amazon S3.	long	byte	gauge
aws.elasticmapreduce.metrics.TaskNodesRequested.max	The target number of TASK nodes in a cluster as determined by managed scaling.	long		gauge
aws.elasticmapreduce.metrics.TaskNodesRunning.avg	The current number of TASK nodes running in a cluster.	long		gauge
aws.elasticmapreduce.metrics.TaskUnitsRequested.max	The target number of TASK units in a cluster as determined by managed scaling.	long		gauge
aws.elasticmapreduce.metrics.TaskUnitsRunning.avg	The current number of TASK units running in a cluster.	long		gauge
aws.elasticmapreduce.metrics.TaskVCPURequested.max	The target number of TASK vCPUs in a cluster as determined by managed scaling.	long		gauge
aws.elasticmapreduce.metrics.TaskVCPURunning.avg	The current number of TASK vCPUs running in a cluster.	long		gauge
aws.elasticmapreduce.metrics.TotalLoad.sum	The total number of concurrent data transfers.	long		gauge
aws.elasticmapreduce.metrics.TotalNodesRequested.max	The target total number of nodes in a cluster as determined by managed scaling.	long		gauge
aws.elasticmapreduce.metrics.TotalNodesRunning.avg	The current total number of nodes available in a running cluster.	long		gauge
aws.elasticmapreduce.metrics.TotalNotebookKernels.sum	The total number of running and idle notebook kernels on the cluster.	long		gauge
aws.elasticmapreduce.metrics.TotalUnitsRequested.max	The target total number of units in a cluster as determined by managed scaling.	long		gauge
aws.elasticmapreduce.metrics.TotalUnitsRunning.avg	The current total number of units available in a running cluster.	long		gauge
aws.elasticmapreduce.metrics.TotalVCPURequested.max	The target total number of vCPUs in a cluster as determined by managed scaling.	long		gauge
aws.elasticmapreduce.metrics.TotalVCPURunning.avg	The current total number of vCPUs available in a running cluster.	long		gauge
aws.elasticmapreduce.metrics.UnderReplicatedBlocks.sum	The number of blocks that need to be replicated one or more times.	long		gauge
aws.elasticmapreduce.metrics.YARNMemoryAvailablePercentage.avg	The percentage of remaining memory available to YARN	double	percent	gauge
aws.tags.*	Tag key value pairs from aws resources.	object
cloud	Fields related to the cloud or infrastructure the events are coming from.	group
cloud.account.id	The cloud account or organization id used to identify different entities in a multi-tenant environment. Examples: AWS account id, Google Cloud ORG Id, or other unique identifier.	keyword
cloud.account.name	The cloud account name or alias used to identify different entities in a multi-tenant environment. Examples: AWS account name, Google Cloud ORG display name.	keyword
cloud.availability_zone	Availability zone in which this host, resource, or service is located.	keyword
cloud.image.id	Image ID for the cloud instance.	keyword
cloud.instance.id	Instance ID of the host machine.	keyword
cloud.instance.name	Instance name of the host machine.	keyword
cloud.machine.type	Machine type of the host machine.	keyword
cloud.project.id	The cloud project identifier. Examples: Google Cloud Project id, Azure Project id.	keyword
cloud.provider	Name of the cloud provider. Example values are aws, azure, gcp, or digitalocean.	keyword
cloud.region	Region in which this host, resource, or service is located.	keyword
container.id	Unique container id.	keyword
container.image.name	Name of the image the container was built on.	keyword
container.labels	Image labels.	object
container.name	Container name.	keyword
data_stream.dataset	Data stream dataset.	constant_keyword
data_stream.namespace	Data stream namespace.	constant_keyword
data_stream.type	Data stream type.	constant_keyword
ecs.version	ECS version this event conforms to. `ecs.version` is a required field and must exist in all events. When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events.	keyword
error	These fields can represent errors of any kind. Use them for errors that happen while fetching events or in cases where the event itself contains an error.	group
error.message	Error message.	match_only_text
event.dataset	Event dataset	constant_keyword
event.module	Event module	constant_keyword
host.architecture	Operating system architecture.	keyword
host.containerized	If the host is a container.	boolean
host.domain	Name of the domain of which the host is a member. For example, on Windows this could be the host's Active Directory domain or NetBIOS domain name. For Linux this could be the domain of the host's LDAP provider.	keyword
host.hostname	Hostname of the host. It normally contains what the `hostname` command returns on the host machine.	keyword
host.id	Unique host id. As hostname is not always unique, use values that are meaningful in your environment. Example: The current usage of `beat.name`.	keyword
host.ip	Host ip addresses.	ip
host.mac	Host MAC addresses. The notation format from RFC 7042 is suggested: Each octet (that is, 8-bit byte) is represented by two [uppercase] hexadecimal digits giving the value of the octet as an unsigned integer. Successive octets are separated by a hyphen.	keyword
host.name	Name of the host. It can contain what hostname returns on Unix systems, the fully qualified domain name (FQDN), or a name specified by the user. The recommended value is the lowercase FQDN of the host.	keyword
host.os.build	OS build information.	keyword
host.os.codename	OS codename, if any.	keyword
host.os.family	OS family (such as redhat, debian, freebsd, windows).	keyword
host.os.kernel	Operating system kernel version as a raw string.	keyword
host.os.name	Operating system name, without the version.	keyword
host.os.name.text	Multi-field of `host.os.name`.	match_only_text
host.os.platform	Operating system platform (such centos, ubuntu, windows).	keyword
host.os.version	Operating system version as a raw string.	keyword
host.type	Type of host. For Cloud providers this can be the machine type like `t2.medium`. If vm, this could be the container, for example, or other information meaningful in your environment.	keyword
service.type	The type of the service data is collected from. The type can be used to group and correlate logs and metrics from one service type. Example: If logs or metrics are collected from Elasticsearch, `service.type` would be `elasticsearch`.	keyword

Logs reference

An example event for emr looks as following:

{
    "data_stream": {
        "namespace": "default",
        "type": "logs",
        "dataset": "aws.emr_logs"
    },
    "@timestamp": "2020-02-20T07:01:01.000Z",
    "ecs": {
        "version": "8.0.0"
    },
    "log": {
        "level": "INFO"
    },
    "event": {
        "original": "2023-06-26 13:45:50,566 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling"
    },
    "process": {
        "name": "blockmanagement.BlockManager"
    },
    "message": "dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling",
    "tags": [
        "preserve_original_event"
    ]
}

Exported fields

Field	Description	Type
@timestamp	Event timestamp.	date
aws.s3.bucket.arn	ARN of the S3 bucket that this log retrieved from.	keyword
aws.s3.bucket.name	Name of a S3 bucket.	keyword
aws.s3.metadata	AWS S3 object metadata values.	flattened
aws.s3.object.key	Name of the S3 object that this log retrieved from.	keyword
cloud.account.id	The cloud account or organization id used to identify different entities in a multi-tenant environment. Examples: AWS account id, Google Cloud ORG Id, or other unique identifier.	keyword
cloud.availability_zone	Availability zone in which this host, resource, or service is located.	keyword
cloud.image.id	Image ID for the cloud instance.	keyword
cloud.instance.id	Instance ID of the host machine.	keyword
cloud.instance.name	Instance name of the host machine.	keyword
cloud.machine.type	Machine type of the host machine.	keyword
cloud.project.id	The cloud project identifier. Examples: Google Cloud Project id, Azure Project id.	keyword
cloud.provider	Name of the cloud provider. Example values are aws, azure, gcp, or digitalocean.	keyword
cloud.region	Region in which this host, resource, or service is located.	keyword
container.id	Unique container id.	keyword
container.image.name	Name of the image the container was built on.	keyword
container.labels	Image labels.	object
container.name	Container name.	keyword
data_stream.dataset	Data stream dataset.	constant_keyword
data_stream.namespace	Data stream namespace.	constant_keyword
data_stream.type	Data stream type.	constant_keyword
ecs.version	ECS version this event conforms to. `ecs.version` is a required field and must exist in all events. When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events.	keyword
error.message	Error message.	match_only_text
event.dataset	Event dataset	constant_keyword
event.module	Event module	constant_keyword
host.architecture	Operating system architecture.	keyword
host.containerized	If the host is a container.	boolean
host.domain	Name of the domain of which the host is a member. For example, on Windows this could be the host's Active Directory domain or NetBIOS domain name. For Linux this could be the domain of the host's LDAP provider.	keyword
host.hostname	Hostname of the host. It normally contains what the `hostname` command returns on the host machine.	keyword
host.id	Unique host id. As hostname is not always unique, use values that are meaningful in your environment. Example: The current usage of `beat.name`.	keyword
host.ip	Host ip addresses.	ip
host.mac	Host MAC addresses. The notation format from RFC 7042 is suggested: Each octet (that is, 8-bit byte) is represented by two [uppercase] hexadecimal digits giving the value of the octet as an unsigned integer. Successive octets are separated by a hyphen.	keyword
host.name	Name of the host. It can contain what hostname returns on Unix systems, the fully qualified domain name (FQDN), or a name specified by the user. The recommended value is the lowercase FQDN of the host.	keyword
host.os.build	OS build information.	keyword
host.os.codename	OS codename, if any.	keyword
host.os.family	OS family (such as redhat, debian, freebsd, windows).	keyword
host.os.kernel	Operating system kernel version as a raw string.	keyword
host.os.name	Operating system name, without the version.	keyword
host.os.name.text	Multi-field of `host.os.name`.	match_only_text
host.os.platform	Operating system platform (such centos, ubuntu, windows).	keyword
host.os.version	Operating system version as a raw string.	keyword
host.type	Type of host. For Cloud providers this can be the machine type like `t2.medium`. If vm, this could be the container, for example, or other information meaningful in your environment.	keyword
log.level	Original log level of the log event. If the source of the event provides a log level or textual severity, this is the one that goes in `log.level`. If your source doesn't specify one, you may put your event transport's severity here (e.g. Syslog severity). Some examples are `warn`, `err`, `i`, `informational`.	keyword
message	For log events the message field contains the log message, optimized for viewing in a log viewer. For structured logs without an original message field, other fields can be concatenated to form a human-readable summary of the event. If multiple messages exist, they can be combined into one message.	match_only_text
process.entrypoint	Process entrypoint.	keyword
process.message	Process message.	keyword
process.name	Process name.	keyword
tags	List of keywords used to tag each event.	keyword

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

emr.md

emr.md

Amazon EMR (ElasticMapReduce)

Data streams

Requirements

Setup

Metrics reference

Logs reference

Files

emr.md

Latest commit

History

emr.md

File metadata and controls

Amazon EMR (ElasticMapReduce)

Data streams

Requirements

Setup

Metrics reference

Logs reference