Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[O11y][Apache Spark] Resolve the conflicts in host.ip field #7468

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 4 additions & 0 deletions packages/apache_spark/_dev/build/docs/README.md
Expand Up @@ -63,6 +63,10 @@ Restart Spark master.

Follow the same set of steps for Spark Worker, Driver and Executor.

### Troubleshooting

If host.ip is shown conflicted under ``metrics-*`` data view, then this issue can be solved by [reindexing](https://www.elastic.co/guide/en/elasticsearch/reference/current/use-a-data-stream.html#reindex-with-a-data-stream) the ``Application``, ``Driver``, ``Executor`` and ``Node`` data stream's indices.

## Metrics

### Application
Expand Down
5 changes: 5 additions & 0 deletions packages/apache_spark/changelog.yml
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "0.6.2"
changes:
- description: Resolve the conflicts in host.ip field
type: bugfix
link: https://github.com/elastic/integrations/pull/7468
- version: "0.6.1"
changes:
- description: Remove incorrect filter from the visualizations
Expand Down
14 changes: 11 additions & 3 deletions packages/apache_spark/data_stream/application/fields/ecs.yml
@@ -1,12 +1,20 @@
- external: ecs
name: ecs.version
- external: ecs
name: error.message
- external: ecs
name: event.dataset
- external: ecs
name: event.kind
- external: ecs
name: event.type
name: event.module
- external: ecs
name: ecs.version
name: event.type
- external: ecs
name: tags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any advantage of changing the ordering of this mapping ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the best practices to have field names in the alphabetical order for readability.

name: host.ip
- external: ecs
name: service.address
- external: ecs
name: service.type
- external: ecs
name: tags
14 changes: 11 additions & 3 deletions packages/apache_spark/data_stream/driver/fields/ecs.yml
@@ -1,12 +1,20 @@
- external: ecs
name: ecs.version
- external: ecs
name: error.message
- external: ecs
name: event.dataset
- external: ecs
name: event.kind
- external: ecs
name: event.type
name: event.module
- external: ecs
name: ecs.version
name: event.type
- external: ecs
name: tags
name: host.ip
- external: ecs
name: service.address
- external: ecs
name: service.type
- external: ecs
name: tags
14 changes: 11 additions & 3 deletions packages/apache_spark/data_stream/executor/fields/ecs.yml
@@ -1,12 +1,20 @@
- external: ecs
name: ecs.version
- external: ecs
name: error.message
- external: ecs
name: event.dataset
- external: ecs
name: event.kind
- external: ecs
name: event.type
name: event.module
- external: ecs
name: ecs.version
name: event.type
- external: ecs
name: tags
name: host.ip
- external: ecs
name: service.address
- external: ecs
name: service.type
- external: ecs
name: tags
14 changes: 11 additions & 3 deletions packages/apache_spark/data_stream/node/fields/ecs.yml
@@ -1,12 +1,20 @@
- external: ecs
name: ecs.version
- external: ecs
name: error.message
- external: ecs
name: event.dataset
- external: ecs
name: event.kind
- external: ecs
name: event.type
name: event.module
- external: ecs
name: ecs.version
name: event.type
- external: ecs
name: tags
name: host.ip
- external: ecs
name: service.address
- external: ecs
name: service.type
- external: ecs
name: tags
20 changes: 20 additions & 0 deletions packages/apache_spark/docs/README.md
Expand Up @@ -63,6 +63,10 @@ Restart Spark master.

Follow the same set of steps for Spark Worker, Driver and Executor.

### Troubleshooting

If host.ip is shown conflicted under ``metrics-*`` data view, then this issue can be solved by [reindexing](https://www.elastic.co/guide/en/elasticsearch/reference/current/use-a-data-stream.html#reindex-with-a-data-stream) the ``Application``, ``Driver``, ``Executor`` and ``Node`` data stream's indices.

## Metrics

### Application
Expand Down Expand Up @@ -156,8 +160,12 @@ An example event for `application` looks as following:
| data_stream.namespace | Data stream namespace. | constant_keyword |
| data_stream.type | Data stream type. | constant_keyword |
| ecs.version | ECS version this event conforms to. `ecs.version` is a required field and must exist in all events. When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events. | keyword |
| error.message | Error message. | match_only_text |
| event.dataset | Name of the dataset. If an event source publishes more than one type of log or events (e.g. access log, error log), the dataset is used to specify which one the event comes from. It's recommended but not required to start the dataset name with the module name, followed by a dot, then the dataset name. | keyword |
| event.kind | This is one of four ECS Categorization Fields, and indicates the highest level in the ECS category hierarchy. `event.kind` gives high-level information about what type of information the event contains, without being specific to the contents of the event. For example, values of this field distinguish alert events from metric events. The value of this field can be used to inform how these kinds of events should be handled. They may warrant different retention, different access control, it may also help understand whether the data coming in at a regular interval or not. | keyword |
| event.module | Name of the module this data is coming from. If your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module. | keyword |
| event.type | This is one of four ECS Categorization Fields, and indicates the third level in the ECS category hierarchy. `event.type` represents a categorization "sub-bucket" that, when used along with the `event.category` field values, enables filtering events down to a level appropriate for single visualization. This field is an array. This will allow proper categorization of some events that fall in multiple event types. | keyword |
| host.ip | Host ip addresses. | ip |
| service.address | Address where data about this service was collected from. This should be a URI, network address (ipv4:port or [ipv6]:port) or a resource path (sockets). | keyword |
| service.type | The type of the service data is collected from. The type can be used to group and correlate logs and metrics from one service type. Example: If logs or metrics are collected from Elasticsearch, `service.type` would be `elasticsearch`. | keyword |
| tags | List of keywords used to tag each event. | keyword |
Expand Down Expand Up @@ -325,8 +333,12 @@ An example event for `driver` looks as following:
| data_stream.namespace | Data stream namespace. | constant_keyword |
| data_stream.type | Data stream type. | constant_keyword |
| ecs.version | ECS version this event conforms to. `ecs.version` is a required field and must exist in all events. When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events. | keyword |
| error.message | Error message. | match_only_text |
| event.dataset | Name of the dataset. If an event source publishes more than one type of log or events (e.g. access log, error log), the dataset is used to specify which one the event comes from. It's recommended but not required to start the dataset name with the module name, followed by a dot, then the dataset name. | keyword |
| event.kind | This is one of four ECS Categorization Fields, and indicates the highest level in the ECS category hierarchy. `event.kind` gives high-level information about what type of information the event contains, without being specific to the contents of the event. For example, values of this field distinguish alert events from metric events. The value of this field can be used to inform how these kinds of events should be handled. They may warrant different retention, different access control, it may also help understand whether the data coming in at a regular interval or not. | keyword |
| event.module | Name of the module this data is coming from. If your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module. | keyword |
| event.type | This is one of four ECS Categorization Fields, and indicates the third level in the ECS category hierarchy. `event.type` represents a categorization "sub-bucket" that, when used along with the `event.category` field values, enables filtering events down to a level appropriate for single visualization. This field is an array. This will allow proper categorization of some events that fall in multiple event types. | keyword |
| host.ip | Host ip addresses. | ip |
| service.address | Address where data about this service was collected from. This should be a URI, network address (ipv4:port or [ipv6]:port) or a resource path (sockets). | keyword |
| service.type | The type of the service data is collected from. The type can be used to group and correlate logs and metrics from one service type. Example: If logs or metrics are collected from Elasticsearch, `service.type` would be `elasticsearch`. | keyword |
| tags | List of keywords used to tag each event. | keyword |
Expand Down Expand Up @@ -491,8 +503,12 @@ An example event for `executor` looks as following:
| data_stream.namespace | Data stream namespace. | constant_keyword |
| data_stream.type | Data stream type. | constant_keyword |
| ecs.version | ECS version this event conforms to. `ecs.version` is a required field and must exist in all events. When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events. | keyword |
| error.message | Error message. | match_only_text |
| event.dataset | Name of the dataset. If an event source publishes more than one type of log or events (e.g. access log, error log), the dataset is used to specify which one the event comes from. It's recommended but not required to start the dataset name with the module name, followed by a dot, then the dataset name. | keyword |
| event.kind | This is one of four ECS Categorization Fields, and indicates the highest level in the ECS category hierarchy. `event.kind` gives high-level information about what type of information the event contains, without being specific to the contents of the event. For example, values of this field distinguish alert events from metric events. The value of this field can be used to inform how these kinds of events should be handled. They may warrant different retention, different access control, it may also help understand whether the data coming in at a regular interval or not. | keyword |
| event.module | Name of the module this data is coming from. If your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module. | keyword |
| event.type | This is one of four ECS Categorization Fields, and indicates the third level in the ECS category hierarchy. `event.type` represents a categorization "sub-bucket" that, when used along with the `event.category` field values, enables filtering events down to a level appropriate for single visualization. This field is an array. This will allow proper categorization of some events that fall in multiple event types. | keyword |
| host.ip | Host ip addresses. | ip |
| service.address | Address where data about this service was collected from. This should be a URI, network address (ipv4:port or [ipv6]:port) or a resource path (sockets). | keyword |
| service.type | The type of the service data is collected from. The type can be used to group and correlate logs and metrics from one service type. Example: If logs or metrics are collected from Elasticsearch, `service.type` would be `elasticsearch`. | keyword |
| tags | List of keywords used to tag each event. | keyword |
Expand Down Expand Up @@ -600,8 +616,12 @@ An example event for `node` looks as following:
| data_stream.namespace | Data stream namespace. | constant_keyword |
| data_stream.type | Data stream type. | constant_keyword |
| ecs.version | ECS version this event conforms to. `ecs.version` is a required field and must exist in all events. When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events. | keyword |
| error.message | Error message. | match_only_text |
| event.dataset | Name of the dataset. If an event source publishes more than one type of log or events (e.g. access log, error log), the dataset is used to specify which one the event comes from. It's recommended but not required to start the dataset name with the module name, followed by a dot, then the dataset name. | keyword |
| event.kind | This is one of four ECS Categorization Fields, and indicates the highest level in the ECS category hierarchy. `event.kind` gives high-level information about what type of information the event contains, without being specific to the contents of the event. For example, values of this field distinguish alert events from metric events. The value of this field can be used to inform how these kinds of events should be handled. They may warrant different retention, different access control, it may also help understand whether the data coming in at a regular interval or not. | keyword |
| event.module | Name of the module this data is coming from. If your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module. | keyword |
| event.type | This is one of four ECS Categorization Fields, and indicates the third level in the ECS category hierarchy. `event.type` represents a categorization "sub-bucket" that, when used along with the `event.category` field values, enables filtering events down to a level appropriate for single visualization. This field is an array. This will allow proper categorization of some events that fall in multiple event types. | keyword |
| host.ip | Host ip addresses. | ip |
| service.address | Address where data about this service was collected from. This should be a URI, network address (ipv4:port or [ipv6]:port) or a resource path (sockets). | keyword |
| service.type | The type of the service data is collected from. The type can be used to group and correlate logs and metrics from one service type. Example: If logs or metrics are collected from Elasticsearch, `service.type` would be `elasticsearch`. | keyword |
| tags | List of keywords used to tag each event. | keyword |
Expand Down
2 changes: 1 addition & 1 deletion packages/apache_spark/manifest.yml
@@ -1,7 +1,7 @@
format_version: 1.0.0
name: apache_spark
title: Apache Spark
version: "0.6.1"
version: "0.6.2"
license: basic
description: Collect metrics from Apache Spark with Elastic Agent.
type: integration
Expand Down