Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws] Remove hardcoded event.dataset field and use ECS field instead #8811

Merged
merged 9 commits into from
Jan 11, 2024

Conversation

kaiyan-sheng
Copy link
Contributor

@kaiyan-sheng kaiyan-sheng commented Dec 28, 2023

Proposed commit message

  1. This PR is to remove hardcoded event.dataset field and reference to the ECS field instead.
    The value for event.dataset is already applied from the elastic-agent side, please see this PR for more detail.

With this change, when user tries to use a customized dataset value, event.dataset should change as well.

2. This PR also changes the default dataset name under Advanced option from generic to aws.cloudwatch_logs.
This is removed from the PR because I misunderstood why default is generic. In theory logs coming from cloudwatch should be rerouted to different data streams based on which service the log comes from. Whatever left in the default datastream are the ones not being routed.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Related issues

@elasticmachine
Copy link

elasticmachine commented Dec 28, 2023

🚀 Benchmarks report

Package aws 👍(6) 💚(7) 💔(4)

Expand to view
Data stream Previous EPS New EPS Diff (%) Result
waf 5524.86 4032.26 -1492.6 (-27.02%) 💔
cloudfront_logs 2881.84 2314.81 -567.03 (-19.68%) 💔
emr_logs 29411.76 22727.27 -6684.49 (-22.73%) 💔
firewall_logs 2544.53 1953.13 -591.4 (-23.24%) 💔

To see the full report comment with /test benchmark fullreport

@kaiyan-sheng kaiyan-sheng self-assigned this Dec 28, 2023
@kaiyan-sheng kaiyan-sheng marked this pull request as ready for review January 3, 2024 15:17
@kaiyan-sheng kaiyan-sheng requested review from a team as code owners January 3, 2024 15:17
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: If the only goal is to remove the hard-coded value, could the field data type still be retained as constant_keyword for performance reasons? The field value in the first event determines its value within a backing index. data_stream.dataset works this way.

You can still use the ECS definition with a type override:

- external: ecs
  name: event.dataset
  type: constant_keyword

@agithomas
Copy link
Contributor

@tommyers-elastic ,

Should we not make similar changes in all Integration packages?

@kaiyan-sheng
Copy link
Contributor Author

@andrewkroh The issue that led to this bug requires event.dataset to have more than one value so we can't use constant_keyword here. The original log message already has an event.dataset field and we are adding one more on top.

Do you know in ECS is that why we are keeping event.dataset to be keyword instead of constant_keyword or multiple values for the same event.dataset actually make sense?

@kaiyan-sheng
Copy link
Contributor Author

+1 @agithomas That's also my next question.

@andrewkroh
Copy link
Member

In #8810 it says

This field should be added by the agent so we should remove hardcoded value for event.dataset in all integrations.

And it implies that event.dataset is being set the same way as data_stream.dataset (via processors added by Elastic Agent). So then why is it OK for data_stream.dataset to be a constant_keyword (without a static value in the mapping), but not for event.dataset? Is there some other follow on processing that I'm unaware of that changes event.dataset but leaves data_stream.dataset untouched?

If a data stream handles multiplexed data like perhaps cloudwatch logs does and the event.dataset is customized based on the log type then this would make sense to relax the constant_keyword type for that data stream. But for something like cloudtrail that only handles cloudtrail data I don't see any reason to relax the constant_keyword type for event.dataset.

@kaiyan-sheng
Copy link
Contributor Author

kaiyan-sheng commented Jan 8, 2024

If a data stream handles multiplexed data like perhaps cloudwatch logs does and the event.dataset is customized based on the log type then this would make sense to relax the constant_keyword type for that data stream. But for something like cloudtrail that only handles cloudtrail data I don't see any reason to relax the constant_keyword type for event.dataset.

@andrewkroh Yes! Sorry I was only thinking about cloudwatch logs in the previous comment. Yes I will keep event.dataset to be keyword type in cloudwatch_logs data stream. The rest of the data streams I'm adding type: constant_keyword.

@gizas
Copy link
Contributor

gizas commented Jan 10, 2024

@kaiyan-sheng following list can be consider a candidate not to have type: constant_keyword?
cloudfront_logs
route53_public_logs
route53_resoolver_logs
firewall_logs

Also for my undertasting the type: keyword is not needed in cloudwatch_logs, is it assumed?

@kaiyan-sheng
Copy link
Contributor Author

@kaiyan-sheng following list can be consider a candidate not to have type: constant_keyword? cloudfront_logs route53_public_logs route53_resoolver_logs firewall_logs

Also for my undertasting the type: keyword is not needed in cloudwatch_logs, is it assumed?

@gizas Yes we don't need to add type: keyword for cloudwatch_logs because it's already the type defined in ECS for event.dataset. And we add type: constant_keyword to the rest of the log type data streams because cloudwatch log data stream is the only one user can use to ingest different kinds of logs, like an input.

Copy link
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kaiyan-sheng kaiyan-sheng merged commit 6d528cd into elastic:main Jan 11, 2024
3 checks passed
@kaiyan-sheng kaiyan-sheng deleted the remove_event_dataset branch January 11, 2024 17:49
@elasticmachine
Copy link

Package aws - 2.11.2 containing this change is available at https://epr.elastic.co/search?package=aws

zmoog added a commit to zmoog/integrations that referenced this pull request Jan 15, 2024
zmoog added a commit that referenced this pull request Jan 17, 2024
…ld instead (#8888)

* [aws] Remove hardcoded event.dataset field and use ECS field instead

Backport #8811 to 8.8.

* Add missing tags to AWS resources

Tags are required to properly set up cloud resource
on the CI infrastructure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AWS] event.dataset is not correct if data_stream.dataset is set to a customized value
7 participants