Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined log.file.* fields breaking tests for filestream inputs #7716

Merged
merged 7 commits into from
Oct 1, 2023

Conversation

bhapas
Copy link
Contributor

@bhapas bhapas commented Sep 8, 2023

What does this PR do?

Issue described in #7687

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Related issues

@elasticmachine
Copy link

elasticmachine commented Sep 8, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-09-28T14:52:07.846+0000

  • Duration: 26 min 32 sec

Test stats 🧪

Test Results
Failed 0
Passed 62
Skipped 0
Total 62

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

elasticmachine commented Sep 8, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (10/10) 💚
Files 100.0% (11/11) 💚
Classes 100.0% (11/11) 💚
Methods 95.146% (98/103)
Lines 80.69% (1099/1362)
Conditionals 100.0% (0/0) 💚

@bhapas bhapas marked this pull request as ready for review September 27, 2023 19:22
@bhapas bhapas requested a review from a team as a code owner September 27, 2023 19:22
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I look at this problem I think we need to go back to the ingest team, and talk about changing the field types to be strings for the reason given in #7716 (comment). Otherwise, this will be an uphill battle to keep a consistent mapping type between integrations. As soon as an integration forgets to map these numeric fields then there will be a conflict with the integrations that do map it as keyword. Today there are 22 separate data streams associated with the filestream input that need mappings for these fields.

Secondly, the fields are uint6412 in Beats. The default mapping will be long3, and this is not the correct type because a uint64 does not fit into long (i.e. 2^64 > 2^63).

Third, this would be an opportunity to align the definitions between operating systems and propose these to ECS. Like perhaps

Type Unix Windows
log.file.device_id string Device ID NTFS volume
log.file.inode string Inode number in decimal File index number in decimal (high and low order 32-bit values combined to form a uint64)

@rdner @ycombinator Would you be open to some changes to these fields in the input?

Footnotes

  1. https://github.com/elastic/beats/blob/8ff6894cab0591cc3a898d7abd6a9103e535efdd/libbeat/common/file/file_windows.go#L32-L34

  2. https://github.com/elastic/beats/blob/f0215ff5cfdbcb0db8ebc1f07af1b4885081389b/libbeat/common/file/file_other.go#L29-L30

  3. https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html

@bhapas
Copy link
Contributor Author

bhapas commented Sep 28, 2023

The more I look at this problem I think we need to go back to the ingest team, and talk about changing the field types to be strings for the reason given in #7716 (comment). Otherwise, this will be an uphill battle to keep a consistent mapping type between integrations. As soon as an integration forgets to map these numeric fields then there will be a conflict with the integrations that do map it as keyword. Today there are 22 separate data streams associated with the filestream input that need mappings for these fields.

Secondly, the fields are uint6412 in Beats. The default mapping will be long3, and this is not the correct type because a uint64 does not fit into long (i.e. 2^64 > 2^63).

Third, this would be an opportunity to align the definitions between operating systems and propose these to ECS. Like perhaps

Type Unix Windows
log.file.device_id string Device ID NTFS volume
log.file.inode string Inode number in decimal File index number in decimal (high and low order 32-bit values combined to form a uint64)
@rdner @ycombinator Would you be open to some changes to these fields in the input?

Footnotes

  1. https://github.com/elastic/beats/blob/8ff6894cab0591cc3a898d7abd6a9103e535efdd/libbeat/common/file/file_windows.go#L32-L34
  2. https://github.com/elastic/beats/blob/f0215ff5cfdbcb0db8ebc1f07af1b4885081389b/libbeat/common/file/file_other.go#L29-L30
  3. https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html

Created an issue for this in beats elastic/beats#36695

@rdner
Copy link
Member

rdner commented Sep 28, 2023

@andrewkroh I'd like to point out 2 things:

  1. initially log.file.device, log.file.inode, log.file.idxhi, log.file.idxlo and log.file.vol were strings and during the review were switched to numbers based on the argument that someone would like to use a range search on these values.

  2. Regarding the file.device field in ECS it's not the same as log.file.device_id as I explained during the review here Add file system information to each event beats#36065 (comment)

I'm fine with reverting everything back to strings if @ycombinator is fine with it.

I'm surprised adding these fields caused so many issues, apologies. Next time, I will involve integration developers into a review when we add more fields.

@ycombinator
Copy link
Contributor

Yep, I'm good with using strings / keywords for these fields given how they're intended to be used.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need better technical descriptions of these fields. I have made suggestions. PTAL

@rdner Can you please also review these descriptions to ensure their accuracy? Where are these field descriptions in Filebeat (example)?

@rdner
Copy link
Member

rdner commented Sep 28, 2023

@andrewkroh the descriptions you suggested looks perfect to me.

Regarding the example file, should every event field be there? I'm asking because it's missing a few fields, for example, log.file.path is not there either.

What's the purpose of this file exactly?

@andrewkroh
Copy link
Member

Regarding the example file, should every event field be there? I'm asking because it's missing a few fields, for example, log.file.path is not there either.

These files drive our documentation and the Elasticsearch index templates. log.file.path is not in that file because it is defined in another file in the repo that provides the ECS fields. On the website you will see log.file.path in the ECS section and log.file.offset in another section because that predates the existence of ECS.

@elasticmachine
Copy link

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@bhapas bhapas added the enhancement New feature or request label Sep 28, 2023
- name: fingerprint
type: keyword
description: The sha256 fingerprint identity of the file when fingerprinting is enabled.
- name: inode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that these fields are currently numeric and would be changed to string as per this comment, is it fine to use keyword here as of now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any JSON numeric value can be mapped as keyword in ES without conversion to a JSON string.

Copy link
Contributor

@ishleenk17 ishleenk17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@bhapas bhapas merged commit 5ecc4dc into elastic:main Oct 1, 2023
4 checks passed
@elasticmachine
Copy link

Package apache_tomcat - 0.13.0 containing this change is available at https://epr.elastic.co/search?package=apache_tomcat

@elasticmachine
Copy link

Package coredns - 0.5.0 containing this change is available at https://epr.elastic.co/search?package=coredns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants