Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add .caseless subfield to process.name & process.executable #2341

Merged
merged 5 commits into from
Jul 10, 2024

Conversation

w0rk3r
Copy link
Contributor

@w0rk3r w0rk3r commented May 22, 2024

Summary

Related PR: elastic/integrations#9850

This PR aims to add a subfield to the process.name and process.executable fields to improve the compatibility of data sources like System, Sysmon, etc., with our Elastic Defend data. This enables us to handle language limitations in KQL more effectively.

Elastic Defend Mapping:
image

@w0rk3r w0rk3r requested a review from andrewkroh May 22, 2024 19:19
@w0rk3r w0rk3r self-assigned this May 22, 2024
This reverts commit fd33ddc.
@w0rk3r w0rk3r marked this pull request as ready for review July 1, 2024 14:42
@w0rk3r w0rk3r requested a review from a team as a code owner July 1, 2024 14:42
@andrewkroh andrewkroh requested review from a team July 3, 2024 22:44
Copy link
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pity the can doesn't describe the contents; in a perfect world calling this "lowercase" would have been better, but we are where we are.

Please ensure that the final commit message is not just a GitHub fusion of the commit messages here.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally reluctant about the naming because I felt that .lowercase is more meaningful for what is stored. But the other aspect of the normalizer is the search-time normalization, and this actively lowercases the search term making the query case-insensitive (or "caseless"). So I think I have rationalized the naming for myself, and I am OK with .caseless.

But I would like to hear opinions from others on where they stand with the naming. I have tagged a few teams, but anyone is welcome to chime in.

@w0rk3r w0rk3r requested a review from andrewkroh July 8, 2024 11:06
@felixbarny
Copy link
Member

felixbarny commented Jul 12, 2024

Looks like this has mainly been added for security use cases and because of language limitations in KQL. Have we considered enhancing KQL, moving to ES|QL, or only adding the new sub-fields to specific integrations or in security project types?
My concern is the storage impact for observability use cases.

@eyalkoren
Copy link
Contributor

@w0rk3r @andrewkroh @efd6 we need your input on @felixbarny's question above and on @ruflin's question, so we can move on with elastic/elasticsearch#110740. In the mean time, ECS is not fully covered by our own mappings, which are widely used, and we have these tests failing daily.

A bit of context - ecs@mappings are dynamic mappings that fully cover ECS. In order to verify that they are kept up-to-date with ECS, we employ daily tests that fail unless all fiedls are properly covered. Since you merged this PR, ecs@mappings is out of sync with ECS.
Please respond here and in the related fix.
Thanks!

@andrewkroh
Copy link
Member

andrewkroh commented Jul 17, 2024

I pushed for standardizing this multi-field because I think it is a negative user-experience to have varying mapping definitions for an ECS field across integrations. I think users expect consistency when dealing with ECS. i.e. No matter were field X is used, it has the same semantics and characteristics.

I do share the storage concerns even for security use cases. So if there something that can be done on the query-side to obviate the need for this multi-field that would be a great.

@elastic/threat-research-and-detection-engineering team, is it feasible to transition to ES|QL for the queries that depend on the "caseless" multi-fields?

@felixbarny do you have any insight into whether KQL could support this case-insensitive query?

If either ES|QL or KQL are probable solutions then I would be OK with temporary adding an inconsistent definition to the integrations that need the .caseless multi-field to bridge the gap and we can revert this PR (which has not be released).

@ebeahan
Copy link
Member

ebeahan commented Jul 17, 2024

do you have any insight into whether KQL could support this case-insensitive query?

Linking a couple of past discussions: elastic/kibana#80591, elastic/kibana#55378

@w0rk3r
Copy link
Contributor Author

w0rk3r commented Jul 17, 2024

@elastic/threat-research-and-detection-engineering team, is it feasible to transition to ES|QL for the queries that depend on the "caseless" multi-fields?

Not at the moment, mainly because new_terms rules don't support any other language than KQL. We avoid KQL whenever possible, but we still have limitations and performance scenarios that force us to use it. Also, we support kibana versions where ESQL isn't available or is running previous versions (which can lead to bugs).

But this isn't just for our prebuilt content. Some customers using Sysmon or Windows security logs have custom processors to add a lowercase version of these fields (we saw this in the last Locked Shields we participated), and others don't know they are required to do this and end up having false negatives (in our prebuilt content, custom use cases, and rules imported from Sigma).

I'm ok with going the integrations route again if needed.

@felixbarny
Copy link
Member

I think users expect consistency when dealing with ECS. i.e. No matter were field X is used, it has the same semantics and characteristics

That makes sense. However, in this case, it seems to me that we're adding a subfield only to work around limitations in KQL and the fact that some rules don't support any other language than KQL. To me, this indicates that these sub-fields shouldn't be part of the default set of fields for ECS.

Am I right to assume that our preferred long-term solution is the rely on ES|QL case-insensitive behavior? Or are there other strong reasons, like a more efficient search, that would warrant the increased storage impact for all use cases?

If the long-term solution is ES|QL, it seems like we shouldn't add .caseless fields to ECS. To unblock us in the short-term, integration-specific sub-fields seem more appropriate.

@w0rk3r
Copy link
Contributor Author

w0rk3r commented Jul 18, 2024

I just opened this PR in integrations to add them to the specific integrations if that is the way we choose to proceed: elastic/integrations#10533

@ruflin
Copy link
Member

ruflin commented Jul 18, 2024

++ on keeping the change to a minimum at first. Should we revert this ECS change?

andrewkroh added a commit that referenced this pull request Jul 19, 2024
andrewkroh added a commit that referenced this pull request Jul 23, 2024
…2350)

This reverts commit 7815b3f from #2341.

This is being reverted due to storage concerns. The goal will be to advance the native querying capabilities (ES|QL, KQL) of the Elastic stack such that this extra normalized multi-field is not necessary. In the meantime, localized overrides of the ECS field definition will be used to add the additional multi-field where needed. The downside of localized overrides are that it creates inconsistency across usages of the this field.
@andrewkroh
Copy link
Member

Revert is complete. See #2350.

lksnyder0 added a commit to huntresslabs/ecs that referenced this pull request Oct 9, 2024
* Add .caseless subfield to process.name & process.executable (elastic#2341)

Adds a subfield to the process.name and process.executable fields to improve the compatibility of data sources like System, Sysmon, etc., with our Elastic Defend data, which enables us to handle language limitations in KQL more effectively.

* Revert "Add .caseless subfield to process.name & process.executable" (elastic#2350)

This reverts commit 7815b3f from elastic#2341.

This is being reverted due to storage concerns. The goal will be to advance the native querying capabilities (ES|QL, KQL) of the Elastic stack such that this extra normalized multi-field is not necessary. In the meantime, localized overrides of the ECS field definition will be used to add the additional multi-field where needed. The downside of localized overrides are that it creates inconsistency across usages of the this field.

* [RFC] Apple Platform specific fields (elastic#2338)

Adds RFS stage 0

---------

Co-authored-by: Alexandra Konrad <alexandra.konrad@elastic.co>
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* Add renovate.json (elastic#2352)

Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* Update template fields (elastic#2354)

Update some templated fields that were missed before merging the RFC

* Pin dependencies (elastic#2355)

Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* Update dependency PyYAML to v6.0.2 (elastic#2356)

Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* Update dependency gitpython to v3.1.43 (elastic#2358)

Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* Update dependency yamllint to v1.35.1 (elastic#2361)

Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* Update stale PR message (elastic#2369)

Add a friendlier stale PR message, based from the
[Beats stale message](https://github.com/elastic/beats/blob/main/.github/stale.yml#L63-L74).

This will hopefully also prompt contributors to respond, so we'll be better able to track PRs
people are still interested in contributing.

* Update actions/checkout action to v4 (elastic#2362)

Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* Update actions/github-script action to v7 (elastic#2363)

Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* Update actions/setup-python action to v5 (elastic#2364)

Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* Update actions/stale action to v9 (elastic#2365)

Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* Update dependency mock to v5 (elastic#2367)

Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* Update dependency ubuntu to v22 (elastic#2368)

Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* Update dependency autopep8 to v1.7.0 (elastic#2359)

Update dependency autopep8 to v1.7.0

---------

Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* Update dependency autopep8 to v2 (elastic#2366)

* Update dependency autopep8 to v2

---------

Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* add license header (elastic#2377)

* Update actions/setup-python digest to f677139 (elastic#2374)

Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* [RFC] Stage 0: Introducing new field in rule namespace (elastic#2330)

* Update 0000-rfc-template.md

Updating the temaplate for RFC Stage 0 for adding 2 new rule fields: rule.tags and rule.remediation

* Update 0000-rfc-template.md

Incorporating review comments.

* Renaming the template file with recommended name

* Resolving conflicts

* Removing Tag Field

* Resolving comments from @trisch-me

* Moving file to rfcs/text folder as per @trisch-me comment. using next number in series.

* I saw number 44 was used in a recent RFC, using next number in series

---------

Co-authored-by: Eric Beahan <eric.beahan@elastic.co>
Co-authored-by: Alexandra Konrad <alexandra.konrad@elastic.co>

* [RFC] Stage 2: Adding Apple Platform specific fields (elastic#2370)

Updating the RFC and moving it to stage two.

* code blocks specified language yaml (elastic#2380)

Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* trim trailing whitespace in schema (elastic#2379)

Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

* [RFC] Stage 0: Introducing new fields in ECS vulnerability field set (elastic#2331)

* RFC to add new fields in ECS vulnerability field set

RFC to add new fields in ECS vulnerability field set

* Moving to separate file

* set title and add stage 0 PR #

* clean up fields table markdown

* Moving to (rfcs/text) and renaming file to next number in series.

* Resolving the comments from @trisch-me

* Update rfcs/text/0045-additional-vulnerability-fields.md

Co-authored-by: Alexandra Konrad <alexandra.konrad@elastic.co>

* Update rfcs/text/0045-additional-vulnerability-fields.md

Co-authored-by: Alexandra Konrad <alexandra.konrad@elastic.co>

* Making changed to the date format as per comments from @trisch-me

* Resolving @trisch-me comments

* Resolving latest comments

* Update rfcs/text/0045-additional-vulnerability-fields.md

Co-authored-by: Alexandra Konrad <alexandra.konrad@elastic.co>

---------

Co-authored-by: Eric Beahan <eric.beahan@elastic.co>
Co-authored-by: Alexandra Konrad <alexandra.konrad@elastic.co>

* Fix type in code signature (elastic#2382)

Change the type of code_signature.flags to keyword, which is what it should be. Also add a unit test that will verify all types are valid.

* Enforce yamllint in CI (elastic#2381)

Start running and enforcing yamllint checks in CI.

* Add Stage0 RFC for new fields for fileless execution on Linux (elastic#2322)

* Add support for settings

* Fix settings merging

* Restrict test workflow

* Fix merge conflicts

* Less restrictive

* Add docker files and pipeline

* Make building more restrictive

* Simplify build workflow

* Update tagging strategy

* Removing unused variable

* Kick?

* Anchors aren't supported 😭

* Fix role name

* Test branch name

* Remove extra default update (#3)

* Remove extra default update

* Fix role name

* Add support for a top-level type (#4)

* Add support for a top-level type

* Actually, don't need to be all the complicated

* Type needs to be nested within the field name (#5)

* Add documention for parameters field (#6)

* Add undocumented field argument

* Remove the PR template

---------

Co-authored-by: Jonhnathan <26856693+w0rk3r@users.noreply.github.com>
Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
Co-authored-by: Thijs Xhaflaire <thijsxhaflaire31@hotmail.com>
Co-authored-by: Alexandra Konrad <alexandra.konrad@elastic.co>
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>
Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
Co-authored-by: Stefan Bischof <bipolis@bipolis.org>
Co-authored-by: Smriti <152067238+smriti0321@users.noreply.github.com>
Co-authored-by: Eric Beahan <eric.beahan@elastic.co>
Co-authored-by: Michal Stanek <75310947+stanek-michal@users.noreply.github.com>
@nicpenning
Copy link
Contributor

nicpenning commented Oct 24, 2024

One of the biggest compliants I hear about Kibana/Elastic is this specific issue. Knowing when to use the right case sensitivity. Users have to take extra steps continually to attempt upper/lower or wild card searches to find what they ate looking for. That is why we force many fields to lowercase so they don't have to think about this. KQL is still 99% of the searches because it is so simple and it gets the job done. ES|QL is not a long term solution IMO. Giving KQL the ability to be caseless in other ways would be great if the mappings won't do it. Host names is a other area where this occurs and unfortunately system indices / areas of the stack this problematic because things like OSQuery require exact matches but Fleet seems to work okay. Perhaps optional multifield types in ECS could allow the best of both worlds where there is no issue/conflict with having all or just 1 extra multifield. Something that probably doesn't exist as a capability today but just a thought. It's a little uneasy that the MDE integration just got this treatment which drifts further away from ECS if I am tracking this thread right. If you ask me, the fields should be simply lowercased today and retain the original process.name in integration_name.process.name.original. It is important to see the case sensitivity because that could add valuable insight into threat activity.

Just my 2 cents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants