Add more Endpoint fields #265

ferullo · 2022-06-21T12:57:12Z

Change Summary

This updates mappings to account for a number of fields Endpoint already generates.

I'm not sure if all the fields added need to be indexed or if the types are right. Some back and forth/thorough review would be appreciated.

Sample values

There are too many changes to easily share a sample document, though I can get some samples if really needed. The changes in this PR pass Endpoint testing.

Release Target

Whenever.

Q/A

Endpoint automated testing passes with these changes and exceptions that allowed testing to otherwise pass removed.

For mapping changes:

I ran make after making the schema changes, and committed any generated files (in schema/, generated/)
If these field(s) are "exception"-able, I made a companion PR to Kibana adding it (see Readme)
If this is a metadata change, I also updated both transform destination schemas to match

For Transform changes:

The new transform successfully starts in Kibana
The corresponding transform destination schema was updated if necessary

…-host-os-type

elasticmachine · 2022-06-21T13:02:40Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-06-22T20:22:45.358+0000
Duration: 7 min 48 sec

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.

ferullo

Is there anything I can/should do to update all additions so they aren't indexed? My thought is that right now these fields aren't indexed anyway so just adding them without indexing seems like the least risky change.

ferullo · 2022-06-21T12:59:28Z

custom_schemas/custom_call_stack.yml


    - name: instruction_pointer
      level: custom
      type: keyword
      description: >
        The return address of this stack frame.

+    - name: memory_section.memory_address


Unfortuntely, it seems Endpoint generates both memory_section.memory_address and memory_section.address as well as memory_section.memory_size and memory_section.size. I'm not sure there is any non-breaking change way to undo that. Thoughts @gabriellandau ?

Hrm. memory_section.memory_address appears to have come first, a carry-over from Endgame schema. It's used for memory protection call stacks alongside memory_size.

memory_section.address appears to be new to Elastic Credential Access events (8.3.0). Can we remove it from the schema, and change CredAccess in 8.3.1 over to memory_address?

I'm not finding any references to memory_section.size - we may be okay to remove that.

I see call_stack.memory_region.[address|size] in custom_subsets/elastic_endpoint/alerts/malware_event.yaml, custom_subsets/elastic_endpoint/alerts/memory_protection_event.yaml, and custom_subsets/elastic_endpoint/alerts/ransomware_event.yaml. Is there any reason I shouldn't remove address and size from all three and add memory_address and memory_size?

Is there any reason I shouldn't remove address and size from all three and add memory_address and memory_size?

From what I can tell, that would be a reasonable course of action. I can do that in a follow-up PR if you don't want to do it here. The sanity test would be to take the resulting endpoint-package hash and run EAF with it, ensuring no schema violations.

Oh, based on git blame that field is really old. I wonder why it was added so long ago?

Malware git blame and PR that added it on Jul 1, 2020

Memory protection git blame and PR that added it on Apr 15, 2021

Ransomware git blame and PR that added it on Oct 26, 2020

It's unclear to me why it is Malware and Ransomware docs in the first place.

ferullo · 2022-06-21T13:00:04Z

custom_schemas/custom_dll.yml

+    - name: Ext.device.bus_type
+      level: custom
+      type: keyword
+      short: FILL ME IN


@Trinity2019 can you share some documentation for these fields to replace FILL ME IN?

custom_schemas/custom_file.yml

ferullo · 2022-06-21T13:01:17Z

custom_schemas/custom_process.yml

@@ -361,6 +361,62 @@
        Indicates the protection level of this process.  Uses the same syntax as Process Explorer.
        Examples include PsProtectedSignerWinTcb, PsProtectedSignerWinTcb-Light, and PsProtectedSignerWindows-Light.

+    - name: Ext.device.bus_type


Will copy paste here too.

pzl · 2022-06-21T14:00:25Z

@ferullo if these fields do not need any indexing in ES, do we need to add these mappings? It is OK to ship data into ES without it being mapped. The use case for data in this fashion is:

documents are searched for/discovered/filtered via other fields as primary means. Then once the document is found, the additional unmapped data is returned as a payload. So that data may be useful for viewing as additional context after retrieval, but is not used for identifying the document itself. The other benefit is that storage is drastically reduced when not mapping fields that we don't need to. Yet another benefit is that we have, in general, too many fields mapped in the endpoint package, specifically in the alerts data stream. It is over a newly imposed threshold for number of mapped fields. It is whole-stack beneficial to only index fields when necessary, leading to the creation of this threshold.

Do we have a need to map these fields?

If we don't have that need, but we want some way to account for and document these fields, then we may be able to add them as you are, but set index: false. I'm not sure if adding the types and definitions but index: false uses any more storage than if the fields were anonymously ignored as they are presently. I'll also need to double check the package specification to see if index:false is even respected from a package point-of-view.

…-host-os-type

ferullo · 2022-06-21T14:52:39Z

I hadn't realized there is a difference between mapping and indexing regarding storage size.

I'd like these fields to be mapped/defined somehow because we've gotten into a risky workflow where Endpoint keeps adding fields without going through a PR process in this repo by adding exceptions in Endpoint's repo to Endpoint testing in that would otherwise fail. The longer that list of exceptions gets the more it feels appropriate to add more exceptions, putting us in a downward spiral.

There's also the fact that these mappings define the Endpoint data schema documentation so it feels arbitrary to not have these things documented. At least a few ought to be documented (data in events/alerts). The info in Endpoint metrics documents could arguably remain undocumented but since a lot of that document is already documented it feels wrong to fail to document all of it.

Still, the ultimate size of document storage in Elasticsearch is of primary concern. My presumption was not indexing this data would address that size-bloat concern while allowing the documentation/testing/pr-process concerns to be met. I'll be curious what you find about mapping-but-not-indexing's effect on storage size. If size still winds up a concern we can figure something else out.

ferullo · 2022-06-21T15:14:22Z

/test

custom_schemas/custom_dll.yml

Co-authored-by: Yamin Tian <56367679+Trinity2019@users.noreply.github.com>

ferullo

Thanks @Trinity2019

ferullo · 2022-06-21T18:42:51Z

/test

ferullo · 2022-06-21T19:52:38Z

I updated the types on a few fields and removed indexing for all but the Responses.* fields (since Responses.* is in alerts I figured indexing is a good thing).

@joe-desimone or @gabriellandau is it ok that the device.* fields won't be indexed or exceptionable?

gabriellandau · 2022-06-21T20:43:03Z

@joe-desimone or @gabriellandau is it ok that the device.* fields won't be indexed or exceptionable?

Even without indexing, @Samirbous will still be able to write endpoint rules based on them.

I could imagine users wanting to hunt for processes run from USB drives (event.category: process and process.Ext.device.bus_type: Usb), for example, but I defer to @Samirbous for how likely that is.

joe-desimone · 2022-06-21T21:01:39Z

I think indexing them makes sense for users to write their own rules and for us to build insider threat type detections on this data in the stack. For example, in a UEBA/Entity analytics type use case, we have an interest in looking for users writing large volumes of data to USB drives.

ferullo · 2022-06-22T20:23:09Z

Sounds good @joe-desimone @gabriellandau . I added indexing to the device.* fields.

gabriellandau · 2022-06-24T18:10:16Z

custom_subsets/elastic_endpoint/alerts/memory_protection_event.yaml

+                      memory_address: {}
+                      memory_size: {}


gabriellandau · 2022-06-24T18:14:29Z

custom_subsets/elastic_endpoint/alerts/malware_event.yaml

@@ -594,8 +595,8 @@ fields:
                  instruction_pointer: {}
                  memory_section:
                    fields:
-                      address: {}
-                      size: {}
+                      memory_address: {}


We don't need to do this now, but I'm pretty sure we can get rid of thread entirely from malware events.

ferullo · 2022-06-27T12:57:46Z

@kevinlog @pzl I think this is ready for approval and merge if it looks good (I'm not saying this doesn't need any more legit PR scrutiny, just that I think my major uncertainties have need discussed and decided upon).

I'm unsure of applicability of the unchecked items in this PR's description. I don't think any of those steps apply to this PR but defer to you to tell me if I'm wrong on that.

pzl

lets do it

ferullo · 2022-06-29T17:31:58Z

Awesome. Am I good to merge? Notably, I want to make sure it's clear I haven't had a Kibana instance use these PR changes because I don't really have a dev set up to do that.

pzl · 2022-06-29T17:55:43Z

@ferullo you are okay to merge. despite the size of the PR, it's mostly just adding field definitions to the data streams which is a generally harmless operation, and CI verifies that everything is syntactically valid, and won't blow up kibana in that sense.

Merging also won't blow up any environment anywhere, and there is testing done before any release. Fixes are easy all the way up to release testing

ferullo added 11 commits June 15, 2022 18:56

add host.os.type

b4b0608

Merge branch 'master' of github.com:elastic/endpoint-package into add…

3ffd3d5

…-host-os-type

add more fields

36b9aa5

device.*

fd5466e

more stuff

b91a531

memory_address and memory_size

9630df4

system impact code signature

a64c650

module_name

78ed6e9

response

3eb7ca7

trusted and trusted_descendant

c47c2b0

team_id and signing_id

876e8cc

ferullo changed the title ~~Add host os type~~ Add more Endpoint fields Jun 21, 2022

ferullo commented Jun 21, 2022

View reviewed changes

Merge branch 'master' of github.com:elastic/endpoint-package into add…

cd7f737

…-host-os-type

add/remove missed files

d6ba520

Trinity2019 reviewed Jun 21, 2022

View reviewed changes

Apply suggestions from code review

67843f9

Co-authored-by: Yamin Tian <56367679+Trinity2019@users.noreply.github.com>

ferullo commented Jun 21, 2022

View reviewed changes

ferullo added 3 commits June 21, 2022 18:36

copy/paste docs

c03c44e

add schemas

83a2530

more auto files

8c98efd

don't index, change type

d731c1f

ferullo added 2 commits June 22, 2022 16:19

index device.*

63f80a4

add auto generated files

51b33cf

gabriellandau approved these changes Jun 24, 2022

View reviewed changes

pzl approved these changes Jun 29, 2022

View reviewed changes

ferullo merged commit 168872d into master Jun 29, 2022

ferullo deleted the add-host-os-type branch June 29, 2022 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more Endpoint fields #265

Add more Endpoint fields #265

ferullo commented Jun 21, 2022

elasticmachine commented Jun 21, 2022 •

edited

Loading

Build stats

ferullo left a comment

ferullo Jun 21, 2022

gabriellandau Jun 21, 2022

ferullo Jun 21, 2022

gabriellandau Jun 21, 2022 •

edited

Loading

ferullo Jun 21, 2022

ferullo Jun 21, 2022

Trinity2019 Jun 21, 2022

ferullo Jun 21, 2022

pzl commented Jun 21, 2022

ferullo commented Jun 21, 2022

ferullo commented Jun 21, 2022

ferullo left a comment

ferullo commented Jun 21, 2022

ferullo commented Jun 21, 2022

gabriellandau commented Jun 21, 2022 •

edited

Loading

joe-desimone commented Jun 21, 2022

ferullo commented Jun 22, 2022

gabriellandau Jun 24, 2022

gabriellandau Jun 24, 2022

ferullo commented Jun 27, 2022

pzl left a comment

ferullo commented Jun 29, 2022

pzl commented Jun 29, 2022

Add more Endpoint fields #265

Add more Endpoint fields #265

Conversation

ferullo commented Jun 21, 2022

Change Summary

Sample values

Release Target

Q/A

For mapping changes:

For Transform changes:

elasticmachine commented Jun 21, 2022 • edited Loading

💚 Build Succeeded

Build stats

🤖 GitHub comments

ferullo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gabriellandau Jun 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pzl commented Jun 21, 2022

ferullo commented Jun 21, 2022

ferullo commented Jun 21, 2022

ferullo left a comment

Choose a reason for hiding this comment

ferullo commented Jun 21, 2022

ferullo commented Jun 21, 2022

gabriellandau commented Jun 21, 2022 • edited Loading

joe-desimone commented Jun 21, 2022

ferullo commented Jun 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ferullo commented Jun 27, 2022

pzl left a comment

Choose a reason for hiding this comment

ferullo commented Jun 29, 2022

pzl commented Jun 29, 2022

elasticmachine commented Jun 21, 2022 •

edited

Loading

gabriellandau Jun 21, 2022 •

edited

Loading

gabriellandau commented Jun 21, 2022 •

edited

Loading