Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more Endpoint fields #265

Merged
merged 20 commits into from
Jun 29, 2022
Merged

Add more Endpoint fields #265

merged 20 commits into from
Jun 29, 2022

Conversation

ferullo
Copy link
Contributor

@ferullo ferullo commented Jun 21, 2022

Change Summary

This updates mappings to account for a number of fields Endpoint already generates.

I'm not sure if all the fields added need to be indexed or if the types are right. Some back and forth/thorough review would be appreciated.

Sample values

There are too many changes to easily share a sample document, though I can get some samples if really needed. The changes in this PR pass Endpoint testing.

Release Target

Whenever.

Q/A

Endpoint automated testing passes with these changes and exceptions that allowed testing to otherwise pass removed.

For mapping changes:

  • I ran make after making the schema changes, and committed any generated files (in schema/, generated/)
  • If these field(s) are "exception"-able, I made a companion PR to Kibana adding it (see Readme)
  • If this is a metadata change, I also updated both transform destination schemas to match

For Transform changes:

  • The new transform successfully starts in Kibana
  • The corresponding transform destination schema was updated if necessary

@ferullo ferullo changed the title Add host os type Add more Endpoint fields Jun 21, 2022
@elasticmachine
Copy link
Contributor

elasticmachine commented Jun 21, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-06-22T20:22:45.358+0000

  • Duration: 7 min 48 sec

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

Copy link
Contributor Author

@ferullo ferullo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything I can/should do to update all additions so they aren't indexed? My thought is that right now these fields aren't indexed anyway so just adding them without indexing seems like the least risky change.


- name: instruction_pointer
level: custom
type: keyword
description: >
The return address of this stack frame.

- name: memory_section.memory_address
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortuntely, it seems Endpoint generates both memory_section.memory_address and memory_section.address as well as memory_section.memory_size and memory_section.size. I'm not sure there is any non-breaking change way to undo that. Thoughts @gabriellandau ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm. memory_section.memory_address appears to have come first, a carry-over from Endgame schema. It's used for memory protection call stacks alongside memory_size.

memory_section.address appears to be new to Elastic Credential Access events (8.3.0). Can we remove it from the schema, and change CredAccess in 8.3.1 over to memory_address?

I'm not finding any references to memory_section.size - we may be okay to remove that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see call_stack.memory_region.[address|size] in custom_subsets/elastic_endpoint/alerts/malware_event.yaml, custom_subsets/elastic_endpoint/alerts/memory_protection_event.yaml, and custom_subsets/elastic_endpoint/alerts/ransomware_event.yaml. Is there any reason I shouldn't remove address and size from all three and add memory_address and memory_size?

Copy link
Contributor

@gabriellandau gabriellandau Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason I shouldn't remove address and size from all three and add memory_address and memory_size?

From what I can tell, that would be a reasonable course of action. I can do that in a follow-up PR if you don't want to do it here. The sanity test would be to take the resulting endpoint-package hash and run EAF with it, ensuring no schema violations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, based on git blame that field is really old. I wonder why it was added so long ago?

Malware git blame and PR that added it on Jul 1, 2020

Memory protection git blame and PR that added it on Apr 15, 2021

Ransomware git blame and PR that added it on Oct 26, 2020

It's unclear to me why it is Malware and Ransomware docs in the first place.

- name: Ext.device.bus_type
level: custom
type: keyword
short: FILL ME IN
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Trinity2019 can you share some documentation for these fields to replace FILL ME IN?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on it

custom_schemas/custom_file.yml Outdated Show resolved Hide resolved
@@ -361,6 +361,62 @@
Indicates the protection level of this process. Uses the same syntax as Process Explorer.
Examples include PsProtectedSignerWinTcb, PsProtectedSignerWinTcb-Light, and PsProtectedSignerWindows-Light.

- name: Ext.device.bus_type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will copy paste here too.

@pzl
Copy link
Member

pzl commented Jun 21, 2022

@ferullo if these fields do not need any indexing in ES, do we need to add these mappings? It is OK to ship data into ES without it being mapped. The use case for data in this fashion is:

documents are searched for/discovered/filtered via other fields as primary means. Then once the document is found, the additional unmapped data is returned as a payload. So that data may be useful for viewing as additional context after retrieval, but is not used for identifying the document itself. The other benefit is that storage is drastically reduced when not mapping fields that we don't need to. Yet another benefit is that we have, in general, too many fields mapped in the endpoint package, specifically in the alerts data stream. It is over a newly imposed threshold for number of mapped fields. It is whole-stack beneficial to only index fields when necessary, leading to the creation of this threshold.

Do we have a need to map these fields?

If we don't have that need, but we want some way to account for and document these fields, then we may be able to add them as you are, but set index: false. I'm not sure if adding the types and definitions but index: false uses any more storage than if the fields were anonymously ignored as they are presently. I'll also need to double check the package specification to see if index:false is even respected from a package point-of-view.

@ferullo
Copy link
Contributor Author

ferullo commented Jun 21, 2022

I hadn't realized there is a difference between mapping and indexing regarding storage size.

I'd like these fields to be mapped/defined somehow because we've gotten into a risky workflow where Endpoint keeps adding fields without going through a PR process in this repo by adding exceptions in Endpoint's repo to Endpoint testing in that would otherwise fail. The longer that list of exceptions gets the more it feels appropriate to add more exceptions, putting us in a downward spiral.

There's also the fact that these mappings define the Endpoint data schema documentation so it feels arbitrary to not have these things documented. At least a few ought to be documented (data in events/alerts). The info in Endpoint metrics documents could arguably remain undocumented but since a lot of that document is already documented it feels wrong to fail to document all of it.

Still, the ultimate size of document storage in Elasticsearch is of primary concern. My presumption was not indexing this data would address that size-bloat concern while allowing the documentation/testing/pr-process concerns to be met. I'll be curious what you find about mapping-but-not-indexing's effect on storage size. If size still winds up a concern we can figure something else out.

@ferullo
Copy link
Contributor Author

ferullo commented Jun 21, 2022

/test

custom_schemas/custom_dll.yml Outdated Show resolved Hide resolved
custom_schemas/custom_dll.yml Outdated Show resolved Hide resolved
custom_schemas/custom_dll.yml Outdated Show resolved Hide resolved
custom_schemas/custom_dll.yml Outdated Show resolved Hide resolved
custom_schemas/custom_dll.yml Outdated Show resolved Hide resolved
custom_schemas/custom_dll.yml Outdated Show resolved Hide resolved
custom_schemas/custom_dll.yml Outdated Show resolved Hide resolved
custom_schemas/custom_dll.yml Outdated Show resolved Hide resolved
custom_schemas/custom_dll.yml Outdated Show resolved Hide resolved
custom_schemas/custom_dll.yml Outdated Show resolved Hide resolved
Co-authored-by: Yamin Tian <56367679+Trinity2019@users.noreply.github.com>
Copy link
Contributor Author

@ferullo ferullo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Trinity2019

@ferullo
Copy link
Contributor Author

ferullo commented Jun 21, 2022

/test

@ferullo
Copy link
Contributor Author

ferullo commented Jun 21, 2022

I updated the types on a few fields and removed indexing for all but the Responses.* fields (since Responses.* is in alerts I figured indexing is a good thing).

@joe-desimone or @gabriellandau is it ok that the device.* fields won't be indexed or exceptionable?

@gabriellandau
Copy link
Contributor

gabriellandau commented Jun 21, 2022

@joe-desimone or @gabriellandau is it ok that the device.* fields won't be indexed or exceptionable?

Even without indexing, @Samirbous will still be able to write endpoint rules based on them.

I could imagine users wanting to hunt for processes run from USB drives (event.category: process and process.Ext.device.bus_type: Usb), for example, but I defer to @Samirbous for how likely that is.

@joe-desimone
Copy link

I think indexing them makes sense for users to write their own rules and for us to build insider threat type detections on this data in the stack. For example, in a UEBA/Entity analytics type use case, we have an interest in looking for users writing large volumes of data to USB drives.

@ferullo
Copy link
Contributor Author

ferullo commented Jun 22, 2022

Sounds good @joe-desimone @gabriellandau . I added indexing to the device.* fields.

Comment on lines +450 to +451
memory_address: {}
memory_size: {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@@ -594,8 +595,8 @@ fields:
instruction_pointer: {}
memory_section:
fields:
address: {}
size: {}
memory_address: {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to do this now, but I'm pretty sure we can get rid of thread entirely from malware events.

@ferullo
Copy link
Contributor Author

ferullo commented Jun 27, 2022

@kevinlog @pzl I think this is ready for approval and merge if it looks good (I'm not saying this doesn't need any more legit PR scrutiny, just that I think my major uncertainties have need discussed and decided upon).

I'm unsure of applicability of the unchecked items in this PR's description. I don't think any of those steps apply to this PR but defer to you to tell me if I'm wrong on that.

Copy link
Member

@pzl pzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets do it

@ferullo
Copy link
Contributor Author

ferullo commented Jun 29, 2022

Awesome. Am I good to merge? Notably, I want to make sure it's clear I haven't had a Kibana instance use these PR changes because I don't really have a dev set up to do that.

@pzl
Copy link
Member

pzl commented Jun 29, 2022

@ferullo you are okay to merge. despite the size of the PR, it's mostly just adding field definitions to the data streams which is a generally harmless operation, and CI verifies that everything is syntactically valid, and won't blow up kibana in that sense.

Merging also won't blow up any environment anywhere, and there is testing done before any release. Fixes are easy all the way up to release testing

@ferullo ferullo merged commit 168872d into master Jun 29, 2022
@ferullo ferullo deleted the add-host-os-type branch June 29, 2022 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants