-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more Endpoint fields #265
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything I can/should do to update all additions so they aren't indexed? My thought is that right now these fields aren't indexed anyway so just adding them without indexing seems like the least risky change.
|
||
- name: instruction_pointer | ||
level: custom | ||
type: keyword | ||
description: > | ||
The return address of this stack frame. | ||
|
||
- name: memory_section.memory_address |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortuntely, it seems Endpoint generates both memory_section.memory_address
and memory_section.address
as well as memory_section.memory_size
and memory_section.size
. I'm not sure there is any non-breaking change way to undo that. Thoughts @gabriellandau ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm. memory_section.memory_address
appears to have come first, a carry-over from Endgame schema. It's used for memory protection call stacks alongside memory_size
.
memory_section.address
appears to be new to Elastic Credential Access events (8.3.0). Can we remove it from the schema, and change CredAccess in 8.3.1 over to memory_address
?
I'm not finding any references to memory_section.size
- we may be okay to remove that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see call_stack.memory_region.[address|size]
in custom_subsets/elastic_endpoint/alerts/malware_event.yaml
, custom_subsets/elastic_endpoint/alerts/memory_protection_event.yaml
, and custom_subsets/elastic_endpoint/alerts/ransomware_event.yaml
. Is there any reason I shouldn't remove address
and size
from all three and add memory_address
and memory_size
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason I shouldn't remove address and size from all three and add memory_address and memory_size?
From what I can tell, that would be a reasonable course of action. I can do that in a follow-up PR if you don't want to do it here. The sanity test would be to take the resulting endpoint-package
hash and run EAF with it, ensuring no schema violations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, based on git blame that field is really old. I wonder why it was added so long ago?
Malware git blame and PR that added it on Jul 1, 2020
Memory protection git blame and PR that added it on Apr 15, 2021
Ransomware git blame and PR that added it on Oct 26, 2020
It's unclear to me why it is Malware and Ransomware docs in the first place.
custom_schemas/custom_dll.yml
Outdated
- name: Ext.device.bus_type | ||
level: custom | ||
type: keyword | ||
short: FILL ME IN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Trinity2019 can you share some documentation for these fields to replace FILL ME IN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on it
@@ -361,6 +361,62 @@ | |||
Indicates the protection level of this process. Uses the same syntax as Process Explorer. | |||
Examples include PsProtectedSignerWinTcb, PsProtectedSignerWinTcb-Light, and PsProtectedSignerWindows-Light. | |||
|
|||
- name: Ext.device.bus_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will copy paste here too.
@ferullo if these fields do not need any indexing in ES, do we need to add these mappings? It is OK to ship data into ES without it being mapped. The use case for data in this fashion is: documents are searched for/discovered/filtered via other fields as primary means. Then once the document is found, the additional unmapped data is returned as a payload. So that data may be useful for viewing as additional context after retrieval, but is not used for identifying the document itself. The other benefit is that storage is drastically reduced when not mapping fields that we don't need to. Yet another benefit is that we have, in general, too many fields mapped in the endpoint package, specifically in the Do we have a need to map these fields? If we don't have that need, but we want some way to account for and document these fields, then we may be able to add them as you are, but set |
I hadn't realized there is a difference between mapping and indexing regarding storage size. I'd like these fields to be mapped/defined somehow because we've gotten into a risky workflow where Endpoint keeps adding fields without going through a PR process in this repo by adding exceptions in Endpoint's repo to Endpoint testing in that would otherwise fail. The longer that list of exceptions gets the more it feels appropriate to add more exceptions, putting us in a downward spiral. There's also the fact that these mappings define the Endpoint data schema documentation so it feels arbitrary to not have these things documented. At least a few ought to be documented (data in events/alerts). The info in Endpoint metrics documents could arguably remain undocumented but since a lot of that document is already documented it feels wrong to fail to document all of it. Still, the ultimate size of document storage in Elasticsearch is of primary concern. My presumption was not indexing this data would address that size-bloat concern while allowing the documentation/testing/pr-process concerns to be met. I'll be curious what you find about mapping-but-not-indexing's effect on storage size. If size still winds up a concern we can figure something else out. |
/test |
Co-authored-by: Yamin Tian <56367679+Trinity2019@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Trinity2019
/test |
I updated the types on a few fields and removed indexing for all but the @joe-desimone or @gabriellandau is it ok that the |
Even without indexing, @Samirbous will still be able to write endpoint rules based on them. I could imagine users wanting to hunt for processes run from USB drives ( |
I think indexing them makes sense for users to write their own rules and for us to build insider threat type detections on this data in the stack. For example, in a UEBA/Entity analytics type use case, we have an interest in looking for users writing large volumes of data to USB drives. |
Sounds good @joe-desimone @gabriellandau . I added indexing to the |
memory_address: {} | ||
memory_size: {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
@@ -594,8 +595,8 @@ fields: | |||
instruction_pointer: {} | |||
memory_section: | |||
fields: | |||
address: {} | |||
size: {} | |||
memory_address: {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to do this now, but I'm pretty sure we can get rid of thread
entirely from malware events.
@kevinlog @pzl I think this is ready for approval and merge if it looks good (I'm not saying this doesn't need any more legit PR scrutiny, just that I think my major uncertainties have need discussed and decided upon). I'm unsure of applicability of the unchecked items in this PR's description. I don't think any of those steps apply to this PR but defer to you to tell me if I'm wrong on that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets do it
Awesome. Am I good to merge? Notably, I want to make sure it's clear I haven't had a Kibana instance use these PR changes because I don't really have a dev set up to do that. |
@ferullo you are okay to merge. despite the size of the PR, it's mostly just adding field definitions to the data streams which is a generally harmless operation, and CI verifies that everything is syntactically valid, and won't blow up kibana in that sense. Merging also won't blow up any environment anywhere, and there is testing done before any release. Fixes are easy all the way up to release testing |
Change Summary
This updates mappings to account for a number of fields Endpoint already generates.
I'm not sure if all the fields added need to be indexed or if the types are right. Some back and forth/thorough review would be appreciated.
Sample values
There are too many changes to easily share a sample document, though I can get some samples if really needed. The changes in this PR pass Endpoint testing.
Release Target
Whenever.
Q/A
Endpoint automated testing passes with these changes and exceptions that allowed testing to otherwise pass removed.
For mapping changes:
make
after making the schema changes, and committed any generated files (inschema/
,generated/
)metadata
change, I also updated both transform destination schemas to matchFor Transform changes: