Crowdstrike FDR hostname/username enrichment #2816

jamiehynds · 2022-03-10T12:03:24Z

Crowdstrike Falcon Data Replicator (FDR) replicates log data from your CrowdStrike environment to an S3 bucket, to enable ingestion of log data for SIEMs and other security tools. While our FDR integration ingests this data, unfortunately, Crowdstrike does not include important information such as hostname or username as part of these events, rendering the events unusable without that context.

As an example, a ProcessRollup event combines data from several sources into one event which describes a process which is running or has previously run on the host. UserSID field is included with the ProcessRollup2 event, The UserSid and AuthenticationId fields define the security context the process was created with. To determine details about this context, find a UserIdentity event with the same Agent ID, UserSid and AuthenticationId. Looking at a UserSid can tell you the user a process is running as, but without also looking at the AuthenticationId you will not be able to determine the full security context information.

For hostname/computername you can correlate the aid (agent id) with the aid_master file
With FDR you also get in addition to the events listed in the Events Data Dictionary, Falcon Insight customers can optionally request these events:
• aid_master (hosts)
• managedassets
• notmanaged

While we do not have an elegant solution to enrich these events today with hostname/username, this issue is intended to track our progress on researching possible solutions/workarounds.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-03-10T12:03:44Z

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

andrewkroh · 2022-03-10T15:30:12Z

One approach for doing the AID Master (host details) enrichment that I want to test is:

Create a separate input in the CrowdStrike FDR integration to read the AID Master data from S3. Allow usage of both SQS and S3 polling. Page 19 of this PDF indicates that their are separate SQS queues for AID Master and FDR data. https://www.crowdstrike.com/wp-content/uploads/2021/10/crowdstrike-falcon-data-replicator-fdr-sqs-technical-add-on-guide.pdf
With the Crowdstrike AID Master data being written to a Fleet managed data stream, apply a "latest" transform to create an entity-centric index with one doc per unique AID. Transforms are already support by Fleet packages.

Or if Fleet allows, route Crowdstrike AID Master data to its own (regular) index. Use the AID value as the _id and update records. This avoids the need for a transform.
Periodically execute an enrich policy to update the enrich index based on data from the transform output. Fleet does not support enrich policies today, but if this approach works we can propose an enhancement. Fleet would need to periodically update the enrich index unless we had a feature like Introduce instant enrich policies. elasticsearch#73407.
Add an enrich processor to the CrowdStrike pipeline to add AID Master data to FDR events an ingest time.

andrewkroh · 2022-07-19T20:46:50Z

Relates:

Package with transform in yaml format does not install elastic-package#895
Elastic package with transform does not create the required destination index elastic-package#896

jamiehynds · 2022-08-02T12:58:19Z

@r00tu53r Cribl recently released a Crowdstrike pack, which includes support for hostname enrichment. Going by their docs, Redis is required for enrichement. https://cribl.io/blog/cribl-pack-for-crowdstrike/#name

r00tu53r · 2022-08-09T09:53:55Z

@jamiehynds thanks for sharing the link. Cribl indicates that it has a pack that enriches from AID data from Redis.

As Andrew suggested above - I could add a new input source to index AID data into a fleet managed data stream.

However, as mentioned in the linked issues we've hit a blocker when adding this capability to the integration due to limitations in Kibana.

we are unable to install a transform in YAML (which has a workaround of using JSON elastic-package#895)
transforms created does not create entity centric index. elastic-package#896

andrewkroh · 2023-07-19T18:51:21Z

I think there are still several issues that will make it difficult or non-optimal to build a solution using only Elasticsearch features exposed through Fleet integrations. The biggest being there is no way to coordinate the timing or ordering of data ingestion (aid master and fdr events), latest transform execution, and enrich policy updates.

My recommendation is to explore a beat based solution that can store the AID metadata and apply the enrichment.

The questions I have are about how often is AID Master data written and is it delivered as a complete or partial update. Is there a single file that contains the AID Master data and it is continuously re-written in S3?

efd6 · 2023-07-24T04:33:39Z

I'm inclined to agree with this. The Elasticsearch approach depends on having multiple components tightly integrated without a system that is well suited to doing this to support it. In addition to this complication in the set-up, this is likely to bring with it difficulties in debugging if we have users with problems in the host enrichment process.

It's entirely unclear from this whether the AIDMaster write is complete or a periodic set of update events.

Depending on what approach they take for these updates I'm thinking either of a time to live heap cache of the metadata or a swap-the-world table of metadata. In the case of complete writes. Both approaches would work in either case, but the swap-the-world approach is simpler to implement and is all that is needed in the case of complete writes.

Given the potential delays in reading data from the buckets and how this would impact on the ability of the beat processor to enrich the events with metadata, I think that we should recommend that the agent that does this work should be colocated on AWS to reduce time costs for collecting the data.

efd6 · 2023-07-28T00:55:24Z

I have looked over potential approaches to this and I think that a processor akin to the add_process_metadata processor is probably the kind of thing we want here.

This processor has a backing cached table of metadata that it collects as needed and uses to decorate events as they pass through. The difference between that processor and what we want here is the cost of obtaining the data to fill the look-up table. This has flow-on effects on how the collector needs to work and advice that we give to users.

The design of the collector that I favour is one where periodically (triggered by the initial configuration of the processor and by calls to the processor to enrich an event, with a cool down period that can be configured) the collector will pull the most recent metadata from the remote store (s3 or sqs) running in a separate goroutine to prevent delaying beat pipeline throughput. On completing the collection the collector then atomically replaces the world with the fresh data. Again to prevent throughput stalling, if an enrichment is not available for an event, either a marker of missing data will be added or the event will be left unaltered.

We cannot use (safely) a long running goroutine to perform the collection, since processor do not get handed any context that might be used to allow them to be cancelled. This means that we need to go the route of triggering short lived goroutines via the calls to the enrichment action. I don't believe that this will have a significant impact on freshness of data except in cases where the pipeline is only sporadically active. This is still an open question though.

Another issue is the latency between invocation/starting a collection and the availability of the data for enrichment. I think that we should provide advice to users that the agent running this input/processor should be collocated on AWS hardware in the same zone as their s3/sqs store to minimise network time costs.

The documentation available from Crowdstrike does not make clear what the behaviour is with regard to host data event dumps into the bucket. Though it does feel like the approach is to at intervals dump the complete known state (we should try to find this out). If this is the case it does not seem to me like there is any real merit in treating s3 and sqs differently by the collector's provision to the processor; if all the details get dumped as a batch it does not make sense to trickle the events through to the processor rather than just making a swap-the-world change over.

The collector will need to be a new s3/sqs package as the current aws input is significantly more complex than is needed for this and has assumptions about being a managed input. With this, we can make the processor take an interface type that has methods to start collection and to swap-the-world, and so make the processor more general than just for AWS stores.

At this stage it looks like the configuration of the processor would include:

endpoint for backing data
timeout duration for enrichment (default to zero)
value to place into event in absence of available metadata (default to zero)
key field in the event document to base enrichment on
destination field in event document

efd6 · 2023-11-28T20:11:00Z

Completed by #8474

jamiehynds changed the title ~~Crowdstrike FDR hostname/username enrichement~~ Crowdstrike FDR hostname/username enrichment Mar 10, 2022

jamiehynds assigned andrewkroh Mar 10, 2022

jamiehynds added the Team:Security-External Integrations label Mar 10, 2022

jamiehynds added 8.3-candidate Integration:crowdstrike CrowdStrike labels Mar 10, 2022

jamiehynds removed the 8.3-candidate label Apr 6, 2022

andrewkroh added the 8.4-candidate label Apr 26, 2022

andrewkroh removed their assignment Apr 26, 2022

r00tu53r self-assigned this Jun 6, 2022

jamiehynds added Epic 8.4 candidate and removed 8.4-candidate labels Jun 23, 2022

MikePaquette added the v8.4.0 label Jun 29, 2022

epixa unassigned r00tu53r Sep 20, 2022

jamiehynds removed v8.4.0 8.4 candidate labels Jan 18, 2023

ebeahan assigned efd6 Apr 17, 2023

jamiehynds added 8.9 candidate and removed Epic labels May 10, 2023

efd6 mentioned this issue Aug 30, 2023

libbeat/processors/add_remote_metadata: new processor elastic/beats#36456

Closed

6 tasks

jamiehynds added 8.11 candidate and removed 8.9 candidate labels Aug 30, 2023

jamiehynds added the Epic label Aug 30, 2023

efd6 mentioned this issue Sep 18, 2023

libbeat/processors/cache: new processor elastic/beats#36619

Merged

6 tasks

This was referenced Oct 6, 2023

libbeat/docs: add item for cache processor elastic/beats#36785

Merged

filebeat/cmd: import cache processor elastic/beats#36786

Merged

efd6 mentioned this issue Nov 28, 2023

crowdstrike: add support for host info enrichment #8474

Merged

5 tasks

efd6 closed this as completed Nov 28, 2023

efd6 mentioned this issue Dec 18, 2023

crowdstrike: add userinfo enrichment support and map fields to ECS #8742

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crowdstrike FDR hostname/username enrichment #2816

Crowdstrike FDR hostname/username enrichment #2816

jamiehynds commented Mar 10, 2022 •

edited by r00tu53r

Loading

elasticmachine commented Mar 10, 2022

andrewkroh commented Mar 10, 2022 •

edited

Loading

andrewkroh commented Jul 19, 2022

jamiehynds commented Aug 2, 2022

r00tu53r commented Aug 9, 2022

andrewkroh commented Jul 19, 2023

efd6 commented Jul 24, 2023

efd6 commented Jul 28, 2023 •

edited

Loading

efd6 commented Nov 28, 2023

Crowdstrike FDR hostname/username enrichment #2816

Crowdstrike FDR hostname/username enrichment #2816

Comments

jamiehynds commented Mar 10, 2022 • edited by r00tu53r Loading

elasticmachine commented Mar 10, 2022

andrewkroh commented Mar 10, 2022 • edited Loading

andrewkroh commented Jul 19, 2022

jamiehynds commented Aug 2, 2022

r00tu53r commented Aug 9, 2022

andrewkroh commented Jul 19, 2023

efd6 commented Jul 24, 2023

efd6 commented Jul 28, 2023 • edited Loading

efd6 commented Nov 28, 2023

jamiehynds commented Mar 10, 2022 •

edited by r00tu53r

Loading

andrewkroh commented Mar 10, 2022 •

edited

Loading

efd6 commented Jul 28, 2023 •

edited

Loading