-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crowdstrike FDR hostname/username enrichment #2816
Comments
Pinging @elastic/security-external-integrations (Team:Security-External Integrations) |
One approach for doing the AID Master (host details) enrichment that I want to test is:
|
@r00tu53r Cribl recently released a Crowdstrike pack, which includes support for hostname enrichment. Going by their docs, Redis is required for enrichement. https://cribl.io/blog/cribl-pack-for-crowdstrike/#name |
@jamiehynds thanks for sharing the link. Cribl indicates that it has a pack that enriches from AID data from Redis. As Andrew suggested above - I could add a new input source to index AID data into a fleet managed data stream. However, as mentioned in the linked issues we've hit a blocker when adding this capability to the integration due to limitations in Kibana.
|
I think there are still several issues that will make it difficult or non-optimal to build a solution using only Elasticsearch features exposed through Fleet integrations. The biggest being there is no way to coordinate the timing or ordering of data ingestion (aid master and fdr events), latest transform execution, and enrich policy updates. My recommendation is to explore a beat based solution that can store the AID metadata and apply the enrichment. The questions I have are about how often is AID Master data written and is it delivered as a complete or partial update. Is there a single file that contains the AID Master data and it is continuously re-written in S3? |
I'm inclined to agree with this. The Elasticsearch approach depends on having multiple components tightly integrated without a system that is well suited to doing this to support it. In addition to this complication in the set-up, this is likely to bring with it difficulties in debugging if we have users with problems in the host enrichment process. It's entirely unclear from this whether the AIDMaster write is complete or a periodic set of update events. Depending on what approach they take for these updates I'm thinking either of a time to live heap cache of the metadata or a swap-the-world table of metadata. In the case of complete writes. Both approaches would work in either case, but the swap-the-world approach is simpler to implement and is all that is needed in the case of complete writes. Given the potential delays in reading data from the buckets and how this would impact on the ability of the beat processor to enrich the events with metadata, I think that we should recommend that the agent that does this work should be colocated on AWS to reduce time costs for collecting the data. |
I have looked over potential approaches to this and I think that a processor akin to the This processor has a backing cached table of metadata that it collects as needed and uses to decorate events as they pass through. The difference between that processor and what we want here is the cost of obtaining the data to fill the look-up table. This has flow-on effects on how the collector needs to work and advice that we give to users. The design of the collector that I favour is one where periodically (triggered by the initial configuration of the processor and by calls to the processor to enrich an event, with a cool down period that can be configured) the collector will pull the most recent metadata from the remote store (s3 or sqs) running in a separate goroutine to prevent delaying beat pipeline throughput. On completing the collection the collector then atomically replaces the world with the fresh data. Again to prevent throughput stalling, if an enrichment is not available for an event, either a marker of missing data will be added or the event will be left unaltered. We cannot use (safely) a long running goroutine to perform the collection, since processor do not get handed any context that might be used to allow them to be cancelled. This means that we need to go the route of triggering short lived goroutines via the calls to the enrichment action. I don't believe that this will have a significant impact on freshness of data except in cases where the pipeline is only sporadically active. This is still an open question though. Another issue is the latency between invocation/starting a collection and the availability of the data for enrichment. I think that we should provide advice to users that the agent running this input/processor should be collocated on AWS hardware in the same zone as their s3/sqs store to minimise network time costs. The documentation available from Crowdstrike does not make clear what the behaviour is with regard to host data event dumps into the bucket. Though it does feel like the approach is to at intervals dump the complete known state (we should try to find this out). If this is the case it does not seem to me like there is any real merit in treating s3 and sqs differently by the collector's provision to the processor; if all the details get dumped as a batch it does not make sense to trickle the events through to the processor rather than just making a swap-the-world change over. The collector will need to be a new s3/sqs package as the current aws input is significantly more complex than is needed for this and has assumptions about being a managed input. With this, we can make the processor take an interface type that has methods to start collection and to swap-the-world, and so make the processor more general than just for AWS stores. At this stage it looks like the configuration of the processor would include:
|
Completed by #8474 |
Crowdstrike Falcon Data Replicator (FDR) replicates log data from your CrowdStrike environment to an S3 bucket, to enable ingestion of log data for SIEMs and other security tools. While our FDR integration ingests this data, unfortunately, Crowdstrike does not include important information such as hostname or username as part of these events, rendering the events unusable without that context.
As an example, a
ProcessRollup
event combines data from several sources into one event which describes a process which is running or has previously run on the host.UserSID
field is included with theProcessRollup2
event, TheUserSid
andAuthenticationId
fields define the security context the process was created with. To determine details about this context, find aUserIdentity
event with the sameAgent ID
,UserSid
andAuthenticationId
. Looking at aUserSid
can tell you the user a process is running as, but without also looking at theAuthenticationId
you will not be able to determine the full security context information.For hostname/computername you can correlate the aid (agent id) with the aid_master file
With FDR you also get in addition to the events listed in the Events Data Dictionary, Falcon Insight customers can optionally request these events:
• aid_master (hosts)
• managedassets
• notmanaged
While we do not have an elegant solution to enrich these events today with hostname/username, this issue is intended to track our progress on researching possible solutions/workarounds.
The text was updated successfully, but these errors were encountered: