Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crowdstrike: add support for host info enrichment #8474

Merged
merged 15 commits into from
Nov 28, 2023

Conversation

efd6
Copy link
Contributor

@efd6 efd6 commented Nov 13, 2023

Proposed commit message

See title.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

  • The fdr datastream offers both s3 and logfile inputs. Limitations in elastic-package prevent us from being able to have two system test infrastructure deployments, so only the logfile s3 input is tested.

How to test this PR locally

Make the following change:

diff --git a/packages/crowdstrike/data_stream/fdr/_dev/deploy/tf/env.yml b/packages/crowdstrike/data_stream/fdr/_dev/deploy/tf/env.yml
index b795fcdeb..6e1f17f7a 100644
--- a/packages/crowdstrike/data_stream/fdr/_dev/deploy/tf/env.yml
+++ b/packages/crowdstrike/data_stream/fdr/_dev/deploy/tf/env.yml
@@ -7,3 +7,4 @@ services:
       - AWS_SESSION_TOKEN=${AWS_SESSION_TOKEN}
       - AWS_PROFILE=${AWS_PROFILE}
       - AWS_REGION=${AWS_REGION:-us-east-1}
+      - TF_VAR_eventbridge_role_arn=arn:aws:iam::144492464627:role/eb-scheduler-role-20231101165501426500000001

Then:

$ export AWS_DEFAULT_PROFILE=elastic-siem
$ aws-mfa --profile=elastic-siem
$ eval $(grep ^aws ~/.aws/credentials | gsed -r 's/^(aws[^ ]+) = (.*)$/export \U\1\E=\2/g')

(gsed is GNU sed so if on macos you will need to install that, is on linux s/gsed/sed/ in the command above)

Then test with elastic-package as normal.

For convenience (because the AWS setup was not trivial), the TF graph is here.

graph

Related issues

Screenshots

@elasticmachine
Copy link

elasticmachine commented Nov 13, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-11-28T19:34:05.461+0000

  • Duration: 20 min 16 sec

Test stats 🧪

Test Results
Failed 0
Passed 34
Skipped 0
Total 34

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

elasticmachine commented Nov 13, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (2/2) 💚
Files 100.0% (15/15) 💚
Classes 100.0% (15/15) 💚
Methods 95.918% (94/98) 👎 -4.082
Lines 88.261% (3609/4089) 👎 -11.739
Conditionals 100.0% (0/0) 💚

@efd6
Copy link
Contributor Author

efd6 commented Nov 13, 2023

/test

@efd6 efd6 force-pushed the 2816-crowdstrike branch 5 times, most recently from 2751001 to db9ce4a Compare November 15, 2023 01:00
@efd6 efd6 marked this pull request as ready for review November 15, 2023 01:33
@efd6 efd6 requested a review from a team as a code owner November 15, 2023 01:33
@elasticmachine
Copy link

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

required: true
show_user: true
title: Enrich Host Metadata
description: Uses data in aidmaster to add host information to events. The aidmaster blob must string "aidmaster" in its path and the FDR Notification Parsing Script must sort events so that aidmaster events appear first in the stream.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: Uses data in aidmaster to add host information to events. The aidmaster blob must string "aidmaster" in its path and the FDR Notification Parsing Script must sort events so that aidmaster events appear first in the stream.
description: Uses data in aidmaster to add host information to events. The aidmaster blob must contain string "aidmaster" in its path and the FDR Notification Parsing Script must sort events so that aidmaster events appear first in the stream.

required: true
show_user: true
title: Host Metadata TTL
description: The period of time that host metadata is considered valid for.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add valid time units?

target: crowdstrike
- if:
contains:
log.file.path: aidmaster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the field log.file.path going to exist for S3 input in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

required: true
show_user: true
title: Host Metadata TTL
description: The period of time that host metadata is considered valid for.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time units here as well

@efd6 efd6 requested a review from kcreddy November 16, 2023 19:48
Copy link
Contributor

@kcreddy kcreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏼

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you manually tested this using the aws-s3 input?

The fdr datastream offers both s3 and logfile inputs. Limitations in elastic-package prevent us from being able to have two system test infrastructure deployments, so only the logfile input is tested.

If we have to sacrifice the logfile testing to get aws-s3 system tests then let's do it. The aws-s3 input is what most users are running with.

- {{path}}
{{/each}}
scan:
sort: filename
order: asc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a weak ordering guarantee because the scan can start concurrent readers. You could set harvester_limit: 1 to control concurrency. I don't think the log file input is used in production by anyone so this matters very little.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I was worried about that. We can't limit though; making the limit 1 results in a deadlock.

packages/crowdstrike/data_stream/fdr/manifest.yml Outdated Show resolved Hide resolved
@efd6 efd6 force-pushed the 2816-crowdstrike branch 3 times, most recently from e5a1d0d to affaee3 Compare November 23, 2023 00:30
andrewkroh and others added 5 commits November 27, 2023 19:22
Error: error running package system tests: could not complete test run: could not add data stream config to policy: could not add package to policy; API status code = 400; response body = {"statusCode":400,"error":"Bad Request","message":"Package policy is invalid: inputs.aws-s3.streams.crowdstrike.fdr.vars.fdr_parsing_script: Invalid YAML format"}
@efd6 efd6 requested a review from andrewkroh November 28, 2023 03:43
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for getting the system testing setup for the aws-s3 input.

I left some minor suggestions related to the test config.

secret_access_key: "{{AWS_SECRET_ACCESS_KEY}}"
session_token: "{{AWS_SESSION_TOKEN}}"
queue_url: "{{TF_OUTPUT_queue_url}}"
preserve_original_event: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to know the expected number of events so can we add the

assert:
  hit_count: N

@@ -0,0 +1,9 @@
input: aws-s3
wait_for_data_timeout: 20m
Copy link
Member

@andrewkroh andrewkroh Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wait period seems quite high. Once Terraform completes its provisioning an SQS notification should be sent after about 1 minute. I think the default of 10m should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 20m value appears to be commonly used, but agreed. Will change.

@efd6 efd6 merged commit 6f5b9b7 into elastic:main Nov 28, 2023
4 checks passed
@elasticmachine
Copy link

Package crowdstrike - 1.26.0 containing this change is available at https://epr.elastic.co/search?package=crowdstrike

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Integration:crowdstrike CrowdStrike
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crowdstrike FDR hostname/username enrichment
4 participants