Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data Exfiltration Detection] Update and add anomaly detection jobs and security rules #6577

Merged
merged 25 commits into from Jun 26, 2023

Conversation

sodhikirti07
Copy link
Contributor

@sodhikirti07 sodhikirti07 commented Jun 14, 2023

What does this PR do?

  • Adds filter to map source IP and destination IP to only contain internal and external addresses respectively.
  • Changes the configurations of current anomaly detection jobs.
  • Changed the description of detectors in ml module
  • Adds two anomaly detection jobs and rules to identify exfiltration to external devices (USB, Airdrop)
  • Updates the lookback time of the detection rules i.e. 2*bucket_span of ml job.
  • Updates README, changelog, manifest and dashboard.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

Find them in the comment section

@sodhikirti07 sodhikirti07 requested review from a team as code owners June 14, 2023 14:11
@sodhikirti07
Copy link
Contributor Author

sodhikirti07 commented Jun 14, 2023

Screenshots

  • Overview page

image

  • ML jobs installed and running
image
  • Security rules installed and enabled
image
  • Updated Dashboard
image

@elasticmachine
Copy link

elasticmachine commented Jun 14, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-06-26T18:26:40.827+0000

  • Duration: 16 min 36 sec

Test stats 🧪

Test Results
Failed 0
Passed 1
Skipped 0
Total 1

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@sodhikirti07
Copy link
Contributor Author

I am a little skeptical to add host.name in the partitionfield field for the jobs that are looking for exfiltration to external devices, will it generate lot of false-positives for the first time hosts? Should I keep it more generic like for the whole population?

Cc: @ajosh0504 @susan-shu-c

@elasticmachine
Copy link

💚 Build Succeeded

History

@susan-shu-c
Copy link
Member

I feel that I'd avoid creating a lot of false positives for new hosts 🤔 re: host.name

2nd question, in the description's second point, Changes the configurations of current anomaly detection jobs., what are the main changes?

Lastly, is it possible to explain how the Airdrop was tested?

@sodhikirti07
Copy link
Contributor Author

2nd question, in the description's second point, Changes the configurations of current anomaly detection jobs., what are the main changes?

@susan-shu-c
I updated the filter query to include internal source and external destination IPs. The anomaly detection jobs are now partitioned by host.name , earlier we were doing population analysis. Lastly, deleted some redundant jobs like anomalies looking at city names, continent names and added them as influencers instead (for info gain). See the current jobs here

Lastly, is it possible to explain how the Airdrop was tested?

Tested by setting up a macOS VM using Parallels desktop and also confirmed with Ricardo Ungureanu.

Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks like some great updates here. Left a couple of comments on the detector descriptions and the influencers for one of the ML jobs.

"analysis_config": {
"bucket_span": "3h",
"detectors": [
{
"detector_description": "high_sum(\"source.bytes\") over \"destination.geo.city_name\"",
"detector_description": "high_sum(\"source.bytes\") over \"destination.geo.country_iso_code\"",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the detectors in the jobs are using the default descriptions which are not always the most user-friendly. These are used in various places in the UI - for example in the ML anomalies table. Any thoughts on going with something which is closer to the form of text used in the job descriptions? For example High bytes sent to an unusual country_iso_code? I know we often go with the defaults, but just wondering if there is something we could do here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I didn't try to touch them as we always followed the default for these jobs. I'm down trying simple text as a descriptor, but IDK, If it's okay to deviate one package's structure from the rest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree with Pete. We've switched to what he's suggesting for our pre-built ML rules. I'd say let's change this package to begin with, and also update others in later versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the description of detectors

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new detector descriptions look good!

Copy link
Contributor

@ajosh0504 ajosh0504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the new detections! Made some minor comments.

packages/ded/docs/README.md Outdated Show resolved Hide resolved
packages/ded/docs/README.md Outdated Show resolved Hide resolved
packages/ded/docs/README.md Outdated Show resolved Hide resolved
"analysis_config": {
"bucket_span": "3h",
"detectors": [
{
"detector_description": "high_sum(\"source.bytes\") over \"destination.geo.city_name\"",
"detector_description": "high_sum(\"source.bytes\") over \"destination.geo.country_iso_code\"",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree with Pete. We've switched to what he's suggesting for our pre-built ML rules. I'd say let's change this package to begin with, and also update others in later versions.

packages/ded/kibana/ml_module/ded-ml.json Show resolved Hide resolved
packages/ded/kibana/ml_module/ded-ml.json Outdated Show resolved Hide resolved
sodhikirti07 and others added 5 commits June 21, 2023 17:19
Co-authored-by: Apoorva Joshi <30438249+ajosh0504@users.noreply.github.com>
…681e3138.json

Co-authored-by: Apoorva Joshi <30438249+ajosh0504@users.noreply.github.com>
…270e8817.json

Co-authored-by: Apoorva Joshi <30438249+ajosh0504@users.noreply.github.com>
Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes for the ML jobs LGTM

@sodhikirti07
Copy link
Contributor Author

Made the following changes based on my discussion with Dian:

  • Changed the ml-module query to work with either file or network events.
  • Changed the anomaly detection jobs to their original state, i.e., removed partition by host.name. The host.name field is not compatible with all the data sources for network logs. Identifiers such as source.ip are the most applicable here but creates a hard limit on the model due to their high cardinalities.

Haven't changed the partition by field in the exfiltration to external device jobs. Let me know the consensus of folks in using user.name or host.name as the partitioning field.

cc: @ajosh0504 @susan-shu-c @dainperkins

@susan-shu-c
Copy link
Member

susan-shu-c commented Jun 26, 2023

To document our discussion:

Seems that especially for the device anomalies e.g. USB, airdrop, host-based is more useful. And user is added as an influencer.

[Edit] I originally deferred to Dain for now on using user, though agree that there are places it's not so clear-cut. For example, if a user doesn't usually send out files to another location, but then does - that's an anomaly

}
},
{
"bool":{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this bool? Also do you have to check for existence of source bytes and destination fields? Are they not present in all network events?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all network events has source.bytes and destination

@sodhikirti07
Copy link
Contributor Author

Based on my discussion with @susan-shu-c and @ajosh0504, I am keeping the current partition_by configurations in place i.e., partitioning the jobs by host.name.

@sodhikirti07 sodhikirti07 merged commit ed42936 into main Jun 26, 2023
4 checks passed
@elasticmachine
Copy link

Package ded - 1.0.3 containing this change is available at https://epr.elastic.co/search?package=ded

@dainperkins
Copy link

dainperkins commented Jun 26, 2023

2 things -

  1. USB exfil - from a security and behavior modeling standpoint, in any environment with shared workstations this will work much better if partitioned by user, as hosts will invariably be less aligned to a specific behavior profile, and it seemed like most file events will have a user involved

  2. if the partitioning by host.name for network exfil is used the query needs to only pull data with host.name (which I think is just packetbeat) otherwise the model is going to be somewhat useless for any network data that isn't provided by packetbeat (e.g. netflow, vpc flow, firewwall logs, sensors, etc.)

Either the partitioning needs to be removed or set to source.ip, or the detection needs to be specific to data that includes host.name. Otherwise the partitioning will be effectively based on packet beat data by host.name, and everything else (including non packetbeat data for the same hosts if any other network data is being ingested.)

from an enterprise standpoint the idea of deploying packetbeat / network capture widely, instead of sensors or netflow, is resource prohibitive, and nothing I would ever want to implement in the real world.

cc @ajosh0504 @susan-shu-c @sodhikirti07

@dainperkins
Copy link

Also if Packetbeat is used as a sensor (e.g. spanning a network tap) every doc will have the same host.name (the host name of the device receiving the network span)

@sodhikirti07
Copy link
Contributor Author

Either the partitioning needs to be removed or set to source.ip, or the detection needs to be specific to data that includes host.name.

@dainperkins the partitionings on network exfil jobs are already removed.

USB exfil - from a security and behavior modeling standpoint, in any environment with shared workstations this will work much better if partitioned by user, as hosts will invariably be less aligned to a specific behavior profile, and it seemed like most file events will have a user involved

From an ML standpoint, we think it's better to use host.name as a partitioning field for USB based jobs. Also, the influencers section adds the user related info to the alerts and anomalies.

@dainperkins
Copy link

sorry - thought the following was in regards to all the jobs:

Based on my discussion with @susan-shu-c and @ajosh0504, I am keeping the current partition_by configurations in place i.e., partitioning the jobs by host.name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants