## Threat Hunting is More Than Data Retrieval

Until now, we are mostly doing *data retrieval*. Of course, we as threat hunters provide more values than blindly data retrieval:
- patterns to match
- directions to investigate (child process, network traffic of process, etc.)
- suspicious entity identification (grouped by Kestrel variables)

However, reading the data retrieved and thinking what data to retrieve is not enough. We also need:
- data enrichment from other sources like threat intelligence
- visualizations to help us digest data
- pre-programmed detection logic (in white-box or black-box)

Let's `APPLY` any logic (not coded in Kestrel) as a hunt step to perform Turing-complete analysis on a Kestrel variable.

It is easy to wrap a white-box or black-box logic into a Kestrel analytics. Some examples in the [kestrel-analytics repo](https://github.com/opencybersecurityalliance/kestrel-analytics):
- [attribute-plot](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/attributeplot): plot/visualize select attributes of entities.
- [exfiltration-modeling](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/dataexfiltration): infer the likelihood of data exfiltration on input network-traffic.
- [domain-enrichment](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/domainnamelookup): whois lookup and domain information enrichment to network-traffic.
- [log4shell-deobfuscation-and-detection](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/log4shell): log4shell URL de-obfuscation and detection.
- [pin-IP-on-map](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/piniponmap): find geo-location of IP addresses and pin them on a map.
- [SANS-enrich](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/sansipenrich): threat intelligence enrichment with SANS API.
- [scikit-learn-clustering](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/sklearn-cluster): cluster entities using scikit-learn.
- [suspicious-process-scoring](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/suspiciousscoring): compute how suspicious a process is based on domain knowledge like SIGMA.
- [xfe-enrich](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/xfeipenrich): Threat Intelligence enrichment using X-Force Exchange.

In [1]:
# let's get network traffic from suspicious (possible C2) process before we apply analytics

splunkd = GET process FROM stixshifter://bh22-linux-192.168.56.91
          WHERE [process:name = 'splunkd']
          START t'2022-07-01T00:00:00Z' STOP t'2022-08-01T00:00:00Z'
          
splunkd_activities = FIND process CREATED BY splunkd

traffic = FIND network-traffic CREATED BY splunkd_activities

DISP traffic ATTR src_ref.value, src_port, dst_ref.value, dst_port



src_ref.value,src_port,dst_ref.value,dst_port
172.17.0.2,38550,192.168.56.2,3128
172.17.0.2,49726,192.168.56.150,8888
172.17.0.2,38492,23.216.74.12,443
172.17.0.2,38508,104.86.237.27,443
172.17.0.2,38522,104.17.123.99,443
172.17.0.2,38524,140.82.113.4,443
172.17.0.2,38538,65.8.66.113,443

VARIABLE,TYPE,#(ENTITIES),#(RECORDS),directory*,file*,ipv4-addr*,network-traffic*,process*,user-account*,x-ecs-destination*,x-ecs-file*,x-ecs-network*,x-ecs-process*,x-ecs-source*,x-ecs-user*,x-oca-asset*,x-oca-event*
splunkd,process,1796,1796,2507,2625,2134,379,5108,1400,762,46,762,3452,762,3452,3452,3452
splunkd_activities,process,1624,1624,3366,3602,2700,388,8136,1948,782,70,782,4880,782,4880,4880,4880
traffic,network-traffic,7,9,3366,3602,2700,381,9760,1948,782,70,782,4880,782,4880,4880,4880


In [2]:
# what are the attributes of `traffic` before we apply the analytics
INFO traffic

# https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/domainnamelookup
APPLY python://domain-enrichment ON traffic

# what are the attributes of `traffic` after we apply the analytics
INFO traffic



0,1
Entity Type,network-traffic
Number of Entities,7
Number of Records,9
Entity Attributes,"dst_port, dst_byte_count, dst_packets, src_port, src_byte_count, src_packets, protocols, id"
Indirect Attributes,[]
Customized Attributes,
Birth Command,find
Associated Datasource,stixshifter://bh22-linux-192.168.56.91
Dependent Variables,splunkd_activities

0,1
Entity Type,network-traffic
Number of Entities,7
Number of Records,9
Entity Attributes,"dst_port, dst_byte_count, dst_packets, src_port, src_byte_count, src_packets, protocols, id"
Indirect Attributes,[]
Customized Attributes,"x_domain_name, x_domain_organization"
Birth Command,find
Associated Datasource,stixshifter://bh22-linux-192.168.56.91
Dependent Variables,splunkd_activities


In [3]:
DISP traffic ATTR src_ref.value, src_port, dst_ref.value, dst_port, x_domain_name, x_domain_organization



src_ref.value,src_port,dst_ref.value,dst_port,x_domain_name,x_domain_organization
172.17.0.2,38550,192.168.56.2,3128,,
172.17.0.2,49726,192.168.56.150,8888,,
172.17.0.2,38492,23.216.74.12,443,a23-216-74-12.deploy.static.akamaitechnologies.com,"Akamai Technologies, Inc. (AKAMAI)"
172.17.0.2,38508,104.86.237.27,443,a104-86-237-27.deploy.static.akamaitechnologies.com,"Akamai Technologies, Inc. (AKAMAI)"
172.17.0.2,38522,104.17.123.99,443,,"Cloudflare, Inc. (CLOUD14)"
172.17.0.2,38524,140.82.113.4,443,lb-140-82-113-4-iad.github.com,"GitHub, Inc. (GITHU)"
172.17.0.2,38538,65.8.66.113,443,server-65-8-66-113.yvr50.r.cloudfront.net,"Amazon.com, Inc. (AMAZO-4)"


In [5]:
# https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/piniponmap
APPLY python://pin-IP-on-map ON traffic



## Beaconing Behavior of C2 Traffic

We said we will check the statistics of the C2 traffic at the end of the second huntbook, and we may find some clues of beaconing behavior. Let's try:

In [6]:
c2_proc = splunkd WHERE command_line LIKE '%192.168.56.150%'

c2_traffic = FIND network-traffic CREATED BY c2_proc

c2_traffic = TIMESTAMPED(c2_traffic)

DISP c2_traffic ATTR first_observed, src_ref.value, src_port, dst_ref.value, dst_port



first_observed,src_ref.value,src_port,dst_ref.value,dst_port
2022-07-27T14:38:30.442905703Z,224.0.0.251,5353,192.168.56.91,5353
2022-07-27T14:38:30.443064336Z,172.17.0.2,5353,172.17.0.1,5353
2022-07-27T14:38:31.443896894Z,172.17.0.2,5353,172.17.0.1,5353
2022-07-27T14:38:31.445104677Z,172.17.0.2,49738,192.168.56.150,8888
2022-07-27T14:38:31.446651846Z,172.17.0.2,49738,192.168.56.150,8888
2022-07-27T14:39:31.517374587Z,172.17.0.2,49738,192.168.56.150,8888
2022-07-27T14:39:31.521548431Z,172.17.0.2,49738,192.168.56.150,8888
2022-07-27T14:40:01.028527Z,172.17.0.2,49738,192.168.56.150,8888
2022-07-27T14:40:07.573279244Z,172.17.0.2,49738,192.168.56.150,8888
2022-07-27T14:40:31.004563Z,172.17.0.2,49738,192.168.56.150,8888

VARIABLE,TYPE,#(ENTITIES),#(RECORDS),directory*,file*,ipv4-addr*,network-traffic*,process*,user-account*,x-ecs-destination*,x-ecs-file*,x-ecs-network*,x-ecs-process*,x-ecs-source*,x-ecs-user*,x-oca-asset*,x-oca-event*
c2_proc,process,1330,1330,0,0,0,0,0,0,0,0,0,0,0,0,0,0
c2_traffic,network-traffic,184,24736,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [9]:
# let's visualize the data to make the traffic pattern explicit to human eye
# of course we can plug in a beaconing analytics here to let machine detect as well

# https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/attributeplot
APPLY python://attribute-plot ON c2_traffic WITH XPARAM=first_observed, YPARAM=id



## Analytics != End of Hunt-Flow

Kestrel analytics is just a hunt step, which can be followed by any other hunt step to compose a large huntflow.

In [12]:
# e.g.: new attributes gained by an analytics could help construct filters to derive new Kestrel variables

github_traffic = traffic WHERE x_domain_organization LIKE '%GitHub%'

DISP traffic ATTR src_ref.value, src_port, dst_ref.value, dst_port, x_domain_name, x_domain_organization
DISP github_traffic ATTR src_ref.value, src_port, dst_ref.value, dst_port, x_domain_name, x_domain_organization



src_ref.value,src_port,dst_ref.value,dst_port,x_domain_name,x_domain_organization
172.17.0.2,38550,192.168.56.2,3128,,
172.17.0.2,49726,192.168.56.150,8888,,
172.17.0.2,38492,23.216.74.12,443,a23-216-74-12.deploy.static.akamaitechnologies.com,"Akamai Technologies, Inc. (AKAMAI)"
172.17.0.2,38508,104.86.237.27,443,a104-86-237-27.deploy.static.akamaitechnologies.com,"Akamai Technologies, Inc. (AKAMAI)"
172.17.0.2,38522,104.17.123.99,443,,"Cloudflare, Inc. (CLOUD14)"
172.17.0.2,38524,140.82.113.4,443,lb-140-82-113-4-iad.github.com,"GitHub, Inc. (GITHU)"
172.17.0.2,38538,65.8.66.113,443,server-65-8-66-113.yvr50.r.cloudfront.net,"Amazon.com, Inc. (AMAZO-4)"

src_ref.value,src_port,dst_ref.value,dst_port,x_domain_name,x_domain_organization
172.17.0.2,38524,140.82.113.4,443,lb-140-82-113-4-iad.github.com,"GitHub, Inc. (GITHU)"

VARIABLE,TYPE,#(ENTITIES),#(RECORDS),directory*,file*,ipv4-addr*,network-traffic*,process*,user-account*,x-ecs-destination*,x-ecs-file*,x-ecs-network*,x-ecs-process*,x-ecs-source*,x-ecs-user*,x-oca-asset*,x-oca-event*
github_traffic,network-traffic,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


## Apply Analytics Executed Remotely

Invoke a Kestrel analytics via:
- [Kestrel Python analytics interface](https://kestrel.readthedocs.io/en/stable/source/kestrel_analytics_python.interface.html) (used in this demo)
- [Kestrel Docker analytics interface](https://kestrel.readthedocs.io/en/stable/source/kestrel_analytics_docker.interface.html)
- Kestrel AWS lambda analytics interface (planned)
- Kestrel msticpy analytics interface (planned)

<img src="https://kestrel.readthedocs.io/en/stable/_images/interfaces.png" alt="Kestrel interfaces" width="800" align="left"/>