# Apply a Kestrel Analytics

Kestrel analytics are one type of hunt steps ([APPLY](https://kestrel.readthedocs.io/en/latest/language.html#apply)) that provide foreign language interfaces to non-Kestrel hunting modules. You can apply any external logic as a Kestrel analytics to
- compute new attributes to one or more Kestrel variables, or/and
- perform visualizations.

Note Kestrel treats analytics as black boxes and only cares about the input and output formats. So it is possible to wrap even proprietary software in a Kestrel analytics to be a hunt step.

## What you will learn

0. How to setup Kestrel analytics?
1. How to `APPLY` an enrichment analytics?
2. How to `APPLY` a visualization analytics?
3. How to pass in parameters to an analytics?
4. How to create your own analytics?
5. Exercise: group and plot

### 0. How to setup Kestrel analytics?

Kestrel can execute analytics as a hunting step via the [Python analytics interface](https://kestrel.readthedocs.io/en/latest/source/kestrel_analytics_python.interface.html), or the [Docker analytics interfce](https://kestrel.readthedocs.io/en/latest/source/kestrel_analytics_docker.interface.html). If you deploy your own Kestrel instance, you can follow the instruction at [Setup Kestrel Analytics](https://kestrel.readthedocs.io/en/latest/installation/analytics.html) to setup analytics.

Learn more about analytics in [APPLY](https://kestrel.readthedocs.io/en/latest/language.html#apply).

In this huntbook, the Python analytics interface is set up to load all analytics from the [kestrel-analytics repo](https://github.com/opencybersecurityalliance/kestrel-analytics). Add a new cell, type `APPLY python://`, then press `TAB` to list all avaliable analytics:
- [attribute-plot](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/attributeplot): plot/visualize select attributes of entities.
- [exfiltration-modeling](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/dataexfiltration): infer the likelihood of data exfiltration on input network-traffic.
- [domain-enrichment](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/domainnamelookup): whois lookup and domain information enrichment to network-traffic.
- [log4shell-deobfuscation-and-detection](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/log4shell): log4shell URL de-obfuscation and detection.
- [pin-IP-on-map](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/piniponmap): find geo-location of IP addresses and pin them on a map.
- [SANS-enrich](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/sansipenrich): threat intelligence enrichment with SANS API.
- [scikit-learn-clustering](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/sklearn-cluster): cluster entities using scikit-learn.
- [suspicious-process-scoring](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/suspiciousscoring): compute how suspicious a process is based on domain knowledge like SIGMA.

### 1. How to `APPLY` an enrichment analytics?

Use the [domain-enrichment](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/domainnamelookup) to get some network-traffic, enrich them with domain information, and display the new attributes.

In [1]:
# get all network-traffic with non-private IP addresses
conns = GET network-traffic
        FROM file:///tmp/lab101.json
        WHERE dst_ref.value NOT LIKE '10.%'
        START 2021-04-03T00:00:00Z STOP 2021-04-04T00:00:00Z 

APPLY python://domain-enrichment ON conns

# x_domain_name and x_domain_organization are new attributes added by the analytics
DISP conns ATTR src_ref.value, src_port, dst_ref.value, dst_port, x_domain_name, x_domain_organization



src_ref.value,src_port,dst_ref.value,dst_port,x_domain_name,x_domain_organization
10.184.147.141,63003,104.97.85.29,80,a104-97-85-29.deploy.static.akamaitechnologies.com,"Akamai Technologies, Inc. (AKAMAI)"
10.184.147.141,62968,23.199.63.11,80,a23-199-63-11.deploy.static.akamaitechnologies.com,"Akamai Technologies, Inc. (AKAMAI)"
10.184.147.141,123,13.86.101.172,123,,Microsoft Corporation (MSFT)
10.184.147.141,63080,104.97.85.50,80,a104-97-85-50.deploy.static.akamaitechnologies.com,"Akamai Technologies, Inc. (AKAMAI)"

VARIABLE,TYPE,#(ENTITIES),#(RECORDS),directory*,file*,ipv4-addr*,ipv6-addr*,mac-addr*,network-traffic*,process*,user-account*,x-ecs-destination*,x-ecs-network*,x-ecs-process*,x-ecs-source*,x-ecs-user*,x-oca-asset*,x-oca-event*
conns,network-traffic,4,10,221,221,452,643,221,217,221,221,221,221,221,221,221,221,221


### 2. How to `APPLY` a visualization analytics?

Use [pin-IP-on-map](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/piniponmap) to find geolocation of remote IPs in `network-traffic` and pin them on a map.

In [2]:
APPLY python://pin-IP-on-map ON conns



### 3. How to pass in parameters to an analytics?

The [attribute-plot](https://github.com/opencybersecurityalliance/kestrel-analytics/tree/release/analytics/attributeplot) analytics takes in two parameters `XPARAM` and `YPARAM` according to the README. You can supply the parameters using `WITH`:

In [3]:
conns_all = GET network-traffic
            FROM file:///tmp/lab101.json
            WHERE dst_port > 0
            START 2021-04-03T00:00:00Z STOP 2021-04-04T00:00:00Z

# conns_all are network-traffic entities without timestamps. Get records of them with timestamps.
# More info: https://kestrel.readthedocs.io/en/latest/language.html#timestamped
conns_ts = TIMESTAMPED(conns_all)
        
APPLY python://attribute-plot ON conns_ts WITH XPARAM=first_observed, YPARAM=id



VARIABLE,TYPE,#(ENTITIES),#(RECORDS),directory*,file*,ipv4-addr*,ipv6-addr*,mac-addr*,network-traffic*,process*,user-account*,x-ecs-destination*,x-ecs-network*,x-ecs-process*,x-ecs-source*,x-ecs-user*,x-oca-asset*,x-oca-event*
conns_all,network-traffic,425,504,725,725,1986,1569,725,300,725,725,1167,1167,1167,1167,1167,1167,1167
conns_ts,network-traffic,504,844,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### 4. How to write your own analytics?

Python analytics is basically a Python function that takes in one or more [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) and return updated DataFrame(s) and/or a visualization object.

Learn more at [Develop a Python Analytics](https://kestrel.readthedocs.io/en/latest/source/kestrel_analytics_python.interface.html#develop-a-python-analytics) in [Python Analytics Interface](https://kestrel.readthedocs.io/en/latest/source/kestrel_analytics_python.interface.html). Example analytics can be found at the [kestrel-analytics repo](https://github.com/opencybersecurityalliance/kestrel-analytics).

Docker analytics has an isolated runtime, and it supports arbitary logic inside. Follow the blog [Building Your Own Kestrel Analytics and Sharing With the Community](https://opencybersecurityalliance.org/posts/kestrel-custom-analytics/) to create your own Docker analytics, even wrap a proprietary detection module/system without its source code.

### 5. Exercise: group and plot

0. Use the canned data source `file:///tmp/lab101.json`.
1. Get all processes with parent process as `svchost.exe` using pattern `[process:parent_ref.name = 'svchost.exe']`.
2. Group the processes by their `binary_ref.name` and count their `pid`.
3. Use analytics `attribute-plot` to plot a histogram of number of processes (`pid`) against the name of each process.

Tip: the analytics `attribute-plot` will intelligently recognizes the input and do a histogram plot.