![image](images/MSTIC.png)

# MSTICPy - Threat hunting toolkit for Jupyter Notebooks
### Ian Hellen, Principal Dev, in Microsoft Threat Intelligence Center (MSTIC)
### @ianhellen (twitter), ianhelle@microsoft.com


---

<h1 style="border: 1px solid;color: White; background-color: DarkSlateGray; padding: 10px">Part 1 - Overview of MSTICPy</h1>

- **Data & Queries** - Getting data into the notebook
- **A tale from a SOC (Security Operations Center)**<br>
  How MSTICPy might be used to help you analyze and visualize information<br>
  that you need to determine whether you're looking at benign or malicious activity.
  
  - Network anomalies
  - Enriching data - find out more about an IP Address
  - Investigate potentially compromised host
    - Summary/Logons/Processes and unusual logon sessions
    
---

<h2 style="color: White; background-color: DarkSlateBlue; padding: 10px">MSTICPy Recommended notebooks</h2>

<a href="https://aka.ms/msticpy-pycon2021" style="font-size: 20px">https://aka.ms/msticpy-pycon2021</a>

<a href="https://aka.ms/msticpy" style="font-size: 20px">https://aka.ms/msticpy</a>

<a href="https://github.com/ianhelle/pycon2021/blob/main/Msticpy-General.ipynb" style="font-size: 20px">This notebook https://github.com/ianhelle/pycon2021/blob/main/Msticpy-General.ipynb</a>

<a href="https://github.com/ianhelle/pycon2021/blob/main/Extending-MSTICPy.ipynb" style="font-size: 20px">Notebook - Extending MSTICPy</a>

<a href="https://nbviewer.jupyter.org/github/Azure/Azure-Sentinel-Notebooks/blob/1b15c7ab98b5aaa5659b431b4f0506927eb1b630/A%20Tour%20of%20Cybersec%20notebook%20features.ipynb" style="font-size: 20px">Notebook - Quick tour of MSTICPy</a>

---

<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">Getting started - We need to intialize a few things</h3>


In [None]:
# pip install msticpy
# pip install msticnb

# Core MSTICPy initialization for Notebooks
from msticpy.nbtools import nbinit
nbinit.init_notebook(namespace=globals())


---

<h1 style="border: 1px solid;color: White; background-color: DarkSlateGray; padding: 10px">Queries and Data</h1>

---

## Getting to data may be a bit dull but it's the foundation of security hunting:
> “Without big data, you are blind and deaf and in the middle of a freeway.”
*Geoffrey Moore*

> “In God we trust, all others bring data.”
*W Edwards Deming*


<h2 style="color: White; background-color: DarkSlateBlue; padding: 10px">MSTICPy Data Providers</h2>

![image](images/DataLayer.png)

https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProviders.html

- Usually come with pre-defined queries
- Azure Sentinel queries are most developed

### Importance of pre-built queries to help analysts!

In [None]:
from msticpy.data import QueryProvider
import pandas as pd

# Load query providers (typically you'll be using just one)
qry_prov_az = QueryProvider("AzureSentinel")
qry_prov_sp = QueryProvider("Splunk")
qry_prov_mde = QueryProvider("MDE")
# Special provider that uses local data files
qry_prov_loc = QueryProvider("LocalData", data_paths=["./data"], query_paths=["./data"])

In [None]:
qry_prov_az.browse_queries()

In [None]:
qry_prov_mde.browse_queries()

In [None]:
qry_prov_loc.Network.get_network_summary()

<h3 style="color: White; background-color: DarkOrange; padding: 5px">Most data providers need authentication</h3>


```python<
qry_prov.connect(connect_params....)
```

In [None]:
qry_prov_az.connect(WorkspaceConfig(workspace="CyberSecuritySoc"))

<h3 style="color: White; background-color: DarkOrange; padding: 5px">Most queries have mandatory parameters</h3>

In [None]:
qry_prov_loc.WindowsSecurity.list_host_processes()

In [None]:
qry_prov_loc.WindowsSecurity.list_host_processes(host_name="MSTICAlertsWin1").head(3)

<h3 style="color: White; background-color: DarkOrange; padding: 5px">Most queries require time parameters!</h3>

### ....datetime strings are a **pain** to type in and keep track of

### Fortunately there's an easier way to specify time parameters 

- use the built-in `query_time` property of the query provider or
- create an instance of `nbwidgets.QueryTime` class and pass as a parameter

In [None]:
qry_prov_az.query_time


#### This is a live query against Azure Sentinel

In [None]:
qry_prov_az.WindowsSecurity.list_host_processes(host_name="VictimPC").head(3)

<br>
<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">Extending built-in queries using the <i>add_query_items</i> parameter</h3>

In [None]:
qry_prov_az.WindowsSecurity.list_host_processes(
    host_name="VictimPC2",
    add_query_items="| where NewProcessName contains 'powershell'"
).head()

<br>
<h3 style="color: White; background-color: DarkSlateGray; padding: 5px"><i>exec_query</i> is a pass-through for native queries</h3>

In [None]:
result = qry_prov_az.exec_query("SecurityEvent | take 1000 | summarize count() by Computer")
result.head(5)

<h3 style="color: White; background-color: DarkOrange; padding: 5px"><i>pandas DataFrames</i> are the lingua-franca of MSTICPy</h3>

In [None]:
type(result)

<h2 style="color: White; background-color: DarkSlateBlue; padding: 10px">What if you don't have a suitable query provider? or even a data source?</h2>

## You can still use most of MSTICPy functionality if you can get your data in a DataFrame

- You can import from CSV, JSON and others to a DF
- Many SDKs (like pyspark) work directly in or can convert to a DataFrame
- Mordor ([https://github.com/OTRF/mordor](https://github.com/OTRF/mordor)) is a great place to get sample attack data

> <b>PS - if you have a data source that you want to use and code in Python
> it's easy to write a data provider.<br>
> Contribute the code as a MSTICPy data provider!!! We're happy to help</b>

In [None]:
df = pd.read_csv("data/ian_procs.csv", parse_dates=["TimeGenerated"])
df.head(3)

In [None]:
df.mp_timeline.plot(group_by="NewProcessName")

---

<h1 style="border: 1px solid;color: White; background-color: DarkSlateGray; padding: 10px">A tale from the SOC (Security Operations Center)</h1>

This section shows how some of the MSTICPy functionality might be used in
a Security Operations Center context.

We'll look at:
- Finding a signal - an alert or anomaly
- Getting some contextual information on the actors/entities in that signal
- Investigating the target(s) of the attack

<h2 style="color: White; background-color: DarkSlateBlue; padding: 10px">Evidence of an attack? Anomalous network traffic</h2>

![image](images/Analysis.png)

<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">Time Series Decomposition</h3>

- STL - Seasonal Trend Decomposition using Loess
- Loess (aka LOWESS) - Locally Weighted Scatterplot Smoothing

In [None]:
from msticpy.nbtools.timeseries import display_timeseries_anomolies
from msticpy.analysis.timeseries import ts_anomalies_stl

For unsummarized data use pandas to summarize by hour
```python
data = qry_prov_loc.Network.get_network_flows()
data = (
    data[["TimeGenerated", "TotalBytesSent"]]
    .set_index("TimeGenerated")
    .groupby(pd.Grouper(freq="1H"))
    .sum()
)
```

or summarize in your query
```python
qry_prov.exec_query("""
    FirewallLogs 
    | where TimeGenerated > ago(28d) 
    | summarize sum(TotalBytesSent) by bin(TimeGenerated, 1h)
"""
)
```

In [None]:
# Get the data
net_df = qry_prov_loc.Network.get_network_summary()
net_df = net_df.set_index("TimeGenerated")
net_df.head(3)


In [None]:
# Conduct our timeseries analysis
ts_analysis = ts_anomalies_stl(net_df)

# Visualize the timeseries and any anomalies
display_timeseries_anomolies(data=ts_analysis, y= 'TotalBytesSent');

<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">Zero in on the anomaly period</h3>

#### Use the anomaly period to filter our underlying data

In [None]:
from msticpy.analysis.timeseries import find_anomaly_periods
anom_times = find_anomaly_periods(ts_analysis)
anom_times

In [None]:
qry_prov_loc.Network.get_network_flows().head()

In [None]:
# Use the anomaly period to provide the "start" and "end" params of our query
ts_df = qry_prov_loc.Network.get_network_flows(anom_times[0])

# Summarize the traffic counts grouped by Source/Dest IP Address
noisy_ips = ts_df.groupby(["SourceIP", "DestinationIP"]).agg({"TotalBytesSent": "sum"})

# Plot this data and pull out the top offender
noisy_ips.plot.barh(figsize=(8, 6))
display(noisy_ips.sort_values("TotalBytesSent", ascending=False).head(1))

---

<h2 style="color: White; background-color: DarkSlateBlue; padding: 10px">Enrichment Functions - getting to know your subject</h2>


![image](images/enrichment.png)

### Enrich with what? how?

We want to find more information about this IP Address

<h3 style="color: White; background-color: DarkOrange; padding: 5px">Introducing Entities</h3>

#### Entities are simple classes that represent "real-world" Cyber objects like accounts and IP addresses


In [None]:
# Import and initialize dynamic pivot functions - more later
from msticpy.datamodel.pivot import Pivot
from msticpy.datamodel import entities


md(f"Some Entities:", "bold, large")
md(f"{', '.join(dir(entities)[:25])} ...", "large")

entities.IpAddress(Address="38.75.137.9")

<h3 style="color: White; background-color: DarkOrange; padding: 5px">Also introducing Pivot functions</h3>

![image](images/Interface.png)

Pivot functions are methods of entities that provide:
- data queries related to an entity
- enrichment functions relevant to that entity

Pivot functions are dynamically attached to entities. We created this
framework to make it easier to find which functions you can use for which entity type.

### Motivation
- We had built a lot of functionality in MSTICPy for querying and enrichment
- A lot of the functions had inconsistent type/parameter signatures
- There was no easy discovery mechanism for these functions - you had to know
- Using entities as pivot points is a "natural" investigation pattern

In [None]:
pivot = Pivot(namespace=globals())

pivot.browse()

### Pivot functions are flexible - can take input as strings, lists or dataframes

In [None]:
IpAddress = entities.IpAddress

IpAddress.whois("38.75.137.9")

#### Find whois info for all of the IPs in the data set

In [None]:
invest_ips = noisy_ips.reset_index()
IpAddress.whois(invest_ips, column="DestinationIP")

In [None]:
IpAddress.whois(invest_ips, column="SourceIP", join="left")

#### also geolocation data

In [None]:
from msticpy.datamodel.entities import IpAddress

destip = IpAddress(Address="38.75.137.9")

destip
destip.geoloc()
destip.Location = entities.GeoLocation(destip.geoloc().iloc[0])
display(destip)

In [None]:
# Create a map
folium = FoliumMap(zoom_start=10)

# Get all of the IPs from our data set and make them into GeoLocation objects
dest_ip_locs = IpAddress.geoloc(data=invest_ips, column="DestinationIP").apply(entities.GeoLocation, axis=1)

# Add our suspect IP and center around that
folium.add_ip_cluster([destip], color="red")
folium.center_map()
# Add the rest of the IPs
folium.add_geoloc_cluster(dest_ip_locs.values, color="blue")
folium.add_ip_cluster([destip], color="red")

folium

<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">Applying some (threat) intelligence to the problem</h3>

### What are Threat Intelligence providers?

#### Threat Intel definition (Courtesy Crowdstrike)
> Threat intelligence is data that is collected, processed, and analyzed to understand a threat<br>
> actor’s motives, targets, and attack behaviors. Threat intelligence enables us to make faster,<br>
> more informed, data-backed security decisions and change their behavior from reactive to<br>
> proactive in the fight against threat actors.

A number of companies make Threat Intel (TI) data available via public or paid APIs:
- VirusTotal
- AlienVault OTX
- IBM XForce
- and many others

In [None]:
ti_results = destip.tilookup_ipv4()
TILookup.browse_results(ti_results)

<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">Side note: we have an app for this last section - see Notebooklets below</h3>

```python
ip_result = IpAddress.nblt.ip_address_summary(destip, timespan=timerange)
```

---

<h2 style="color: White; background-color: DarkSlateBlue; padding: 10px">Let's look at what happened on the host...the other end of the communication</h2>

<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">Introducing "notebooklets" - macros for CyberSec</h3>

![image](images/Interface.png)

We built notebooklets because life is too short keep writing (copy/pasting) the same code over and over again.

Notebooklets package multiple notebook cells for common investigation routines into simple functions

<h3 style="color: White; background-color: DarkOrange; padding: 5px">Compare lines of code in the following section to output!</h3>

In [None]:
# Import and initialize MSTIC Notebooklets - companion package
# more later
import msticnb as nb
nb.init(query_provider=qry_prov_az)
# qry_prov_az.connect(WorkspaceConfig(workspace="CyberSecuritySoc"))

nb.browse()

In [None]:
host_time = nbwidgets.QueryTime(timespan=anom_times[0])
host_time

In [None]:
host_summary = nb.nblts.azsent.host.HostSummary()

host_summary_rslt = host_summary.run(value="victimpc", timespan=host_time, options=["-bookmarks", "-azure_api"])

In [None]:
host_summary_rslt.browse_alerts()

<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">Who logged on to this host?</h3> 

In [None]:
host_logons = nb.nblts.azsent.host.HostLogonsSummary()
host_logons_rslt = host_logons.run(value="victimpc", timespan=host_time)

In [None]:
%kql SigninLogs | summarize count() by UserPrincipalName | order by count_ desc | limit 20

In [None]:
account_summary = nb.nblts.azsent.account.AccountSummary()
acct_result = account_summary.run(value="RonHD", timespan=host_time, options=["-get_bookmarks"])

In [None]:
account_summary = nb.nblts.azsent.account.AccountSummary()
acct_result = account_summary.run(value="seb", timespan=host_time, options=["-get_bookmarks"])

In [None]:
acct_result.get_additional_data()

In [None]:
acct_result.az_activity_timeline_by_operation()

<br>
<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">It's also a good idea to look at processes...</h3>

#### ...but which ones?

> Disclaimer - this data is captive (local) data from another host. Just to highlight some things

In [None]:
procs_df = entities.Host.LocalData.host_processes(host_name="MSTICAlertsWin1")
display(procs_df.head())
md(f"Total number of events: {len(procs_df)}", "bold")


In [None]:
logon_session = nb.nblts.azsent.host.LogonSessionsRarity()
logon_rslt = logon_session.run(data=procs_df)

In [None]:
logon_rslt.process_tree(account="MSTICAlertsWin1\ian")

In [None]:
enc_cmds = logon_rslt.processes_with_cluster.query("Account.str.contains('ian') and NewProcessName.str.contains('powershell')")

display(enc_cmds[["NewProcessName", "CommandLine"]])


In [None]:
for row in entities.Process.util.b64decode(enc_cmds, column="CommandLine").itertuples():
    print(row.Index, row.decoded_string)


---

<h2 style="color: White; background-color: DarkSlateBlue; padding: 10px">End of part 1 - The SOC story.</h2>

<h3 style="color: White; background-color: DarkSlateGray;padding: 10px">We identified a network traffic anomaly</h3>

<ul style="font-size: large">
<li>We found the IP addresses responsible</li>
<li>We found contextual info such as geo location and subnet ownership</li>
<li>We used Threat Intelligence to confirm that this IP was malicious</li>
</ul>


<h3 style="color: White; background-color: DarkSlateGray; padding: 10px">Looking at the host</h3>
<ul style="font-size: large">
<li>We retrieved some overview data about our host</li>
<li>Retrieved a lot of info about the logon patterns on that host</li>
<li>Were able to identify the suspicious session and view the process tree</li>
<li>We found some encoded powershell commands linking back to the attacker</li>
</ul>

---

<h3 style="color: White; background-color: DarkSlateGray; padding: 10px">Note - this won't be the end of the story for the Analyst...</h3>


<h3 style="color: White; background-color: DarkSlateGray; padding: 10px">...nor for us. On to part #2</h3>