## MSTICPy and Notebooks in InfoSec

---

<h1 style="border: solid; padding:5pt; color:black; background-color:#909090">Session 7 - Visualization</h1>

---

## What this session covers:

- Event timelines
  - Basics
  - Grouping
  - Hover columns
  - Variants - timeline_values, timeline_duration
- Process Tree
- Graphs


## Prerequisites
- Python >= 3.8 Environment
- Jupyter installed
- MSTICPy installed
- Run az login

## Recommended
- VS Code


---

# <a style="border: solid; padding:5pt; color:black; background-color:#909090">Notebook Setup</a>

---

In [None]:
%env MSTICPYCONFIG=./msticpyconfig.yaml
import msticpy as mp
mp.init_notebook()

---

# <a style="border: solid; padding:5pt; color:black; background-color:#909090">MSTICPy Timeline</a>

---

### - Use Bokeh plots (Python/Javascript visualization library)
### - Will work with any data that has a time stamp
### - Grouping by property
### - Hover/tooltips
### - Invoked from `pandas` accessor
#### [Reference: Timeline Documentation](https://msticpy.readthedocs.io/en/latest/visualization/EventTimeline.html)

## Note: Read data from an excel sheet

- You need to `pip install openpyxl`
- `df = pd.read_excel('/path/to/file.xlsx')`

#### You may need to play around with things like date formats since Excel
- Sometimes gives you dates as string
- Doesn't support timezone-aware dates

> We're going to use `procs_df = qry_local.WindowsSecurity.list_host_processes_mde()`

In [None]:
procs_df = pd.read_excel("./data/host_procs.xlsx")
procs_df.head()

In [None]:
qry_local = mp.QueryProvider("LocalData")
procs_df = qry_local.M365D.list_host_processes_mde()

In [None]:
procs_df.mp_plot.timeline(time_column="CreatedProcessCreationTime")

## Navigating Bokeh plots

### Toolbar

![Bokeh toolbar](./media/bokeh_toolbar.png)

![Bokeh RangeTool](./media/Bokeh_rangecontrol.png)

In [None]:
help(procs_df.mp_plot.timeline)

## Grouping timeline events



In [None]:
procs_df.mp_plot.timeline(
    time_column="CreatedProcessCreationTime",
    group_by="CreatedProcessAccountName",
    legend="none"
)

## Adding hover (tooltip) columns

## <a style="border: solid; padding:5pt; color:black; background-color:#309030">Task 1 - Add tooltip columnsName</a>

Add some informative columns to the the hover/tooltip box

1. Choose the columns from the list of available columns
2. Use Python `help(procs_df.mp_plot.timeline)` to find the correct parameter name to specify the list of columns
3. Extend the previous plot to add the columns


<details>
<summary>Hints...</summary>
<ul>
<li>Use the cell below to identify columns in the source dataframe.</li>
<li>Use the <b>source_columns</b> to specify a list of columns.</li>
<li>Final command should look something like this
<pre>
procs_df.mp_plot.timeline(
    time_column="CreatedProcessCreationTime",
    group_by="CreatedProcessAccountName",
    legend="none",
    source_columns=["CreatedProcessName", "CreatedProcessCommandLine"]
)
</pre>
</li>
</ul>
</details>


In [None]:
procs_df.filter(regex="CreatedProcess.*").columns

In [None]:
procs_df.mp_plot.timeline(
    time_column="CreatedProcessCreationTime",
    group_by="CreatedProcessAccountName",
    legend="none",
    #...
)

---

# <a style="border: solid; padding:5pt; color:black; background-color:#909090">Timeline variants</a>

---

## Timeline duration

Highlight the start and end of activity

In [None]:
procs_df.mp_plot.timeline_duration(
    time_column="CreatedProcessCreationTime",
    group_by="CreatedProcessAccountName",
    source_columns=["CreatedProcessName", "CreatedProcessCommandLine"]
)

## Timeline values - plot scalar values


In [None]:
help(procs_df.mp_plot.timeline_values)

### Breakdown of the following command


| `qry_local` | `.Network `    | `.list_azure_network_flows_by_ip()` | `.mp_plot`  | `.timeline_values(` |
|-------------|----------------|-------------------------------------|-------------|---------------------|
| provider    | query_category | query (returns DF)                  | pd accessor | plot function       |

In [None]:
qry_local.Network.list_azure_network_flows_by_ip().mp_plot.timeline_values(
    time_column="FlowStartTime",
    value_column="TotalAllowedFlows",
    group_by="L7Protocol",
    kind=["circle", "vbar"],
    source_columns=["AllExtIPs"]
)


---

# <a style="border: solid; padding:5pt; color:black; background-color:#909090">Process Tree</a>

---

### [Reference: Process Tree](https://msticpy.readthedocs.io/en/latest/visualization/ProcessTree.html)

In [None]:
procs_df.mp_plot.process_tree()

## <a style="border: solid; padding:5pt; color:black; background-color:#309030">Task 2 - Process Tree</a>

Plot process tree with legend highlighting process name.

1. Extend the previous plot command to color by process name
2. Optionally, hide the legend box 

Use help(procs_df.mp_plot.process_tree) to see function help
<details>
<summary>Hints...</summary>
<ul>

<li>Use <b>legend_col={col_name}</b>parameter</li>
<li>Use <b>hide_legend=True</b>parameter</li>
<li>Command should look like this
<pre>
procs_df.mp_plot.process_tree(
    legend_col="CreatedProcessName",
    hide_legend=True
)
</pre>
</li>
</ul>
</details>


In [None]:
procs_df.mp_plot.process_tree(
    # ...
)

## Mini appendix
### Process tree utilities to investigation parts of the tree

In [None]:
# build a process tree DF
proc_tree = procs_df.mp.build_process_tree()

In [None]:
from msticpy.transform.process_tree_utils import get_children, get_ancestors, get_siblings, get_roots, get_descendents

# return root processes
get_roots(proc_tree)

In [None]:
get_children(proc_tree, source="outlook.exe|10576|2021-06-22 00:42:37.789900")

In [None]:
get_descendents(proc_tree, source="outlook.exe|10576|2021-06-22 00:42:37.789900")

In [None]:
proc_tree[proc_tree.index.str.startswith("powershell.exe")].head()

In [None]:
get_descendents(proc_tree, source="outlook.exe|10576|2021-06-22 00:42:37.789900").mp_plot.process_tree()

---

# <a style="border: solid; padding:5pt; color:black; background-color:#909090">Creating and plotting graphs</a>

---

### [Reference: Graphs/Networks](https://msticpy.readthedocs.io/en/latest/visualization/NetworkGraph.html)

In [None]:
help(procs_df.mp.to_graph)

In [None]:
procs_df.mp_plot.network(
    source_col="CreatedProcessAccountName",
    target_col="CreatedProcessName"
)


## Adding node attributes

In [None]:
procs_df.mp_plot.network(
    source_col="CreatedProcessParentName",
    target_col="CreatedProcessName",
    target_attrs=["CreatedProcessAccountName"],
    # source_attrs=[...],
    # edge_attr=[...],
)

## <a style="border: solid; padding:5pt; color:black; background-color:#309030">Task 3 - Plot a graph of processes spawned by cmd.exe</a>

Filter the input DataFrame and replot.

1. Filter the input data frame to only child processes of `cmd.exe`
2. Plot a graph
3. Add "CreatedProcessCommandLine", "CreatedProcessCreationTime" as source node attributes

<details>
<summary>Hints...</summary>
<ul>
<li>Use pandas filtering to get only processes whose parent is cmd.exe
<pre>
procs_df[procs_df["CreatedProcessParentName"].str.contains("cmd.exe")]
</pre>
</li>
<li>Plot the filtered result using the mp_plot.network accessor</li>
<li>Add "CreatedProcessCommandLine", "CreatedProcessCreationTime" to the "target_attrs" parameter list</li>
<li>Solution should look something like this
<pre>
procs_df[procs_df["CreatedProcessParentName"].str.contains("cmd.exe")].mp_plot.network(
    source_col="CreatedProcessParentName",
    target_col="CreatedProcessName",
    target_attrs=["CreatedProcessAccountName", "CreatedProcessCommandLine"]
)
</pre>
</ul>
</details>
 


In [None]:
procs_cmd_df = # filtered procs_df

procs_cmd_df.mp_plot.network(
    # params
)

### Can output NetworkX Graph for graph analysis, export to other display tools, etc.

In [None]:
nxgraph = procs_df.mp.to_graph(
    source_col="CreatedProcessParentName",
    target_col="CreatedProcessName",
    target_attrs=["CreatedProcessAccountName", "CreatedProcessCommandLine"]
)
nxgraph

In [None]:
import networkx as nx
nx.draw(nxgraph)

---

# <a style="border: solid; padding:5pt; color:black; background-color:#909090">Appendix - Other visualizations</a>

---

## Matrix plots

### `df.mp_plot.matrix(...)`

In [None]:
procs_df.mp_plot.matrix(
    x="CreatedProcessParentName",
    y="CreatedProcessName",
    height=1600,
)

## Plot inverse - fewer interactions == larger circle

In [None]:
procs_df.mp_plot.matrix(
    x="CreatedProcessParentName",
    y="CreatedProcessName",
    height=1600,
    invert=True
)

In [None]:
procs_df[~procs_df["CreatedProcessAccountName"].isin(["LOCAL SERVICE", "SYSTEM", "NETWORK SERVICE"])].mp_plot.matrix(
    x="CreatedProcessAccountName",
    y="CreatedProcessName",
    height=800,
    invert=True,
    title="Processes executed by user (rarity)"
)

## Folium Map

### `df.mp_plot.folium_map(...)`

In [None]:

ioc_df = pd.read_csv("./data/cobalt_strike_c2_otx.csv")
ioc_ip_df = ioc_df[ioc_df["Indicator type"] == "IPv4"]

ioc_ip_df.mp_plot.folium_map(ip_column="Indicator")