## MSTICPy and Notebooks in InfoSec
---

# <a style="border: solid; padding:5pt; color:black; background-color:#909090">3. Acquiring Data Using MSTICPy</a>

---

## What this session covers:
 - Setting up query providers
 - Connecting to providers
 - Querying for data
 - Offline data options

## Prerequisites
- Python >= 3.8 Environment
- Jupyter installed
- MSTICPy
- The msticpyconfig.yaml file you recently populated


### MSTICPy has a number of supported data providers
- Microsoft Sentinel
- Microsoft Defender/Defender for Endpoint
- Splunk
- Sumologic
- Microsoft Graph
- Local data
- Mordor/Security Datasets
- Kusto/Azure Data Explorer
- Azure Resource Graph

These provide way to connect to and query data from these sources in a structured and standardized way.<br>
The providers also provide a way to create, store and call templated queries simply and easily.

Ref: https://msticpy.readthedocs.io/en/latest/DataAcquisition.html

In [None]:
#Set up MSTICPy
%env MSTICPYCONFIG=./msticpyconfig.yaml
import msticpy as mp 
mp.init_notebook()

The QueryProvider handles this functionality and can be configured to work with the supported data sources.

`list_data_environments` shows us the names of the providers available to us.

In [None]:
mp.QueryProvider.list_data_environments()

You can then pass the name of the required provider to `QueryProvider`.

In [None]:
qry_prov = mp.QueryProvider("MSSentinel")

---

# <a style="border: solid; padding:5pt; color:black; background-color:#909090">Authenticating to Providers</a>

---

Once we have created our QueryProvider for the data source we want the next step is to connect the provider to the source and authenticate. <br>
In order to connect we need to tell the provider which instance to connect to, i.e. what workspace, cluster, or database.<br>

To do that we need to provide a set of connection parameters or a connection string. We can do this manually or we can store these details in our msticpyconfig file and pull them directly from there.<br>
Here we are going to connect first using a manually created connection string, and later using our config file, which is a much more manageable way of handling it.

The authentication method for the provider will depend on the type of providers, and what is supported. We don't have the breadth to cover all of the options here today but most providers have a authentication method that requires the user to log in each time, either via an interactive login, or device code login.<br>
However we can also configure most providers to use tokens already on a host, such as MSI and Azure CLI tokens. This removes the need to authenticate each time.<br>

Generally for Microsoft services the following options are supported:
 - Interactive/Device Code 
 - Azure CLI
 - MSI
 - Creds stored as Environment Variables
 - VSCode or PowerShell Credentials

Some other providers (such as Defender) use app level authentication instead. The documentation will detail what authentication options are possible for each provider.

Below we will connect with a specific connection string, and the default auth method for this provider - Device Code.

Ref: https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProviders.html

In [None]:
la_connection_string = f'loganalytics://code().tenant("72f988bf-86f1-41af-91ab-2d7cd011db47").workspace("8ecf8077-cf51-4820-aadd-14040956f35d")'
qry_prov.connect(connection_str=la_connection_string)

As we can see the above method is a bit cumbersome for every day use - having a more seemless authentication method, and storing workspace details in config is much smoother.

To use the a settings from our config instead of the connection string we can use WorkspaceConfig to collect those from file and pass them to the connection method.<br>

We are also going to use Azure CLI authentication this time, this uses any Azure CLI tokens already on the host so our first step is to authenticate to the CLI.<br>

---
**Note -**
You only need to perform the CLI authentication once per token lifetime rather than every time you connect.

In [None]:
!az login

Now when we connect to our QueryProvider we just need to tell the provider to use CLI authentication. 

---
**Note -**
The authentication methods are passed as a list, this is because you can often provide multiple options that it will use in order until it successfully authenticates.

In [None]:
qry_prov = mp.QueryProvider("MSSentinel")
qry_prov.connect(mp.WorkspaceConfig(), mp_az_auth=['cli'])

Once connected we can start running queries to get data.
We can do this with the built in queries or with our own queries.

We will start with the built in queries. We can list the available queries with `list_queries`.

Ref: https://msticpy.readthedocs.io/en/latest/DataAcquisition.html#built-in-data-queries

In [None]:
qry_prov.list_queries()

We can also use `browse` to get a clearer view of whats available

In [None]:
qry_prov.browse()

In [None]:
qry_prov.Azure.list_all_signins_geo()

In [None]:
df = qry_prov.Azure.list_all_signins_geo()
df.head()

Some queries require parameters such as a account or host name to search for results in.

In [None]:
office_activity = qry_prov.Office365.list_activity_for_account(account_name="KDickens@seccxp.ninja")
office_activity.head()

You can get a clearer view of what a built in query actually is but adding the "print" keyword when calling it.<br>
This will mean the query code is printed rather than run. The printed query will include any parameters you passed it.

In [None]:
qry_prov.Office365.list_activity_for_account("print", account_name="KDickens@seccxp.ninja")

We can also customize built in queries with by adding query items to them.

In [None]:
office_activity_filtered = qry_prov.Office365.list_activity_for_account(
    account_name="KDickens@seccxp.ninja",
    add_query_items="| where Operation != 'MailItemsAccessed'"
)
office_activity_filtered.head()

You can also add your own built in queries by specifying them in a yaml file and adding the required path to your msticpyconfig.yaml file. 

We can also use `exec_query` to run our own queries.

In [None]:
query = "OfficeActivity | where TimeGenerated > ago(7d) | where UserId =~ 'KDickens@seccxp.ninja' | summarize count() by Operation"
custom_query_df = qry_prov.exec_query(query)
custom_query_df

When writing our own queries for a Log Analytics (or Kusto) based data source we can check the schema of any table in our connected workspace with `.schema`.<br>
This will return JSON data with all the tables, their column names, and the data type of each field.

In [None]:
qry_prov.schema['W3CIISLog']

---
**Extra**

It is also possible to add your own queries to the built in queries in MSTICPy.<br>
For more details on how to create these queries see this notebook: https://github.com/ianhelle/pycon2021/blob/main/Extending-MSTICPy.ipynb<br>
In addition our documentation shows how to structure the required files and reference them in your configuration.<br>
Ref: https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProviders.html#adding-a-new-set-of-queries-and-running-them

## <a style="border: solid; padding:5pt; color:black; background-color:#309030">1st Exercise - Run a query</a>

Execute a query against the created `qry_prov`. This can be a built in query or a custom query - its up to you.

<details>
<summary>Hints...</summary>
<ul>
<li>If you add "print" as a parameter when calling a query it will print out the query rather than executing it.</li>
<li>qry_prov.browse() will show you the code need to run each query in there</li>
</ul>
</details>


---

# <a style="border: solid; padding:5pt; color:black; background-color:#909090">Kusto</a>

---

Sentinel isn't the only data provider available and we have plenty more that we can use to connect to.<br>
Kusto is a popular data source for a lot of uses.

Before we can use Kusto we need to set add some more config items to our msticpyconfig.yaml file. In this case we are going to add our Kusto cluster at `https://msticpytraining.eastus.kusto.windows.net`.

In [None]:
from msticpy.config import  MpConfigEdit

MpConfigEdit()



## <a style="border: solid; padding:5pt; color:black; background-color:#309030">2nd Exercise - Kusto</a>

1 . Connect to the Kusto cluster https://msticpytraining.eastus.kusto.windows.net/ and the msticpydata database. <br>
2. Run a query to understand the schema of the Syslog table and get some data


<details>
<summary>Hints...</summary>
<ul>
<li>The Kusto data provider is called simply "Kusto". </li>
<li>The`cluster` and `database` parameters are key here if not using instance.</li>
<li>https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProv-Kusto.html has the details you need</li>
<li>`Syslog | getschema` returns the schema of the Syslog table.</li>
</ul>
</details>


---

# <a style="border: solid; padding:5pt; color:black; background-color:#909090">Microsoft Defender</a>

---
Some data providers have different connection options, for example the Microsoft Defender for Endpoint and Microsoft 365 Defender APIs require a client application to handle authentication.<br>
You can pass in these application details when connecting but if we are using an application secret its better to keep these in KeyVault and reference them in our config file.

In [None]:
import msticpy as mp 

mp.init_notebook()

You can store multiple instances in your config file. To select what instance to connect to use the `instance` keyword.<br>
In this example we will connect to our pre-configured Training instance.

Ref: https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProv-MSDefender.html#connecting-to-m365-defender

In [None]:
defender_prov = mp.QueryProvider("M365D")
defender_prov.connect(instance="Training")

In [None]:
defender_prov.browse()

We can also execute our own queries in the same format as with the other providers.

In [None]:
defender_prov.exec_query("DeviceInfo | take 10")

## <a style="border: solid; padding:5pt; color:black; background-color:#309030">3rd Exercise - Defender Investigation</a>

1. Find the remote IP address associated MDE connections to the URL 'davlenwindows.com' on 10/14/2022
2. Find all the hosts that have connected to that URL address since 10/01/2022
3. Get the file hash of the initiating process for these connections on 10/14/2022 and get all the files names associated with this hash on that day


<details>
<summary>Hints...</summary>
<ul>
<li>You can do this with built in queries or your own queries</li>
<li>The Query Browser is your friend `qry_prov.browse()`</li>
<li>Don't forget you can use add_query_items to add to the built in queries to customize the returned data.</li>
</ul>
</details>

---

# <a style="border: solid; padding:5pt; color:black; background-color:#909090">Azure Resource Graph</a>

---


The Azure Resource graph provides a way to get details about Azure Resources using KQL, this is something that is really useful to adding context during an investigation.<br>
Below we are going to load our Resource Graph provider and connect using the Azure CLI tokens that we generated earlier.

In [None]:
res_qry_prov = mp.QueryProvider("ResourceGraph")
res_qry_prov.connect(auth_methods=["cli"])

As with the other providers we can use in built queries or write our own custom queries. Hopefully by now you are familiar with this model and concept.


## <a style="border: solid; padding:5pt; color:black; background-color:#309030">4th Exercise - Azure Resource Graph</a>

 1 . Find out how many KeyVaults that you have access to. <br>
 2. What resources exist in the msticpy resource group.<br>
 3. Find the Key Vault that is detailed in your msticpyconfig.yaml file<br>


<details>
<summary>Hints...</summary>
<ul>
<li>All data in the Resource Graph is in the Resources table</li>
<li>https://learn.microsoft.com/en-us/azure/governance/resource-graph/samples/starter?tabs=azure-cli gives you some query examples</li>
<li>`Resources | where type =~ 'microsoft.keyvault/vaults' will show you all Keyvaults</li>
<li>You will need to use .exec_query here</li>
</ul>
</details>



## <a style="border: solid; padding:5pt; color:black; background-color:#309030">Bonus Exercise - Azure Resource Graph</a>

CDOC received a report that the VM MSTIC-DSVM has been compromised. You need to answer the following questions:
1. Is this a real host?
2. Is it currently in use?
3. What IPs is it associated with?
4. Is it a production host?
5. What other resources might have been compromised?
6. Are there any users we can contact about this host?
