# Tutorial 2: Exploratory Data Analysis

## Objectives

- Use [Taegis Magic](https://github.com/secureworks/taegis-magic) to query relevant security data for a specific threat
- Leverage [`pandas`](https://pandas.pydata.org) DataFrames to analyze query results and find evidence of a threat
- Document key findings in markdown text
- Create a Taegis investigation from the notebook

![hunting-single-tenant](images/hunting-single-tenant.png)

## Description

This tutorial shows how to use Taegis Magic for exploratory data analysis (EDA) and to create Taegis investigations from Jupyter notebooks.
Taegis investigations are used to organize key findings and relevant evidence - such as Taegis events, alerts, assets, and search queries - during the discovery and resolution of a security incident. 

## Step 1: Import Dependencies

> Before we begin, please ensure that the notebook kernel is set to `taegis-hunting-tutorials`.

This tutorial relies heavily on Taegis Magic to interact with Taegis from Jupyter notebooks.
[Taegis Magic](https://github.com/secureworks/taegis-magic) is a Jupyter notebook and command-line interface to interact with the Secureworks Taegis™ security platform. Taegis Magic aims to improve security operations and threat hunting workflows through deep integration between Taegis, Jupyter notebooks, and [`pandas` DataFrames](https://pandas.pydata.org/docs/getting_started/index.html).
For a general overview of Taegis Magic and the Taegis SDK for Python, please see the [Taegis SDK for Python](https://github.com/secureworks/taegis-sdk-python/tree/main/docs) and [Taegis Magic](https://github.com/secureworks/taegis-magic/tree/main/docs) documentation.

Taegis Magic is implemented as an [IPython Magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html) extension, which requires a special syntax to import into the running notebook:

In [1]:
%load_ext taegis_magic
#%taegis configure logging defaults sdk_warning --status false

You can run Taegis Magic commands using the `%taegis` and `%%taegis` syntax for [line and cell magics](https://github.com/secureworks/taegis-magic/tree/main/docs/jupyter#ipython-magics) respectively.

You can use the `--help`/`-h` flag to explore the subcommands and options available in Taegis Magic like other CLI tools. Please see the [Taegis Magic docs](https://github.com/secureworks/taegis-magic/tree/main/docs) for more details on magic commands.

## Step 2: Query Taegis for Evidence

We need to query Taegis to hunt for evidence of a given threat.
In these tutorials, we will hunt for threats that intentionally disable some security software on Windows endpoints.
We will query Taegis for medium-severity (or lower) alerts related to Windows service tampering which may have gone unactioned.
Then we will also query Taegis events for suspicious endpoint process telemetry that is potential evidence of these activities.

First, let's define the scope of our exploratory data analysis in the form of a Taegis tenant ID and environment.
It is convenient to define these values as variables and reference them in subsequent commands using the `$VARIABLE` string expansion syntax:

In [2]:
# You should change these values to a Taegis tenant ID 
# and environment that you are authorized to access.

TAEGIS_TENANT_ID = '145483'
TAEGIS_ENVIRONMENT = 'foxtrot'

### Alerts Query

We will use the `%%taegis alerts search` command to query for medium and lower severity alerts, which potentially went unactioned.
Here is a short explanation of this magic command:

- The `%%taegis` cell magic syntax indicates that the command will read from the entire cell contents. The first line of the cell is parsed as the arguments, while the remainder of the cell is parsed as the query string.
- We pass in the aforementioned `--tenant` and `--region` arguments to set the scope of the query
- We include the `--track` flag, which tells Taegis Magic to remember this query for later inclusion in our investigation.
- And lastly, we pass in the `--assign` argument to assign the query results to a variable named `alerts_df`.

Query results from Taegis Magic commands are returned as `pandas` DataFrames.

> If the Taegis SDK for Python is not already authenticated, it will automatically prompt the user to complete an authentication flow.

In [3]:
%%taegis alerts search --tenant $TAEGIS_TENANT_ID --region $TAEGIS_ENVIRONMENT --track --assign alerts_df

FROM alert
WHERE metadata.severity <= 0.6
EARLIEST='2023-09-12T12:47:00'
LATEST='2023-09-12T13:05:00'

**Taegis Search Results**

ID: *5b28d368-694e-41db-a9c6-82885c4f6890*



|Region          |Tenant             |Service          |Status          |Num. Total                          |Num. Returned                          |Link                   |
|----------------|-------------------|-----------------|----------------|------------------------------------|---------------------------------------|:----------------------|
|foxtrot|145483|alerts|OK|75|75|https://foxtrot.taegis.secureworks.com/share/5f4c5f85-94e7-4d11-9d9d-97be4641c38d|


Taegis Magic will display markdown tables showing the results from a command.
In this case, the markdown table shows that the query completed successfully and had 75 results.
It also included a "share link", which is a smart hyperlink to this query inside of the Taegis web interface.

Since we passed in the argument `--assign alerts_df`, the variable `alerts_df` references the query results as a `pd.DataFrame`.
Taegis Magic attempts to flattened nested JSON fields to make it easier to reference the columns.

Here is an overview of the DataFrame returned by the previous Taegis Magic command:

In [4]:
alerts_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 75 entries, 0 to 74
Data columns (total 60 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   id                                       75 non-null     object 
 1   group_key                                73 non-null     object 
 2   attack_technique_ids                     75 non-null     object 
 3   tenant_id                                75 non-null     object 
 4   parent_tenant_id                         75 non-null     object 
 5   suppressed                               75 non-null     bool   
 6   resolution_reason                        75 non-null     object 
 7   tags                                     75 non-null     object 
 8   sensor_types                             75 non-null     object 
 9   visibility                               75 non-null     object 
 10  suppression_rules                        23 non-null

Taegis alerts and events are laden with security-relevant information across their schematized fields.

We can slice and dice the results using the capabilities afforded by `pandas` to find rows of interest.

In [5]:
alerts_df.groupby(["status", "metadata.severity", "metadata.title"])["id"].nunique()

status                 metadata.severity  metadata.title                                                                                                   
ResolutionStatus.OPEN  0.00               RESEARCH: Active Directory Enumeration with Powershell ADSI Searcher (script block)                                   3
                                          RESEARCH: Cleartext Password Storage Enabled (32 bit registry)                                                        1
                                          RESEARCH: Discovering Network Information of Localhost                                                                2
                                          RESEARCH: Malicious Interaction with Volume Shadow Copy Backups                                                       1
                                          RESEARCH: PowerShell Activity Involves Scheduled Task                                                                34
                                  

Since we are interested in threats that tamper with security software running as Windows services, lets filter the DataFrame to alerts that reference Windows Defender AV or the Windows native `sc.exe` service control executable.

In [6]:
possible_service_manipulation = alerts_df[alerts_df["metadata.title"].str.contains("Windows Defender|sc.exe")]
possible_service_manipulation[["id", "tenant_id", "metadata.severity", "metadata.title"]]

Unnamed: 0,id,tenant_id,metadata.severity,metadata.title
58,alert://priv:event-filter:145483:1694523157283...,145483,0.5,Windows Defender Service Deleted
60,alert://priv:event-filter:145483:1694523109278...,145483,0.0,RESEARCH: Service Deleted Manually using sc.exe
62,alert://priv:event-filter:145483:1694523109287...,145483,0.5,Windows Defender Bypass - Disable Security Not...


#### Stage Alerts Evidence

If these alerts are suitably suspicious and we wanted to add them to a (future) Taegis investigation, we can use the `%taegis investigations evidence stage alerts` subcommand to read the `possible_service_manipulation` DataFrame and "stage" these alerts for inclusion when we create the investigation at the end of the notebook:

In [7]:
%taegis investigations evidence stage alerts possible_service_manipulation


**Investigation ID**: NEW

| Action | Evidence Type | Staged Before Change | Staged After Change | Difference |
| ------ | ------------- | -------------------- | ------------------- | ---------- |
| stage | InvestigationEvidenceType.Alert | 0 | 3 | 3 |


### Events Queries

Now lets query Taegis for process events related to the execution of `sc.exe`.
Notice that the magic command is `%%taegis events search` rather than `%%taegis alerts search`.

In [8]:
%%taegis events search --tenant $TAEGIS_TENANT_ID --region $TAEGIS_ENVIRONMENT --track --assign sc_processes

FROM process
WHERE image_path contains 'sc.exe'
EARLIEST='2023-09-12T12:47:00'
LATEST='2023-09-12T13:05:00'

**Taegis Search Results**

ID: *52a4a04c-2397-48ae-ba82-f2f7e88c059b*



|Region          |Tenant             |Service          |Status          |Num. Total                          |Num. Returned                          |Link                   |
|----------------|-------------------|-----------------|----------------|------------------------------------|---------------------------------------|:----------------------|
|foxtrot|145483|events|SUCCEEDED|N/A|172|https://foxtrot.taegis.secureworks.com/share/4782e5f2-f716-466f-bb37-35bb1ea5d956|


Unsurprisingly, there are many more events than alerts.
We can groupby image path and commandline fields to look for evidence of `sc stop` and `sc delete` activity.

In [9]:
sc_stop_or_delete = sc_processes[sc_processes.commandline.str.contains("stop|delete")]
sc_stop_or_delete.groupby(["parent_image_path", "image_path", "commandline"])["resource_id"].size()

parent_image_path                                 image_path                                       commandline        
\Device\HarddiskVolume2\Windows\System32\cmd.exe  \Device\HarddiskVolume2\Windows\System32\sc.exe  sc delete SDRSV        12
                                                                                                   sc delete Sense        12
                                                                                                   sc delete WerSvc       12
                                                                                                   sc delete WinDefend    12
                                                                                                   sc delete mpssvc       12
                                                                                                   sc delete wscsvc       12
                                                                                                   sc delete wuauserv     12
      

### Stage Events Evidence

Once again, we can filter down to the rows of interest and stage them for inclusion in the investigation that we will create at the end of the tutorial.
Let's stage processes related to tampering with the `WinDefend` service:

In [10]:
manipulating_windefend = sc_stop_or_delete[sc_stop_or_delete.commandline.str.contains("WinDefend")]
manipulating_windefend[["resource_id",  "hostname", "image_path", "commandline"]].head()

Unnamed: 0,resource_id,hostname,image_path,commandline
144,event://priv:scwx.process:145483:1694527486000...,Win10,\Device\HarddiskVolume2\Windows\System32\sc.exe,sc stop WinDefend
145,event://priv:scwx.process:145483:1694523544000...,Win10,\Device\HarddiskVolume2\Windows\System32\sc.exe,sc stop WinDefend
146,event://priv:scwx.process:145483:1694523155000...,Win10,\Device\HarddiskVolume2\Windows\System32\sc.exe,sc stop WinDefend
147,event://priv:scwx.process:145483:1694525162000...,Win10,\Device\HarddiskVolume2\Windows\System32\sc.exe,sc stop WinDefend
148,event://priv:scwx.process:145483:1694526292000...,Win10,\Device\HarddiskVolume2\Windows\System32\sc.exe,sc stop WinDefend


In [11]:
%taegis investigations evidence stage events manipulating_windefend


**Investigation ID**: NEW

| Action | Evidence Type | Staged Before Change | Staged After Change | Difference |
| ------ | ------------- | -------------------- | ------------------- | ---------- |
| stage | InvestigationEvidenceType.Event | 0 | 24 | 24 |


## Step 3: Document Key Findings

The _key findings_ section of a Taegis investigation contains a human-readable summary of the investigation in markdown text.

We can use the built-in `%%writefile` cell magic to write cell content to a file named `key-findings.md` in the current working directory.

> In this tutorial, we will write a short summary of our key findings for demonstration purposes.
> In the next tutorial, we will show how to use the markdown content of the notebook as the key findings of an investigation.

In [12]:
%%writefile key-findings.md

# Executive Summary

We hunted for evidence of unactioned and/or undetected manipulation of the Windows Defender service.

# Findings

We found evidence of unauthorized stop and delete commands for the `WinDefend` service.
This may be evidence of a malicious actor or software.

Writing key-findings.md


## Step 4: Create Investigation

Once analysis has concluded, we can create a new Taegis investigation that contains our key findings and is linked to the alert and event evidence that we staged in the previous steps.

Once again, we will use Taegis Magic to interact with the Taegis APIs.
The `%taegis investigations create` command allows us to create the investigation.
Since this is a "threat hunt", we want to make sure that we pass in `--type THREAT_HUNT` so that it is categorized correctly.

> Note the options available using the `--help`/`-h` flags.

In [13]:
%taegis investigations create -h

usage: taegis_magic_parser [-h] [--assign NAME | --append NAME]
                           [--display NAME] [--cache]

options:
  -h, --help      show this help message and exit
  --assign NAME   Assign results as pandas DataFrame to NAME
  --append NAME   Append results as pandas DataFrame to NAME
  --display NAME  Display NAME as markdown table
  --cache         Save output to cache / Load output from cache (if present)





> Since the `taegis investigations create` command takes many arguments, it is easier to read if we break it across multiple lines.
But do note that the `$VARIABLE` string expansion only works on the "first line." So we need to ensure that we use escape characters (`\`) so that it is interpreted as a single line.

In [14]:
%taegis investigations search-queries stage
%taegis investigations create \
--title "Tutorial 02: Exploratory Data Analysis" \
--key-findings key-findings.md \
--priority MEDIUM \
--type THREAT_HUNT \
--status OPEN \
--assignee-id @customer \
--region $TAEGIS_ENVIRONMENT \
--tenant $TAEGIS_TENANT_ID

id,tenant_id,query,results_returned,total_results,inserted_time
52a4a04c-2397-48ae-ba82-f2f7e88c059b,145483,\nFROM process\nWHERE image_path contains 'sc.exe'\nEARLIEST='2023-09-12T12:47:00'\nLATEST='2023-09-12T13:05:00'\n,172,-1,2023-09-13T16:00:22Z
5b28d368-694e-41db-a9c6-82885c4f6890,145483,\nFROM alert\nWHERE metadata.severity <= 0.6\nEARLIEST='2023-09-12T12:47:00'\nLATEST='2023-09-12T13:05:00'\n,75,75,2023-09-13T16:00:06Z



| Investigation ID  | Short ID                | Title                | Type                | Share Link           |
| ----------------- | ----------------------- | -------------------- | ------------------- | -------------------- |
| f41b68da-cc7e-48cf-9a70-31fe640f7944 | INV00004 | Tutorial 02: Exploratory Data Analysis | InvestigationType.THREAT_HUNT | https://foxtrot.taegis.secureworks.com/share/cfde1680-2773-4791-90ee-610d926a929c |


Now navigate to the Taegis tenant and environment that you specified earlier in the tutorial.
You should now see a new investigation titled `Tutorial 02: Exploratory Data Analysis` that contains the relevant queries, evidence, and key findings!

## Wrap-Up

In this tutorial, we did the following:

- Imported the Taegis Magic package
- Used [Taegis Magic](https://github.com/secureworks/taegis-magic) to query relevant security data for a specific threat
- Leveraged [`pandas`](https://pandas.pydata.org) to analyze query results and find evidence of a threat
- Populated the key findings section of a Taegis investigation using markdown text
- Created a threat hunting Taegis investigation with the key findings and linked to relevant alert and event evidence

In the next tutorial, we will use these concepts and techniques to formalize a threat hunting procedure that can be repeated consistently and at scale.