# Example Notebook for RAG (Retrieval-Augmented Generation) Agent Usage

### Query the RAG agent using the cell magic `%%ask` command

In [16]:
# %load_ext msticpy.aiagents.mp_docs_rag_magic
# Or use:
%reload_ext msticpy.aiagents.mp_docs_rag_magic

In [2]:
%%ask 
What are the three things that I need to connect to Microsoft Sentinel Query Provider?

2024-07-30 15:48:19,414 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - [32mUse the existing collection `MSTICpy_Docs_2.12.0`.[0m
2024-07-30 15:48:27,518 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Found 384 chunks.[0m



**Question**: What are the three things that I need to connect to Microsoft Sentinel Query Provider?


**Answer**: To connect to the Microsoft Sentinel Query Provider, you need the following three things:

1. A `QueryProvider` instance.
2. The data environment string ("MSSentinel" for Microsoft Sentinel).
3. A connection string or authentication parameters.

Sources: C:\Users\t-egarcia\Documents\Forked MSTICpy Repo\msticpy\docs\source\data_acquisition\DataProviders.rst

In [3]:
%%ask
How do I connect to the M365 Defender query provider?


**Question**: How do I connect to the M365 Defender query provider?


**Answer**: To connect to the M365 Defender query provider, you need to follow these steps:

1. Ensure your connection details are specified in the `msticpyconfig.yaml` file.

2. Create a `QueryProvider` instance for M365 Defender.

3. Call the `connect()` method on the instance.

Here's an example:

```python
from msticpy.data import QueryProvider

# Create a QueryProvider instance
mdatp_prov = QueryProvider("M365D")

# Connect to the M365 Defender instance using the configured details
mdatp_prov.connect()
```

If you have multiple instances configured, specify the instance name when calling `connect()`:

```python
mdatp_prov.connect(instance="Tenant2")
```

If you prefer to pass connection parameters directly, use keyword arguments:

```python
# Collect credentials
ten_id = input('Tenant ID')
client_id = input('Client ID')
client_secret = input('Client Secret')

# Create a QueryProvider instance
mdatp_prov = QueryProvider('M365D')

# Connect using collected credentials
mdatp_prov.connect(tenant_id=ten_id, client_id=client_id, client_secret=client_secret)
```

Alternatively, you can use a connection string:

```python
# Define a connection string
conn_str = (
    "tenant_id='243bb6be-4136-4b64-9055-fb661594199a'; "
    "client_id='a5b24e23-a96a-4472-b729-9e5310c83e20'; "
    "client_secret='[PLACEHOLDER]'"
)

# Create a QueryProvider instance
mdatp_prov = QueryProvider('M365D')

# Connect using the connection string
mdatp_prov.connect(conn_str)
```

Sources: C:\Users\t-egarcia\Documents\Forked MSTICpy Repo\msticpy\docs\source\data_acquisition\DataProv-MSDefender.rst

In [4]:
%%ask
What do I need to add to my msticpyconfig.yaml config for the Azure Resource Graph query provider?


**Question**: What do I need to add to my msticpyconfig.yaml config for the Azure Resource Graph query provider?


**Answer**: To add Azure Resource Graph to your `msticpyconfig.yaml` configuration, include the following under the `Azure` section:

```yaml
Azure:
  auth_methods:
  - cli
  - interactive
  cloud: global
```

For more information on configuring `msticpyconfig.yaml`, refer to the MSTICPy documentation.

Sources: C:\Users\t-egarcia\Documents\Forked MSTICpy Repo\msticpy\docs\source\data_acquisition\ResourceGraphDriver.rst

#### A response of `UPDATE_CONTEXT` indicates that the agents are unable to answer the query with the information retrieved by the RAG agent.

In [5]:
%%ask
Does the Splunk query provider support device code authentication?


**Question**: Does the Splunk query provider support device code authentication?


**Answer**: UPDATE CONTEXT

In [6]:
%%ask 
How can I plot IP addresses in this dataframe on a map?


**Question**: How can I plot IP addresses in this dataframe on a map?


**Answer**: To plot IP addresses in a DataFrame on a map using MSTICpy's FoliumMap, you can use the `mp_plot.folium_map` pandas accessor. Here's an example:

```python
# Plotting IP addresses using the mp_plot.folium_map accessor
geo_loc_df.mp_plot.folium_map(ip_column="IPAddress")
```

This will display an interactive map with markers based on the IP addresses in the "IPAddress" column of your DataFrame.

Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\visualization\\FoliumMap.rst

In [7]:
%%ask 
How do I create a new custom data provider with msticpy?


**Question**: How do I create a new custom data provider with msticpy?


**Answer**: To create a new custom data provider with MSTICpy, follow these main steps:

1. **Write the driver class:** Derive it from `DriverBase` and implement the methods `__init__`, `connect`, `query`, and optionally `query_with_results`.
2. **Customize the driver (optional):** Expose attributes via `QueryProvider`, and implement custom parameter formatting and query parameter substitution if needed.
3. **Register the driver:** Update the `DataEnvironment` enum and add an entry to the driver dynamic load table.
4. **Add queries:** Create a folder named after your `DataEnvironment` and add your query files there.
5. **Add settings definition:** Define settings in a YAML configuration file.
6. **Create documentation:** Document the configuration and use of the data provider.
7. **Create unit tests:** Add unit tests using mocks to simulate service responses.

For detailed guidance on these steps, refer to the provided MSTICpy documentation related to data providers.

Sources: WritingDataProviders.rst, PluginFramework.rst, ExtendingMsticpy.rst

In [9]:
%%ask 
How do I list which TI providers are currently enabled?


**Question**: How do I list which TI providers are currently enabled?


**Answer**: ### Step 1: Intent
The user's intent is to get help with **question answering**.

### Step 2: Answer
To list which Threat Intelligence (TI) providers are currently enabled in MSTICpy, you can inspect the configuration typically found in the `msticpyconfig.yaml` file under the `TIProviders` section. This configuration file determines which providers are set up and whether they are marked as primary/secondary.

Sources: `C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\extending\\WritingTIAndContextProviders.rst`

In [10]:
%%ask 
How do I lookup threat intelligence for multiple IP addresses at once?


**Question**: How do I lookup threat intelligence for multiple IP addresses at once?


**Answer**: Step 1: User's intent is to generate code for performing threat intelligence lookups for multiple IP addresses at once.

Step 2:
```python
from msticpy.context.ip_utils import ip_whois

# List of IP addresses to lookup
ip_list = ["123.1.2.3", "124.5.6.7"]

# Performing Whois lookup for multiple IP addresses
whois_data = ip_whois(ip_list)
print(whois_data)
```

Sources: C:\Users\t-egarcia\Documents\Forked MSTICpy Repo\msticpy\docs\source\data_acquisition\IPWhois.rst

In [11]:
%%ask 
How do I use pivot functions?


**Question**: How do I use pivot functions?


**Answer**: To use pivot functions in MSTICpy, you have two primary options: creating persistent pivot function definitions in YAML files or adding ad hoc pivot functions directly in code. Here's a brief overview of both methods:

**1. Persistent Pivot Function Definitions**

- Define your pivot function properties in a YAML file with a top-level element `pivot_providers`.
- Example YAML definition:

```yaml
pivot_providers:
  who_is:
    src_module: msticpy.context.ip_utils
    src_func_name: get_whois_df
    func_new_name: whois
    input_type: dataframe
    entity_map:
      IpAddress: Address
    func_df_param_name: data
    func_df_col_param_name: ip_column
    func_out_column_name: query
    func_static_params:
      all_columns: True
      show_progress: False
    func_input_value_arg: ip_address
```

- Load and register the definition using:

```python
from msticpy.init.pivot_core.pivot import Pivot
Pivot.register_pivot_providers(pivot_reg_path=path_to_your_yaml, namespace=globals(), def_container="my_container", force_container=True)
```

**2. Ad Hoc Pivot Functions in Code**

- Add a function as a pivot using the `add_pivot_function` method:

```python
def my_func(input: str):
    return input.upper()

Pivot.add_pivot_function(
    func=my_func,
    container="change_case",
    input_type="value",
    entity_map={"Host": "HostName"},
    func_input_value_arg="input",
    func_new_name="upper_name",
)
```

- Alternatively, use the `PivotRegistration` class:

```python
from msticpy.init.pivot_core.pivot_register import PivotRegistration

def my_func(input: str):
    return input.upper()

piv_reg = PivotRegistration(
    input_type="value",
    entity_map={"Host": "HostName"},
    func_input_value_arg="input",
    func_new_name="upper_name"
)
Pivot.add_pivot_function(my_func, piv_reg, container="change_case")
```

**Running Pivots in DataFrame Pipelines:**

```python
(
    my_df
    .query("UserCount > 1")
    .mp_pivot.run(IpAddress.util.whois, column="Ioc")
    .drop_duplicates()
)
```

- Use `mp_pivot.run` to integrate pivot functions into DataFrame processing pipelines.
- Join input and output DataFrames with the `join` parameter in `mp_pivot.run`.

**Debugging Tools:**

- `mp_pivot.display` for intermediate results.
- `mp_pivot.tee` for creating snapshots.
- `mp_pivot.tee_exec` for executing intermediate operations (e.g., plotting).

Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\extending\\PivotFunctions.rst, C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\data_analysis\\PivotFunctions.rst, C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\api\\msticpy.init.pivot.rst

In [12]:
%%ask 
Which columns do I need in a dataframe to plot process trees?


**Question**: Which columns do I need in a dataframe to plot process trees?


**Answer**: To plot process trees, the required columns in a DataFrame are typically:

1. `ParentProcessName`
2. `Process`

Additional attributes such as `SubjectUserName`, `SubjectDomainName`, `SubjectLogonId`, `NewProcessName`, `CommandLine`, and `TimeGenerated` can be used for more detailed visualization and analysis.

Sources: C:\Users\t-egarcia\Documents\Forked MSTICpy Repo\msticpy\docs\source\visualization\NetworkGraph.rst

In [13]:
%%ask 
What kind of visualizations does msticpy support?


**Question**: What kind of visualizations does msticpy support?


**Answer**: MSTICPy supports various visualizations including interactive timelines, process trees, multi-dimensional Morph Charts, data viewers, matrix plots, network plots, and several others listed under the `msticpy.vis` package.

Sources: C:\Users\t-egarcia\Documents\Forked MSTICpy Repo\msticpy\docs\source\index.rst, C:\Users\t-egarcia\Documents\Forked MSTICpy Repo\msticpy\docs\source\visualization\MorphCharts.rst

In [14]:
%%ask 
How do I add a new query for Microsoft 365 Defender to msticpy?


**Question**: How do I add a new query for Microsoft 365 Defender to msticpy?


**Answer**: To add a new query for Microsoft 365 Defender (M365D) to MSTICPy, you should use the `QueryProvider` class. Here's a step-by-step guide on how to achieve it:

1. **Initialize the `QueryProvider` for M365D**:
   ```python
   from msticpy.data import QueryProvider

   mdatp_prov = QueryProvider("M365D")
   ```

2. **Connect to the M365 Defender API**:
   ```python
   mdatp_prov.connect()
   ```

3. **Add your new query**:
   You can add new queries to the query store of `QueryProvider`. Here’s an example of how to define and add a new query:
   ```python
   new_query = """
   DeviceEvents
   | where ActionType == "FileCreated"
   | limit 10
   """
   mdatp_prov.add_query("GetRecentFileCreatedEvents", new_query)
   ```

4. **Run the newly added query**:
   ```python
   results = mdatp_prov.exec_query("GetRecentFileCreatedEvents")
   print(results)
   ```

In summary, you need to instantiate a `QueryProvider` object for M365D, connect to the API, add the new query, and then execute the query.

Sources: C:\Users\t-egarcia\Documents\Forked MSTICpy Repo\msticpy\docs\source\data_acquisition\DataProv-MSDefender.rst

In [15]:
%%ask
Which msticpy module contains the code related to visualizing network graphs?


**Question**: Which msticpy module contains the code related to visualizing network graphs?


**Answer**: The MSTICpy module that contains the code related to visualizing network graphs is `msticpy.vis.network_plot`.

Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\api\\msticpy.vis.network_plot.rst