In [2]:
import sys
import msticpy as mp

sys.path.append(r'../src')

%load_ext autoreload
%autoreload 2

In [2]:
import logging
logging.basicConfig(level=logging.INFO)

## Doc search examples

In [23]:
from search_docs import RTDocSearch


In [None]:
# Build vectorstore from HTML (skip if already built)
RTDocSearch.create_vectorstore("e:/src/msticpy/docs/build/html", "./mp-rtd-vs.faiss_index")

In [24]:
# Load pre-created vectorstore
doc_search = RTDocSearch("./mp-rtd-vs.faiss_index", model_name="gpt-4")

In [25]:

doc_search.ask("How do I plot a timeline from a dataframe?")

To plot a timeline from a DataFrame, you can use the `mp_plot.timeline()` function which is implemented as a pandas accessor. This allows you to plot directly from the DataFrame. Here is an example:

```python
df.mp_plot.timeline(
   group_by="Account",
   source_columns=["NewProcessName", "ParentProcessName"],
   yaxis=True
);
```

In this example, `df` is your DataFrame. The `group_by` parameter is used to define the way that the data is grouped. The `source_columns` parameter is used to specify the columns from the DataFrame that you want to include in the plot. The `yaxis` parameter is set to `True` to enable the y-axis.

Please note that the trailing semicolon is not mandatory. It is used here to prevent Jupyter from showing the return value from the function.

In [26]:
doc_search.ask("How do I find threat intelligence reports for an IP address?")

To find threat intelligence reports for an IP address, you can use the `lookup_ip` function from the `IpAddress` class in MSTICPy. This function queries all loaded providers that support the observable type (in this case, an IP address). 

Here is an example of how to use this function:

```python
from msticpy.datamodel.entities import IpAddress

# Define the IP addresses you want to look up
iocs = ['162.244.80.235', '185.141.63.120', '82.118.21.1', '85.93.88.165']

# Use the lookup_ip function to find threat intelligence reports
results = IpAddress.ti.lookup_ip(iocs)
```

In this example, `iocs` is a list of IP addresses for which you want to find threat intelligence reports. The `lookup_ip` function returns a DataFrame with the results.

If you want to specify which providers to query, you can use the `providers` parameter to specify a list of provider names. For example:

```python
results = IpAddress.ti.lookup_ip(iocs, providers=['RiskIQ', 'VirusTotal'])
```

In this case, the function will only query the 'RiskIQ' and 'VirusTotal' providers.

In [28]:
doc_search.ask("How do I query log data from Microsoft Sentinel?")

To query log data from Microsoft Sentinel, you can use the MSTICPy package. Here are the steps:

1. **Create a Query Provider**: You need to create a Query Provider for Microsoft Sentinel. This is done using the `QueryProvider` class in MSTICPy.

```python
from msticpy.data import QueryProvider
qry_prov = QueryProvider("LogAnalytics")
```

2. **Connect to a Data Environment**: You need to connect to your Microsoft Sentinel workspace. You can do this by using the `connect` method of the `QueryProvider` object you created. You will need to provide your tenant ID and workspace ID.

```python
qry_prov.connect("<tenant_id>", "<workspace_id>")
```

3. **Run a Query**: Once connected, you can run a query using the `query` method of the `QueryProvider` object. You can either run a pre-defined query or an ad hoc query.

```python
# Running a pre-defined query
data = qry_prov.SecurityAlert.list_alerts(start="2022-01-01", end="2022-01-31")

# Running an ad hoc query
data = qry_prov.exec_query('''
    SecurityAlert
    | where TimeGenerated > ago(30d)
    | where AlertName has "Malware"
''')
```

Please note that the actual query string will depend on the specific log data you want to retrieve from Microsoft Sentinel. The above examples are just illustrative.

Remember to replace `<tenant_id>` and `<workspace_id>` with your actual Azure tenant ID and Sentinel workspace ID.

In [29]:
doc_search.ask("Show me how to use msticpy pivot functions?")

To use MSTICPy pivot functions, you first need to import the necessary modules and initialize the pivot library. Here's a basic example of how to use pivot functions with the `IpAddress` entity:

```python
from msticpy.datamodel.entities import IpAddress
from msticpy.init_notebook import init_notebook

# Initialize the pivot library
init_notebook(namespace=globals())

# Create an IpAddress entity
ip_entity = IpAddress(Address="157.53.1.1")

# Use pivot functions on the IpAddress entity
ip_type = ip_entity.util.ip_type()
whois_info = ip_entity.util.whois()
geolocation = ip_entity.util.geoloc()

# Print the results
print(ip_type)
print(whois_info)
print(geolocation)
```

In this example, we're using the `ip_type`, `whois`, and `geoloc` pivot functions on an `IpAddress` entity. These functions return information about the IP address, such as its type (public or private), whois information, and geolocation data.

Remember, the pivot functions are attached to the entities most relevant to that operation. So, if you want to do things with an IP address, just load the `IpAddress` entity and browse its methods.

For more detailed examples and explanations, you can check out the following notebooks:
- [PivotFunctions-Introduction](https://github.com/microsoft/msticpy/blob/main/docs/notebooks/PivotFunctions-Introduction.ipynb)
- [PivotFunctions](https://github.com/microsoft/msticpy/blob/main/docs/notebooks/PivotFunctions.ipynb)

These notebooks illustrate the use of pivot functions and cover most of the use cases.

## Code search - not as successful!

In [30]:
from search_code import CodeSearch

#CodeSearch.create_vectorstore("e:/src/msticpy/", "./mp-code-vs.faiss_index")

In [31]:
%xmode verbose
code_search = CodeSearch("./mp-code-vs.faiss_index")


Exception reporting mode: Verbose


In [32]:
code_search.ask("What are the parameters for folium?")

The `FoliumMap` class in the provided context has several parameters that can be used to customize the map. Here are the parameters:

- `icon_map` (optional): This is a mapping dictionary or function. It is `None` by default.
- `popup_columns` (optional): This is a list of columns to use for the popup text. It is `None` by default.
- `tooltip_columns` (optional): This is a list of columns to use for the tooltip text. It is `None` by default.
- `marker_cluster` (optional): This is a boolean value that determines whether to use marker clustering. The default is `True`.
- `default_color` (optional): This is the default color for marker icons. The default is `"blue"`.
- `title` (optional): This is the name of the layer. The default is `'layer1'`.
- `zoom_start` (optional): This is the zoom level of the map. The default is `7`.
- `tiles` (optional): This is a custom set of tiles or tile URL. The default is `None`.

Here is an example of how to use these parameters:

```python
folium_map = FoliumMap(
    title="My Map",
    zoom_start=10,
    tiles="OpenStreetMap",
    marker_cluster=True,
    default_color="red",
    icon_map={"icon": "info-sign"},
    popup_columns=["column1", "column2"],
    tooltip_columns=["column3", "column4"]
)
```

In this example, a `FoliumMap` object is created with a title of "My Map", a zoom level of 10, using the "OpenStreetMap" tiles, with marker clustering enabled, a default marker color of red, an icon map with an "info-sign" icon, and specific columns for the popup and tooltip text.

In [33]:
### Compare same question to doc search

In [34]:
doc_search.ask("What are the parameters for folium?")

The parameters for the `FoliumMap` class in MSTICPy are:

- `title` (str, optional): Name of the layer (the default is 'layer1')
- `zoom_start` (int, optional): The zoom level of the map (the default is 7)
- `tiles` ([type], optional): Custom set of tiles or tile URL (the default is None)
- `width` (str, optional): Map display width (the default is '100%')
- `height` (str, optional): Map display height (the default is '100%')
- `location` (list, optional): Location to center map on

Here is an example of how to use these parameters with the `FoliumMap` class:

```python
from msticpy.vis.foliummap import FoliumMap

# Create a new FoliumMap instance
folium_map = FoliumMap(location=(47.5982328,-122.331), zoom_start=14, width='50%', height='50%')

# Access the underlying folium map object
type(folium_map.folium_map)
```

You can also use the `FoliumMap` class via a pandas extension method. Here is an example of how to do this:

```python
import pandas as pd

# Assume geo_loc_df is a DataFrame with IP location data
geo_loc_df.mp_plot.folium_map(ip_column="IPAddress")
```

In this example, `mp_plot.folium_map` is a pandas extension method that uses the `FoliumMap` class to plot IP location data from a DataFrame. The `ip_column` parameter specifies the column in the DataFrame that contains the IP addresses to plot.

In [35]:

code_search.ask("How do I create a QueryTime widget?")

To create a `QueryTime` widget, you can use the `QueryTime` class constructor. The constructor accepts several parameters, including `origin_time`, `before`, `after`, `label`, `units`, and others. 

Here is an example of how to create a `QueryTime` widget:

```python
from datetime import datetime, timezone

query_time_widget = QueryTime(
    origin_time=datetime.now(timezone.utc),
    before=1,
    after=0,
    label="Set time range for pivot functions.",
    units="day",
)
```

In this example, `origin_time` is set to the current date and time, `before` is set to 1, `after` is set to 0, `label` is set to "Set time range for pivot functions.", and `units` is set to "day".

Please note that this code example is based on the provided context and may require additional imports or setup code to run correctly.

In [36]:
code_search.ask("What is the syntax for using the GeoIP Lite lookup?")

The GeoIP Lite lookup is used by creating an instance of the `GeoLiteLookup` class and then calling the `lookup_ip` method on it. The `lookup_ip` method takes an IP address as a parameter and returns the geographic location of the IP address.

Here is an example of how to use the GeoIP Lite lookup:

```python
from msticpy.sectools.geoip import GeoLiteLookup

# Create an instance of the GeoLiteLookup class
ip_location = GeoLiteLookup()

# Use the lookup_ip method to get the location of an IP address
location = ip_location.lookup_ip(ip_address="104.97.41.163")

print(location)
```

In this example, the `lookup_ip` method is called with the IP address "104.97.41.163". The method returns the geographic location of the IP address.

Please note that you need to have a valid API key from Maxmind GeoLite2 to use this service. The API key should be added to your `msticpyconfig.yaml` file. If no API key is found, a message will be displayed with instructions on how to obtain and add the API key.

```python
_NO_API_KEY_MSSG = """
No API Key was found to download the Maxmind GeoIPLite database.
If you do not have an account, go here to create one and obtain and API key.
https://www.maxmind.com/en/geolite2/signup

Add this API key to your msticpyconfig.yaml
https://msticpy.readthedocs.io/en/latest/data_acquisition/GeoIPLookups.html#maxmind-geo-ip-lite-lookup-class.
"""
```

Please note that this module does not appear to expose any functionality via a pandas extension method or a pivot function.

In [None]:
code_search.retriever.retriever.vectorstore.max_marginal_relevance_search

In [64]:

docs = code_search.retriever.retriever.vectorstore.similarity_search("geolite", k=10)

content = [doc for doc in docs if doc.metadata["source"].startswith("msticpy")][:5]
print("\n".join([f"File: {doc.metadata['source']}\n{'-' * len(doc.metadata['source'])}\n\n{doc.page_content}\n" for doc in content]))


File: msticpy\context\geoip.py
------------------------

+ "edition_id=GeoLite2-City&license_key={license_key}&suffix=tar.gz"
    )

    _DB_HOME = str(Path.joinpath(Path("~").expanduser(), ".msticpy", "GeoLite2"))
    _DB_ARCHIVE = "GeoLite2-City.mmdb.{rand}.tar.gz"
    _DB_FILE = "GeoLite2-City.mmdb"

    _LICENSE_HTML = """
This product includes GeoLite2 data created by MaxMind, available from
<a href="https://www.maxmind.com">https://www.maxmind.com</a>.
"""

    _LICENSE_TXT = """
This product includes GeoLite2 data created by MaxMind, available from
https://www.maxmind.com.
"""

    _NO_API_KEY_MSSG = """
No API Key was found to download the Maxmind GeoIPLite database.
If you do not have an account, go here to create one and obtain and API key.
https://www.maxmind.com/en/geolite2/signup

Add this API key to your msticpyconfig.yaml
https://msticpy.readthedocs.io/en/latest/data_acquisition/GeoIPLookups.html#maxmind-geo-ip-lite-lookup-class.

File: msticpy\context\geoip.py
---------