![image](images/MSTIC.png)

# Customizing and extending MSTICPy
### Ian Hellen, Principal Dev, in Microsoft Threat Intelligence Center (MSTIC)
### @ianhellen (twitter), ianhelle@microsoft.com


---

<h2 style="color: White; background-color: DarkSlateBlue; padding: 10px">Topics</h2>


#### - MSTICPy Settings
#### - Creating queries
#### - Creating pivot functions
#### - Creating a data provider
<br>

---

<h2 style="color: White; background-color: DarkSlateBlue; padding: 10px">Get this notebook <i>Extending MSTICPy.ipynb</i></h2>

## @ https://aka.ms/msticpy-launchspace

---

In [1]:
from msticpy.nbtools import nbinit
nbinit.init_notebook(namespace=globals());

---

<h1 style="border: 1px solid;color: White; background-color: DarkSlateGray; padding: 10px">Creating & editing MSTICPy settings</h1>


### Sample config file
### https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/msticpyconfig.yaml

In [2]:
from msticpy.config import MpConfigEdit

MpConfigEdit("./msticpyconfig.yaml")

VBox(children=(Tab(children=(VBox(children=(Label(value='Azure Sentinel workspace settings'), HBox(children=(V…

---

<h1 style="border: 1px solid;color: White; background-color: DarkSlateGray; padding: 10px">Creating a query</h1>


In [3]:
# from msticpy.data import QueryProvider   ## Not needed if you ran nbinit.init_notebook
my_qry_prov = QueryProvider("AzureSentinel")
my_qry_prov.connect(WorkspaceConfig(workspace="CyberSecuritySoc"))

Please wait. Loading Kqlmagic extension...


In [4]:
my_qry_prov.SecurityAlert.list_alerts(start=-1, end=0).head(3)

Unnamed: 0,TenantId,TimeGenerated,AlertDisplayName,AlertName,Severity,Description,ProviderName,VendorName,VendorOriginalId,SystemAlertId,ResourceId,SourceComputerId,AlertType,ConfidenceLevel,ConfidenceScore,IsIncident,StartTimeUtc,EndTimeUtc,ProcessingEndTime,RemediationSteps,ExtendedProperties,Entities,SourceSystem,WorkspaceSubscriptionId,WorkspaceResourceGroup,ExtendedLinks,ProductName,ProductComponentName,AlertLink,Status,CompromisedEntity,Tactics,Type
0,8ecf8077-cf51-4820-aadd-14040956f35d,2021-05-20 19:05:25.688000+00:00,Malicious credential theft tool execution detected,Malicious credential theft tool execution detected,High,A known credential theft tool execution command line was detected.\nEither the process itself or...,MDATP,Microsoft,da637569252690092730_-820524321,4d334d29-c109-857c-e775-af5bb2807704,,,WindowsDefenderAtp,,,False,2021-05-18 08:59:58.188000+00:00,2021-05-18 08:59:58.188000+00:00,2021-05-20 19:05:25.565000+00:00,"[\r\n ""1. Make sure the machine is completely updated and all your software has the latest patc...","{\r\n ""MicrosoftDefenderAtp.Category"": ""CredentialAccess"",\r\n ""MicrosoftDefenderAtp.Investiga...","[\r\n {\r\n ""$id"": ""4"",\r\n ""DnsDomain"": ""na.contosohotels.com"",\r\n ""HostName"": ""vict...",Detection,,,,Microsoft Defender Advanced Threat Protection,,https://securitycenter.microsoft.com/alert/da637569252690092730_-820524321?tid=4b2462a4-bbee-495...,New,,CredentialAccess,SecurityAlert
1,8ecf8077-cf51-4820-aadd-14040956f35d,2021-05-20 19:10:30.159000+00:00,Malicious credential theft tool execution detected,Malicious credential theft tool execution detected,High,A known credential theft tool execution command line was detected.\nEither the process itself or...,MDATP,Microsoft,da637569217459020928_1215630741,a910a8dc-8ec5-33be-2d6d-0dcb25ffdf75,,,WindowsDefenderAtp,,,False,2021-05-18 07:59:08.774000+00:00,2021-05-18 07:59:08.774000+00:00,2021-05-20 19:10:30.019000+00:00,"[\r\n ""1. Make sure the machine is completely updated and all your software has the latest patc...","{\r\n ""MicrosoftDefenderAtp.Category"": ""CredentialAccess"",\r\n ""MicrosoftDefenderAtp.Investiga...","[\r\n {\r\n ""$id"": ""4"",\r\n ""DnsDomain"": ""na.contosohotels.com"",\r\n ""HostName"": ""vict...",Detection,,,,Microsoft Defender Advanced Threat Protection,,https://securitycenter.microsoft.com/alert/da637569217459020928_1215630741?tid=4b2462a4-bbee-495...,New,,CredentialAccess,SecurityAlert
2,8ecf8077-cf51-4820-aadd-14040956f35d,2021-05-20 19:15:07.153000+00:00,TI map IP entity to AzureActivity (enriched),TI map IP entity to AzureActivity (enriched),Medium,Identifies a match in AzureActivity from any IP IOC from TI. Automation will enrich with RiskIQ ...,ASI Scheduled Alerts,Microsoft,e6fbf8ac-1338-4454-a7db-d50d1dee22c3,ed38d784-7357-93ec-4a86-0649c6bcb2f1,,,8ecf8077-cf51-4820-aadd-14040956f35d_9119cf0a-7bc3-4288-b8d2-1e4e293b433e,,,False,2021-05-06 19:10:02.788000+00:00,2021-05-20 19:10:02.788000+00:00,2021-05-20 19:15:07.153000+00:00,,"{\r\n ""Query"": ""// The query_now parameter represents the time (in UTC) at which the scheduled ...","[\r\n {\r\n ""$id"": ""3"",\r\n ""Address"": ""144.91.119.160"",\r\n ""Type"": ""ip""\r\n },\r\n ...",Detection,d1d8779d-38d7-4f06-91db-9cbc8de0176f,SOC,,Azure Sentinel,Scheduled Alerts,,New,,Impact,SecurityAlert


In [5]:
my_qry_prov.browse_queries()

VBox(children=(Text(value='', description='Filter:', style=DescriptionStyle(description_width='initial')), Sel…

<h2 style="color: White; background-color: DarkSlateBlue; padding: 10px">Query definition YAML file</h2>

```yaml
metadata:
  version: 1
  description: Query template           # Name of the file - just for reference
  data_environments: [DataProvider]     # Name of the data provider
  data_families: [DataCategory]         # This is used to group the queries
  tags: ['event', 'security']           # not yet used but maybe in the future
defaults:                               # Global defaults for queries (unless overridden)
  metadata:
  parameters:
sources:
  query_name:                           # Query definition
    description: Retrieves list of events on a host
    args:
      # Query text (with optional params)
      query: '                          
        select something from table where host = "{param1}"
        '
    parameters:
      # query parameter defintions (inherits from 'defaults' section)
      param1:
        description: First parameter
        type: type        # str, int, datetime, list
        # default:        # if there is a default value

```

<h2 style="color: White; background-color: DarkSlateBlue; padding: 10px">First query</h2>

In [6]:
query_def = """
metadata:
  version: 1
  description: Security event queries
  data_environments: [AzureSentinel]
  data_families: [CustomQueries]
  tags: ['windows', 'event', 'security']
defaults:
  metadata:
  parameters:
sources:
  list_events_by_id:
    description: Retrieves list of events on a host
    args:
      query: '
        {table}
        | where EventID in ({event_list})
        | where TimeGenerated >= datetime({start})
        | where TimeGenerated <= datetime({end})
        {add_query_items}'
    parameters:
      table:
        description: The name of the source table
        type: str
        default: SecurityEvent
      start:
        description: start of the query
        type: datetime
        default: -1
      end:
        description: end of the query
        type: datetime
        default: 0
      event_list:
        description: List of event IDs to match
        type: list
      add_query_items:
        description: Additional query clauses (precede with "|")
        type: str
        default: ""
"""

<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">Query creation process</h3>

- Take a query from your query interface
- parameterize the query
- Move any reusable parameters to `defaults` section

In [7]:
from pathlib import Path
if not Path("./queries").is_dir():
    Path("./queries").mkdir()

# Write out the yaml file
qry_file = Path("./queries").joinpath("az_sent_eventqueries.yaml")
qry_file.write_text(query_def)

# Check that the definition is readable and looks OK
from msticpy.data.data_query_reader import read_query_def_file
read_query_def_file(qry_file)

({'list_events_by_id': {'description': 'Retrieves list of events on a host',
   'args': {'query': ' {table} | where EventID in ({event_list}) | where TimeGenerated >= datetime({start}) | where TimeGenerated <= datetime({end}) {add_query_items}'},
   'parameters': {'table': {'description': 'The name of the source table',
     'type': 'str',
     'default': 'SecurityEvent'},
    'start': {'description': 'start of the query',
     'type': 'datetime',
     'default': -1},
    'end': {'description': 'end of the query',
     'type': 'datetime',
     'default': 0},
    'event_list': {'description': 'List of event IDs to match',
     'type': 'list'},
    'add_query_items': {'description': 'Additional query clauses (precede with "|")',
     'type': 'str',
     'default': ''}}}},
 {'metadata': None, 'parameters': None},
 {'version': 1,
  'description': 'Security event queries',
  'data_environments': ['AzureSentinel'],
  'data_families': ['CustomQueries'],
  'tags': ['windows', 'event', 'securit

<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">Check the query</h3>

In [8]:
from msticpy.data import QueryProvider
my_qry_prov = QueryProvider("AzureSentinel", query_paths=["./queries"])

In [9]:
my_qry_prov.list_queries()

['Azure.get_vmcomputer_for_host',
 'Azure.get_vmcomputer_for_ip',
 'Azure.list_aad_signins_for_account',
 'Azure.list_aad_signins_for_ip',
 'Azure.list_all_signins_geo',
 'Azure.list_azure_activity_for_account',
 'Azure.list_azure_activity_for_ip',
 'Azure.list_azure_activity_for_resource',
 'Azure.list_storage_ops_for_hash',
 'Azure.list_storage_ops_for_ip',
 'AzureNetwork.az_net_analytics',
 'AzureNetwork.dns_lookups_for_domain',
 'AzureNetwork.dns_lookups_for_ip',
 'AzureNetwork.dns_lookups_from_ip',
 'AzureNetwork.get_heartbeat_for_host',
 'AzureNetwork.get_heartbeat_for_ip',
 'AzureNetwork.get_host_for_ip',
 'AzureNetwork.get_ips_for_host',
 'AzureNetwork.list_azure_network_flows_by_host',
 'AzureNetwork.list_azure_network_flows_by_ip',
 'AzureSentinel.get_bookmark_by_id',
 'AzureSentinel.get_bookmark_by_name',
 'AzureSentinel.list_bookmarks',
 'AzureSentinel.list_bookmarks_for_entity',
 'AzureSentinel.list_bookmarks_for_tags',
 'CustomQueries.list_events_by_id',
 'Heartbeat.get_h

<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">Test the query</h3>

In [10]:
my_qry_prov.connect(WorkspaceConfig(workspace="CyberSecuritySoc"))

In [11]:
my_qry_prov.CustomQueries.list_events_by_id()

Query:  list_events_by_id
Data source:  AzureSentinel
Retrieves list of events on a host

Parameters
----------
add_query_items: str (optional)
    Additional query clauses (precede with "|")
end: datetime (optional)
    end of the query
event_list: list
    List of event IDs to match
start: datetime (optional)
    start of the query
    (default value is: -1)
table: str (optional)
    The name of the source table
    (default value is: SecurityEvent)
Query:
 {table} | where EventID in ({event_list}) | where TimeGenerated >= datetime({start}) | where TimeGenerated <= datetime({end}) {add_query_items}


ValueError: No values found for these parameters: ['event_list']

In [12]:
my_qry_prov.CustomQueries.list_events_by_id(event_list=[4672, 5058]).head(3)

Unnamed: 0,TenantId,TimeGenerated,SourceSystem,Account,AccountType,Computer,EventSourceName,Channel,Task,Level,EventData,EventID,Activity,SourceComputerId,EventOriginId,MG,TimeCollected,ManagementGroupName,AccessList,AccessMask,AccessReason,AccountDomain,AccountExpires,AccountName,AccountSessionIdentifier,...,TargetUserName,TargetUserSid,TemplateContent,TemplateDSObjectFQDN,TemplateInternalName,TemplateOID,TemplateSchemaVersion,TemplateVersion,TokenElevationType,TransmittedServices,UserAccountControl,UserParameters,UserPrincipalName,UserWorkstations,VirtualAccount,VendorIds,Workstation,WorkstationName,PartitionKey,RowKey,StorageAccount,AzureDeploymentID,AzureTableName,Type,_ResourceId
0,8ecf8077-cf51-4820-aadd-14040956f35d,2021-05-20 20:18:36.057000+00:00,OpsManager,,,VictimPC2,Microsoft-Windows-Security-Auditing,Security,12292,8,"<EventData xmlns=""http://schemas.microsoft.com/win/2004/08/events/event"">\r\n <Data Name=""Subje...",5058,5058 - Key file operation.,0b31dee3-5401-43d7-802a-7c8aab820390,3d9dce53-342e-4297-9651-884e8c3c23d8,00000000-0000-0000-0000-000000000001,2021-05-20 20:19:07.092000+00:00,AOI-8ecf8077-cf51-4820-aadd-14040956f35d,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,SecurityEvent,/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourcegroups/defendtheflag/providers/micro...
1,8ecf8077-cf51-4820-aadd-14040956f35d,2021-05-20 20:18:47.053000+00:00,OpsManager,,,VictimPC2,Microsoft-Windows-Security-Auditing,Security,12292,8,"<EventData xmlns=""http://schemas.microsoft.com/win/2004/08/events/event"">\r\n <Data Name=""Subje...",5058,5058 - Key file operation.,0b31dee3-5401-43d7-802a-7c8aab820390,25803ca6-f66a-46be-9899-ecd97d9f73a6,00000000-0000-0000-0000-000000000001,2021-05-20 20:19:07.092000+00:00,AOI-8ecf8077-cf51-4820-aadd-14040956f35d,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,SecurityEvent,/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourcegroups/defendtheflag/providers/micro...
2,8ecf8077-cf51-4820-aadd-14040956f35d,2021-05-20 20:18:59.057000+00:00,OpsManager,,,VictimPC2,Microsoft-Windows-Security-Auditing,Security,12292,8,"<EventData xmlns=""http://schemas.microsoft.com/win/2004/08/events/event"">\r\n <Data Name=""Subje...",5058,5058 - Key file operation.,0b31dee3-5401-43d7-802a-7c8aab820390,fce6531f-9119-4d3c-8013-bc2fb44086fe,00000000-0000-0000-0000-000000000001,2021-05-20 20:19:07.092000+00:00,AOI-8ecf8077-cf51-4820-aadd-14040956f35d,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,SecurityEvent,/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourcegroups/defendtheflag/providers/micro...


In [13]:
q_times = nbwidgets.QueryTime(units="min", origin_time=pd.Timestamp("2021-05-04"))
q_times

VBox(children=(HTML(value='<h4>Set query time boundaries</h4>'), HBox(children=(DatePicker(value=datetime.date…

In [14]:
print("start", q_times.start, "end", q_times.end)
my_qry_prov.CustomQueries.list_events_by_id(q_times, event_list=[4672, 5058]).head(3)

start 2021-05-03 23:00:00 end 2021-05-04 01:00:00


Unnamed: 0,TenantId,TimeGenerated,SourceSystem,Account,AccountType,Computer,EventSourceName,Channel,Task,Level,EventData,EventID,Activity,SourceComputerId,EventOriginId,MG,TimeCollected,ManagementGroupName,AccessList,AccessMask,AccessReason,AccountDomain,AccountExpires,AccountName,AccountSessionIdentifier,...,TargetUserName,TargetUserSid,TemplateContent,TemplateDSObjectFQDN,TemplateInternalName,TemplateOID,TemplateSchemaVersion,TemplateVersion,TokenElevationType,TransmittedServices,UserAccountControl,UserParameters,UserPrincipalName,UserWorkstations,VirtualAccount,VendorIds,Workstation,WorkstationName,PartitionKey,RowKey,StorageAccount,AzureDeploymentID,AzureTableName,Type,_ResourceId
0,8ecf8077-cf51-4820-aadd-14040956f35d,2021-05-04 00:57:35.793000+00:00,OpsManager,NT AUTHORITY\SYSTEM,Machine,VictimPC2,Microsoft-Windows-Security-Auditing,Security,12548,8,,4672,4672 - Special privileges assigned to new logon.,0b31dee3-5401-43d7-802a-7c8aab820390,ef28ed78-74c6-4187-acd3-e39718db52c7,00000000-0000-0000-0000-000000000001,2021-05-04 00:58:07.048000+00:00,AOI-8ecf8077-cf51-4820-aadd-14040956f35d,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,SecurityEvent,/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourcegroups/defendtheflag/providers/micro...
1,8ecf8077-cf51-4820-aadd-14040956f35d,2021-05-04 00:57:36.057000+00:00,OpsManager,,,VictimPC2,Microsoft-Windows-Security-Auditing,Security,12292,8,"<EventData xmlns=""http://schemas.microsoft.com/win/2004/08/events/event"">\r\n <Data Name=""Subje...",5058,5058 - Key file operation.,0b31dee3-5401-43d7-802a-7c8aab820390,c87a50c6-b60f-4202-8a67-62a1b4ed6ffe,00000000-0000-0000-0000-000000000001,2021-05-04 00:58:07.048000+00:00,AOI-8ecf8077-cf51-4820-aadd-14040956f35d,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,SecurityEvent,/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourcegroups/defendtheflag/providers/micro...
2,8ecf8077-cf51-4820-aadd-14040956f35d,2021-05-04 00:57:40.020000+00:00,OpsManager,NT AUTHORITY\SYSTEM,Machine,VictimPC2,Microsoft-Windows-Security-Auditing,Security,12548,8,,4672,4672 - Special privileges assigned to new logon.,0b31dee3-5401-43d7-802a-7c8aab820390,70835f0d-d4de-4567-8065-13698d0fc154,00000000-0000-0000-0000-000000000001,2021-05-04 00:58:07.048000+00:00,AOI-8ecf8077-cf51-4820-aadd-14040956f35d,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,SecurityEvent,/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourcegroups/defendtheflag/providers/micro...


<h2 style="color: White; background-color: DarkSlateBlue; padding: 10px">Adding the query so it is always loaded</h2>

- Put the query(ies) in a folder
- Add a `Custom` item to the `QueryDefinitions` section in `msticpyconfig.yaml`

> **Note**: the entry must be a list (prefix each item with "-" even if there is only one path.

```yaml
...
  IPStack:
    Args:
      AuthKey:
        KeyVault: null
    Provider: IPStackLookup
QueryDefinitions:
  Custom:
    - /etc/msticpy/queries
    - ~/my_queries
TIProviders:
  OTX:
    Args:
      AuthKey:
      ...
```

<h2 style="color: White; background-color: DarkSlateBlue; padding: 10px">Multiple queries</h2>

In [15]:
query_def2 = """
metadata:
  version: 1
  description: Security event queries
  data_environments: [AzureSentinel]
  data_families: [CustomQueries]
  tags: ['windows', 'event', 'security']
defaults:
  metadata:
  parameters:
    table:
      description: The name of the source table
      type: str
      default: SecurityEvent
    start:
      description: start of the query
      type: datetime
      default: -1
    end:
      description: end of the query
      type: datetime
      default: 0
    add_query_items:
        description: Additional query clauses (precede with "|")
        type: str
        default: ""
sources:
  list_events_by_id:
    description: Retrieves list of events on a host
    args:
      query: '
        {table}
        | where EventID in ({event_list})
        | where TimeGenerated >= datetime({start})
        | where TimeGenerated <= datetime({end})
        {add_query_items}'
    parameters:
      event_list:
        description: List of event IDs to match
        type: list
  list_events_by_host:
    metadata:
      data_families: [CustomQueries_Host]
    description: Retrieves list of events on a host
    args:
      query: '
        {table}
        | where Computer has ({host_name})
        | where TimeGenerated >= datetime({start})
        | where TimeGenerated <= datetime({end})
        {add_query_items}'
    parameters:
      host_name:
        description: Name (or partial name) of host to search for.
        type: str
      
"""

In [16]:

qry_file2 = Path("./queries").joinpath("az_sent_eventqueries.yaml")
qry_file2.write_text(query_def2)


1468

In [17]:
my_qry_prov = QueryProvider("AzureSentinel", query_paths=["./queries"])
my_qry_prov.connect(WorkspaceConfig(workspace="CyberSecuritySoc"))
my_qry_prov.list_queries()[18:28]
# help(filter)

['AzureNetwork.list_azure_network_flows_by_host',
 'AzureNetwork.list_azure_network_flows_by_ip',
 'AzureSentinel.get_bookmark_by_id',
 'AzureSentinel.get_bookmark_by_name',
 'AzureSentinel.list_bookmarks',
 'AzureSentinel.list_bookmarks_for_entity',
 'AzureSentinel.list_bookmarks_for_tags',
 'CustomQueries.list_events_by_id',
 'CustomQueries_Host.list_events_by_host',
 'Heartbeat.get_heartbeat_for_host']

<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">Debugging a query</h3>

In [18]:
my_qry_prov.CustomQueries_Host.list_events_by_host("print", q_times, host_name="victimpc")

' SecurityEvent | where Computer has (victimpc) | where TimeGenerated >= datetime(2021-05-03T23:00:00Z) | where TimeGenerated <= datetime(2021-05-04T01:00:00Z) '

<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">Read more</h3>

### https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProviders.html#creating-new-queries

---

<h1 style="border: 1px solid;color: White; background-color: DarkSlateGray; padding: 10px">Pivot functions</h1>

- Find a function
- Look at its type signature
- Create a registration
- Add the function as a Pivot

In [19]:
pivot = Pivot(namespace=globals())
pivot.browse()

Using Open PageRank. See https://www.domcop.com/openpagerank/what-is-openpagerank


VBox(children=(HBox(children=(VBox(children=(HTML(value='<b>Entities</b>'), Select(description='entity', layou…

<h2 style="color: White; background-color: DarkSlateBlue; padding: 10px">Add a simple function</h2>

In [20]:
import urllib


def split_url(url):
    """Split url into component parts."""
    return urllib.parse.urlparse(url)._asdict()


split_url("https://www.youtube.com/watch?v=CmyFcChTc4M")

OrderedDict([('scheme', 'https'),
             ('netloc', 'www.youtube.com'),
             ('path', '/watch'),
             ('params', ''),
             ('query', 'v=CmyFcChTc4M'),
             ('fragment', '')])

In [21]:
from msticpy.datamodel.pivot import Pivot, PivotRegistration

url_func_reg = PivotRegistration(
    entity_container_name="cust_util",
    input_type="value",              # A value, list/iterable or dataframe
    func_input_value_arg="url",      # What is the param name expected by the function
    func_new_name="components",      # Rename the function?
    entity_map={"Url": "Url"},       # Attach to which entities?
)

Pivot.add_pivot_function(func=split_url, pivot_reg=url_func_reg)

In [22]:
from msticpy.datamodel import entities
entities.Url.pivots()

['AzureSentinel.azsent_bookmarks',
 'AzureSentinel.azti_list_indicators_by_url',
 'cust_util.components',
 'dns_is_resolvable',
 'dns_resolve',
 'qry_azsent_bookmarks',
 'ti.lookup_url',
 'ti.lookup_url_OTX',
 'ti.lookup_url_VirusTotal',
 'ti.lookup_url_XForce',
 'tilookup_url',
 'util.b64decode',
 'util.dns_components',
 'util.dns_in_abuse_list',
 'util.dns_is_resolvable',
 'util.dns_resolve',
 'util.dns_validate_tld',
 'util.extract_iocs',
 'util.url_components',
 'util.url_screenshot']

In [23]:
entities.Url.cust_util.components(url="https://github.com/microsoft/msticpy?something=special")

Unnamed: 0,scheme,netloc,path,params,query,fragment,url,src_row_index
0,https,github.com,/microsoft/msticpy,,something=special,,https://github.com/microsoft/msticpy?something=special,0


In [24]:
urls = [
    "https://www.youtube.com/watch?v=CmyFcChTc4M",
    "https://techcommunity.microsoft.com/t5/azure-sentinel/democratize-machine-learning-with-customizable-ml-anomalies/ba-p/2346338",
    "https://devblogs.microsoft.com/azure-sdk/python-conda-sdk-preview/#top-row",
    "https://vscode-auth.github.com/?browser_session_id=401396e1c5809e2d9eb4fd1f52d96193e24e35e125b0187252156fcfde2f914d&code=7ee536e476696a4cc5c9&state=LqQuTkU4NNKAoefBxkX_83S7l8U%2FeyJhdXRoU2VydmVyIjoiaHR0cHM6Ly9naXRodWIuY29tIiwiY2FsbGJhY2tVcmkiOiJ2c2NvZGU6Ly92c2NvZGUuZ2l0aHViLWF1dGhlbnRpY2F0aW9uL2RpZC1hdXRoZW50aWNhdGU_d2luZG93aWQ9MSIsInJlc3BvbnNlVHlwZSI6ImNvZGUiLCJzdGF0ZSI6IjNmODgzNTUwLWRiYzQtNGM3Zi1iMjE1LTRlYWRlNTdhNDE4YSIsImlkIjoiOjpmZmZmOjEwLjQzLjE4Mi4yMDcifQ",
    "http://localhost:8888/lab/tree/pycon2021/PyCon-Msticpy.ipynb"
]

entities.Url.cust_util.components(urls)

Unnamed: 0,scheme,netloc,path,params,query,fragment,url,src_row_index
0,https,www.youtube.com,/watch,,v=CmyFcChTc4M,,https://www.youtube.com/watch?v=CmyFcChTc4M,0
1,https,techcommunity.microsoft.com,/t5/azure-sentinel/democratize-machine-learning-with-customizable-ml-anomalies/ba-p/2346338,,,,https://techcommunity.microsoft.com/t5/azure-sentinel/democratize-machine-learning-with-customiz...,1
2,https,devblogs.microsoft.com,/azure-sdk/python-conda-sdk-preview/,,,top-row,https://devblogs.microsoft.com/azure-sdk/python-conda-sdk-preview/#top-row,2
3,https,vscode-auth.github.com,/,,browser_session_id=401396e1c5809e2d9eb4fd1f52d96193e24e35e125b0187252156fcfde2f914d&code=7ee536e...,,https://vscode-auth.github.com/?browser_session_id=401396e1c5809e2d9eb4fd1f52d96193e24e35e125b01...,3
4,http,localhost:8888,/lab/tree/pycon2021/PyCon-Msticpy.ipynb,,,,http://localhost:8888/lab/tree/pycon2021/PyCon-Msticpy.ipynb,4


<h2 style="color: White; background-color: DarkSlateBlue; padding: 10px">Using pivot functions from external modules</h2>

<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">Define functions in a module</h3>

In [25]:
%%writefile pivot_funcs.py
"""Custom pivot functions."""

import urllib

def split_url(url: str) -> dict:
    """Split url into component parts."""
    return urllib.parse.urlparse(url)._asdict()


def is_https(url: str) -> bool:
    """Return true if URL is https."""
    url_comps = split_url(url)
    return url_comps.get("scheme", "").casefold() == "https"


Overwriting pivot_funcs.py


<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">Create a pivot definition file</h3>

In [26]:
%%writefile pivot_funcs.yaml
pivot_providers:
  url_comps:
    src_module: pivot_funcs
    src_func_name: split_url
    func_new_name: url_comps
    input_type: value
    entity_map:
      Url: Url
    func_input_value_arg: url
    entity_container_name: cust_util
    create_shortcut: True
  is_https:
    src_module: pivot_funcs
    src_func_name: is_https
    input_type: value
    entity_map:
      Url: Url
    func_input_value_arg: url
    entity_container_name: cust_util
    create_shortcut: True

Overwriting pivot_funcs.yaml


In [27]:
pivot.register_pivot_providers(pivot_reg_path="pivot_funcs.yaml")


In [28]:
entities.Url.pivots()

['AzureSentinel.azsent_bookmarks',
 'AzureSentinel.azti_list_indicators_by_url',
 'cust_util.components',
 'cust_util.is_https',
 'cust_util.url_comps',
 'dns_is_resolvable',
 'dns_resolve',
 'is_https',
 'qry_azsent_bookmarks',
 'ti.lookup_url',
 'ti.lookup_url_OTX',
 'ti.lookup_url_VirusTotal',
 'ti.lookup_url_XForce',
 'tilookup_url',
 'url_comps',
 'util.b64decode',
 'util.dns_components',
 'util.dns_in_abuse_list',
 'util.dns_is_resolvable',
 'util.dns_resolve',
 'util.dns_validate_tld',
 'util.extract_iocs',
 'util.url_components',
 'util.url_screenshot']

In [29]:
entities.Url.is_https(urls)

Unnamed: 0,url,result,src_row_index
0,https://www.youtube.com/watch?v=CmyFcChTc4M,True,0
1,https://techcommunity.microsoft.com/t5/azure-sentinel/democratize-machine-learning-with-customiz...,True,1
2,https://devblogs.microsoft.com/azure-sdk/python-conda-sdk-preview/#top-row,True,2
3,https://vscode-auth.github.com/?browser_session_id=401396e1c5809e2d9eb4fd1f52d96193e24e35e125b01...,True,3
4,http://localhost:8888/lab/tree/pycon2021/PyCon-Msticpy.ipynb,False,4


In [30]:
youtube = entities.Url(Url="https://www.youtube.com/watch?v=CmyFcChTc4M")
display(youtube)

In [31]:

youtube.is_https()

Unnamed: 0,url,result,src_row_index
0,https://www.youtube.com/watch?v=CmyFcChTc4M,True,0


In [None]:
youtube.url_comps()

<h3 style="color: White; background-color: DarkSlateGray; padding: 5px">Read more</h3>

### https://msticpy.readthedocs.io/en/latest/data_analysis/PivotFunctions.html#customizing-and-managing-pivots

---

<h1 style="border: 1px solid;color: White; background-color: DarkSlateGray; padding: 10px">Creating a DataProvider</h1>

![image](images/DataLayer.png)

---

#### Our base class

In [None]:
# Base class - doc strings removed for brevity
import abc
from abc import ABC

class DriverBase(ABC):
    """Base class for data providers."""

    def __init__(self, **kwargs):
        """Initialize new instance."""
        self._kwargs = kwargs
        self._loaded = False
        self._connected = False
        self.current_connection = None
        self.public_attribs: dict = {}
        self.formatters: dict = {}
        self.use_query_paths = True
        self.has_driver_queries = False

    @property
    def loaded(self) -> bool:
        return self._loaded

    @property
    def connected(self) -> bool:
        return self._connected

    @property
    def schema(self) -> dict:
        return {}

    @abc.abstractmethod
    def connect(self, connection_str: str = None, **kwargs):
        """Implement this"""
        return None

    @abc.abstractmethod
    def query(
        self, query: str, query_source=None, **kwargs
    ) -> pd.DataFrame:
        """Implement this"""

#### Example dataprovider driver

In [None]:
# Simple query provider that reads CSVs to a DataFrame
from pathlib import Path

class CSVDataDriver(DriverBase):
    """Simple query provider that reads CSVs."""

    def __init__(self, connection_str: str = None, **kwargs):
        """Instantiate LocalDataDriver and optionally connect."""
        del connection_str
        super().__init__()

        self._loaded = True
        self._data_path = connection_str or "."


    def connect(self, connection_str: str = None, **kwargs):
        """Connect to data source."""
        self._data_path = connection_str or "."
        self._connected = True

    def query(
        self, query: str, query_source=None, **kwargs
    ) -> pd.DataFrame:
        """Execute query string and return DataFrame of results."""
        file = Path(self._data_path).joinpath(query_source.source.get("file", ""))

        if not file.is_file():
            raise FileNotFoundError(
                f"Data file ({file}) for query {query} not found."
            )
        
        return pd.read_csv(file).query(query)


# YAML definition
queries_examples = """
    get_logins_by_type:
        args:
            query: '
                LogonType == {logon_type}
                '
        description: Return logon events of a given type
        file: host_logons.csv
        params:
            logon_type:
                type: "int"
                description: "Logon Type code"
                default: 3
"""

# Note - this is an illustration - you need to register the driver
# with our query provider module before this will work
# How to use
qry_prov = QueryProvider("CSVDataDriver")
qry_prov.connect("./data")
qry_prov.get_logins_by_type("logon_type==3")

---

<h1 style="border: 1px solid;color: White; background-color: DarkSlateGray; padding: 10px">Conclusion</h1>

<ul style="font=20px">
    <li>MSTICPy has a broad collection of tools useful for Cyber investigators</li>
    <li>Still growing...and still has some rough edges</li>
    <li>It is open source, free and data platform-independent</li>
</ul>
 

---

<h1 style="border: 1px solid;color: White; background-color: DarkSlateGray; padding: 10px">Take-aways</h1>

<ul style="font=20px">
    <li>Install MSTICPy (and MSTICnb)</li>
    <li>Try the sample notebooks in MyBinder</li>
    <li>If you like it, leave us a star on GitHub - https://github.com/microsoft/msticpy</li>
    <li>Try the MSTICPy lab in our Python labs (swag available) https://github.com/Azure-Samples/azure-python-labs/blob/main/9-MSTICPy/README.md</li>
    <li>Contribute some code or queries
        <ul>
            <li>Much of MSTICPy is extensible (data providers, TI providers, pivots funcs)</li>
            <li>We're especially keen on better support for data providers, new providers, queries</li>
        </ul>
    </li>
</ul>

---

<h1 style="border: 1px solid;color: White; background-color: DarkSlateGray; padding: 10px">Resources</h1>

## Get this notebook and resources @ https://aka.ms/msticpy-launchspace

[LaunchSpace landing page](https://channel9.msdn.com/Shows/The-Launch-Space/Updates-to-MSTICPy-and-Jupyter-Notebooks-in-Azure-Sentinel)

Documentation and code:

MSTICPy Documentation - https://msticpy.readthedocs.io<br>
GitHub repo - https://github.com/microsoft/msticpy<br>
Blog - https://msticpy.medium.com<br>
Introductory articles
- <a href="https://msticpy.medium.com/msticpy-v1-0-0-and-jupyter-notebooks-in-azure-sentinel-an-update-ac2f6df61f9e?source=friends_link&sk=721420baba0796878bf6c1147a28512d">MSTICPy overview</a>
- <a href="https://techcommunity.microsoft.com/t5/azure-sentinel/msticpy-and-jupyter-notebooks-in-azure-sentinel-an-update/ba-p/2279661">MSTICPy overview for Azure Sentinel users</a>

Sample notebooks:
- https://github.com/microsoft/msticpy/tree/master/docs/notebooks
- https://github.com/Azure/Azure-Sentinel-Notebooks
- MSTICPy sample notebooks [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/microsoft/msticpy/HEAD?filepath=%2Fdocs%2Fnotebooks)
  Try the EventTimeLine and ProcessTree notebooks
- Simple machine learning [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/Azure/Azure-Sentinel-Notebooks/HEAD?filepath=Machine%20Learning%20in%20Notebooks%20Examples.ipynb)


<h1 style="border: 1px solid;color: White; background-color: DarkSlateGray; padding: 10px">Contacts</h1>

Email - msticpy@microsoft.com<br>
Twitter - @ianhellen, @MSSPete (Pete Bryan) @AshwinPatil (Ashwin Patil)<br>
GitHub - @ianhelle (Note the missing last "n"\), @PeteBryan, @AshwinPatil<br>
LinkedIn - @ianhellen, @PeteBryan, @AshwinPatil