# Purpose

This notebook will analyze symon data for finding all applications which connect out to the network and look at which 

In [1]:
!python3 -m pip install pandas
!python3 -m pip install lxml
!python3 -m pip install seaborn



## Gather Data

This notebook will utilize Sysmon data exported to XML from a Windows 10 machine.  Installing Sysmon with the sysmonconfig-export available from SwiftOnSecurity. I then simply exported the XML data of a system.

https://github.com/SwiftOnSecurity/sysmon-config


Only certain applications are recorded based on a variety of conditions.  From more information look at sysmon-config and this section for the rules:

<NetworkConnect onmatch="include">

## Turn data into a Dataframe

The data in XML and needs to be moved into a dataframe.  I found some code from Dritzna on Github that I am usingi and it worked out well.  I found parsing XML to be a little slow and wonder how much better it would be to pull from something like Elasticsearch.

https://gist.github.com/dtrizna/b0b9ccc488da59fcc7090a21eba93317

In [2]:
import pandas as pd
import sys
from lxml import etree
import warnings
warnings.filterwarnings("ignore")

pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
pd.set_option('display.max_rows', 500)

logdf = pd.DataFrame()

# Shamelessly stolen from: https://gist.github.com/dtrizna/b0b9ccc488da59fcc7090a21eba93317

def read_xml(FILENAME):
    parser = etree.XMLParser(recover=True)

    #with open(FILENAME) as file:
    #    data = file.readlines()

    # ignore XML documentation's tag (1st line), so taking only data[1]
    raw = etree.parse(FILENAME)
    print(raw)
    #print(etree.tostring(raw, pretty_print=True))
    return raw


def events_to_df(eventlist):
    df = pd.DataFrame()
    #tag = '{http://schemas.microsoft.com/win/2004/08/events/event}'
    tag = '{http://schemas.microsoft.com/win/2004/08/events/event}'
    for idx, event in enumerate(eventlist):
        edict = {}
        for element in event.iterdescendants():
            if any(x in element.tag for x in ['TimeCreated', 'Execution', 'Security']):
                for item in element.items():
                    edict[item[0]] = item[1]
            # filter out empty fieldspp
            elif any(x in element.tag for x in ['Provider', 'System', 'Correlation']):
                pass
            elif 'Data' in element.tag:
                for item in element.items():
                    edict[item[1]] = element.text
            else:
                edict[element.tag.replace(tag,'')] = element.text
        
        # add raw text event to have ability always access full value of eventlog
        edict['raw'] = etree.tostring(event, pretty_print=True).decode()
    
        edf = pd.DataFrame(edict, index=[idx])
        df = df.append(edf, sort=True)    
    return df



#xml = read_xml("/Users/daniel.lohin/Documents/sysmon_logs.xml")
xml = read_xml("/Users/daniel.lohin/Downloads/ransomwarewindows.xml")
print('[!] Found XML file! Preprocessing...')

# get all events in list
events = []
# we see prefix on every tag, define that
tag = '{http://schemas.microsoft.com/win/2004/08/events/event}'
for element in xml.iter(tag+'Event'):
    events.append(element)

# transform to dataframe
logdf = events_to_df(events)
print('[+] File parsed!')
print(logdf.head())
logdf

<lxml.etree._ElementTree object at 0x11463b840>
[!] Found XML file! Preprocessing...
[+] File parsed!
  Archived                                          CallTrace                               Channel CommandLine Company                      Computer                                   Configuration                              ConfigurationFileHash CreationUtcTime CurrentDirectory Description DestinationHostname DestinationIp DestinationIsIpv6 DestinationPort DestinationPortName Details EventID EventRecordID   EventType FileVersion GrantedAccess                                             Hashes                                              Image ImageLoaded Initiated IntegrityLevel IsExecutable            Keywords Level LogonGuid LogonId Opcode OriginalFileName ParentCommandLine ParentImage ParentProcessGuid ParentProcessId                                 PipeName                             ProcessGuid ProcessID ProcessId Product Protocol QueryName QueryResults QueryStatus RuleName Sc

Unnamed: 0,Archived,CallTrace,Channel,CommandLine,Company,Computer,Configuration,ConfigurationFileHash,CreationUtcTime,CurrentDirectory,Description,DestinationHostname,DestinationIp,DestinationIsIpv6,DestinationPort,DestinationPortName,Details,EventID,EventRecordID,EventType,FileVersion,GrantedAccess,Hashes,Image,ImageLoaded,Initiated,IntegrityLevel,IsExecutable,Keywords,Level,LogonGuid,LogonId,Opcode,OriginalFileName,ParentCommandLine,ParentImage,ParentProcessGuid,ParentProcessId,PipeName,ProcessGuid,ProcessID,ProcessId,Product,Protocol,QueryName,QueryResults,QueryStatus,RuleName,SchemaVersion,Signature,SignatureStatus,Signed,SourceHostname,SourceImage,SourceIp,SourceIsIpv6,SourcePort,SourcePortName,SourceProcessGUID,SourceProcessId,SourceThreadId,State,SystemTime,TargetFilename,TargetImage,TargetObject,TargetProcessGUID,TargetProcessId,Task,TerminalSessionId,ThreadID,User,UserID,UtcTime,Version,raw
0,,,Microsoft-Windows-Sysmon/Operational,,,win-dc-365.attackrange.local,c:\Program Files\ansible\AttackRangeSysmon.xml,SHA256=0ABB62ECDB67B3213E4229F59E1901BD5CC01F4...,,,,,,,,,,16,1,,,,,,,,,,0x8000000000000000,4,,,0,,,,,,,,3688,,,,,,,,,,,,,,,,,,,,,,2021-06-10T10:05:25.470561000Z,,,,,,16,,2740,,S-1-5-21-986166657-4127868789-2511509191-500,2021-06-10 10:05:25.457,3,"<Event xmlns=""http://schemas.microsoft.com/win..."
1,true,,Microsoft-Windows-Sysmon/Operational,,,win-dc-365.attackrange.local,,,,,,,,,,,,23,38,,,,"MD5=446DD1CF97EABA21CF14D03AEBC79F27,SHA256=A7...",C:\Windows\System32\WindowsPowerShell\v1.0\pow...,,,,false,0x8000000000000000,4,,,0,,,,,,,{928AB1BB-E3E4-60C1-9B02-00000000C301},1012,3860,,,,,,-,,,,,,,,,,,,,,,2021-06-10T10:05:27.974799300Z,C:\Users\Administrator\AppData\Local\Microsoft...,,,,,23,,2156,ATTACKRANGE\Administrator,S-1-5-18,2021-06-10 10:05:27.971,5,"<Event xmlns=""http://schemas.microsoft.com/win..."
2,true,,Microsoft-Windows-Sysmon/Operational,,,win-dc-365.attackrange.local,,,,,,,,,,,,23,37,,,,"MD5=446DD1CF97EABA21CF14D03AEBC79F27,SHA256=A7...",C:\Windows\System32\WindowsPowerShell\v1.0\pow...,,,,false,0x8000000000000000,4,,,0,,,,,,,{928AB1BB-E3E4-60C1-9C02-00000000C301},1012,3580,,,,,,-,,,,,,,,,,,,,,,2021-06-10T10:05:27.949716500Z,C:\Users\Administrator\AppData\Local\Microsoft...,,,,,23,,2156,ATTACKRANGE\Administrator,S-1-5-18,2021-06-10 10:05:27.939,5,"<Event xmlns=""http://schemas.microsoft.com/win..."
3,,C:\Windows\SYSTEM32\ntdll.dll+a6134|C:\Windows...,Microsoft-Windows-Sysmon/Operational,,,win-dc-365.attackrange.local,,,,,,,,,,,,10,36,,,0x100000,,,,,,,0x8000000000000000,4,,,0,,,,,,,,1012,,,,,,,-,,,,,,C:\Windows\system32\svchost.exe,,,,,{928AB1BB-E35F-60C1-1300-00000000C301},964,1256,,2021-06-10T10:05:27.048285300Z,,C:\Windows\sysmon64.exe,,{928AB1BB-E3E5-60C1-A602-00000000C301},1012,10,,2156,,S-1-5-18,2021-06-10 10:05:27.044,3,"<Event xmlns=""http://schemas.microsoft.com/win..."
4,,,Microsoft-Windows-Sysmon/Operational,,,win-dc-365.attackrange.local,,,,,,,,,,,,17,35,CreatePipe,,,,C:\Windows\system32\svchost.exe,,,,,0x8000000000000000,4,,,0,,,,,,\PIPE_EVENTROOT\CIMV2SCM EVENT PROVIDER,{928AB1BB-E35F-60C1-1600-00000000C301},1012,1264,,,,,,-,,,,,,,,,,,,,,,2021-06-10T10:05:27.033076500Z,,,,,,17,,2156,,S-1-5-18,2021-06-10 10:05:27.028,1,"<Event xmlns=""http://schemas.microsoft.com/win..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36622,,C:\Windows\SYSTEM32\ntdll.dll+a6134|C:\Windows...,Microsoft-Windows-Sysmon/Operational,,,win-host-977.attackrange.local,,,,,,,,,,,,10,10505,,,0x1fffff,,,,,,,0x8000000000000000,4,,,0,,,,,,,,1964,,,,,,,-,,,,,,C:\Windows\system32\csrss.exe,,,,,{61981517-E6CE-60C1-0500-00000000C501},416,532,,2021-06-10T10:33:31.615810100Z,,C:\Program Files\SplunkUniversalForwarder\bin\...,,{61981517-EA7B-60C1-2D04-00000000C501},216,10,,2764,,S-1-5-18,2021-06-10 10:33:31.612,3,"<Event xmlns=""http://schemas.microsoft.com/win..."
36623,,C:\Windows\SYSTEM32\ntdll.dll+a7404|C:\Windows...,Microsoft-Windows-Sysmon/Operational,,,win-host-977.attackrange.local,,,,,,,,,,,,10,10504,,,0x1fffff,,,,,,,0x8000000000000000,4,,,0,,,,,,,,1964,,,,,,,-,,,,,,C:\Program Files\SplunkUniversalForwarder\bin\...,,,,,{61981517-E765-60C1-A200-00000000C501},2592,3496,,2021-06-10T10:33:31.615672000Z,,C:\Program Files\SplunkUniversalForwarder\bin\...,,{61981517-EA7B-60C1-2D04-00000000C501},216,10,,2764,,S-1-5-18,2021-06-10 10:33:31.612,3,"<Event xmlns=""http://schemas.microsoft.com/win..."
36624,,,Microsoft-Windows-Sysmon/Operational,"""C:\Program Files\SplunkUniversalForwarder\bin...",Splunk Inc.,win-host-977.attackrange.local,,,,C:\Windows\system32\,Active Directory monitor,,,,,,,1,10503,,8.0.2,,"MD5=947139F3BB2AB70CAF692A60C7A3A735,SHA256=94...",C:\Program Files\SplunkUniversalForwarder\bin\...,,,System,,0x8000000000000000,4,{61981517-E6CE-60C1-E703-000000000000},0x3e7,0,splunk-admon.exe,"""C:\Program Files\SplunkUniversalForwarder\bin...",C:\Program Files\SplunkUniversalForwarder\bin\...,{61981517-E765-60C1-A200-00000000C501},2592,,{61981517-EA7B-60C1-2D04-00000000C501},1964,216,splunk Application,,,,,-,,,,,,,,,,,,,,,2021-06-10T10:33:31.615452900Z,,,,,,1,0,2764,NT AUTHORITY\SYSTEM,S-1-5-18,2021-06-10 10:33:31.613,5,"<Event xmlns=""http://schemas.microsoft.com/win..."
36625,true,,Microsoft-Windows-Sysmon/Operational,,,win-host-977.attackrange.local,,,,,,,,,,,,23,10502,,,,"MD5=46BC361A1887CACE4474A09FC6497317,SHA256=69...",C:\Program Files\SplunkUniversalForwarder\bin\...,,,,false,0x8000000000000000,4,,,0,,,,,,,{61981517-E773-60C1-D900-00000000C501},1964,300,,,,,,-,,,,,,,,,,,,,,,2021-06-10T10:33:31.290829100Z,C:\Program Files\SplunkUniversalForwarder\var\...,,,,,23,,2764,NT AUTHORITY\SYSTEM,S-1-5-18,2021-06-10 10:33:31.284,5,"<Event xmlns=""http://schemas.microsoft.com/win..."


In [3]:
print(logdf.keys())

Index(['Archived', 'CallTrace', 'Channel', 'CommandLine', 'Company', 'Computer', 'Configuration', 'ConfigurationFileHash', 'CreationUtcTime', 'CurrentDirectory', 'Description', 'DestinationHostname', 'DestinationIp', 'DestinationIsIpv6', 'DestinationPort', 'DestinationPortName', 'Details', 'EventID', 'EventRecordID', 'EventType', 'FileVersion', 'GrantedAccess', 'Hashes', 'Image', 'ImageLoaded', 'Initiated', 'IntegrityLevel', 'IsExecutable', 'Keywords', 'Level', 'LogonGuid', 'LogonId', 'Opcode', 'OriginalFileName', 'ParentCommandLine', 'ParentImage', 'ParentProcessGuid', 'ParentProcessId', 'PipeName', 'ProcessGuid', 'ProcessID', 'ProcessId', 'Product', 'Protocol', 'QueryName', 'QueryResults', 'QueryStatus', 'RuleName', 'SchemaVersion', 'Signature', 'SignatureStatus', 'Signed', 'SourceHostname', 'SourceImage', 'SourceIp', 'SourceIsIpv6', 'SourcePort', 'SourcePortName', 'SourceProcessGUID', 'SourceProcessId', 'SourceThreadId', 'State', 'SystemTime', 'TargetFilename', 'TargetImage',
      

In [104]:
logdf

Unnamed: 0,Archived,CallTrace,Channel,CommandLine,Company,Computer,Configuration,ConfigurationFileHash,CreationUtcTime,CurrentDirectory,Description,DestinationHostname,DestinationIp,DestinationIsIpv6,DestinationPort,DestinationPortName,Details,EventID,EventRecordID,EventType,FileVersion,GrantedAccess,Hashes,Image,ImageLoaded,Initiated,IntegrityLevel,IsExecutable,Keywords,Level,LogonGuid,LogonId,Opcode,OriginalFileName,ParentCommandLine,ParentImage,ParentProcessGuid,ParentProcessId,PipeName,ProcessGuid,ProcessID,ProcessId,Product,Protocol,QueryName,QueryResults,QueryStatus,RuleName,SchemaVersion,Signature,SignatureStatus,Signed,SourceHostname,SourceImage,SourceIp,SourceIsIpv6,SourcePort,SourcePortName,SourceProcessGUID,SourceProcessId,SourceThreadId,State,SystemTime,TargetFilename,TargetImage,TargetObject,TargetProcessGUID,TargetProcessId,Task,TerminalSessionId,ThreadID,User,UserID,UtcTime,Version,raw
0,,,Microsoft-Windows-Sysmon/Operational,,,win-dc-365.attackrange.local,c:\Program Files\ansible\AttackRangeSysmon.xml,SHA256=0ABB62ECDB67B3213E4229F59E1901BD5CC01F4...,,,,,,,,,,16,1,,,,,,,,,,0x8000000000000000,4,,,0,,,,,,,,3688,,,,,,,,,,,,,,,,,,,,,,2021-06-10T10:05:25.470561000Z,,,,,,16,,2740,,S-1-5-21-986166657-4127868789-2511509191-500,2021-06-10 10:05:25.457,3,"<Event xmlns=""http://schemas.microsoft.com/win..."
1,true,,Microsoft-Windows-Sysmon/Operational,,,win-dc-365.attackrange.local,,,,,,,,,,,,23,38,,,,"MD5=446DD1CF97EABA21CF14D03AEBC79F27,SHA256=A7...",C:\Windows\System32\WindowsPowerShell\v1.0\pow...,,,,false,0x8000000000000000,4,,,0,,,,,,,{928AB1BB-E3E4-60C1-9B02-00000000C301},1012,3860,,,,,,-,,,,,,,,,,,,,,,2021-06-10T10:05:27.974799300Z,C:\Users\Administrator\AppData\Local\Microsoft...,,,,,23,,2156,ATTACKRANGE\Administrator,S-1-5-18,2021-06-10 10:05:27.971,5,"<Event xmlns=""http://schemas.microsoft.com/win..."
2,true,,Microsoft-Windows-Sysmon/Operational,,,win-dc-365.attackrange.local,,,,,,,,,,,,23,37,,,,"MD5=446DD1CF97EABA21CF14D03AEBC79F27,SHA256=A7...",C:\Windows\System32\WindowsPowerShell\v1.0\pow...,,,,false,0x8000000000000000,4,,,0,,,,,,,{928AB1BB-E3E4-60C1-9C02-00000000C301},1012,3580,,,,,,-,,,,,,,,,,,,,,,2021-06-10T10:05:27.949716500Z,C:\Users\Administrator\AppData\Local\Microsoft...,,,,,23,,2156,ATTACKRANGE\Administrator,S-1-5-18,2021-06-10 10:05:27.939,5,"<Event xmlns=""http://schemas.microsoft.com/win..."
3,,C:\Windows\SYSTEM32\ntdll.dll+a6134|C:\Windows...,Microsoft-Windows-Sysmon/Operational,,,win-dc-365.attackrange.local,,,,,,,,,,,,10,36,,,0x100000,,,,,,,0x8000000000000000,4,,,0,,,,,,,,1012,,,,,,,-,,,,,,C:\Windows\system32\svchost.exe,,,,,{928AB1BB-E35F-60C1-1300-00000000C301},964,1256,,2021-06-10T10:05:27.048285300Z,,C:\Windows\sysmon64.exe,,{928AB1BB-E3E5-60C1-A602-00000000C301},1012,10,,2156,,S-1-5-18,2021-06-10 10:05:27.044,3,"<Event xmlns=""http://schemas.microsoft.com/win..."
4,,,Microsoft-Windows-Sysmon/Operational,,,win-dc-365.attackrange.local,,,,,,,,,,,,17,35,CreatePipe,,,,C:\Windows\system32\svchost.exe,,,,,0x8000000000000000,4,,,0,,,,,,\PIPE_EVENTROOT\CIMV2SCM EVENT PROVIDER,{928AB1BB-E35F-60C1-1600-00000000C301},1012,1264,,,,,,-,,,,,,,,,,,,,,,2021-06-10T10:05:27.033076500Z,,,,,,17,,2156,,S-1-5-18,2021-06-10 10:05:27.028,1,"<Event xmlns=""http://schemas.microsoft.com/win..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36622,,C:\Windows\SYSTEM32\ntdll.dll+a6134|C:\Windows...,Microsoft-Windows-Sysmon/Operational,,,win-host-977.attackrange.local,,,,,,,,,,,,10,10505,,,0x1fffff,,,,,,,0x8000000000000000,4,,,0,,,,,,,,1964,,,,,,,-,,,,,,C:\Windows\system32\csrss.exe,,,,,{61981517-E6CE-60C1-0500-00000000C501},416,532,,2021-06-10T10:33:31.615810100Z,,C:\Program Files\SplunkUniversalForwarder\bin\...,,{61981517-EA7B-60C1-2D04-00000000C501},216,10,,2764,,S-1-5-18,2021-06-10 10:33:31.612,3,"<Event xmlns=""http://schemas.microsoft.com/win..."
36623,,C:\Windows\SYSTEM32\ntdll.dll+a7404|C:\Windows...,Microsoft-Windows-Sysmon/Operational,,,win-host-977.attackrange.local,,,,,,,,,,,,10,10504,,,0x1fffff,,,,,,,0x8000000000000000,4,,,0,,,,,,,,1964,,,,,,,-,,,,,,C:\Program Files\SplunkUniversalForwarder\bin\...,,,,,{61981517-E765-60C1-A200-00000000C501},2592,3496,,2021-06-10T10:33:31.615672000Z,,C:\Program Files\SplunkUniversalForwarder\bin\...,,{61981517-EA7B-60C1-2D04-00000000C501},216,10,,2764,,S-1-5-18,2021-06-10 10:33:31.612,3,"<Event xmlns=""http://schemas.microsoft.com/win..."
36624,,,Microsoft-Windows-Sysmon/Operational,"""C:\Program Files\SplunkUniversalForwarder\bin...",Splunk Inc.,win-host-977.attackrange.local,,,,C:\Windows\system32\,Active Directory monitor,,,,,,,1,10503,,8.0.2,,"MD5=947139F3BB2AB70CAF692A60C7A3A735,SHA256=94...",C:\Program Files\SplunkUniversalForwarder\bin\...,,,System,,0x8000000000000000,4,{61981517-E6CE-60C1-E703-000000000000},0x3e7,0,splunk-admon.exe,"""C:\Program Files\SplunkUniversalForwarder\bin...",C:\Program Files\SplunkUniversalForwarder\bin\...,{61981517-E765-60C1-A200-00000000C501},2592,,{61981517-EA7B-60C1-2D04-00000000C501},1964,216,splunk Application,,,,,-,,,,,,,,,,,,,,,2021-06-10T10:33:31.615452900Z,,,,,,1,0,2764,NT AUTHORITY\SYSTEM,S-1-5-18,2021-06-10 10:33:31.613,5,"<Event xmlns=""http://schemas.microsoft.com/win..."
36625,true,,Microsoft-Windows-Sysmon/Operational,,,win-host-977.attackrange.local,,,,,,,,,,,,23,10502,,,,"MD5=46BC361A1887CACE4474A09FC6497317,SHA256=69...",C:\Program Files\SplunkUniversalForwarder\bin\...,,,,false,0x8000000000000000,4,,,0,,,,,,,{61981517-E773-60C1-D900-00000000C501},1964,300,,,,,,-,,,,,,,,,,,,,,,2021-06-10T10:33:31.290829100Z,C:\Program Files\SplunkUniversalForwarder\var\...,,,,,23,,2764,NT AUTHORITY\SYSTEM,S-1-5-18,2021-06-10 10:33:31.284,5,"<Event xmlns=""http://schemas.microsoft.com/win..."


## Enrich Data

Pull out all the events which are establishing a network connection.  Gather information on the counts of connection as well as the standard deviation of these connections being established.

In [4]:
filtered_df = logdf[logdf['Image'].notnull()]
unique_df = pd.DataFrame()
#Get the apps that are being utilizing network connections
unique_df['Image'] = filtered_df['Image'].unique()
unique_df['conn_count'] = unique_df['Image'].apply(lambda x: logdf[(logdf['Image'] == x )]['DestinationIp'].notnull().count())
unique_df['std_dev'] = unique_df['Image'].apply(lambda x: pd.to_datetime(logdf[(logdf['Image'] == x )].UtcTime).std())
unique_df.sort_values('std_dev', ascending=False)

Unnamed: 0,Image,conn_count,std_dev
18,C:\Windows\ADWS\Microsoft.ActiveDirectory.WebS...,36,0 days 00:08:34.923011137
16,C:\Windows\System32\shutdown.exe,2,0 days 00:08:27.407805409
26,C:\Program Files\Amazon\SSM\ssm-agent-worker.exe,44,0 days 00:07:34.482688181
20,C:\Program Files\SplunkUniversalForwarder\bin\...,43,0 days 00:07:10.838878031
19,C:\Program Files\SplunkUniversalForwarder\bin\...,43,0 days 00:07:10.831703448
21,C:\Program Files\SplunkUniversalForwarder\bin\...,42,0 days 00:07:03.325961118
24,C:\Program Files\SplunkUniversalForwarder\bin\...,42,0 days 00:07:03.214183559
23,C:\Program Files\SplunkUniversalForwarder\bin\...,42,0 days 00:07:03.187321525
58,C:\Windows\System32\reg.exe,4,0 days 00:07:02.774191741
22,C:\Program Files\SplunkUniversalForwarder\bin\...,84,0 days 00:07:00.594955038


Lets look at how many unique systems have run each (this would have probably been more exciting with a different dataset).

In [111]:
#subdomain_df['frequency'] = subdomain_df['registered_domain'].apply(lambda x: zeek_df['registered_domain_str'].value_counts()[x] if pd.notnull(x) else x)
unique_df['executing_systems'] = unique_df['Image'].apply(lambda x: logdf[(logdf['Image'] == x )]['Computer'].unique())
unique_df['users'] = unique_df['Image'].apply(lambda x: logdf[(logdf['Image'] == x )]['User'].unique())
unique_df['TargetFilename'] = unique_df['Image'].apply(lambda x: logdf[(logdf['Image'] == x )]['TargetFilename'].unique())
unique_df['executing_number']=[len(i) for i in unique_df['executing_systems']]
unique_df['executing_users']=[len(i) for i in unique_df['users']]
unique_df['unique_target_file_count']=[len(i) for i in unique_df['TargetFilename']]
unique_df

Unnamed: 0,Image,conn_count,std_dev,executing_systems,executing_number,TargetFilename,executing_users,users,file_count,unique_target_file_count
0,C:\Windows\System32\WindowsPowerShell\v1.0\pow...,404,0 days 00:05:49.337953394,"[win-dc-365.attackrange.local, win-host-977.at...",2,[C:\Users\Administrator\AppData\Local\Microsof...,4,"[ATTACKRANGE\Administrator, nan, NT AUTHORITY\...",152,152
1,C:\Windows\system32\svchost.exe,494,0 days 00:04:25.049203005,"[win-dc-365.attackrange.local, win-host-977.at...",2,"[nan, C:\ProgramData\USOPrivate\UpdateStore\up...",3,"[nan, NT AUTHORITY\SYSTEM, NT AUTHORITY\NETWOR...",183,183
2,C:\Windows\System32\wbem\unsecapp.exe,4,0 days 00:06:09.144824930,"[win-dc-365.attackrange.local, win-host-977.at...",2,[nan],1,[NT AUTHORITY\SYSTEM],1,1
3,C:\Windows\sysmon64.exe,27,0 days 00:05:00.161317642,"[win-dc-365.attackrange.local, win-host-977.at...",2,[nan],2,"[NT AUTHORITY\SYSTEM, nan]",1,1
4,C:\Windows\System32\cmd.exe,127,0 days 00:05:32.998167310,"[win-dc-365.attackrange.local, win-host-977.at...",2,[nan],3,"[ATTACKRANGE\Administrator, NT AUTHORITY\SYSTE...",1,1
5,C:\Windows\System32\winrshost.exe,33,0 days 00:04:46.025937652,"[win-dc-365.attackrange.local, win-host-977.at...",2,[nan],2,"[ATTACKRANGE\Administrator, WIN-HOST-977\Admin...",1,1
6,C:\Program Files\SplunkUniversalForwarder\bin\...,3324,0 days 00:06:36.928211748,"[win-dc-365.attackrange.local, win-host-977.at...",2,[C:\Program Files\SplunkUniversalForwarder\var...,2,"[NT AUTHORITY\SYSTEM, nan]",9,9
7,C:\Windows\System32\dns.exe,483,0 days 00:05:53.854788148,[win-dc-365.attackrange.local],1,[nan],2,"[NT AUTHORITY\SYSTEM, nan]",1,1
8,C:\Windows\System32\svchost.exe,279,0 days 00:06:34.217010295,"[win-dc-365.attackrange.local, win-host-977.at...",2,"[nan, C:\Windows\System32\LogFiles\WMI\SUM.etl...",4,"[NT AUTHORITY\NETWORK SERVICE, NT AUTHORITY\SY...",22,22
9,C:\Windows\System32\msdtc.exe,4,0 days 00:06:28.638952300,"[win-dc-365.attackrange.local, win-host-977.at...",2,[nan],1,[NT AUTHORITY\NETWORK SERVICE],1,1


## Process Data
Work In Progress

In [None]:
from sklearn.ensemble import IsolationForest
from sklearn.cluster import KMeans

# Train/fit and Predict anomalous instances using the Isolation Forest model
features = ['conn_count','std_dev']
to_matrix = unique_df.values()
conn_matrix = to_matrix.fit_transform(unique_df[features])
print(conn_matrix.shape)
model=IsolationForest(n_estimators=50, max_samples='auto', contamination=float(0.1),max_features=1.0)
model.fit(unique_df[['conn_count']])

odd_clf = IsolationForest(contamination=0.2)  # Marking 20% as odd
odd_clf.fit(conn_matrix)

# Now we create a new dataframe using the prediction from our classifier
predictions = odd_clf.predict(conn_matrix)
odd_df = unique_df[features][predictions == -1]
display_df = unique_df[predictions == -1].copy()

# Now we're going to explore our odd observations with help from KMeans
odd_matrix = to_matrix.fit_transform(odd_df)
num_clusters = min(len(odd_df), 4)  # 4 clusters unless we have less than 4 observations
display_df['cluster'] = KMeans(n_clusters=num_clusters).fit_predict(odd_matrix)

odd_matrix = to_matrix.fit_transform(odd_df)
num_clusters = min(len(odd_df), 4)  # 4 clusters unless we have less than 4 observations
display_df['cluster'] = KMeans(n_clusters=num_clusters).fit_predict(odd_matrix)
print(odd_matrix.shape)

features += ['Image']
cluster_groups = display_df[features+['cluster']].groupby('cluster')

## Print Data

Lets see if we see anything by mapping the connections over time.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(200,20))
sns.countplot(x='UtcTime', hue='Image', \
             data=logdf[logdf['DestinationIp'].notnull()])
plt.show()