Explore all the data that can be provided by `mitreattack.attackToExcel.stixToDf` and `enterprise-attack.json`

In [1]:
from stix2 import MemoryStore
import mitreattack.attackToExcel.stixToDf as stixToDf
import pandas as pd
import os
file_path = '../data/raw/enterprise-attack.json'
attackdata = MemoryStore ()
attackdata.load_from_file (file_path)

Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
NumExpr defaulting to 8 threads.


In [2]:
techniques = stixToDf.techniquesToDf(attackdata, 'enterprise-attack')
groups = stixToDf.groupsToDf(attackdata)
tactics = stixToDf.tacticsToDf(attackdata)
software = stixToDf.softwareToDf(attackdata)
mitigations = stixToDf.mitigationsToDf(attackdata)
relationships = stixToDf.relationshipsToDf(attackdata)

parsing techniques: 100%|██████████| 607/607 [00:00<00:00, 1436.01it/s]
parsing relationships for type=technique: 100%|██████████| 16530/16530 [00:00<00:00, 73261.52it/s]
parsing groups: 100%|██████████| 136/136 [00:00<00:00, 29218.12it/s]
parsing relationships for type=group: 100%|██████████| 16530/16530 [00:00<00:00, 91165.42it/s]
parsing tactics: 100%|██████████| 14/14 [00:00<00:00, 27737.49it/s]
parsing software: 100%|██████████| 635/635 [00:00<00:00, 31079.43it/s]
parsing relationships for type=software: 100%|██████████| 16530/16530 [00:00<00:00, 76154.11it/s]
parsing mitigations: 100%|██████████| 43/43 [00:00<00:00, 42860.05it/s]
parsing relationships for type=mitigation: 100%|██████████| 16530/16530 [00:00<00:00, 108463.88it/s]
parsing all relationships: 100%|██████████| 16530/16530 [00:00<00:00, 74451.10it/s]


---
# 1- Techniques data

In [3]:
techniques.keys()

dict_keys(['techniques', 'procedure examples', 'associated mitigations', 'citations'])

## 1-1 Techniques-Techniques

In [4]:
techniques_df = techniques['techniques']

In [5]:
techniques_df.head()

Unnamed: 0,ID,name,description,url,created,last modified,version,tactics,detection,platforms,...,is sub-technique,sub-technique of,defenses bypassed,contributors,permissions required,supports remote,system requirements,impact type,effective permissions,relationship citations
256,T1548,Abuse Elevation Control Mechanism,Adversaries may circumvent mechanisms designed...,https://attack.mitre.org/techniques/T1548,30 January 2020,21 April 2023,1.1,"Defense Evasion, Privilege Escalation",Monitor the file system for files that have th...,"Linux, Windows, macOS",...,False,,,,"Administrator, User",,,,,",(Citation: Github UACMe)"
50,T1548.002,Abuse Elevation Control Mechanism: Bypass User...,Adversaries may bypass UAC mechanisms to eleva...,https://attack.mitre.org/techniques/T1548/002,30 January 2020,21 April 2023,2.1,"Defense Evasion, Privilege Escalation",There are many ways to perform UAC bypasses wh...,Windows,...,True,T1548,Windows User Account Control,Casey Smith; Stefan Kanthak,"Administrator, User",,,,Administrator,"(Citation: ESET EvilNum July 2020),(Citation: ..."
433,T1548.004,Abuse Elevation Control Mechanism: Elevated Ex...,Adversaries may leverage the <code>Authorizati...,https://attack.mitre.org/techniques/T1548/004,30 January 2020,19 October 2022,1.0,"Defense Evasion, Privilege Escalation",Consider monitoring for <code>/usr/libexec/sec...,macOS,...,True,T1548,,"Erika Noerenberg, @gutterchurl, Carbon Black; ...","Administrator, User",,,,root,"(Citation: Carbon Black Shlayer Feb 2019),"
258,T1548.001,Abuse Elevation Control Mechanism: Setuid and ...,An adversary may abuse configurations where an...,https://attack.mitre.org/techniques/T1548/001,30 January 2020,15 March 2023,1.1,"Defense Evasion, Privilege Escalation",Monitor the file system for files that have th...,"Linux, macOS",...,True,T1548,,,User,,,,,"(Citation: OSX Keydnap malware),(Citation: ANS..."
52,T1548.003,Abuse Elevation Control Mechanism: Sudo and Su...,Adversaries may perform sudo caching and/or us...,https://attack.mitre.org/techniques/T1548/003,30 January 2020,14 March 2022,1.0,"Defense Evasion, Privilege Escalation","On Linux, auditd can alert every time a user's...","Linux, macOS",...,True,T1548,,,User,,,,root,"(Citation: objsee mac malware 2017),(Citation:..."


In [6]:
techniques_df.columns

Index(['ID', 'name', 'description', 'url', 'created', 'last modified',
       'version', 'tactics', 'detection', 'platforms', 'data sources',
       'is sub-technique', 'sub-technique of', 'defenses bypassed',
       'contributors', 'permissions required', 'supports remote',
       'system requirements', 'impact type', 'effective permissions',
       'relationship citations'],
      dtype='object')

In [7]:
techniques_df['detection'].isnull().sum()

0

In [8]:
import numpy as np
techniques_df_tmp = techniques_df['detection'].replace('', np.nan)

In [9]:
techniques_df_tmp.isnull().sum()

28

In [10]:
techniques_df['detection']

256    Monitor the file system for files that have th...
50     There are many ways to perform UAC bypasses wh...
433    Consider monitoring for <code>/usr/libexec/sec...
258    Monitor the file system for files that have th...
52     On Linux, auditd can alert every time a user's...
                             ...                        
447    Host data that can relate unknown or suspiciou...
589    Host data that can relate unknown or suspiciou...
374    Host data that can relate unknown or suspiciou...
5      Monitor network traffic for WMI connections; t...
557    Use process monitoring to monitor the executio...
Name: detection, Length: 607, dtype: object

In [11]:
type (techniques_df['ID'])

pandas.core.series.Series

## 1-2 Techniques-Mitigations


In [12]:
techniques_mitigations_df = techniques['associated mitigations']

In [13]:
techniques_mitigations_df['source ID'].value_counts()

source ID
M1026    105
M1018     89
M1056     82
M1047     78
M1038     66
M1042     59
M1022     56
M1031     55
M1040     46
M1027     44
M1017     41
M1032     41
M1037     40
M1028     39
M1030     35
M1051     32
M1041     29
M1054     24
M1021     23
M1045     20
M1024     20
M1035     16
M1049     15
M1015     14
M1048     12
M1029     11
M1050     11
M1046     11
M1057      9
M1053      9
M1013      9
M1043      8
M1052      7
M1025      7
M1036      7
M1033      6
M1016      5
M1020      4
M1019      4
M1034      4
M1044      3
M1039      2
M1055      2
Name: count, dtype: int64

In [14]:
techniques_procedure_examples_df =  techniques['procedure examples']

In [70]:
techniques_procedure_examples_df['source type'].value_counts()

source type
software    8406
group       3052
campaign     526
Name: count, dtype: int64

---
# 2- Groups data

In [15]:
groups.keys()

dict_keys(['groups', 'associated software', 'techniques used', 'attributed campaigns', 'citations'])

## 2-1 Groups-Groups

In [16]:
groups_df = groups['groups']

In [17]:
groups_df['ID'].count()

136

In [18]:
groups_df.head()

Unnamed: 0,ID,name,description,url,created,last modified,version,contributors,associated groups,associated groups citations,relationship citations
47,G0099,APT-C-36,[APT-C-36](https://attack.mitre.org/groups/G00...,https://attack.mitre.org/groups/G0099,05 May 2020,26 May 2021,1.1,Jose Luis Sánchez Martinez,Blind Eagle,(Citation: QiAnXin APT-C-36 Feb2019),"(Citation: QiAnXin APT-C-36 Feb2019),(Citation..."
13,G0006,APT1,[APT1](https://attack.mitre.org/groups/G0006) ...,https://attack.mitre.org/groups/G0006,31 May 2017,26 May 2021,1.4,,"Comment Crew, Comment Group, Comment Panda","(Citation: Mandiant APT1), (Citation: Mandiant...","(Citation: Mandiant APT1),(Citation: McAfee Oc..."
96,G0005,APT12,[APT12](https://attack.mitre.org/groups/G0005)...,https://attack.mitre.org/groups/G0005,31 May 2017,30 March 2020,2.1,,"DNSCALC, DynCalc, IXESHE, Numbered Panda","(Citation: Moran 2014), (Citation: Meyers Numb...","(Citation: Moran 2013),(Citation: Moran 2014),..."
129,G0023,APT16,[APT16](https://attack.mitre.org/groups/G0023)...,https://attack.mitre.org/groups/G0023,31 May 2017,26 July 2022,1.1,,,,"(Citation: FireEye EPS Awakens Part 2),(Citati..."
130,G0025,APT17,[APT17](https://attack.mitre.org/groups/G0025)...,https://attack.mitre.org/groups/G0025,31 May 2017,13 October 2020,1.1,,Deputy Dog,(Citation: FireEye APT17),"(Citation: FireEye APT17),(Citation: FireEye A..."


## 2-2 Group-Softwares

In [19]:
groups_software_df = groups['associated software']

In [20]:
groups_software_df['source ID'].nunique()

124

In [21]:
groups_software_df.count()

source ID              830
source name            830
source type            830
mapping type           830
target ID              830
target name            830
target type            830
mapping description    828
dtype: int64

In [22]:
for col in groups_df.columns: 
    print (col)

ID
name
description
url
created
last modified
version
contributors
associated groups
associated groups citations
relationship citations


## 2-3 ❗Groups-Techniques

In [23]:
groups_techniques_df = groups['techniques used']

In [24]:
groups_techniques_df['mapping description'].head()

3167    [APT-C-36](https://attack.mitre.org/groups/G00...
2096    [APT-C-36](https://attack.mitre.org/groups/G00...
2093    [APT-C-36](https://attack.mitre.org/groups/G00...
65      [APT-C-36](https://attack.mitre.org/groups/G00...
2230    [APT-C-36](https://attack.mitre.org/groups/G00...
Name: mapping description, dtype: object

In [25]:
groups_techniques_df.count()

source ID              3052
source name            3052
source type            3052
mapping type           3052
target ID              3052
target name            3052
target type            3052
mapping description    3052
dtype: int64

### ❗Number of unique techniques that are used by more than one group

In [26]:
all_technique_IDs = techniques_df['ID']
all_technique_IDs.count()

607

In [27]:
groups_techniques_df['target ID'].nunique()

388

In [28]:
type (groups_techniques_df['target ID'].unique())

numpy.ndarray

In [29]:
used_techniques = groups_techniques_df['target ID'].unique()
not_used_techniques = all_technique_IDs [~all_technique_IDs.isin (used_techniques)]

In [30]:
not_used_techniques

256        T1548
433    T1548.004
258    T1548.001
52     T1548.003
349    T1134.003
         ...    
465    T1550.004
197    T1497.003
86         T1600
314    T1600.002
162    T1600.001
Name: ID, Length: 219, dtype: object

## 2-4 Groups-Campaigns

In [31]:
groups_campaigns_df = groups['attributed campaigns']

In [32]:
groups_campaigns_df.count()

source ID              6
source name            6
source type            6
mapping type           6
target ID              6
target name            6
target type            6
mapping description    6
dtype: int64

---
# 3- Tactics data


In [33]:
tactics.keys()

dict_keys(['tactics'])

In [34]:
tactics_df = tactics['tactics']

In [35]:
for col in tactics_df.columns:
    print (col)

ID
name
description
url
created
last modified
version


In [75]:
tactics_df.sort_index()

Unnamed: 0,ID,name,description,url,created,last modified,version
0,TA0006,Credential Access,The adversary is trying to steal account names...,https://attack.mitre.org/tactics/TA0006,17 October 2018,19 July 2019,1.0
1,TA0002,Execution,The adversary is trying to run malicious code....,https://attack.mitre.org/tactics/TA0002,17 October 2018,19 July 2019,1.0
2,TA0040,Impact,"The adversary is trying to manipulate, interru...",https://attack.mitre.org/tactics/TA0040,14 March 2019,25 July 2019,1.0
3,TA0003,Persistence,The adversary is trying to maintain their foot...,https://attack.mitre.org/tactics/TA0003,17 October 2018,19 July 2019,1.0
4,TA0004,Privilege Escalation,The adversary is trying to gain higher-level p...,https://attack.mitre.org/tactics/TA0004,17 October 2018,06 January 2021,1.0
5,TA0008,Lateral Movement,The adversary is trying to move through your e...,https://attack.mitre.org/tactics/TA0008,17 October 2018,19 July 2019,1.0
6,TA0005,Defense Evasion,The adversary is trying to avoid being detecte...,https://attack.mitre.org/tactics/TA0005,17 October 2018,19 July 2019,1.0
7,TA0010,Exfiltration,The adversary is trying to steal data.\n\nExfi...,https://attack.mitre.org/tactics/TA0010,17 October 2018,19 July 2019,1.0
8,TA0007,Discovery,The adversary is trying to figure out your env...,https://attack.mitre.org/tactics/TA0007,17 October 2018,19 July 2019,1.0
9,TA0009,Collection,The adversary is trying to gather data of inte...,https://attack.mitre.org/tactics/TA0009,17 October 2018,19 July 2019,1.0


---
# 4- Software data

In [36]:
software.keys()

dict_keys(['software', 'associated groups', 'associated campaigns', 'techniques used', 'citations'])

## 4-1 Software-Software

In [37]:
software_df = software['software']

In [38]:
software_df.head()

Unnamed: 0,ID,name,description,url,created,last modified,version,contributors,platforms,aliases,type,relationship citations
350,S0066,3PARA RAT,[3PARA RAT](https://attack.mitre.org/software/...,https://attack.mitre.org/software/S0066,31 May 2017,30 March 2020,1.1,,Windows,,malware,"(Citation: CrowdStrike Putter Panda),,(Citatio..."
393,S0065,4H RAT,[4H RAT](https://attack.mitre.org/software/S00...,https://attack.mitre.org/software/S0065,31 May 2017,30 March 2020,1.1,,Windows,,malware,"(Citation: CrowdStrike Putter Panda),,(Citatio..."
13,S0677,AADInternals,[AADInternals](https://attack.mitre.org/softwa...,https://attack.mitre.org/software/S0677,01 February 2022,15 April 2023,1.2,,"Azure AD, Office 365, Windows",,tool,"(Citation: MSTIC Nobelium Oct 2021),,(Citation..."
437,S0469,ABK,[ABK](https://attack.mitre.org/software/S0469)...,https://attack.mitre.org/software/S0469,10 June 2020,24 June 2020,1.0,,Windows,,malware,"(Citation: Trend Micro Tick November 2019),,(C..."
624,S0045,ADVSTORESHELL,[ADVSTORESHELL](https://attack.mitre.org/softw...,https://attack.mitre.org/software/S0045,31 May 2017,30 March 2020,1.1,,Windows,"AZZY, EVILTOSS, NETUI, Sedreco",malware,"(Citation: Securelist Sofacy Feb 2018),(Citati..."


In [39]:
software_techniques_df = software['techniques used']

In [40]:
software_techniques_df.head()

Unnamed: 0,source ID,source name,source type,mapping type,target ID,target name,target type,mapping description
4672,S0066,3PARA RAT,software,uses,T1083,File and Directory Discovery,technique,[3PARA RAT](https://attack.mitre.org/software/...
8960,S0066,3PARA RAT,software,uses,T1573.001,Symmetric Cryptography,technique,[3PARA RAT](https://attack.mitre.org/software/...
7203,S0066,3PARA RAT,software,uses,T1070.006,Timestomp,technique,[3PARA RAT](https://attack.mitre.org/software/...
5111,S0066,3PARA RAT,software,uses,T1071.001,Web Protocols,technique,[3PARA RAT](https://attack.mitre.org/software/...
6918,S0065,4H RAT,software,uses,T1083,File and Directory Discovery,technique,[4H RAT](https://attack.mitre.org/software/S00...


In [41]:
software_techniques_df[software_techniques_df['target ID'] == 'T1071.001']

Unnamed: 0,source ID,source name,source type,mapping type,target ID,target name,target type,mapping description
5111,S0066,3PARA RAT,software,uses,T1071.001,Web Protocols,technique,[3PARA RAT](https://attack.mitre.org/software/...
2847,S0065,4H RAT,software,uses,T1071.001,Web Protocols,technique,[4H RAT](https://attack.mitre.org/software/S00...
8032,S0469,ABK,software,uses,T1071.001,Web Protocols,technique,[ABK](https://attack.mitre.org/software/S0469)...
8782,S0045,ADVSTORESHELL,software,uses,T1071.001,Web Protocols,technique,[ADVSTORESHELL](https://attack.mitre.org/softw...
7821,S1028,Action RAT,software,uses,T1071.001,Web Protocols,technique,[Action RAT](https://attack.mitre.org/software...
...,...,...,...,...,...,...,...,...
2993,S0068,httpclient,software,uses,T1071.001,Web Protocols,technique,[httpclient](https://attack.mitre.org/software...
1959,S1059,metaMain,software,uses,T1071.001,Web Protocols,technique,[metaMain](https://attack.mitre.org/software/S...
3651,S0385,njRAT,software,uses,T1071.001,Web Protocols,technique,[njRAT](https://attack.mitre.org/software/S038...
4165,S0067,pngdowner,software,uses,T1071.001,Web Protocols,technique,[pngdowner](https://attack.mitre.org/software/...


---
# 5- Mitigations data

In [42]:
mitigations.keys()

dict_keys(['mitigations', 'techniques addressed', 'citations'])

## 5-1 Mitigations-Mitigations

In [43]:
mitigations_df = mitigations['mitigations']

In [44]:
mitigations_df.head()

Unnamed: 0,ID,name,description,url,created,last modified,version,relationship citations
41,M1036,Account Use Policies,Configure features related to account use like...,https://attack.mitre.org/mitigations/M1036,11 June 2019,21 October 2022,1.0,(Citation: Microsoft Common Conditional Access...
37,M1015,Active Directory Configuration,Configure Active Directory to prevent use of c...,https://attack.mitre.org/mitigations/M1015,06 June 2019,29 May 2020,1.1,"(Citation: ADSecurity Mimikatz DCSync),(Citati..."
31,M1049,Antivirus/Antimalware,Use signatures or heuristics to detect malicio...,https://attack.mitre.org/mitigations/M1049,11 June 2019,31 March 2020,1.1,"(Citation: Microsoft AMSI June 2015),(Citation..."
7,M1013,Application Developer Guidance,This mitigation describes any guidance or trai...,https://attack.mitre.org/mitigations/M1013,25 October 2017,17 October 2018,1.0,"(Citation: Apple App Security Overview),(Citat..."
34,M1048,Application Isolation and Sandboxing,Restrict execution of code to a virtual enviro...,https://attack.mitre.org/mitigations/M1048,11 June 2019,31 March 2020,1.1,"(Citation: Kubernetes Hardening Guide),(Citati..."


In [45]:
mitigations_techniques_df = mitigations['techniques addressed']

In [46]:
mitigations_techniques_df.count()

source ID              1200
source name            1200
source type            1200
mapping type           1200
target ID              1200
target name            1200
target type            1200
mapping description    1200
dtype: int64

---
# 6- Relationships data

In [47]:
relationships.keys()

dict_keys(['relationships', 'citations'])

In [48]:
relationships_df = relationships['relationships']

In [49]:
relationships_df['mapping type'].value_counts()

mapping type
uses             12900
detects           1876
mitigates         1200
attributed-to        6
Name: count, dtype: int64

## 6-1 Relationships - `uses` mapping

In [50]:
rela_uses_df = relationships_df[relationships_df['mapping type'] =='uses']

In [71]:
rela_uses_df['source type'].value_counts()

source type
software    8406
group       3882
campaign     612
Name: count, dtype: int64

In [51]:
rela_software_techniques_df = relationships_df[(relationships_df['mapping type'] =='uses') & (relationships_df['source type'] =='software')]

In [52]:
rela_software_techniques_df.head()

Unnamed: 0,source ID,source name,source type,mapping type,target ID,target name,target type,mapping description
8084,S0066,3PARA RAT,software,uses,T1083,File and Directory Discovery,technique,[3PARA RAT](https://attack.mitre.org/software/...
15350,S0066,3PARA RAT,software,uses,T1573.001,Symmetric Cryptography,technique,[3PARA RAT](https://attack.mitre.org/software/...
12385,S0066,3PARA RAT,software,uses,T1070.006,Timestomp,technique,[3PARA RAT](https://attack.mitre.org/software/...
8800,S0066,3PARA RAT,software,uses,T1071.001,Web Protocols,technique,[3PARA RAT](https://attack.mitre.org/software/...
11915,S0065,4H RAT,software,uses,T1083,File and Directory Discovery,technique,[4H RAT](https://attack.mitre.org/software/S00...


In [53]:
rela_software_techniques_df['target ID'].nunique()

406

In [54]:
rela_uses_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 12900 entries, 2208 to 4683
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   source ID            12900 non-null  object
 1   source name          12900 non-null  object
 2   source type          12900 non-null  object
 3   mapping type         12900 non-null  object
 4   target ID            12900 non-null  object
 5   target name          12900 non-null  object
 6   target type          12900 non-null  object
 7   mapping description  12898 non-null  object
dtypes: object(8)
memory usage: 907.0+ KB


In [55]:
rela_uses_df['source type'].value_counts()

source type
software    8406
group       3882
campaign     612
Name: count, dtype: int64

In [56]:
rela_uses_df[rela_uses_df['source type'] == 'campaign']['target type'].value_counts()

target type
technique    526
software      86
Name: count, dtype: int64

## 6-2 Relationships - `detects` mapping

In [57]:
rela_detects_df = relationships_df[relationships_df['mapping type'] == 'detects']

In [58]:
rela_detects_df.head()


Unnamed: 0,source ID,source name,source type,mapping type,target ID,target name,target type,mapping description
8038,,Active DNS,datacomponent,detects,T1583,Acquire Infrastructure,technique,Monitor for queried domain name system (DNS) r...
1158,,Active DNS,datacomponent,detects,T1584,Compromise Infrastructure,technique,Monitor for queried domain name system (DNS) r...
1286,,Active DNS,datacomponent,detects,T1584.002,DNS Server,technique,Monitor for queried domain name system (DNS) r...
5513,,Active DNS,datacomponent,detects,T1584.001,Domains,technique,Monitor for queried domain name system (DNS) r...
6724,,Active DNS,datacomponent,detects,T1583.001,Domains,technique,Monitor queried domain name system (DNS) regis...


In [59]:
rela_detects_df['target ID'].nunique()

568

## 6-3 Relationships - `mitigates` mapping

In [60]:
rela_mitigates_df = relationships_df[relationships_df['mapping type'] == 'mitigates']

In [61]:
rela_mitigates_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1200 entries, 8321 to 10405
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   source ID            1200 non-null   object
 1   source name          1200 non-null   object
 2   source type          1200 non-null   object
 3   mapping type         1200 non-null   object
 4   target ID            1200 non-null   object
 5   target name          1200 non-null   object
 6   target type          1200 non-null   object
 7   mapping description  1200 non-null   object
dtypes: object(8)
memory usage: 84.4+ KB


## 6-4 Relationships - `attributed-to` mapping

In [62]:
rela_attributed_df = relationships_df[relationships_df['mapping type'] == 'attributed-to']

In [63]:
rela_attributed_df['target type'].value_counts()

target type
group    6
Name: count, dtype: int64