# Building a Neo4j Knowledge Graph for MITRE ATT&CK

Loosely adapted from https://graphacademy.neo4j.com/courses/llm-knowledge-graph-construction/
We do a lot of data importing and cleaning


# Tactics

We use pandas for checking and cleaning the data, and openpyxl for reading .xlsx files.

In [1]:
# !pip3 install pandas
# !pip3 install openpyxl

In [2]:
!ls

[0m[01;34mclean[0m  [01;34mcsv[0m  neo4j.pw  notebook.ipynb  [01;34mxlsx[0m


In [3]:
!ls ./xlsx

enterprise-attack-v16.1-campaigns.xlsx
enterprise-attack-v16.1-datasources.xlsx
enterprise-attack-v16.1-groups.xlsx
enterprise-attack-v16.1-matrices.xlsx
enterprise-attack-v16.1-mitigations.xlsx
enterprise-attack-v16.1-relationships.xlsx
enterprise-attack-v16.1-software.xlsx
enterprise-attack-v16.1-tactics.xlsx
enterprise-attack-v16.1-techniques.xlsx


In [4]:
import pandas as pd

In [5]:
with pd.ExcelFile("./xlsx/enterprise-attack-v16.1-tactics.xlsx") as xls:  
    tactics = pd.read_excel(xls)
    print(xls.sheet_names)
    # Only one sheet.

['tactics']


In [6]:
tactics.head()

Unnamed: 0,ID,STIX ID,name,description,url,created,last modified,domain,version
0,TA0009,x-mitre-tactic--d108ce10-2419-4cf9-a774-46161d...,Collection,The adversary is trying to gather data of inte...,https://attack.mitre.org/tactics/TA0009,17 October 2018,05 September 2024,enterprise-attack,1.1
1,TA0011,x-mitre-tactic--f72804c5-f15a-449e-a5da-2eecd1...,Command and Control,The adversary is trying to communicate with co...,https://attack.mitre.org/tactics/TA0011,17 October 2018,19 July 2019,enterprise-attack,1.0
2,TA0006,x-mitre-tactic--2558fd61-8c75-4730-94c4-11926d...,Credential Access,The adversary is trying to steal account names...,https://attack.mitre.org/tactics/TA0006,17 October 2018,19 July 2019,enterprise-attack,1.0
3,TA0005,x-mitre-tactic--78b23412-0651-46d7-a540-170a1c...,Defense Evasion,The adversary is trying to avoid being detecte...,https://attack.mitre.org/tactics/TA0005,17 October 2018,19 July 2019,enterprise-attack,1.0
4,TA0007,x-mitre-tactic--c17c5845-175e-4421-9713-829d05...,Discovery,The adversary is trying to figure out your env...,https://attack.mitre.org/tactics/TA0007,17 October 2018,19 July 2019,enterprise-attack,1.0


We only care about: ID, name, description

In [7]:
tactics = tactics.drop(columns=["STIX ID", "url", "domain", "version", "last modified", "created"])

Write to ./clean/

In [8]:
with open("./clean/tactics-tactics.csv", mode="w") as f:
    f.write(tactics.to_csv(index='False'))

Check

# Techniques

In [9]:
xls = pd.ExcelFile("./xlsx/enterprise-attack-v16.1-techniques.xlsx")
print(xls.sheet_names)

['techniques', 'procedure examples', 'associated mitigations', 'citations']


Four sheets: techniques, procedure examples, associated mitigations, citations

### Techniques Inspection and Cleaning

In [10]:
techniques_techniques = pd.read_excel(xls, sheet_name='techniques')

In [11]:
techniques_techniques.head()

Unnamed: 0,ID,STIX ID,name,description,url,created,last modified,domain,version,tactics,...,is sub-technique,sub-technique of,defenses bypassed,contributors,permissions required,supports remote,system requirements,impact type,effective permissions,relationship citations
0,T1548,attack-pattern--67720091-eee3-4d2d-ae16-826456...,Abuse Elevation Control Mechanism,Adversaries may circumvent mechanisms designed...,https://attack.mitre.org/techniques/T1548,30 January 2020,15 October 2024,enterprise-attack,1.4,"Defense Evasion, Privilege Escalation",...,False,,,,"Administrator, User",,,,,"(Citation: TrendMicro RaspberryRobin 2022),(Ci..."
1,T1548.002,attack-pattern--120d5519-3098-4e1c-9191-2aa612...,Abuse Elevation Control Mechanism: Bypass User...,Adversaries may bypass UAC mechanisms to eleva...,https://attack.mitre.org/techniques/T1548/002,30 January 2020,21 April 2023,enterprise-attack,2.1,"Defense Evasion, Privilege Escalation",...,True,T1548,Windows User Account Control,Casey Smith; Stefan Kanthak,"Administrator, User",,,,Administrator,"(Citation: McAfee Honeybee),(Citation: BitDefe..."
2,T1548.004,attack-pattern--b84903f0-c7d5-435d-a69e-de47cc...,Abuse Elevation Control Mechanism: Elevated Ex...,Adversaries may leverage the <code>Authorizati...,https://attack.mitre.org/techniques/T1548/004,30 January 2020,19 October 2022,enterprise-attack,1.0,"Defense Evasion, Privilege Escalation",...,True,T1548,,"Erika Noerenberg, @gutterchurl, Carbon Black; ...","Administrator, User",,,,root,"(Citation: Carbon Black Shlayer Feb 2019),"
3,T1548.001,attack-pattern--6831414d-bb70-42b7-8030-d4e06b...,Abuse Elevation Control Mechanism: Setuid and ...,An adversary may abuse configurations where an...,https://attack.mitre.org/techniques/T1548/001,30 January 2020,15 March 2023,enterprise-attack,1.1,"Defense Evasion, Privilege Escalation",...,True,T1548,,,User,,,,,"(Citation: OSX Keydnap malware),(Citation: ANS..."
4,T1548.003,attack-pattern--1365fe3b-0f50-455d-b4da-266ce3...,Abuse Elevation Control Mechanism: Sudo and Su...,Adversaries may perform sudo caching and/or us...,https://attack.mitre.org/techniques/T1548/003,30 January 2020,14 March 2022,enterprise-attack,1.0,"Defense Evasion, Privilege Escalation",...,True,T1548,,,User,,,,root,"(Citation: hexed osx.dok analysis 2019),(Citat..."


In [12]:
techniques_techniques = techniques_techniques.drop(columns=['STIX ID', 'url', 'created', 'last modified', 'domain', 'version', 'contributors', 'relationship citations'])

In [13]:
techniques_techniques.columns

Index(['ID', 'name', 'description', 'tactics', 'detection', 'platforms',
       'data sources', 'is sub-technique', 'sub-technique of',
       'defenses bypassed', 'permissions required', 'supports remote',
       'system requirements', 'impact type', 'effective permissions'],
      dtype='object')

### The annoying thing is that we have "is sub-technique", and "sub-technique of", so we need to make a dataframe for subtechniques and a dataframe for techniques proper

In [14]:
subtechniques_df = techniques_techniques[techniques_techniques["is sub-technique"] == True] 

In [15]:
techniques_df = techniques_techniques[techniques_techniques["is sub-technique"] != True]

In [16]:
subtechniques_df.count()

ID                       453
name                     453
description              453
tactics                  453
detection                390
platforms                453
data sources             419
is sub-technique         453
sub-technique of         453
defenses bypassed         72
permissions required     104
supports remote           23
system requirements       23
impact type               18
effective permissions     15
dtype: int64

In [17]:
techniques_df.count()

ID                       203
name                     203
description              203
tactics                  203
detection                193
platforms                203
data sources             197
is sub-technique         203
sub-technique of           0
defenses bypassed         32
permissions required      18
supports remote           14
system requirements       13
impact type               14
effective permissions      4
dtype: int64

In [18]:
techniques_techniques.count()

ID                       656
name                     656
description              656
tactics                  656
detection                583
platforms                656
data sources             616
is sub-technique         656
sub-technique of         453
defenses bypassed        104
permissions required     122
supports remote           37
system requirements       36
impact type               32
effective permissions     19
dtype: int64

In [19]:
453 + 203

656

### Check columns again

In [20]:
techniques_df.head()

Unnamed: 0,ID,name,description,tactics,detection,platforms,data sources,is sub-technique,sub-technique of,defenses bypassed,permissions required,supports remote,system requirements,impact type,effective permissions
0,T1548,Abuse Elevation Control Mechanism,Adversaries may circumvent mechanisms designed...,"Defense Evasion, Privilege Escalation",Monitor the file system for files that have th...,"IaaS, Identity Provider, Linux, Office Suite, ...","Command: Command Execution, File: File Metadat...",False,,,"Administrator, User",,,,
7,T1134,Access Token Manipulation,Adversaries may modify access tokens to operat...,"Defense Evasion, Privilege Escalation",If an adversary is using a standard command-li...,Windows,Active Directory: Active Directory Object Modi...,False,,"Heuristic Detection, Host Forensic Analysis, S...","Administrator, User",,,,SYSTEM
13,T1531,Account Access Removal,Adversaries may interrupt availability of syst...,Impact,Use process monitoring to monitor the executio...,"IaaS, Linux, Office Suite, SaaS, Windows, macOS",Active Directory: Active Directory Object Modi...,False,,,,,,Availability,
14,T1087,Account Discovery,Adversaries may attempt to get a listing of va...,Discovery,System and network discovery techniques normal...,"IaaS, Identity Provider, Linux, Office Suite, ...","Command: Command Execution, File: File Access,...",False,,,,,,,
19,T1098,Account Manipulation,Adversaries may manipulate accounts to maintai...,"Persistence, Privilege Escalation",Collect events that correlate with changes to ...,"Containers, IaaS, Identity Provider, Linux, Ne...",Active Directory: Active Directory Object Modi...,False,,,,,,,


### The columns: `supports remote, system requirements, impact type, effective permissions` have too many NaN, so we drop them :]

In [21]:
techniques_df = techniques_df.drop(columns=['supports remote','system requirements', 'impact type', 'effective permissions'])

In [22]:
subtechniques_df = subtechniques_df.drop(columns=['supports remote','system requirements', 'impact type', 'effective permissions'])

In [23]:
techniques_df.head()

Unnamed: 0,ID,name,description,tactics,detection,platforms,data sources,is sub-technique,sub-technique of,defenses bypassed,permissions required
0,T1548,Abuse Elevation Control Mechanism,Adversaries may circumvent mechanisms designed...,"Defense Evasion, Privilege Escalation",Monitor the file system for files that have th...,"IaaS, Identity Provider, Linux, Office Suite, ...","Command: Command Execution, File: File Metadat...",False,,,"Administrator, User"
7,T1134,Access Token Manipulation,Adversaries may modify access tokens to operat...,"Defense Evasion, Privilege Escalation",If an adversary is using a standard command-li...,Windows,Active Directory: Active Directory Object Modi...,False,,"Heuristic Detection, Host Forensic Analysis, S...","Administrator, User"
13,T1531,Account Access Removal,Adversaries may interrupt availability of syst...,Impact,Use process monitoring to monitor the executio...,"IaaS, Linux, Office Suite, SaaS, Windows, macOS",Active Directory: Active Directory Object Modi...,False,,,
14,T1087,Account Discovery,Adversaries may attempt to get a listing of va...,Discovery,System and network discovery techniques normal...,"IaaS, Identity Provider, Linux, Office Suite, ...","Command: Command Execution, File: File Access,...",False,,,
19,T1098,Account Manipulation,Adversaries may manipulate accounts to maintai...,"Persistence, Privilege Escalation",Collect events that correlate with changes to ...,"Containers, IaaS, Identity Provider, Linux, Ne...",Active Directory: Active Directory Object Modi...,False,,,


In [24]:
techniques_df.drop(columns=['is sub-technique', 'sub-technique of'])

Unnamed: 0,ID,name,description,tactics,detection,platforms,data sources,defenses bypassed,permissions required
0,T1548,Abuse Elevation Control Mechanism,Adversaries may circumvent mechanisms designed...,"Defense Evasion, Privilege Escalation",Monitor the file system for files that have th...,"IaaS, Identity Provider, Linux, Office Suite, ...","Command: Command Execution, File: File Metadat...",,"Administrator, User"
7,T1134,Access Token Manipulation,Adversaries may modify access tokens to operat...,"Defense Evasion, Privilege Escalation",If an adversary is using a standard command-li...,Windows,Active Directory: Active Directory Object Modi...,"Heuristic Detection, Host Forensic Analysis, S...","Administrator, User"
13,T1531,Account Access Removal,Adversaries may interrupt availability of syst...,Impact,Use process monitoring to monitor the executio...,"IaaS, Linux, Office Suite, SaaS, Windows, macOS",Active Directory: Active Directory Object Modi...,,
14,T1087,Account Discovery,Adversaries may attempt to get a listing of va...,Discovery,System and network discovery techniques normal...,"IaaS, Identity Provider, Linux, Office Suite, ...","Command: Command Execution, File: File Access,...",,
19,T1098,Account Manipulation,Adversaries may manipulate accounts to maintai...,"Persistence, Privilege Escalation",Collect events that correlate with changes to ...,"Containers, IaaS, Identity Provider, Linux, Ne...",Active Directory: Active Directory Object Modi...,,
...,...,...,...,...,...,...,...,...,...
643,T1497,Virtualization/Sandbox Evasion,Adversaries may employ various means to detect...,"Defense Evasion, Discovery","Virtualization, sandbox, user activity, and re...","Linux, Windows, macOS","Command: Command Execution, Process: OS API Ex...","Anti-virus, Host forensic analysis, Signature-...",
647,T1600,Weaken Encryption,Adversaries may compromise a network device’s ...,Defense Evasion,There is no documented method for defenders to...,Network,File: File Modification,Encryption,Administrator
650,T1102,Web Service,"Adversaries may use an existing, legitimate ex...",Command and Control,Host data that can relate unknown or suspiciou...,"Linux, Windows, macOS","Network Traffic: Network Connection Creation, ...",,
654,T1047,Windows Management Instrumentation,Adversaries may abuse Windows Management Instr...,Execution,Monitor network traffic for WMI connections; t...,Windows,"Command: Command Execution, Network Traffic: N...",,


## Creating a subtechniques relationship table

In [25]:
subtechniques_relationship = subtechniques_df[["ID", "sub-technique of"]].rename(
    columns={"ID": "subtechnique_id", "subtechnique_of": "parent_technique_id"}
)

subtechniques_relationship["relationship"] = "sub-technique"

In [26]:
subtechniques_relationship.head()

Unnamed: 0,subtechnique_id,sub-technique of,relationship
1,T1548.002,T1548,sub-technique
2,T1548.004,T1548,sub-technique
3,T1548.001,T1548,sub-technique
4,T1548.003,T1548,sub-technique
5,T1548.006,T1548,sub-technique


In [27]:
subtechniques_relationship.count()

subtechnique_id     453
sub-technique of    453
relationship        453
dtype: int64

## Final changes to subtechniques_df. Note that we could have kept the "tactics" column but for hierarchy purposes we remove it. And data sources is in relationships.xlsx so we don't care for now

In [28]:
subtechniques_df = subtechniques_df.drop(columns=["tactics", "is sub-technique", "sub-technique of", "data sources"])

In [29]:
subtechniques_df.head()

Unnamed: 0,ID,name,description,detection,platforms,defenses bypassed,permissions required
1,T1548.002,Abuse Elevation Control Mechanism: Bypass User...,Adversaries may bypass UAC mechanisms to eleva...,There are many ways to perform UAC bypasses wh...,Windows,Windows User Account Control,"Administrator, User"
2,T1548.004,Abuse Elevation Control Mechanism: Elevated Ex...,Adversaries may leverage the <code>Authorizati...,Consider monitoring for <code>/usr/libexec/sec...,macOS,,"Administrator, User"
3,T1548.001,Abuse Elevation Control Mechanism: Setuid and ...,An adversary may abuse configurations where an...,Monitor the file system for files that have th...,"Linux, macOS",,User
4,T1548.003,Abuse Elevation Control Mechanism: Sudo and Su...,Adversaries may perform sudo caching and/or us...,"On Linux, auditd can alert every time a user's...","Linux, macOS",,User
5,T1548.006,Abuse Elevation Control Mechanism: TCC Manipul...,Adversaries can manipulate or abuse the Transp...,,macOS,,


## Save :)

In [30]:
with open("./clean/subtechniques.csv", "w") as f:
    f.write(subtechniques_df.to_csv(index='False'))

In [31]:
with open("./clean/subtechniques_relationship.csv", "w") as f:
    f.write(subtechniques_relationship.to_csv(index='False'))

### TODO: platforms, data sources, defenses bypassed, permissions required - make relationship tables
### For now we keep in csv but ignore when importing to neo4j

## Creating a tactics-techniques relationship table. Notice how "tactics" is a list :D

In [32]:
techniques_df.head()

Unnamed: 0,ID,name,description,tactics,detection,platforms,data sources,is sub-technique,sub-technique of,defenses bypassed,permissions required
0,T1548,Abuse Elevation Control Mechanism,Adversaries may circumvent mechanisms designed...,"Defense Evasion, Privilege Escalation",Monitor the file system for files that have th...,"IaaS, Identity Provider, Linux, Office Suite, ...","Command: Command Execution, File: File Metadat...",False,,,"Administrator, User"
7,T1134,Access Token Manipulation,Adversaries may modify access tokens to operat...,"Defense Evasion, Privilege Escalation",If an adversary is using a standard command-li...,Windows,Active Directory: Active Directory Object Modi...,False,,"Heuristic Detection, Host Forensic Analysis, S...","Administrator, User"
13,T1531,Account Access Removal,Adversaries may interrupt availability of syst...,Impact,Use process monitoring to monitor the executio...,"IaaS, Linux, Office Suite, SaaS, Windows, macOS",Active Directory: Active Directory Object Modi...,False,,,
14,T1087,Account Discovery,Adversaries may attempt to get a listing of va...,Discovery,System and network discovery techniques normal...,"IaaS, Identity Provider, Linux, Office Suite, ...","Command: Command Execution, File: File Access,...",False,,,
19,T1098,Account Manipulation,Adversaries may manipulate accounts to maintai...,"Persistence, Privilege Escalation",Collect events that correlate with changes to ...,"Containers, IaaS, Identity Provider, Linux, Ne...",Active Directory: Active Directory Object Modi...,False,,,


In [33]:
tactics_relationship = techniques_df.assign(tactics=techniques_df["tactics"].str.split(",")).explode("tactics")

tactics_relationship = tactics_relationship[["ID", "tactics"]].rename(
    columns={"ID": "technique_id", "tactics": "tactic_id"}
)

tactics_relationship["relationship"] = "is tactic of"

In [34]:
tactics_relationship.head()

Unnamed: 0,technique_id,tactic_id,relationship
0,T1548,Defense Evasion,is tactic of
0,T1548,Privilege Escalation,is tactic of
7,T1134,Defense Evasion,is tactic of
7,T1134,Privilege Escalation,is tactic of
13,T1531,Impact,is tactic of


In [35]:
tactic_name_to_id = tactics.set_index("name")["ID"].to_dict()

In [36]:
tactics_relationship["tactic_id"] = tactics_relationship["tactic_id"].apply(
    lambda name: tactic_name_to_id[name.strip()]
)

In [37]:
tactics_relationship.head()

Unnamed: 0,technique_id,tactic_id,relationship
0,T1548,TA0005,is tactic of
0,T1548,TA0004,is tactic of
7,T1134,TA0005,is tactic of
7,T1134,TA0004,is tactic of
13,T1531,TA0040,is tactic of


In [38]:
tactics_relationship.count()

technique_id    236
tactic_id       236
relationship    236
dtype: int64

In [39]:
techniques_df.count()

ID                      203
name                    203
description             203
tactics                 203
detection               193
platforms               203
data sources            197
is sub-technique        203
sub-technique of          0
defenses bypassed        32
permissions required     18
dtype: int64

## Clean and save

In [40]:
techniques_df = techniques_df.drop(columns=["tactics", "data sources", "is sub-technique", "sub-technique of"])

In [41]:
techniques_df.columns

Index(['ID', 'name', 'description', 'detection', 'platforms',
       'defenses bypassed', 'permissions required'],
      dtype='object')

In [42]:
with open("./clean/tactics_relationship.csv", "w") as f:
    f.write(tactics_relationship.to_csv(index='False'))

In [43]:
with open("./clean/techniques.csv", "w") as f:
    f.write(techniques_df.to_csv(index='False'))

# Mitigations

In [44]:
!ls ./xlsx

enterprise-attack-v16.1-campaigns.xlsx
enterprise-attack-v16.1-datasources.xlsx
enterprise-attack-v16.1-groups.xlsx
enterprise-attack-v16.1-matrices.xlsx
enterprise-attack-v16.1-mitigations.xlsx
enterprise-attack-v16.1-relationships.xlsx
enterprise-attack-v16.1-software.xlsx
enterprise-attack-v16.1-tactics.xlsx
enterprise-attack-v16.1-techniques.xlsx


In [45]:
xls = pd.ExcelFile("./xlsx/enterprise-attack-v16.1-mitigations.xlsx") 
mitigations = pd.read_excel(xls)
print(xls.sheet_names)

['mitigations', 'techniques addressed', 'citations']


In [46]:
mitigations_df = pd.read_excel(xls, sheet_name='mitigations')
mitigations_to_techniques_df = pd.read_excel(xls, sheet_name ='techniques addressed')

## Mitigations

In [47]:
mitigations_df.head()

Unnamed: 0,ID,STIX ID,name,description,url,created,last modified,domain,version,relationship citations
0,M1036,course-of-action--f9f9e6ef-bc0a-41ad-ba11-0924...,Account Use Policies,Configure features related to account use like...,https://attack.mitre.org/mitigations/M1036,11 June 2019,21 October 2022,enterprise-attack,1.0,"(Citation: Okta Block Anonymizing Services),(C..."
1,M1015,course-of-action--e3388c78-2a8d-47c2-8422-c139...,Active Directory Configuration,Implement robust Active Directory configuratio...,https://attack.mitre.org/mitigations/M1015,06 June 2019,08 October 2024,enterprise-attack,1.2,"(Citation: AdSecurity DCSync Sept 2015),(Citat..."
2,M1049,course-of-action--a6a47a06-08fc-4ec4-bdc3-2037...,Antivirus/Antimalware,Use signatures or heuristics to detect malicio...,https://attack.mitre.org/mitigations/M1049,11 June 2019,31 March 2020,enterprise-attack,1.1,"(Citation: SourceForge rkhunter),(Citation: Ch..."
3,M1013,course-of-action--25dc1ce8-eb55-4333-ae30-a7cb...,Application Developer Guidance,This mitigation describes any guidance or trai...,https://attack.mitre.org/mitigations/M1013,25 October 2017,27 September 2023,"enterprise-attack,mobile-attack",1.1,"(Citation: Comparitech Replay Attack),(Citatio..."
4,M1048,course-of-action--b9f0c069-abbe-4a07-a245-2481...,Application Isolation and Sandboxing,Restrict execution of code to a virtual enviro...,https://attack.mitre.org/mitigations/M1048,11 June 2019,31 March 2020,enterprise-attack,1.1,(Citation: Windows Blogs Microsoft Edge Sandbo...


In [48]:
mitigations_df = mitigations_df.drop(columns=['STIX ID', 'url', 'created', 'last modified', 'domain', 'version', 'relationship citations'])

In [49]:
mitigations_df.head()

Unnamed: 0,ID,name,description
0,M1036,Account Use Policies,Configure features related to account use like...
1,M1015,Active Directory Configuration,Implement robust Active Directory configuratio...
2,M1049,Antivirus/Antimalware,Use signatures or heuristics to detect malicio...
3,M1013,Application Developer Guidance,This mitigation describes any guidance or trai...
4,M1048,Application Isolation and Sandboxing,Restrict execution of code to a virtual enviro...


In [50]:
with open("./clean/mitigations.csv", "w") as f:
    f.write(mitigations_df.to_csv(index='False'))

## Mitigations to techniques relationship

In [51]:
mitigations_to_techniques_df.head()

Unnamed: 0,source ID,source name,source ref,source type,mapping type,target ID,target name,target ref,target type,mapping description,STIX ID,created,last modified
0,M1036,Account Use Policies,course-of-action--f9f9e6ef-bc0a-41ad-ba11-0924...,mitigation,mitigates,T1550.001,Application Access Token,attack-pattern--f005e783-57d4-4837-88ad-dbe7fa...,technique,"Where possible, consider restricting the use o...",relationship--41f072bf-ec21-4a83-bb25-6c84fce8...,16 October 2024,16 October 2024
1,M1036,Account Use Policies,course-of-action--f9f9e6ef-bc0a-41ad-ba11-0924...,mitigation,mitigates,T1110,Brute Force,attack-pattern--a93494bb-4b80-4ea1-8695-3236a4...,technique,Set account lockout policies after a certain n...,relationship--c1a6c86e-5d5d-4cf1-845e-1660d9c1...,13 June 2019,28 May 2024
2,M1036,Account Use Policies,course-of-action--f9f9e6ef-bc0a-41ad-ba11-0924...,mitigation,mitigates,T1078.004,Cloud Accounts,attack-pattern--f232fa7a-025c-4d43-abc7-318e81...,technique,Use conditional access policies to block login...,relationship--b93fd13f-b11b-4f4b-9d77-977e7af4...,21 February 2023,16 March 2023
3,M1036,Account Use Policies,course-of-action--f9f9e6ef-bc0a-41ad-ba11-0924...,mitigation,mitigates,T1110.004,Credential Stuffing,attack-pattern--b2d03cea-aec1-45ca-9744-9ee583...,technique,Set account lockout policies after a certain n...,relationship--4b7e0525-1ba7-4d55-89d1-07fc9419...,20 February 2020,28 May 2024
4,M1036,Account Use Policies,course-of-action--f9f9e6ef-bc0a-41ad-ba11-0924...,mitigation,mitigates,T1621,Multi-Factor Authentication Request Generation,attack-pattern--954a1639-f2d6-407d-aef3-491762...,technique,Enable account restrictions to prevent login a...,relationship--a09cb233-bdff-4d89-bcc3-7fb21089...,01 April 2022,21 February 2023


In [52]:
mitigations_to_techniques_df = mitigations_to_techniques_df.drop(columns=['source ref', 'source type', 'target ref', 'STIX ID', 'created', 'last modified'])

In [53]:
mitigations_to_techniques_df.head()

Unnamed: 0,source ID,source name,mapping type,target ID,target name,target type,mapping description
0,M1036,Account Use Policies,mitigates,T1550.001,Application Access Token,technique,"Where possible, consider restricting the use o..."
1,M1036,Account Use Policies,mitigates,T1110,Brute Force,technique,Set account lockout policies after a certain n...
2,M1036,Account Use Policies,mitigates,T1078.004,Cloud Accounts,technique,Use conditional access policies to block login...
3,M1036,Account Use Policies,mitigates,T1110.004,Credential Stuffing,technique,Set account lockout policies after a certain n...
4,M1036,Account Use Policies,mitigates,T1621,Multi-Factor Authentication Request Generation,technique,Enable account restrictions to prevent login a...


In [54]:
print(mitigations_to_techniques_df['mapping type'].unique())

['mitigates']


In [55]:
print(mitigations_to_techniques_df['target type'].unique())

['technique']


### No need to keep "source name", "target name" then!

In [56]:
mitigations_to_techniques_df = mitigations_to_techniques_df.drop(columns=['source name', 'target name', 'target type', 'mapping type'])

In [57]:
mitigations_to_techniques_df.head()

Unnamed: 0,source ID,target ID,mapping description
0,M1036,T1550.001,"Where possible, consider restricting the use o..."
1,M1036,T1110,Set account lockout policies after a certain n...
2,M1036,T1078.004,Use conditional access policies to block login...
3,M1036,T1110.004,Set account lockout policies after a certain n...
4,M1036,T1621,Enable account restrictions to prevent login a...


### Seperate the mitigations-subtechniques relationship from the mitigations-techniques relationship

In [58]:
mitigations_to_subtechniques_df = mitigations_to_techniques_df[mitigations_to_techniques_df["target ID"].str.contains(r"\.", na=False)]
mitigations_to_techniques_df = mitigations_to_techniques_df[~mitigations_to_techniques_df["target ID"].str.contains(r"\.", na=False)]

In [59]:
mitigations_to_subtechniques_df.head()

Unnamed: 0,source ID,target ID,mapping description
0,M1036,T1550.001,"Where possible, consider restricting the use o..."
2,M1036,T1078.004,Use conditional access policies to block login...
3,M1036,T1110.004,Set account lockout policies after a certain n...
5,M1036,T1110.001,Set account lockout policies after a certain n...
6,M1036,T1110.003,Set account lockout policies after a certain n...


In [60]:
mitigations_to_techniques_df.head()

Unnamed: 0,source ID,target ID,mapping description
1,M1036,T1110,Set account lockout policies after a certain n...
4,M1036,T1621,Enable account restrictions to prevent login a...
7,M1036,T1648,"Where possible, consider restricting access to..."
8,M1036,T1550,"Where possible, consider restricting the use o..."
9,M1036,T1078,Use conditional access policies to block login...


In [61]:
with open("./clean/mitigations_to_techniques.csv", "w") as f:
    f.write(mitigations_to_techniques_df.to_csv(index='False'))

In [62]:
with open("./clean/mitigations_to_subtechniques.csv", "w") as f:
    f.write(mitigations_to_subtechniques_df.to_csv(index='False'))

# Campaigns, Groups, Software

## Campaigns

In [63]:
!ls ./xlsx

enterprise-attack-v16.1-campaigns.xlsx
enterprise-attack-v16.1-datasources.xlsx
enterprise-attack-v16.1-groups.xlsx
enterprise-attack-v16.1-matrices.xlsx
enterprise-attack-v16.1-mitigations.xlsx
enterprise-attack-v16.1-relationships.xlsx
enterprise-attack-v16.1-software.xlsx
enterprise-attack-v16.1-tactics.xlsx
enterprise-attack-v16.1-techniques.xlsx


In [64]:
xls = pd.ExcelFile("./xlsx/enterprise-attack-v16.1-campaigns.xlsx") 
campaigns = pd.read_excel(xls)
print(xls.sheet_names)

['campaigns', 'associated software', 'techniques used', 'attributed groups', 'citations']


In [65]:
campaigns_df = pd.read_excel(xls, sheet_name="campaigns")

In [66]:
campaigns_software_df = pd.read_excel(xls, sheet_name="associated software")
campaigns_groups_df = pd.read_excel(xls, sheet_name="attributed groups")
campaigns_techniques_df = pd.read_excel(xls, sheet_name="techniques used")

In [67]:
campaigns_df.head()

Unnamed: 0,ID,STIX ID,name,description,url,created,last modified,domain,version,associated campaigns,associated campaigns citations,first seen,first seen citation,last seen,last seen citation,contributors,relationship citations
0,C0028,campaign--46421788-b6e1-4256-b351-f8beffd1afba,2015 Ukraine Electric Power Attack,[2015 Ukraine Electric Power Attack](https://a...,https://attack.mitre.org/campaigns/C0028,27 September 2023,06 October 2023,"ics-attack,enterprise-attack",1.0,,,01 December 2015,(Citation: Booz Allen Hamilton),01 January 2016,(Citation: Booz Allen Hamilton),,"(Citation: Booz Allen Hamilton),(Citation: Cha..."
1,C0025,campaign--aa73efef-1418-4dbe-b43c-87a498e97234,2016 Ukraine Electric Power Attack,[2016 Ukraine Electric Power Attack](https://a...,https://attack.mitre.org/campaigns/C0025,31 March 2023,10 April 2023,"enterprise-attack,ics-attack",1.0,,,01 December 2016,(Citation: ESET Industroyer)(Citation: Dragos ...,01 December 2016,(Citation: ESET Industroyer)(Citation: Dragos ...,,"(Citation: Dragos Crashoverride 2018),(Citatio..."
2,C0034,campaign--df8eb785-70f8-4300-b444-277ba849083d,2022 Ukraine Electric Power Attack,The [2022 Ukraine Electric Power Attack](https...,https://attack.mitre.org/campaigns/C0034,27 March 2024,10 April 2024,"enterprise-attack,ics-attack",1.0,,,01 June 2022,(Citation: Mandiant-Sandworm-Ukraine-2022),01 October 2022,(Citation: Mandiant-Sandworm-Ukraine-2022),,"(Citation: Mandiant-Sandworm-Ukraine-2022),(Ci..."
3,C0040,campaign--add4d9de-1256-4166-83b8-57087288dced,APT41 DUST,[APT41 DUST](https://attack.mitre.org/campaign...,https://attack.mitre.org/campaigns/C0040,16 September 2024,21 September 2024,enterprise-attack,1.0,,,31 January 2023,(Citation: Google Cloud APT41 2024),30 June 2024,(Citation: Google Cloud APT41 2024),,"(Citation: Google Cloud APT41 2024),(Citation:..."
4,C0010,campaign--ab747e62-1bcb-479f-a26b-1cd39d413d81,C0010,[C0010](https://attack.mitre.org/campaigns/C00...,https://attack.mitre.org/campaigns/C0010,21 September 2022,04 October 2022,enterprise-attack,1.0,,,01 December 2020,(Citation: Mandiant UNC3890 Aug 2022),01 August 2022,(Citation: Mandiant UNC3890 Aug 2022),,"(Citation: Mandiant UNC3890 Aug 2022),(Citatio..."


In [68]:
cols = ['ID', 'name', 'description', 'associated campaigns', 'first seen', 'last seen']
campaigns_df = campaigns_df.loc[:, cols]

In [69]:
campaigns_df.head()

Unnamed: 0,ID,name,description,associated campaigns,first seen,last seen
0,C0028,2015 Ukraine Electric Power Attack,[2015 Ukraine Electric Power Attack](https://a...,,01 December 2015,01 January 2016
1,C0025,2016 Ukraine Electric Power Attack,[2016 Ukraine Electric Power Attack](https://a...,,01 December 2016,01 December 2016
2,C0034,2022 Ukraine Electric Power Attack,The [2022 Ukraine Electric Power Attack](https...,,01 June 2022,01 October 2022
3,C0040,APT41 DUST,[APT41 DUST](https://attack.mitre.org/campaign...,,31 January 2023,30 June 2024
4,C0010,C0010,[C0010](https://attack.mitre.org/campaigns/C00...,,01 December 2020,01 August 2022


In [70]:
campaigns_df['associated campaigns'].unique()

array([nan, 'Operation Interception, Operation North Star'], dtype=object)

#### associated_campaigns isn't very useful

In [71]:
campaigns_df = campaigns_df.drop(columns=['associated campaigns'])

### Campaigns relationships

#### groups

In [72]:
campaigns_groups_df.head()

Unnamed: 0,source ID,source name,source ref,source type,mapping type,target ID,target name,target ref,target type,mapping description,STIX ID,created,last modified
0,C0028,2015 Ukraine Electric Power Attack,campaign--46421788-b6e1-4256-b351-f8beffd1afba,campaign,attributed-to,G0034,Sandworm Team,intrusion-set--381fcf73-60f6-4ab2-9991-6af3cbc...,group,(Citation: Andy Greenberg June 2017) (Citation...,relationship--4d407dda-944a-4974-b1c2-0a04d2c9...,27 September 2023,27 September 2023
1,C0025,2016 Ukraine Electric Power Attack,campaign--aa73efef-1418-4dbe-b43c-87a498e97234,campaign,attributed-to,G0034,Sandworm Team,intrusion-set--381fcf73-60f6-4ab2-9991-6af3cbc...,group,(Citation: US District Court Indictment GRU Un...,relationship--90647f03-38a4-4364-a3af-53640a81...,31 March 2023,31 March 2023
2,C0034,2022 Ukraine Electric Power Attack,campaign--df8eb785-70f8-4300-b444-277ba849083d,campaign,attributed-to,G0034,Sandworm Team,intrusion-set--381fcf73-60f6-4ab2-9991-6af3cbc...,group,(Citation: Mandiant-Sandworm-Ukraine-2022)(Cit...,relationship--d3717846-eaab-4fde-99f6-a972dec9...,27 March 2024,10 April 2024
3,C0040,APT41 DUST,campaign--add4d9de-1256-4166-83b8-57087288dced,campaign,attributed-to,G0096,APT41,intrusion-set--18854f55-ac7c-4634-bd9a-352dd07...,group,[APT41 DUST](https://attack.mitre.org/campaign...,relationship--31aebdb8-ce4d-4d31-a144-7a5c354b...,16 September 2024,16 September 2024
4,C0011,C0011,campaign--b4e5a4a9-f3be-4631-ba8f-da6ebb067fac,campaign,attributed-to,G0134,Transparent Tribe,intrusion-set--e44e0985-bc65-4a8f-b578-211c858...,group,(Citation: Cisco Talos Transparent Tribe Educa...,relationship--751e795e-7c1a-4ba1-bb20-636aed02...,22 September 2022,22 September 2022


In [73]:
campaigns_groups_df['target type'].unique()

array(['group'], dtype=object)

In [74]:
campaigns_groups_df['mapping type'].unique()

array(['attributed-to'], dtype=object)

#### only retain ID->ID, mapping desc

In [75]:
cols = ['source ID', 'target ID', 'mapping description']
campaigns_groups_df = campaigns_groups_df.loc[:,cols]

In [76]:
campaigns_groups_df.head()

Unnamed: 0,source ID,target ID,mapping description
0,C0028,G0034,(Citation: Andy Greenberg June 2017) (Citation...
1,C0025,G0034,(Citation: US District Court Indictment GRU Un...
2,C0034,G0034,(Citation: Mandiant-Sandworm-Ukraine-2022)(Cit...
3,C0040,G0096,[APT41 DUST](https://attack.mitre.org/campaign...
4,C0011,G0134,(Citation: Cisco Talos Transparent Tribe Educa...


#### techniques

In [77]:
campaigns_techniques_df.head()

Unnamed: 0,source ID,source name,source ref,source type,mapping type,target ID,target name,target ref,target type,mapping description,STIX ID,created,last modified
0,C0028,2015 Ukraine Electric Power Attack,campaign--46421788-b6e1-4256-b351-f8beffd1afba,campaign,uses,T1562.001,Disable or Modify Tools,attack-pattern--ac08589e-ee59-4935-8667-d845e3...,technique,During the [2015 Ukraine Electric Power Attack...,relationship--f53b248d-da29-4805-b714-29db3ce9...,04 October 2023,04 October 2023
1,C0028,2015 Ukraine Electric Power Attack,campaign--46421788-b6e1-4256-b351-f8beffd1afba,campaign,uses,T1136.002,Domain Account,attack-pattern--7610cada-1499-41a4-b3dd-46467b...,technique,During the [2015 Ukraine Electric Power Attack...,relationship--8a14f269-467e-49f8-a53f-ba1ad7fe...,27 September 2023,27 September 2023
2,C0028,2015 Ukraine Electric Power Attack,campaign--46421788-b6e1-4256-b351-f8beffd1afba,campaign,uses,T1133,External Remote Services,attack-pattern--10d51417-ee35-4589-b1ff-b6df1c...,technique,During the [2015 Ukraine Electric Power Attack...,relationship--62ed20e9-07e5-4e22-a8dd-b93ab897...,27 September 2023,27 September 2023
3,C0028,2015 Ukraine Electric Power Attack,campaign--46421788-b6e1-4256-b351-f8beffd1afba,campaign,uses,T1070.004,File Deletion,attack-pattern--d63a3fb8-9452-4e9d-a60a-54be68...,technique,During the [2015 Ukraine Electric Power Attack...,relationship--4d9acf03-bd3e-458b-bcea-6d1b5767...,27 September 2023,02 October 2023
4,C0028,2015 Ukraine Electric Power Attack,campaign--46421788-b6e1-4256-b351-f8beffd1afba,campaign,uses,T1105,Ingress Tool Transfer,attack-pattern--e6919abc-99f9-4c6c-95a5-14761e...,technique,During the [2015 Ukraine Electric Power Attack...,relationship--7d12ca01-c979-4fdf-8548-9da12c10...,27 September 2023,27 September 2023


In [78]:
cols = ['source ID', 'target ID', 'mapping description']
campaigns_techniques_df = campaigns_techniques_df.loc[:,cols]

In [79]:
campaigns_subtechniques_df = campaigns_techniques_df[campaigns_techniques_df["target ID"].str.contains(r"\.", na=False)]
campaigns_subtechniques_df.head()

Unnamed: 0,source ID,target ID,mapping description
0,C0028,T1562.001,During the [2015 Ukraine Electric Power Attack...
1,C0028,T1136.002,During the [2015 Ukraine Electric Power Attack...
3,C0028,T1070.004,During the [2015 Ukraine Electric Power Attack...
5,C0028,T1056.001,During the [2015 Ukraine Electric Power Attack...
7,C0028,T1204.002,During the [2015 Ukraine Electric Power Attack...


In [80]:
campaigns_techniques_df = campaigns_techniques_df[~campaigns_techniques_df["target ID"].str.contains(r"\.", na=False)]
campaigns_techniques_df.head()

Unnamed: 0,source ID,target ID,mapping description
2,C0028,T1133,During the [2015 Ukraine Electric Power Attack...
4,C0028,T1105,During the [2015 Ukraine Electric Power Attack...
6,C0028,T1570,During the [2015 Ukraine Electric Power Attack...
8,C0028,T1112,During the [2015 Ukraine Electric Power Attack...
9,C0028,T1040,During the [2015 Ukraine Electric Power Attack...


In [81]:
with open("./clean/campaigns_techniques.csv", "w") as f:
    f.write(campaigns_techniques_df.to_csv(index='False'))

In [82]:
with open("./clean/campaigns_df.csv", "w") as f:
    f.write(campaigns_df.to_csv(index='False'))

In [83]:
with open("./clean/campaigns_subtechniques.csv", "w") as f:
    f.write(campaigns_subtechniques_df.to_csv(index='False'))

### software 

In [84]:
campaigns_software_df.head()

Unnamed: 0,source ID,source name,source ref,source type,mapping type,target ID,target name,target ref,target type,mapping description,STIX ID,created,last modified
0,C0028,2015 Ukraine Electric Power Attack,campaign--46421788-b6e1-4256-b351-f8beffd1afba,campaign,uses,S0089,BlackEnergy,malware--54cc1d4f-5c53-4f0e-9ef5-11b4998e82e4,software,(Citation: Booz Allen Hamilton),relationship--c8e78d6f-ac9d-4ad3-ae13-238f1eb4...,27 September 2023,27 September 2023
1,C0028,2015 Ukraine Electric Power Attack,campaign--46421788-b6e1-4256-b351-f8beffd1afba,campaign,uses,S0607,KillDisk,malware--e221eb77-1502-4129-af1d-fe1ad55e7ec6,software,(Citation: Booz Allen Hamilton),relationship--1dad5efc-395f-4b92-8f4f-3e987a4d...,27 September 2023,27 September 2023
2,C0025,2016 Ukraine Electric Power Attack,campaign--aa73efef-1418-4dbe-b43c-87a498e97234,campaign,uses,S0604,Industroyer,malware--e401d4fe-f0c9-44f0-98e6-f93487678808,software,Within the [2016 Ukraine Electric Power Attack...,relationship--57e8711a-9aae-4a22-94d4-f4c8a3a8...,31 March 2023,07 April 2023
3,C0034,2022 Ukraine Electric Power Attack,campaign--df8eb785-70f8-4300-b444-277ba849083d,campaign,uses,S0693,CaddyWiper,malware--b30d999d-64e0-4e35-9856-884e4b83d611,software,(Citation: Mandiant-Sandworm-Ukraine-2022),relationship--f0128f0b-be94-4d1e-a61e-d66cb722...,27 March 2024,28 March 2024
4,C0040,APT41 DUST,campaign--add4d9de-1256-4166-83b8-57087288dced,campaign,uses,S0154,Cobalt Strike,malware--a7881f21-e978-4fe4-af56-92c9416a2616,software,[Cobalt Strike](https://attack.mitre.org/softw...,relationship--55617b29-c1da-4519-87ea-97902bf3...,16 September 2024,16 September 2024


In [85]:
cols = ['source ID', 'target ID', 'mapping description']
campaigns_software_df = campaigns_software_df.loc[:,cols]

In [86]:
campaigns_software_df.head()

Unnamed: 0,source ID,target ID,mapping description
0,C0028,S0089,(Citation: Booz Allen Hamilton)
1,C0028,S0607,(Citation: Booz Allen Hamilton)
2,C0025,S0604,Within the [2016 Ukraine Electric Power Attack...
3,C0034,S0693,(Citation: Mandiant-Sandworm-Ukraine-2022)
4,C0040,S0154,[Cobalt Strike](https://attack.mitre.org/softw...


In [87]:
with open("./clean/campaigns_software.csv", "w") as f:
    f.write(campaigns_software_df.to_csv(index='False'))

In [88]:
## Groups

## Groups

In [90]:
xls = pd.ExcelFile("./xlsx/enterprise-attack-v16.1-groups.xlsx") 
campaigns = pd.read_excel(xls)
print(xls.sheet_names)

['groups', 'associated software', 'techniques used', 'attributed campaigns', 'citations']


In [93]:
groups_software_df = pd.read_excel(xls, sheet_name="associated software")
groups_campaigns_df = pd.read_excel(xls, sheet_name="attributed campaigns")
groups_techniques_df = pd.read_excel(xls, sheet_name="techniques used")


SyntaxError: invalid syntax (856884881.py, line 4)

### Is this redundant? Let's check