# MITRE ATT&CK data fetch

references:
- https://github.com/mitre-attack/mitreattack-python/tree/master/mitreattack/attackToExcel
- https://stix2.readthedocs.io/en/latest/api/datastore/stix2.datastore.memory.html
- https://mitreattack-python.readthedocs.io/en/latest/
---
there are two ways to fetch the data:
1. download `enterprise-attack.json` then load it (like in this note)
2. directly fetch from the server with `mitreattack.attackToExcel.attackToExcel.get_stix_data("enterprise-attack")` (see n0.ipynb)
---

In [1]:
# import mitreattack.attackToExcel.attackToExcel as attackToExcel
import mitreattack.attackToExcel.stixToDf as stixToDf
from stix2 import MemoryStore
attackdata = MemoryStore ()
attackdata.load_from_file ( 'data/enterprise-attack.json')


ModuleNotFoundError: No module named 'mitreattack'

## 1 Technique lists

`techniques_data` is a dict that stores multiple lists regarding Techniques


In [2]:
techniques_data = stixToDf.techniquesToDf(attackdata, "enterprise-attack")
print (techniques_data.keys())


parsing techniques: 100%|██████████| 607/607 [00:00<00:00, 922.91it/s] 
parsing relationships for type=technique: 100%|██████████| 16530/16530 [00:00<00:00, 48079.94it/s]


dict_keys(['techniques', 'procedure examples', 'associated mitigations', 'citations'])


### 1-1 Technique list

In [3]:
techniques_df = techniques_data["techniques"]
techniques_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 607 entries, 256 to 557
Data columns (total 21 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   ID                      607 non-null    object
 1   name                    607 non-null    object
 2   description             607 non-null    object
 3   url                     607 non-null    object
 4   created                 607 non-null    object
 5   last modified           607 non-null    object
 6   version                 607 non-null    object
 7   tactics                 607 non-null    object
 8   detection               607 non-null    object
 9   platforms               607 non-null    object
 10  data sources            568 non-null    object
 11  is sub-technique        607 non-null    bool  
 12  sub-technique of        411 non-null    object
 13  defenses bypassed       103 non-null    object
 14  contributors            349 non-null    object
 15  permissio

In [4]:
techniques_cols = list (techniques_df.columns)
for col in techniques_cols:
    print('-',col,)
    
    
def print_cols (df):
    df_cols = list(df.columns)
    for col in df_cols:
        print ('-',col)

- ID
- name
- description
- url
- created
- last modified
- version
- tactics
- detection
- platforms
- data sources
- is sub-technique
- sub-technique of
- defenses bypassed
- contributors
- permissions required
- supports remote
- system requirements
- impact type
- effective permissions
- relationship citations


In [5]:
# techniques_df[techniques_df['supports remote']==True].head(7)
# techniques_df[['data sources']].head(7)
# techniques_df[['system requirements']].value_counts()
# techniques_df.head(7)
techniques_df[['impact type']].value_counts()

impact type 
Availability    19
Integrity        7
Name: count, dtype: int64

### 1-2 Technique - Associated mitigation list

In [6]:
techniques_mitigations_df = techniques_data['associated mitigations']
techniques_mitigations_df.head()

Unnamed: 0,source ID,source name,source type,mapping type,target ID,target name,target type,mapping description
7827,M1036,Account Use Policies,mitigation,mitigates,T1110,Brute Force,technique,Set account lockout policies after a certain n...
1037,M1036,Account Use Policies,mitigation,mitigates,T1078.004,Cloud Accounts,technique,Use conditional access policies to block login...
8955,M1036,Account Use Policies,mitigation,mitigates,T1110.004,Credential Stuffing,technique,Set account lockout policies after a certain n...
3874,M1036,Account Use Policies,mitigation,mitigates,T1621,Multi-Factor Authentication Request Generation,technique,Enable account restrictions to prevent login a...
14566,M1036,Account Use Policies,mitigation,mitigates,T1110.001,Password Guessing,technique,Set account lockout policies after a certain n...


In [7]:
techniques_mitigations_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1200 entries, 7827 to 9804
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   source ID            1200 non-null   object
 1   source name          1200 non-null   object
 2   source type          1200 non-null   object
 3   mapping type         1200 non-null   object
 4   target ID            1200 non-null   object
 5   target name          1200 non-null   object
 6   target type          1200 non-null   object
 7   mapping description  1200 non-null   object
dtypes: object(8)
memory usage: 84.4+ KB


In [8]:
print_cols (techniques_mitigations_df)

- source ID
- source name
- source type
- mapping type
- target ID
- target name
- target type
- mapping description


---
## 2 Group list

In [9]:
groups_data = stixToDf.groupsToDf (attackdata)
print (groups_data.keys())


TypeError: groupsToDf() takes 1 positional argument but 2 were given

### 2-1 Group list

In [10]:
groups_df = groups_data['groups']


In [11]:
# groups_df.info()
# print_cols (groups_df)
groups_df.head()

Unnamed: 0,ID,name,description,url,created,last modified,version,contributors,associated groups,associated groups citations,relationship citations
47,G0099,APT-C-36,[APT-C-36](https://attack.mitre.org/groups/G00...,https://attack.mitre.org/groups/G0099,05 May 2020,26 May 2021,1.1,Jose Luis Sánchez Martinez,Blind Eagle,(Citation: QiAnXin APT-C-36 Feb2019),"(Citation: QiAnXin APT-C-36 Feb2019),(Citation..."
13,G0006,APT1,[APT1](https://attack.mitre.org/groups/G0006) ...,https://attack.mitre.org/groups/G0006,31 May 2017,26 May 2021,1.4,,"Comment Crew, Comment Group, Comment Panda","(Citation: Mandiant APT1), (Citation: Mandiant...","(Citation: Mandiant APT1 Appendix),(Citation: ..."
96,G0005,APT12,[APT12](https://attack.mitre.org/groups/G0005)...,https://attack.mitre.org/groups/G0005,31 May 2017,30 March 2020,2.1,,"DNSCALC, DynCalc, IXESHE, Numbered Panda","(Citation: Moran 2014), (Citation: Meyers Numb...","(Citation: Moran 2014),(Citation: Moran 2013),..."
129,G0023,APT16,[APT16](https://attack.mitre.org/groups/G0023)...,https://attack.mitre.org/groups/G0023,31 May 2017,26 July 2022,1.1,,,,"(Citation: FireEye EPS Awakens Part 2),(Citati..."
130,G0025,APT17,[APT17](https://attack.mitre.org/groups/G0025)...,https://attack.mitre.org/groups/G0025,31 May 2017,13 October 2020,1.1,,Deputy Dog,(Citation: FireEye APT17),"(Citation: FireEye APT17),(Citation: FireEye A..."


### 2-2 Group - Technique list

In [12]:
groups_techniques_df = groups_data['techniques used']


In [13]:
groups_techniques_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3052 entries, 3167 to 1447
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   source ID            3052 non-null   object
 1   source name          3052 non-null   object
 2   source type          3052 non-null   object
 3   mapping type         3052 non-null   object
 4   target ID            3052 non-null   object
 5   target name          3052 non-null   object
 6   target type          3052 non-null   object
 7   mapping description  3052 non-null   object
dtypes: object(8)
memory usage: 214.6+ KB


In [14]:
groups_techniques_df.head()

Unnamed: 0,source ID,source name,source type,mapping type,target ID,target name,target type,mapping description
3167,G0099,APT-C-36,group,uses,T1105,Ingress Tool Transfer,technique,[APT-C-36](https://attack.mitre.org/groups/G00...
2096,G0099,APT-C-36,group,uses,T1204.002,Malicious File,technique,[APT-C-36](https://attack.mitre.org/groups/G00...
2093,G0099,APT-C-36,group,uses,T1036.004,Masquerade Task or Service,technique,[APT-C-36](https://attack.mitre.org/groups/G00...
65,G0099,APT-C-36,group,uses,T1571,Non-Standard Port,technique,[APT-C-36](https://attack.mitre.org/groups/G00...
2230,G0099,APT-C-36,group,uses,T1027,Obfuscated Files or Information,technique,[APT-C-36](https://attack.mitre.org/groups/G00...


### 2-3 Associated Software

In [15]:
groups_software_df = groups_data['associated software']
groups_software_df.head()

Unnamed: 0,source ID,source name,source type,mapping type,target ID,target name,target type,mapping description
473,G0099,APT-C-36,group,uses,S0434,Imminent Monitor,software,(Citation: QiAnXin APT-C-36 Feb2019)
1746,G0006,APT1,group,uses,S0017,BISCUIT,software,(Citation: Mandiant APT1)
1270,G0006,APT1,group,uses,S0025,CALENDAR,software,(Citation: Mandiant APT1)
55,G0006,APT1,group,uses,S0119,Cachedump,software,(Citation: Mandiant APT1)
1485,G0006,APT1,group,uses,S0026,GLOOXMAIL,software,(Citation: Mandiant APT1)


In [16]:
groups_software_df.info()
print_cols (groups_software_df)

<class 'pandas.core.frame.DataFrame'>
Index: 830 entries, 473 to 3001
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   source ID            830 non-null    object
 1   source name          830 non-null    object
 2   source type          830 non-null    object
 3   mapping type         830 non-null    object
 4   target ID            830 non-null    object
 5   target name          830 non-null    object
 6   target type          830 non-null    object
 7   mapping description  828 non-null    object
dtypes: object(8)
memory usage: 58.4+ KB
- source ID
- source name
- source type
- mapping type
- target ID
- target name
- target type
- mapping description


---
## 3 Tactic list

In [17]:
tactics_data = stixToDf.tacticsToDf(attackdata)
tactics_data.keys()

parsing tactics: 100%|██████████| 14/14 [00:00<00:00, 6993.84it/s]


dict_keys(['tactics'])

In [18]:
tactics_df = tactics_data['tactics']

In [19]:
tactics_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 14 entries, 9 to 10
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   ID             14 non-null     object
 1   name           14 non-null     object
 2   description    14 non-null     object
 3   url            14 non-null     object
 4   created        14 non-null     object
 5   last modified  14 non-null     object
 6   version        14 non-null     object
dtypes: object(7)
memory usage: 896.0+ bytes


In [20]:
print_cols (tactics_df)

- ID
- name
- description
- url
- created
- last modified
- version


## 4 Software list

In [21]:
software_data = stixToDf.softwareToDf(attackdata)
software_data.keys()

parsing software: 100%|██████████| 635/635 [00:00<00:00, 16300.47it/s]
parsing relationships for type=software: 100%|██████████| 16530/16530 [00:00<00:00, 68686.47it/s]


dict_keys(['software', 'associated groups', 'associated campaigns', 'techniques used', 'citations'])

In [22]:
software_df = software_data['software']
software_groups_df = software_data['associated groups']
software_techniques_df = software_data['techniques used']

### 4-1 Software list

In [23]:
software_df.info()
print_cols (software_df)

<class 'pandas.core.frame.DataFrame'>
Index: 635 entries, 350 to 258
Data columns (total 12 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   ID                      635 non-null    object
 1   name                    635 non-null    object
 2   description             635 non-null    object
 3   url                     635 non-null    object
 4   created                 635 non-null    object
 5   last modified           635 non-null    object
 6   version                 635 non-null    object
 7   contributors            138 non-null    object
 8   platforms               598 non-null    object
 9   aliases                 608 non-null    object
 10  type                    635 non-null    object
 11  relationship citations  635 non-null    object
dtypes: object(12)
memory usage: 64.5+ KB
- ID
- name
- description
- url
- created
- last modified
- version
- contributors
- platforms
- aliases
- type
- relationship c

### 4-2 Software-Group list

In [24]:
software_groups_df.info()
print_cols(software_groups_df)

<class 'pandas.core.frame.DataFrame'>
Index: 830 entries, 1150 to 7089
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   source ID            830 non-null    object
 1   source name          830 non-null    object
 2   source type          830 non-null    object
 3   mapping type         830 non-null    object
 4   target ID            830 non-null    object
 5   target name          830 non-null    object
 6   target type          830 non-null    object
 7   mapping description  828 non-null    object
dtypes: object(8)
memory usage: 58.4+ KB
- source ID
- source name
- source type
- mapping type
- target ID
- target name
- target type
- mapping description


### 4-3 Software-Techniques list

In [25]:
print_cols(software_techniques_df)

- source ID
- source name
- source type
- mapping type
- target ID
- target name
- target type
- mapping description


---
# Export data

In [26]:
dfs = {
    "techniques_df" : techniques_df,
    "techniques_mitigations_df" : techniques_mitigations_df,
    "groups_df": groups_df,
    "groups_techniques_df" : groups_techniques_df,
    "groups_software_df" : groups_software_df,
    "tactics_df" : tactics_df,
    "software_df" : software_df,
    "software_groups_df" : software_groups_df
}


In [29]:
import attck_utils

for key in dfs.keys():
    # dfs[key].to_csv (f"fetched_data/{key}.csv", index = False)
    attck_utils.save_df_to_csv (
        path = "data/fetched_data",
        filename =  key,
        df = dfs[key]
    )