# Analyse de document Microsoft Office

Plusieurs types de fichiers sont disponibles au sein de la suite Microsoft Office :
- les **documents OLE**, que nous connaissons tous : 
  - **Excel** : ".xls"
  - **Powerpoint** : ".ppt"
  - **Word** : ".doc"
- les **documents Office Open XML "OOXML"** correspondent à des archives de documents XML (documents compressés). Ce type de document, standardisé via l'ISO/IEC 29500, a été introduit dans la suite Microsoft Office pour la première fois dans la version 2010. 
  - **Excel** : ".xlsx"
  - **Powerpoint** : ".pptx"
  - **Word** : ".docx"
- les documents RTF, ayant pour extension ".rtf"

In [18]:
from colorama import init, Fore, Back, Style
from defang import defang
import msticpy as mp
import pandas as pd
import msticpy.sectools as sectools

mp.init_notebook(globals(), verbosity=0)
ti = mp.TILookup()
ioc_extractor = sectools.IoCExtract()

# Chemin du fichier à analyser
officeFile = {}
officeFile['path'] = "/home/secubian/Desktop/Cases/Microsoft_Office/onenot_2623024aba1ee994dcb82e937a8beb59abbebf51b6aa4cde8434bb56458b47da.one"
#officeFile['path'] = "/home/secubian/Desktop/Cases/Microsoft_Office/8ed7befccff98a6acb255f63071a6e6ac1410c1d3b08ce560cac3cfe24572c8e.xlsx"
#officeFile['path'] = "/home/secubian/Desktop/Cases/Microsoft_Office/eeb7b78972ba051833135c6ba4215c0faf93d5dfe1c5603f74c777b38867646b.xlsx"
#officeFile['path'] = "/home/secubian/Desktop/Cases/Microsoft_Office/5b0f61b42e9a6c238c7028751bf75c484778219cf88a7c5007c2a49e14351e70.xls"
#officeFile['path'] = "/home/secubian/Desktop/Cases/Microsoft_Office/04b08125f2348443663ac6b44ed2388af399e5506b4a75dd4a0d02a40734848e.doc"



## Analyse des méta données

Il est important d'extraire des informations permettant d'obtenir du contexte sur le document, telles que l'auteur, le logiciel utilisé, les signatures numériques (hash). 

In [19]:
import oletools.oleid
oid = oletools.oleid.OleID(officeFile['path'])
officeFile['oleid'] = {}
indicators = oid.check()
for i in indicators:
    officeFile['oleid'][i.name] = i.value

for indicator in officeFile['oleid']:
    try:
        displayValue = False
        if (isinstance(officeFile['oleid'][indicator],bool) and officeFile['oleid'][indicator]) or (isinstance(officeFile['oleid'][indicator],int) and officeFile['oleid'][indicator]>0) or (officeFile['oleid'][indicator][0:3] == "Yes"):
            print(Fore.RED + f"[!] {indicator}: {officeFile['oleid'][indicator]}")
            displayValue = True
    except Exception as err:
        pass
    finally:
        if not displayValue: print(Fore.GREEN + f"[✓] {indicator}: {officeFile['oleid'][indicator]}")

[32m[✓] File format: MS Excel 2007+ Workbook (.xlsx)
[32m[✓] Container format: OpenXML
[32m[✓] Encrypted: False
[32m[✓] VBA Macros: No
[32m[✓] XLM Macros: No
[32m[✓] External Relationships: 0
[32m[✓] ObjectPool: False
[32m[✓] Flash objects: 0


Extraction des signatures numériques (HASH), permettant la recherche dans les bases de Threat Intelligence.

In [11]:
import hashlib
officeFile['md5'] = hashlib.md5(officeFile['path'].encode('UTF-8')).hexdigest()
officeFile['sha256'] = hashlib.sha256(officeFile['path'].encode('UTF-8')).hexdigest()

df_ti = ti.lookup_iocs(data=[officeFile['md5'], officeFile['sha256']], providers=["VirusTotal", "OTX"])
df_ti = df_ti[df_ti['Result']==True]
df_ti = pd.json_normalize(data=df_ti[['Severity','Provider','Ioc','Details']].to_dict(orient='records'))

print(Fore.GREEN + "Microsoft Office file HASH.")
print(Fore.GREEN + f"[✓] MD5: \t{officeFile['md5'] } ")
print(Fore.GREEN + f"[✓] SHA256: \t{officeFile['sha256'] }")

if (df_ti.empty):
    print(Fore.GREEN + "[✓] No Potentially identified as malicious")
else:
    print(Fore.RED + "[!] Potentially identified as malicious")
    display(df_ti)


Observables processed:   0%|          | 0/4 [00:00<?, ?obs/s]

[32mMicrosoft Office file HASH.
[32m[✓] MD5: 	b4cb893de96f04b987441bb9cb163b96 
[32m[✓] SHA256: 	5abf6f88376ebd42b87bbf8c821e336f45c46fb5928b6980a53c2d1d33e0fcdc
[31m[!] Potentially identified as malicious


Unnamed: 0,Severity,Provider,Ioc,Details.pulse_count,Details.sections_available,Details.verbose_msg,Details.response_code,Details.positives,Details.resource,Details.permalink
0,information,OTX,b4cb893de96f04b987441bb9cb163b96,0.0,"[general, analysis]",,,,,
1,information,OTX,5abf6f88376ebd42b87bbf8c821e336f45c46fb5928b6980a53c2d1d33e0fcdc,0.0,"[general, analysis]",,,,,
2,information,VirusTotal,b4cb893de96f04b987441bb9cb163b96,,,"The requested resource is not among the finished, queued or pending scans",0.0,0.0,b4cb893de96f04b987441bb9cb163b96,
3,information,VirusTotal,5abf6f88376ebd42b87bbf8c821e336f45c46fb5928b6980a53c2d1d33e0fcdc,,,"The requested resource is not among the finished, queued or pending scans",0.0,0.0,5abf6f88376ebd42b87bbf8c821e336f45c46fb5928b6980a53c2d1d33e0fcdc,


## Extraction et Analyse des relations externes au document analysé

Si des références externes ont été identifiées dans les métadonnées, l'usage de l'outil **oleobj** devrait permettre d'extraire les url,ip, domaines du document analysé.

In [16]:
from oletools import oleobj

with open(officeFile['path'], 'rb') as file_handle:
    data = file_handle.read()

relationships = []
if officeFile['oleid']['External Relationships'] > 0:
    xml_parser = oleobj.XmlParser(officeFile['path'])
    for relationship, target in oleobj.find_external_relationships(xml_parser):
        did_dump = True
        print(Fore.RED + f"[!] Found relationship {relationship} with external link {defang(target)}")
        relationships.append(target)
        if target.startswith('mhtml:'):
            print("Potential exploit for CVE-2021-40444")
    for target in oleobj.find_customUI(xml_parser):
        did_dump = True
        print(Fore.RED + f"[!]Found customUI tag with external link or VBA macro {defang(target)} (possibly exploiting CVE-2021-42292)")
        relationships.append(target)

if not relationships:
    print(Fore.GREEN + "[✓] No relationships found")

[31m[!] Found relationship frame with external link hXXp://104[.]129.4.31/..........W-----W.....W-w---------------W...W...w-----wW-------.---/...-......-W......Ww.......-----wW...w----W.-------------Ww-----.----.w-.wW..wbk


\
En fonction du résultat obtenu ci-dessus, une recherche dans les bases de connaissance de menaces peut être pertinente.

In [17]:
if relationship:
    df_relationships = pd.DataFrame(relationships, columns = ['relation'])
    df_relationships = ioc_extractor.extract(data=df_relationships,columns=['relation'])
    df_ti = ti.lookup_iocs(data=df_relationships['Observable'], providers=["VirusTotal", "OTX"])
    df_ti = df_ti[df_ti['Result']==True]
    df_ti = pd.json_normalize(data=df_ti[['Severity','Provider','Ioc','Details']].to_dict(orient='records'))

    if (df_ti.empty):
        print(Fore.GREEN + "[✓] No Potentially identified as malicious")
    else:
        print(Fore.RED + "[!] Potentially identified as malicious")
        display(df_ti)
else:
    print(Fore.GREEN + "[✓] No relationships found")

Observables processed:   0%|          | 0/4 [00:00<?, ?obs/s]

[31m[!] Potentially identified as malicious


Unnamed: 0,Severity,Provider,Ioc,Details.pulse_count,Details.sections_available,Details.verbose_msg,Details.response_code,Details.positives,Details.detected_urls,Details.detected_downloaded_samples,Details.detected_communicating_samples,Details.resource,Details.permalink
0,information,OTX,104.129.4.31,0.0,"[general, geo, reputation, url_list, passive_dns, malware, nids_list, http_scans]",,,,,,,,
1,information,OTX,http://104.129.4.31/..........W-----W.....W-w---------------W...W...w-----wW-------.---/...-.......,0.0,"[general, url_list, http_scans, screenshot]",,,,,,,,
2,high,VirusTotal,104.129.4.31,,,IP address in dataset,1.0,8.0,[https://104.129.4.31/],[],[],,
3,high,VirusTotal,http://104.129.4.31/..........W-----W.....W-w---------------W...W...w-----wW-------.---/...-.......,,,"Scan finished, scan information embedded in this object",1.0,11.0,,,,http://104.129.4.31/..........W-----W.....W-w---------------W...W...w-----wW-------.---/...-.......,https://www.virustotal.com/gui/url/93839f6bd29a19ebcbba84dce8d1a9229df51e837e58fa9f7b9270003f91b...


## Extraction et Analyse des macros

Les macros sont des éléments permettant d'exécuter des morceaux de code à l'ouverture du document ou à lors d'événements bien précis.

In [7]:
# https://github.com/decalage2/oletools/wiki/olevba
from oletools.olevba import VBA_Parser, TYPE_OLE, TYPE_OpenXML, TYPE_Word2003_XML, TYPE_MHTML
vbaparser = VBA_Parser(officeFile['path'])

if vbaparser.detect_vba_macros():
    print(Fore.RED + "[!] VBA Macros found")
    results = vbaparser.analyze_macros()
    print(Fore.RED + f"[!] AutoExec keywords: {vbaparser.nb_autoexec}")
    print(Fore.RED + f"[!] IOCs: {vbaparser.nb_iocs}")
    print(Fore.RED + f"[!] Hex obfuscated strings: {vbaparser.nb_hexstrings}")
    print(Fore.RED + f"[!] Base64 obfuscated strings: {vbaparser.nb_base64strings}")
    print(Fore.RED + f"[!] Dridex obfuscated strings: {vbaparser.nb_dridexstrings}")
    print(Fore.RED + f"[!] VBA obfuscated strings: {vbaparser.nb_vbastrings}")

    print("\n")
    print(Fore.RED + f"[!] Suspicious patterns : {vbaparser.nb_suspicious}")
    for kw_type, keyword, description in results:
        #print('type=%s - keyword=%s - description=%s' % (kw_type, keyword, description))
        print(f"[{kw_type}] - {keyword} : {description}")
else:
    print(Fore.GREEN + "[✓] No VBA Macros found")


[32m[✓] No VBA Macros found


Cette tentative se base sur les résultats obtenus précédemments. Si aucune chaine obfusquée n'a été détectée, aucun résultat ne sera obtenu.

In [8]:
# https://github.com/decalage2/oletools/wiki/olevba
if vbaparser.nb_hexstrings or vbaparser.nb_base64strings or vbaparser.nb_dridexstrings or vbaparser.nb_vbastrings:
    print()
    print(Fore.RED + vbaparser.reveal())
else:
    print(Fore.GREEN + "[✓] No VBA obfuscated strings found")

[32m[✓] No VBA obfuscated strings found
