# PCAP Analysis

Sources : 
- https://adriangcoder.medium.com/pandas-tricks-and-tips-a7b87c3748ea
- https://www.stamus-networks.com/blog/jupyter-playbooks-for-suricata-part-3



Certains outils ont été spécialement développés pour analyser le traffic réseau et détecter à travers des règles définies, un comportement malveillant/suspect.


In [1]:
import json
import pandas as pd
from pandas import json_normalize
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
%matplotlib inline
import re
from IPython.display import display, HTML

import ipaddress as ip
from msticpy.transform.iocextract import IoCExtract
# Instantiate an IoCExtract object
#from msticpy.transform import IoCExtract
ioc_extractor = IoCExtract()

import msticpy as mp
mp.init_notebook(globals(), verbosity=0)
ti = mp.TILookup()
ioc_extract = IoCExtract()

#Expand the width of the cells
display(HTML("<style>.container { width:90% !important; }</style>"))

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

def ip_type(string):
    if ip.ip_address(string).is_private:
        return 'Private'
    elif ip.ip_address(string).is_multicast:
        return 'Multicast'
    elif ip.ip_address(string).is_reserved:
        return 'Reserved'
    elif ip.ip_address(string).is_loopback:
        return 'Loopback'
    elif ip.ip_address(string).is_global:
        return 'Public'
    elif ip.ip_address(string).is_link_local:
        return 'Link local'



# Chemin du fichier à analyser
pcapFile = {}
pcapFile_path = "/home/kidrek/Downloads/challenge-files/challenge-files/infection-traffic.pcap"



## Analyse rapide du PCAP via Tshark

Il est possible de parcourir facilement le contenu d'un fichier PCAP grâce à l'outil : **tshark**, une version console de l'outil **wireshark**.

```
# Read all pcap file 
$ tshark -r {pcap_filepath}

# Read all pcap file without resolve domain name 
$ tshark -nr {pcap_filepath}
```



Affichage des conversations **TCP** présentes au sein du PCAP

In [2]:
!tshark -nr $pcapFile_path -q -z conv,tcp

TCP Conversations
Filter:<No Filter>
                                                           |       <-      | |       ->      | |     Total     |    Relative    |   Duration   |
                                                           | Frames  Bytes | | Frames  Bytes | | Frames  Bytes |      Start     |              |
10.6.2.103:57600           <-> 194.5.249.46:443              1385 1,982kB       884 64kB         2269 2,046kB      76.535252000        68.9587
10.6.2.103:57594           <-> 172.67.169.59:80               548 778kB         317 22kB          865 800kB         8.578007000        63.0912
10.6.2.103:57614           <-> 38.135.122.194:8080            410 41kB          436 188kB         846 229kB       571.503808000        33.4358
10.6.2.103:57592           <-> 45.142.213.105:80              402 573kB         284 20kB          686 593kB         0.193640000         2.8731
10.6.2.103:57593           <-> 65.8.218.70:443                183 252kB          90 5,755bytes     27

Affichage des conversations **UDP** présentes au sein du PCAP

In [3]:
!tshark -nr $pcapFile_path -q -z conv,udp

UDP Conversations
Filter:<No Filter>
                                                           |       <-      | |       ->      | |     Total     |    Relative    |   Duration   |
                                                           | Frames  Bytes | | Frames  Bytes | | Frames  Bytes |      Start     |              |
10.6.2.103:57692           <-> 10.6.2.1:53                      1 94bytes         1 78bytes         2 172bytes      0.000000000         0.1778
10.6.2.103:62832           <-> 10.6.2.1:53                      1 168bytes        1 74bytes         2 242bytes      7.268986000         0.0485
10.6.2.103:64801           <-> 10.6.2.1:53                      1 108bytes        1 76bytes         2 184bytes      8.509279000         0.0666
10.6.2.103:51737           <-> 10.6.2.1:53                      1 90bytes         1 74bytes         2 164bytes     13.295954000         0.0864
10.6.2.103:60527           <-> 10.6.2.1:53                      1 92bytes         1 76bytes         2

Il est possible d'aller plus loin via l'usage de filtres.

* Filter on HTTP method


In [4]:
!tshark  -nr $pcapFile_path -Y "http.request.method==GET"

    6   0.377723   10.6.2.103 → 45.142.213.105 HTTP 616 GET /adda/T/5xBOnOkAQixWY7/JQNizzLtuT6BVV0xRecCKVVHAAR6PkgGrIPN/sose5?user=anRsIkfbv&time=0qobcg4DyUX11ZLF5yHrIevFn&page=1K2n8iJ&i9y9SwJu=yVaCtZ9s0gUfn&q=hj9xWh4I6PDdXOPDey&id=Vr4pf&user=mHMoD292T&search=uZVgg21LyVRFdD2FABGZvQlnkM90&q=Dwc1s67MbWC24TGoOjMXC HTTP/1.1 
  966   8.610776   10.6.2.103 → 172.67.169.59 HTTP 334 GET / HTTP/1.1 


* Show all User-Agent

In [5]:
!tshark -nr $pcapFile_path -Y "http.user_agent" -Tfields -e ip.addr -e http.user_agent 

10.6.2.103,45.142.213.105	Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729)



source : https://jsur.in/post/2020-02-19-tshark-cheatsheet


## Analyse du PCAP via Suricata

Une fois l'analyse rapide terminée, il est important de faire appel à des moteurs de détection type NIDS, spécialisé Réseau, tel que **Suricata**. 
Avant de débuter l'analyse du pcap, il est indispensable de mettre à jour les règles de détection ```suricata-update```.

In [6]:
LOGDIR = "/tmp/suricata/logs"
!rm -rf $LOGDIR 2>/dev/null; mkdir -p $LOGDIR

# Update rules
!sudo suricata-update 1>/dev/null

# Start analyse
!suricata -S /var/lib/suricata/rules/suricata.rules -r $pcapFile_path -l $LOGDIR -v 

[32m12/6/2023 -- 17:40:48[0m - <[1;33mNotice[0m> - [33mThis is Suricata version 6.0.1 RELEASE running in USER mode[0m
[32m12/6/2023 -- 17:40:48[0m - <[33mInfo[0m> - CPUs/cores online: 8[0m
[32m12/6/2023 -- 17:40:48[0m - <[33mInfo[0m> - fast output device (regular) initialized: fast.log[0m
[32m12/6/2023 -- 17:40:48[0m - <[33mInfo[0m> - eve-log output device (regular) initialized: eve.json[0m
[32m12/6/2023 -- 17:40:48[0m - <[33mInfo[0m> - stats output device (regular) initialized: stats.log[0m
[32m12/6/2023 -- 17:41:00[0m - <[33mInfo[0m> - 1 rule files processed. 34088 rules successfully loaded, 0 rules failed[0m
[32m12/6/2023 -- 17:41:00[0m - <[33mInfo[0m> - Threshold config parsed: 0 rule(s) found[0m
[32m12/6/2023 -- 17:41:01[0m - <[33mInfo[0m> - 34091 signatures processed. 1270 are IP-only rules, 5214 are inspecting packet payload, 27407 inspect application layer, 104 are decoder event only[0m
[32m12/6/2023 -- 17:41:50[0m - <[33mInfo[0m> - 

Si Suricata identifie à travers ses règles des indicateurs de compromissions, alors le nombre d'alerte devrait être supérieur à 0. 

Si tel est le cas, ces alertes sont consultables au sein du fichier eve.json ou fast.log. \
Les étapes suivantes considèrent que des alertes ont été générées. L'action initiale consiste donc à parcourir le fichier **eve.json** et le charger dans un dataFrame pour analyser plus facilement les données qu'il contient.



In [7]:
# Load nested eve.json in dataFrame
with open(f"{ LOGDIR }/eve.json", "r") as eveFile:
    df_suricata = pd.json_normalize([
        json.loads(line) for line in eveFile
    ], max_level=1)

df_suricata['flow_id'] = df_suricata['flow_id'].fillna(0).astype(int)
df_suricata['alert.signature_id'] = df_suricata['alert.signature_id'].fillna(0).astype(int)
df_suricata['dest_port'] = df_suricata['dest_port'].fillna(0).astype(int)
df_suricata['src_port'] = df_suricata['src_port'].fillna(0).astype(int)

df_suricata['flow.start'] = pd.to_datetime(df_suricata['flow.start'], format='%Y-%m-%d %H:%M:%S')
df_suricata['timestamp'] = pd.to_datetime(df_suricata['timestamp'], format='%Y-%m-%d %H:%M:%S')

### Qualification des alertes recensées par Suricata



In [8]:
df_suricata_alert = df_suricata[(df_suricata.event_type == "alert") & (df_suricata['alert.category'].str.contains('Not Suspicious') == False)][['alert.severity','alert.signature_id','alert.signature','src_ip','src_port','dest_ip','dest_port','proto','app_proto','flow.start','flow_id','flow.bytes_toserver','flow.bytes_toclient']].sort_values(by=['alert.severity'], ascending=False)
display(df_suricata_alert)

Unnamed: 0,alert.severity,alert.signature_id,alert.signature,src_ip,src_port,dest_ip,dest_port,proto,app_proto,flow.start,flow_id,flow.bytes_toserver,flow.bytes_toclient
4,3.0,2014520,ET INFO EXE - Served Attached HTTP,45.142.213.105,80,10.6.2.103,57592,TCP,http,2021-06-02 20:49:58.445041+00:00,900886284388977,2002.0,44462.0
5,2.0,2023883,ET DNS Query to a *.top domain - Likely Hostile,10.6.2.103,51737,10.6.2.1,53,UDP,dns,2021-06-02 20:50:11.547355+00:00,612749814225435,74.0,0.0
6,2.0,2023883,ET DNS Query to a *.top domain - Likely Hostile,10.6.2.103,60527,10.6.2.1,53,UDP,dns,2021-06-02 20:51:11.037312+00:00,2000084384321984,76.0,0.0
11,2.0,2027871,ET INFO Observed DNS Query to .fit TLD,10.6.2.103,65245,10.6.2.1,53,UDP,dns,2021-06-02 20:51:15.495366+00:00,2243162353667846,74.0,0.0
26,2.0,2023883,ET DNS Query to a *.top domain - Likely Hostile,10.6.2.103,64801,10.6.2.1,53,UDP,dns,2021-06-02 20:50:06.760680+00:00,2249694994406248,76.0,0.0
30,2.0,2023882,ET INFO HTTP Request to a *.top domain,10.6.2.103,57594,172.67.169.59,80,TCP,http,2021-06-02 20:50:06.829408+00:00,1414169236514784,520.0,3765.0
36,2.0,2023883,ET DNS Query to a *.top domain - Likely Hostile,10.6.2.103,54815,10.6.2.1,53,UDP,dns,2021-06-02 21:01:13.945771+00:00,758009946533483,74.0,0.0
45,2.0,2023883,ET DNS Query to a *.top domain - Likely Hostile,10.6.2.103,54222,10.6.2.1,53,UDP,dns,2021-06-02 20:51:15.938512+00:00,403632155742736,74.0,0.0
3,1.0,2018959,ET POLICY PE EXE or DLL Windows file download HTTP,45.142.213.105,80,10.6.2.103,57592,TCP,http,2021-06-02 20:49:58.445041+00:00,900886284388977,2002.0,44462.0
29,1.0,2032086,ET MALWARE Win32/IcedID Request Cookie,10.6.2.103,57594,172.67.169.59,80,TCP,http,2021-06-02 20:50:06.829408+00:00,1414169236514784,520.0,3765.0


Une fois les alertes qualifiées, nous allons réaliser une extraction des IOCs potentiellement présents dans le PCAP et les rechercher dans des bases de connaissances de la menace.

In [9]:
# any IoCs in the string?
iocs_found = ioc_extractor.extract(data=df_suricata.fillna(''), columns=['src_ip','dest_ip','dns.rrname'])
iocs_found = iocs_found['Observable'].drop_duplicates()
df_ti = ti.lookup_iocs(data=iocs_found, providers=["VirusTotal", "OTX"])
df_suspnetworkconnections = df_ti[df_ti['Result']==True]
df_suspnetworkconnections = pd.json_normalize(data=df_suspnetworkconnections[['Ioc','Provider','Details']].to_dict(orient='records')).sort_values(by=['Details.pulse_count'], ascending=False)
df_suspnetworkconnections[['Ioc','Provider','Details.pulse_count','Details.names','Details.references']]

Observables processed:   0%|          | 0/30 [00:00<?, ?obs/s]

Unnamed: 0,Ioc,Provider,Details.pulse_count,Details.names,Details.references
6,194.5.249.46,OTX,11,"[Tracking IcedID Servers, test, test, test, test, test, TOR Nodes, Log4j Scanning Hosts, TA551 IcedID C2 Infrastructure (Suspected), ThreatFox2020604, Melting Ice – Tracking IcedID Servers with a few simple steps]","[[https://research.checkpoint.com/2021/melting-ice-tracking-icedid-servers-with-a-few-simple-steps/], [], [], [], [], [], [], [Log4j Scanning Hosts.pdf], [], [https://threatfox.abuse.ch/export/json/recent/], [https://research.checkpoint.com/2021/melting-ice-tracking-icedid-servers-with-a-few-simple-steps/]]"
7,185.33.85.35,OTX,8,"[Tracking IcedID Servers, Log4J Exploit, Log4j Scanning Hosts, TA551 IcedID C2 Infrastructure (Suspected), ThreatFox2020604, Melting Ice – Tracking IcedID Servers with a few simple steps, Team Cymru tacks BokBot infrastructure, TA551 (Shathak) pushes IcedID (Bokbot) - List of indicators - May 2021]","[[https://research.checkpoint.com/2021/melting-ice-tracking-icedid-servers-with-a-few-simple-steps/], [], [Log4j Scanning Hosts.pdf], [], [https://threatfox.abuse.ch/export/json/recent/], [https://research.checkpoint.com/2021/melting-ice-tracking-icedid-servers-with-a-few-simple-steps/], [https://team-cymru.com/blog/2021/05/19/tracking-bokbot-infrastructure/], [https://github.com/pan-unit42/tweets/blob/master/2021-05-17-TA551-IOCs-for-IcedID.txt, https://twitter.com/Unit42_Intel/status/1394659804329693186?s=20]]"
10,38.135.122.194,OTX,6,"[asdAS, Conti Ransomware (12 May 2021), Conti Ransomware IOCs, Conti IOC CyberLab, Additional Conti IOCs - May 2021, Conti Ransomware IOCs - May 2021]","[[], [https://thedfirreport.com/2021/05/12/conti-ransomware/], [], [], [https://twitter.com/TheDFIRReport/status/1392443471357779970], [https://thedfirreport.com/2021/05/12/conti-ransomware/]]"
0,coursemcclurez.com,OTX,5,"[Malware - Malware Domain Feed V2 - November 03 2020, Malware - Malware Domain Feed V2 - November 03 2020, TA551 domains, NewDom-1-20210530, TA551 - IcedID - IOCs - April, May & June 2021 Campaign]","[[], [], [https://raw.githubusercontent.com/hpthreatresearch/iocs/main/TA551/domains.txt, https://threatresearch.ext.hp.com/detecting-ta551-domains/], [], [https://docs.google.com/spreadsheets/d/1N7YSoS5nXYHLnGNrJdCDP40wMmTLHG5X5lbM2EW8UOY/edit#gid=0]]"
3,fimlubindu.top,OTX,5,"[Tracking BokBot (IcedID) Infrastructure, MacOS EvilQuest - Unpacked, Threatfox 20210527, Team Cymru tacks BokBot infrastructure, TA551 (Shathak) pushes IcedID (Bokbot) - List of indicators - May 2021]","[[https://team-cymru.com/blog/2021/05/19/tracking-bokbot-infrastructure/, https://github.com/team-cymru/iocs/tree/master/bokbot], [], [https://threatfox.abuse.ch/export/json/recent/], [https://team-cymru.com/blog/2021/05/19/tracking-bokbot-infrastructure/], [https://github.com/pan-unit42/tweets/blob/master/2021-05-17-TA551-IOCs-for-IcedID.txt, https://twitter.com/Unit42_Intel/status/1394659804329693186?s=20]]"
5,kilodaser4.fit,OTX,2,"[ThreatFox2020604, TA551 (Shathak) pushes IcedID (Bokbot) - List of indicators - May 2021]","[[https://threatfox.abuse.ch/export/json/recent/], [https://github.com/pan-unit42/tweets/blob/master/2021-05-17-TA551-IOCs-for-IcedID.txt, https://twitter.com/Unit42_Intel/status/1394659804329693186?s=20]]"
1,65.8.218.70,OTX,1,[MacOS EvilQuest - Unpacked],[[]]
4,extrimefigim.top,OTX,1,[Threatfox 20210527],[[https://threatfox.abuse.ch/export/json/recent/]]
11,arhannexa5.top,OTX,1,[ThreatFox2020604],[[https://threatfox.abuse.ch/export/json/recent/]]
2,45.142.213.105,OTX,0,,


### Statistiques

#### Déterminer les types de connexions réalisés

In [12]:
df_suricata.groupby(by='event_type').size().reset_index(name='count').sort_values(by=['count'])

Unnamed: 0,event_type,count
5,stats,1
2,fileinfo,2
4,http,2
6,tls,9
1,dns,16
0,alert,19
3,flow,21


#### Determiner les noms de domaines recherchés

Cette analyse permet de recenser les noms de domaines non communs pouvant résulter d'une attaque en cours.

In [13]:
df_suricata[df_suricata.event_type=='dns'][['timestamp','dns.rrname','dns.rrtype','dns.rcode','dns.grouped']].dropna()

Unnamed: 0,timestamp,dns.rrname,dns.rrtype,dns.rcode,dns.grouped
1,2021-06-02 20:49:58.429250+00:00,coursemcclurez.com,A,NOERROR,{'A': ['45.142.213.105']}
9,2021-06-02 20:51:11.127796+00:00,extrimefigim.top,A,NOERROR,{'A': ['194.5.249.46']}
10,2021-06-02 20:50:11.633734+00:00,fimlubindu.top,A,NOERROR,{'A': ['185.33.85.35']}
13,2021-06-02 20:51:15.560390+00:00,kilodaser4.fit,A,NOERROR,{'A': ['185.33.85.35']}
28,2021-06-02 20:50:06.827231+00:00,supplementik.top,A,NOERROR,"{'A': ['172.67.169.59', '104.21.79.67']}"
39,2021-06-02 21:01:13.997203+00:00,arhannexa5.top,A,NOERROR,{'A': ['185.33.85.35']}
43,2021-06-02 20:50:05.568889+00:00,aws.amazon.com,A,NOERROR,"{'CNAME': ['tp.8e49140c2-frontier.amazon.com', 'dr49lng3n1n2s.cloudfront.net'], 'A': ['65.8.218.70']}"
47,2021-06-02 20:51:16.026333+00:00,arhannexa5.top,A,NOERROR,{'A': ['185.33.85.35']}


#### Déterminer les fréquences de connexions

Il est également possible d'identifier les récurrences de connexions vers une même adresse ip de destination.

In [14]:
df_suricata.groupby('dest_ip').size().reset_index(name='count').sort_values(by=['count'], ascending=False)

Unnamed: 0,dest_ip,count
0,10.6.2.1,30
1,10.6.2.103,12
3,185.33.85.35,10
4,194.5.249.46,6
2,172.67.169.59,4
5,38.135.122.194,3
6,45.142.213.105,2
7,65.8.218.70,2


#### [TODO] Determiner les connexions réalisées à intervale régulier en parcourant l'ensemble des flow.

Cette analyse permet notamment d'identifier les beacons présents sur notre réseau.

In [15]:
# source : https://towardsdatascience.com/6-visualization-tricks-to-handle-ultra-long-time-series-data-57dad97e0fc2

df_flow = df_suricata[['timestamp','flow_id','src_ip','dest_ip','flow.bytes_toserver','flow.bytes_toclient']].sort_values(by=['timestamp'])

# Generate a graph filtered on specific remote ip
#df_flow = df_suricata[df_suricata.dest_ip == '10.6.2.1' ][['timestamp','flow_id','src_ip','dest_ip','flow.bytes_toserver','flow.bytes_toclient']].sort_values(by=['timestamp'])

display(px.box(df_flow, y='flow.bytes_toclient', x ='timestamp'))
display(px.box(df_flow, y='flow.bytes_toserver', x ='timestamp'))
