# PCAP Analysis

Sources : 
- https://adriangcoder.medium.com/pandas-tricks-and-tips-a7b87c3748ea
- https://www.stamus-networks.com/blog/jupyter-playbooks-for-suricata-part-3



Certains outils ont été spécialement développés pour analyser le traffic réseau et détecter à travers des règles définies, un comportement malveillant/suspect.


In [None]:
import json
import pandas as pd
from pandas import json_normalize
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
%matplotlib inline
import re
from IPython.display import display, HTML

import ipaddress as ip
from msticpy.transform.iocextract import IoCExtract
# Instantiate an IoCExtract object
#from msticpy.transform import IoCExtract
ioc_extractor = IoCExtract()

import msticpy as mp
mp.init_notebook(globals(), verbosity=0)
ti = mp.TILookup()
ioc_extract = IoCExtract()

#Expand the width of the cells
display(HTML("<style>.container { width:90% !important; }</style>"))

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

def ip_type(string):
    if ip.ip_address(string).is_private:
        return 'Private'
    elif ip.ip_address(string).is_multicast:
        return 'Multicast'
    elif ip.ip_address(string).is_reserved:
        return 'Reserved'
    elif ip.ip_address(string).is_loopback:
        return 'Loopback'
    elif ip.ip_address(string).is_global:
        return 'Public'
    elif ip.ip_address(string).is_link_local:
        return 'Link local'



# Set pcap file path
pcapFile = {}
pcapFile_path = "XXXXXXxxxxx.pcap"



## Analyse rapide du PCAP via Tshark

Il est possible de parcourir facilement le contenu d'un fichier PCAP grâce à l'outil : **tshark**, une version console de l'outil **wireshark**.

```
# Read all pcap file 
$ tshark -r {pcap_filepath}

# Read all pcap file without resolve domain name 
$ tshark -nr {pcap_filepath}
```



Affichage des conversations **TCP** présentes au sein du PCAP

In [None]:
!tshark -nr $pcapFile_path -q -z conv,tcp

Affichage des conversations **UDP** présentes au sein du PCAP

In [None]:
!tshark -nr $pcapFile_path -q -z conv,udp

Il est possible d'aller plus loin via l'usage de filtres.

* Filter on HTTP method


In [None]:
!tshark -r $pcapFile_path -Y "http.request.method==GET" | grep -i 'swellheaded.php'

Show all HTTP requests and URLs : \
It could be interesting to identify exe or archive downloads.

In [None]:
#!tshark -r $pcapFile_path -Y 'ip.src == 1.2.3.4 and http.request.method == "GET"' -T fields -e http.request.method -e http.request.version -e http.request.full_uri | head -n 1
!tshark -nr $pcapFile_path -Y 'http.request.method == "POST" or http.request.method == "GET"' -T fields -e tcp.stream  -e http.request.method -e http.request.version -e http.request.full_uri | egrep -i '.zip|.exe'

Get all data about a specific tcp stream id

In [None]:
!tshark -nr $pcapFile_path -Y 'tcp.stream == 119' -T fields -e data.data

In [None]:
!tshark -nr $pcapFile_path -Y 'http.request.method == "POST" or http.request.method == "GET"' -T fields -e tcp.stream  -e http.request.method -e http.request.version -e http.request.full_uri -e data.data

* Show all User-Agent

In [None]:
!tshark -nr $pcapFile_path -Y "http.user_agent" -Tfields -e ip.addr -e http.user_agent 


source : https://jsur.in/post/2020-02-19-tshark-cheatsheet


## Analyse du PCAP via Suricata

Une fois l'analyse rapide terminée, il est important de faire appel à des moteurs de détection type NIDS, spécialisé Réseau, tel que **Suricata**. 
Avant de débuter l'analyse du pcap, il est indispensable de mettre à jour les règles de détection ```suricata-update```.

In [None]:
LOGDIR = "/tmp/suricata/logs"
!rm -rf $LOGDIR 2>/dev/null; mkdir -p $LOGDIR

# Update rules
!sudo suricata-update 1>/dev/null

# Start analyse
!suricata -S /var/lib/suricata/rules/suricata.rules -r $pcapFile_path -l $LOGDIR -v 

Si Suricata identifie à travers ses règles des indicateurs de compromissions, alors le nombre d'alerte devrait être supérieur à 0. 

Si tel est le cas, ces alertes sont consultables au sein du fichier eve.json ou fast.log. \
Les étapes suivantes considèrent que des alertes ont été générées. L'action initiale consiste donc à parcourir le fichier **eve.json** et le charger dans un dataFrame pour analyser plus facilement les données qu'il contient.



In [None]:
# Load nested eve.json in dataFrame
with open(f"{ LOGDIR }/eve.json", "r") as eveFile:
    df_suricata = pd.json_normalize([
        json.loads(line) for line in eveFile
    ], max_level=1)

df_suricata['flow_id'] = df_suricata['flow_id'].fillna(0).astype(int)
df_suricata['alert.signature_id'] = df_suricata['alert.signature_id'].fillna(0).astype(int)
df_suricata['dest_port'] = df_suricata['dest_port'].fillna(0).astype(int)
df_suricata['src_port'] = df_suricata['src_port'].fillna(0).astype(int)

df_suricata['flow.start'] = pd.to_datetime(df_suricata['flow.start'], format='%Y-%m-%d %H:%M:%S')
df_suricata['timestamp'] = pd.to_datetime(df_suricata['timestamp'], format='%Y-%m-%d %H:%M:%S')

In [None]:
# Reference : https://github.com/Cyb3r-Monk/RITA-J/blob/main/C2%20Detection%20-%20HTTP.ipynb

columns_to_display = ['Score','tsScore','dsScore','connections_count','src_ip','dest_ip','dest_port','proto']
columns_to_filter = ['timestamp','src_ip','dest_ip','dest_port','proto','flow.bytes_toserver']
columns_to_groupby = ['src_ip','dest_ip','dest_port','proto']
df_beacon = df_suricata[columns_to_filter]
df_beacon = df_beacon.drop_duplicates(subset=columns_to_filter)

df_beacon = df_beacon[df_beacon['flow.bytes_toserver'].notnull()]
df_beacon = df_beacon.groupby(columns_to_groupby).agg(list)
df_beacon.reset_index(inplace=True)

# Calule le nombre de connexions répondant aux mêmes critères ip_source, ip_destination, port_destination, protocole
df_beacon['connections_count'] = df_beacon['timestamp'].apply(lambda x: len(x))

# Application d'un seuil de connexion minimal à obtenir par critère
df_beacon = df_beacon.loc[df_beacon['connections_count'] > 20]

# Tri des connexions par connections_count
df_beacon = df_beacon.sort_values(['connections_count'], ascending=False)

# Tri des connexions par timestamp
df_beacon['timestamp'] = df_beacon['timestamp'].apply(lambda x: sorted(x))

# Calcul du delta entre les paquets / Suppression des deltas = 0 qui n'apporterait pas de plus-value
df_beacon['deltas'] = df_beacon['timestamp'].apply(lambda x: pd.Series(x).diff().dt.seconds.dropna().tolist())

# Variables for time delta dispersion
df_beacon['tsLow'] = df_beacon['deltas'].apply(lambda x: np.percentile(np.array(x), 20))
df_beacon['tsMid'] = df_beacon['deltas'].apply(lambda x: np.percentile(np.array(x), 50))
df_beacon['tsHigh'] = df_beacon['deltas'].apply(lambda x: np.percentile(np.array(x), 80))
df_beacon['tsBowleyNum'] = df_beacon['tsLow'] + df_beacon['tsHigh'] - 2*df_beacon['tsMid']
df_beacon['tsBowleyDen'] = df_beacon['tsHigh'] - df_beacon['tsLow']
df_beacon['tsSkew'] = df_beacon[['tsLow','tsMid','tsHigh','tsBowleyNum','tsBowleyDen']].apply(
    lambda x: x['tsBowleyNum'] / x['tsBowleyDen'] if x['tsBowleyDen'] != 0 and x['tsMid'] != x['tsLow'] and x['tsMid'] != x['tsHigh'] else 0.0, axis=1
    )
df_beacon['tsMadm'] = df_beacon['deltas'].apply(lambda x: np.median(np.absolute(np.array(x) - np.median(np.array(x)))))
df_beacon['tsConnDiv'] = df_beacon['timestamp'].apply(lambda x: (x[-1].to_pydatetime() - x[0].to_pydatetime()).seconds / 90)

# Variables for data size dispersion
df_beacon['dsLow'] = df_beacon['flow.bytes_toserver'].apply(lambda x: np.percentile(np.array(x), 20))
df_beacon['dsMid'] = df_beacon['flow.bytes_toserver'].apply(lambda x: np.percentile(np.array(x), 50))
df_beacon['dsHigh'] = df_beacon['flow.bytes_toserver'].apply(lambda x: np.percentile(np.array(x), 80))
df_beacon['dsBowleyNum'] = df_beacon['dsLow'] + df_beacon['dsHigh'] - 2*df_beacon['dsMid']
df_beacon['dsBowleyDen'] = df_beacon['dsHigh'] - df_beacon['dsLow']
df_beacon['dsSkew'] = df_beacon[['dsLow','dsMid','dsHigh','dsBowleyNum','dsBowleyDen']].apply(
    lambda x: x['dsBowleyNum'] / x['dsBowleyDen'] if x['dsBowleyDen'] != 0 and x['dsMid'] != x['dsLow'] and x['dsMid'] != x['dsHigh'] else 0.0, axis=1
    )
df_beacon['dsMadm'] = df_beacon['flow.bytes_toserver'].apply(lambda x: np.median(np.absolute(np.array(x) - np.median(np.array(x)))))

# Time delta score calculation
df_beacon['tsSkewScore'] = 1.0 - abs(df_beacon['tsSkew'])
# If jitter is greater than 30 seconds, say 90 seconds, MadmScore might be zero
# It depends on how the jitter is implemented.
df_beacon['tsMadmScore'] = 1.0 - (df_beacon['tsMadm'] / 30.0)
df_beacon['tsMadmScore'] = df_beacon['tsMadmScore'].apply(lambda x: 0 if x < 0 else x)
df_beacon['tsConnCountScore'] = (df_beacon['connections_count']) / df_beacon['tsConnDiv']
df_beacon['tsConnCountScore'] = df_beacon['tsConnCountScore'].apply(lambda x: 1.0 if x > 1.0 else x)
df_beacon['tsScore'] = (((df_beacon['tsSkewScore'] + df_beacon['tsMadmScore'] + df_beacon['tsConnCountScore']) / 3.0) * 1000) / 1000

# Data size score calculation of sent bytes
df_beacon['dsSkewScore'] = 1.0 - abs(df_beacon['dsSkew'])
# If data jitter is greater than 128 bytes, say 300 bytes, MadmScore might be zero
# Depends on how the jitter is implemented. 
df_beacon['dsMadmScore'] = 1.0 - (df_beacon['dsMadm'] / 128.0)
df_beacon['dsMadmScore'] = df_beacon['dsMadmScore'].apply(lambda x: 0 if x < 0 else x)
# Perfect beacons don't send to much data since they are idle and just checking in, 
# division by high number makes the score insensitive. 
# Making the smallness score more sensitive as it makes more sense. 
df_beacon['dsSmallnessScore'] = 1.0 - (df_beacon['dsMid'] / 8192.0)
df_beacon['dsSmallnessScore'] = df_beacon['dsSmallnessScore'].apply(lambda x: 0 if x < 0 else x)
df_beacon['dsScore'] = (((df_beacon['dsSkewScore'] + df_beacon['dsMadmScore'] + df_beacon['dsSmallnessScore']) / 3.0) * 1000) / 1000

# Overal Score calculation
df_beacon['Score'] = (df_beacon['dsScore'] + df_beacon['tsScore']) / 2

df_beacon.sort_values(by= 'Score', ascending=False, inplace=True, ignore_index=True)


Voici les connexions fréquentes ressemblants à des échanges entre des beacons et l'infrastructure C2 d'un groupe d'attaquant.

In [None]:
df_beacon[columns_to_display]

### Qualification des alertes recensées par Suricata



In [None]:
df_suricata_alert = df_suricata[(df_suricata.event_type == "alert") & (df_suricata['alert.category'].str.contains('Not Suspicious') == False)][['alert.severity','alert.signature_id','alert.signature','src_ip','src_port','dest_ip','dest_port','proto','app_proto','flow.start','flow_id','flow.bytes_toserver','flow.bytes_toclient']].sort_values(by=['alert.severity'], ascending=False)
display(df_suricata_alert.head(20))

Une fois les alertes qualifiées, nous allons réaliser une extraction des IOCs potentiellement présents dans le PCAP et les rechercher dans des bases de connaissances de la menace.

In [None]:
# any IoCs in the string?
iocs_found = ioc_extractor.extract(data=df_suricata.fillna(''), columns=['src_ip','dest_ip','dns.rrname'])
iocs_found = iocs_found['Observable'].drop_duplicates()
df_ti = ti.lookup_iocs(data=iocs_found, providers=["VirusTotal", "OTX"])
df_suspnetworkconnections = df_ti[df_ti['Result']==True]
df_suspnetworkconnections = pd.json_normalize(data=df_suspnetworkconnections[['Ioc','Provider','Details']].to_dict(orient='records')).sort_values(by=['Details.pulse_count'], ascending=False)
df_suspnetworkconnections[['Ioc','Provider','Details.pulse_count','Details.names','Details.references']]

### Statistiques

#### Déterminer les types de connexions réalisés

In [None]:
df_suricata.groupby(by='event_type').size().reset_index(name='count').sort_values(by=['count'])

#### Determiner les noms de domaines recherchés

Cette analyse permet de recenser les noms de domaines non communs pouvant résulter d'une attaque en cours.

In [None]:
df_suricata[df_suricata.event_type=='dns'][['timestamp','dns.rrname','dns.rrtype','dns.rcode','dns.grouped']].dropna()