<h1 align='center'> DDoS Attack (Filtering and) Labelling </h1>

<h2 align='center'> 
<div style="width:400px;padding:10px;border:1px dotted black;">
Goal: to label DDoS attacks.
</div>
</h2>

<img align='center' src="figs/summary.png" width="400px"/>


- The **input** is a packet-based network trace file that contains an attack, usually with the extension .pcap or pcapng.
- The **output** is a string with a label of the attack found in the input file.

## Research Question (RQ) definitions:
- **RQ1: How to efficiently read and convert packet-based network traces containing DDoS attacks for facilitating futher analysis?** DDoS attack trace contains a very large amount of records making very expensive computationally to read and analyse them. These records are nested, i.e., packets can have different set of information (network fields), which makes even more challenging to analyse them in a single manner. There are different tools and libraries for reading packet-based network traces 

- **RQ2: How to efficiently identify the main characteristics of a DDoS attack?** From a practical point of view the definition of DDoS attack is a set of network records with a same characteristics, which is the majority of the network traffic compared to the remaining traffic.

- **RQ3: How the characteristics of DDoS attacks can be used for labeling them?** There are many taxonomies to classify and label DDoS attacks. Our approach to address this question is based on the practical experience of network operators and network security specialists. In this approach they label attacks based on their observations on the attacks.

<h2 align='center'>===============================================================<br> RQ1: How to efficiently read and convert packet-based network traces containing DDoS attacks for facilitating futher analysis?</h2>

- Approach 1: Read everything in memory at once
- Approach 2: Split the file in smaller files, read each one in memory, and merge the results
- Approach 3: Read and process packet by packet

**Everything is basically libpcap and winpcap**
- CLI: tcpdump and tshark (Ethereal/Wireshark)
- Python libraries: scapy, dpkt, pure-pcapfile, pypcap, pypcapfile, python-libpcap, pcapy, WinPcapy

- for generating packets there is a benchmark in http://libtins.github.io/benchmark/

**Performance deppends on:**
- different output, e.g., stdout, file
- options 

### Library for measuring time 

In [4]:
import time
import resource

import pandas as pd
import numpy as np

<h2>===================================================<br>
Analysing tcpdump</h2>

In [2]:
# !time tcpdump -r data/anon-Booter5.pcap

> More than dozens minutes

In [3]:
# duration_default = time.time()

# !tcpdump -r data/anon-Booter5.pcap > anon-Booter5.txt

# duration_default = time.time() - duration_default
# print(duration_default)

> Depends on the network. Better than the previous BUT more than dozens minutes

- **'-nn': Don't convert addresses (i.e., host addresses, port numbers, etc.) to names.**

In [4]:
time_wo_resolvingnames = time.time()
mem_wo_resolvingnames = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 

!tcpdump -r data/anon-Booter5.pcap -n> anon-Booter5.txt

mem_wo_resolvingnames = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss -mem_wo_resolvingnames
time_wo_resolvingnames = time.time() - time_wo_resolvingnames

print(time_wo_resolvingnames, mem_wo_resolvingnames)

reading from file data/anon-Booter5.pcap, link-type EN10MB (Ethernet)
7.8505859375 16384


>example:

In [5]:
!tcpdump -r data/anon-Booter5.pcap -n|head -1

reading from file data/anon-Booter5.pcap, link-type EN10MB (Ethernet)
23:56:40.211654 IP 247.193.164.155.53 > 227.213.154.241.9231: 26565 1/0/1 A 62.116.143.18 (61)


- **'-t': Don't print a timestamp on each dump line**

In [6]:
time_wo_resolvingnames_wo_time = time.time()
mem_wo_resolvingnames_wo_time = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 

!tcpdump -r data/anon-Booter5.pcap -nt > anon-Booter5.txt

mem_wo_resolvingnames_wo_time = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss -mem_wo_resolvingnames_wo_time
time_wo_resolvingnames_wo_time = time.time() - time_wo_resolvingnames_wo_time

print(time_wo_resolvingnames_wo_time, mem_wo_resolvingnames_wo_time)

reading from file data/anon-Booter5.pcap, link-type EN10MB (Ethernet)
6.884278059005737 8192


> Example:

In [7]:
!tcpdump -r data/anon-Booter5.pcap -nt|head -1

reading from file data/anon-Booter5.pcap, link-type EN10MB (Ethernet)
IP 247.193.164.155.53 > 227.213.154.241.9231: 26565 1/0/1 A 62.116.143.18 (61)


- **'-tt': Print the timestamp on each dump line.**

In [8]:
time_wo_resolvingnames_w_timestamp = time.time()
mem_wo_resolvingnames_w_timestamp = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 

!tcpdump -r data/anon-Booter5.pcap -ntt > anon-Booter5.txt

mem_wo_resolvingnames_w_timestamp = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss -mem_wo_resolvingnames_w_timestamp
time_wo_resolvingnames_w_timestamp = time.time() - time_wo_resolvingnames_w_timestamp

print(time_wo_resolvingnames_w_timestamp, mem_wo_resolvingnames_w_timestamp)

reading from file data/anon-Booter5.pcap, link-type EN10MB (Ethernet)
7.259585857391357 4096


>example:

In [9]:
!tcpdump -r data/anon-Booter5.pcap -ntt|head -1

reading from file data/anon-Booter5.pcap, link-type EN10MB (Ethernet)
1376521000.211654 IP 247.193.164.155.53 > 227.213.154.241.9231: 26565 1/0/1 A 62.116.143.18 (61)


- **'-v': verbose output. For example printing: the time to live, identification, total length and options in an IP packet, packet integrity checks (verifying the IP header checksum).** 

In [10]:
time_wo_resolvingnames_w_timestamp_verbose = time.time()
mem_wo_resolvingnames_w_timestamp_verbose = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 

!tcpdump -r data/anon-Booter5.pcap -nttv > anon-Booter5.txt

mem_wo_resolvingnames_w_timestamp_verbose = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss -mem_wo_resolvingnames_w_timestamp_verbose
time_wo_resolvingnames_w_timestamp_verbose = time.time() - time_wo_resolvingnames_w_timestamp_verbose

print(time_wo_resolvingnames_w_timestamp_verbose, mem_wo_resolvingnames_w_timestamp_verbose)

reading from file data/anon-Booter5.pcap, link-type EN10MB (Ethernet)
15.678691864013672 0


> example:

In [11]:
!tcpdump -r data/anon-Booter5.pcap -nttv|head -2

reading from file data/anon-Booter5.pcap, link-type EN10MB (Ethernet)
1376521000.211654 IP (tos 0x0, ttl 124, id 10443, offset 0, flags [none], proto UDP (17), length 89)
    247.193.164.155.53 > 227.213.154.241.9231: 26565 1/0/1 ddostheinter.net. A 62.116.143.18 (61)


- **'-vvv': (max) verbose**

In [12]:
time_wo_resolvingnames_w_timestamp_verboseplus = time.time()
mem_wo_resolvingnames_w_timestamp_verboseplus = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 

!tcpdump -r data/anon-Booter5.pcap -nttvvv > anon-Booter5.txt

mem_wo_resolvingnames_w_timestamp_verboseplus = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss -mem_wo_resolvingnames_w_timestamp_verboseplus
time_wo_resolvingnames_w_timestamp_verboseplus = time.time() - time_wo_resolvingnames_w_timestamp_verboseplus

print(time_wo_resolvingnames_w_timestamp_verboseplus, mem_wo_resolvingnames_w_timestamp_verboseplus)

reading from file data/anon-Booter5.pcap, link-type EN10MB (Ethernet)
32.09510087966919 8192


In [13]:
!tcpdump -r data/anon-Booter5.pcap -nttvvv|head -2

reading from file data/anon-Booter5.pcap, link-type EN10MB (Ethernet)
1376521000.211654 IP (tos 0x0, ttl 124, id 10443, offset 0, flags [none], proto UDP (17), length 89)
    247.193.164.155.53 > 227.213.154.241.9231: [udp sum ok] 26565 q: A? ddostheinter.net. 1/0/1 ddostheinter.net. [6m30s] A 62.116.143.18 ar: . OPT UDPsize=4000 (61)


**Attention: tcpdump DOES NOT have customized output. We must to use other tools, such as grep, or sed, or awk: http://stackoverflow.com/questions/13492611/tcpdump-output-only-source-and-destination-addresses**

In [10]:
time_custom = time.time()
mem_custom = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 

!tcpdump -r data/anon-Booter5.pcap -nttv|\
awk 'NR % 2 {\
printf $1";";\
for(i=1;i<=NF;i++) if ($i == "ttl") printf $(i+1)";";\
for(i=1;i<=NF;i++) if ($i == "proto") printf $(i+1)";";\
for(i=1;i<=NF;i++) if ($i == "length") printf $(i+1)";";\
for(i=1;i<=NF;i++) if ($i == "offset") printf $(i+1)";";\
}\
!(NR % 2) {\
printf $1";"$3"\n"}' > anon-Booter5.txt

mem_custom = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss -mem_custom
time_custom = time.time() - time_custom

print(time_custom, mem_custom)

reading from file data/anon-Booter5.pcap, link-type EN10MB (Ethernet)
47.359403133392334 0


>example:

In [7]:
!tcpdump -r data/anon-Booter5.pcap -nttv|\
awk 'NR % 2 {\
printf $1";"\
$2";";\
for(i=1;i<=NF;i++) if ($i == "ttl") printf $(i+1)";";\
for(i=1;i<=NF;i++) if ($i == "proto") printf $(i+1)";";\
for(i=1;i<=NF;i++) if ($i == "length") printf $(i+1)";";\
for(i=1;i<=NF;i++) if ($i == "offset") printf $(i+1)";";\
}\
!(NR % 2) {\
printf $1";"$3"\n"}'|head -1

reading from file data/anon-Booter5.pcap, link-type EN10MB (Ethernet)
1376521000.211654;IP;124,;UDP;89);0,;247.193.164.155.53;227.213.154.241.9231:


### Summary

In [72]:
analysis_tcpdump = pd.DataFrame([['time_wo_resolvingnames',time_wo_resolvingnames],\
              ['time_wo_resolvingnames_wo_time',time_wo_resolvingnames_wo_time],\
              ['time_wo_resolvingnames_w_timestamp',time_wo_resolvingnames_w_timestamp],\
              ['time_wo_resolvingnames_w_timestamp_verbose',time_wo_resolvingnames_w_timestamp_verbose],\
              ['time_wo_resolvingnames_w_timestamp_verboseplus',time_wo_resolvingnames_w_timestamp_verboseplus],\
              ['time_custom',time_custom]],\
             columns=['description','time'])

analysis_tcpdump

Unnamed: 0,description,time
0,time_wo_resolvingnames,7.850586
1,time_wo_resolvingnames_wo_time,6.884278
2,time_wo_resolvingnames_w_timestamp,7.259586
3,time_wo_resolvingnames_w_timestamp_verbose,15.678692
4,time_wo_resolvingnames_w_timestamp_verboseplus,32.095101
5,time_custom,46.229947


<h2> =================================================<br>
Analysing tshark </h2>

- **'-n': Disable network object name resolution (such as hostname, TCP and UDP port names)**

In [21]:
time_wo_resolvingnames2 = time.time()
mem_wo_resolvingnames2 = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 

!tshark -r data/anon-Booter5.pcap -n > anon-Booter5.txt

mem_wo_resolvingnames2 = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss -mem_wo_resolvingnames2
time_wo_resolvingnames2 = time.time() - time_wo_resolvingnames2

print(time_wo_resolvingnames2, mem_wo_resolvingnames2)

36.34719109535217 0


>example:

In [22]:
!tshark -r data/anon-Booter5.pcap -n|head -1

  1   0.000000 247.193.164.155 -> 227.213.154.241 DNS 103 Standard query response 0x67c5  A 62.116.143.18
tshark: An error occurred while printing packets: Broken pipe.


- **'-t e': Print the timestamp on each dump line.**

In [23]:
time_wo_resolvingnames_w_timestamp2 = time.time()
mem_wo_resolvingnames_w_timestamp2 = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 

!tshark -r data/anon-Booter5.pcap -n -t e > anon-Booter5.txt

mem_wo_resolvingnames_w_timestamp2 = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss -mem_wo_resolvingnames_w_timestamp2
time_wo_resolvingnames_w_timestamp2 = time.time() - time_wo_resolvingnames_w_timestamp2

print(time_wo_resolvingnames_w_timestamp2, mem_wo_resolvingnames_w_timestamp2)

38.84751915931702 0


>example:

In [40]:
!tshark -r data/anon-Booter5.pcap -n -t e|head -1

  1 1376521000.211654 247.193.164.155 -> 227.213.154.241 DNS 103 Standard query response 0x67c5  A 62.116.143.18
tshark: An error occurred while printing packets: Broken pipe.


- **with an customized output**

In [62]:
time_custom2 = time.time()
mem_custom2 = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

!tshark -r data/anon-Booter5.pcap -n -E separator=\; -E header=y -T fields \
-e frame.time_epoch \
-e ip.ttl \
-e ip.proto \
-e frame.len \
-e ip.src \
-e ip.dst \
-e udp.srcport \
-e udp.dstport \
-e tcp.srcport \
-e tcp.dstport > anon-Booter5.txt

mem_custom2 = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss -mem_custom2
time_custom2 = time.time() - time_custom2

print(time_custom2, mem_custom2)

138.0743579864502 12288


> example:

In [64]:
!tshark -r data/anon-Booter5.pcap -n -E separator=\; -E header=y -T fields \
-e frame.time_epoch \
-e ip.ttl \
-e ip.proto \
-e frame.len \
-e ip.src \
-e ip.dst \
-e udp.srcport \
-e udp.dstport \
-e tcp.srcport \
-e tcp.dstport |head -2

frame.time_epoch;ip.ttl;ip.proto;frame.len;ip.src;ip.dst;udp.srcport;udp.dstport;tcp.srcport;tcp.dstport
1376521000.211654000;124;17;103;247.193.164.155;227.213.154.241;53;9231;;
tshark: An error occurred while printing packets: Broken pipe.


## Summary

In [73]:
analysis_tshark = pd.DataFrame([['time_wo_resolvingnames',time_wo_resolvingnames2],\
              ['time_wo_resolvingnames_wo_time',np.nan],\
              ['time_wo_resolvingnames_w_timestamp',time_wo_resolvingnames_w_timestamp2],\
              ['time_wo_resolvingnames_w_timestamp_verbose',np.nan],\
              ['time_wo_resolvingnames_w_timestamp_verboseplus',np.nan],\
              ['time_custom',time_custom2]],\
             columns=['description','time'])

analysis_tshark

Unnamed: 0,description,time
0,time_wo_resolvingnames,36.347191
1,time_wo_resolvingnames_wo_time,
2,time_wo_resolvingnames_w_timestamp,38.847519
3,time_wo_resolvingnames_w_timestamp_verbose,
4,time_wo_resolvingnames_w_timestamp_verboseplus,
5,time_custom,138.074358


<h2> ==============================================<br>
Analysing Scapy </h2>

In [80]:
import logging
logging.getLogger("scapy.runtime").setLevel(logging.ERROR)

from scapy.all import *

> NOTE: It is COMPLETELY impractical to read everything in memory, as the following:

In [81]:
# entire_file = rdpcap("data/anon-Booter5.pcap")

In [91]:
time_custom3 = time.time()
mem_custom3 = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

f = open('anon-Booter5.txt','w')

with PcapReader('data/anon-Booter5.pcap') as pcap_reader:
        for pkt in pcap_reader:
            
            timestamp= pkt.time
            
            if IP in pkt:
                ttl=pkt[IP].ttl
                ip_proto=pkt[IP].proto
                ip_length=pkt[IP].proto
                ip_src=pkt[IP].src
                ip_dst=pkt[IP].dst
                
                if UDP in pkt:
                    sport=pkt[UDP].sport
                    dport=pkt[UDP].dport
                    tcp_flags=""

                if TCP in pkt:
                    sport=pkt[TCP].sport
                    dport=pkt[TCP].dport
                    tcp_flags=pkt[TCP].flags #'F': 'FIN','S': 'SYN','R': 'RST','P': 'PSH','A': 'ACK','U': 'URG','E': 'ECE','C': 'CWR',
            
                print (str(timestamp),\
                       str(ttl),\
                       str(ip_proto),\
                       str(ip_src),\
                       str(ip_dst),\
                       str(sport),\
                       str(dport),\
                       str(tcp_flags),\
                       sep=";",\
                       file=f) 

mem_custom3 = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss -mem_custom3
time_custom3 = time.time() - time_custom3

print(time_custom3, mem_custom3)

2628.9540469646454 681889792


In [92]:
analysis_scapy = pd.DataFrame([['time_wo_resolvingnames',np.nan],\
              ['time_wo_resolvingnames_wo_time',np.nan],\
              ['time_wo_resolvingnames_w_timestamp',np.nan],\
              ['time_wo_resolvingnames_w_timestamp_verbose',np.nan],\
              ['time_wo_resolvingnames_w_timestamp_verboseplus',np.nan],\
              ['time_custom',time_custom3]],\
             columns=['description','time'])

analysis_scapy

Unnamed: 0,description,time
0,time_wo_resolvingnames,
1,time_wo_resolvingnames_wo_time,
2,time_wo_resolvingnames_w_timestamp,
3,time_wo_resolvingnames_w_timestamp_verbose,
4,time_wo_resolvingnames_w_timestamp_verboseplus,
5,time_custom,2628.954047


<h2>================================================================<br>
Analysing dpkt</h2>

Unfortunately, there is no support for dpkt on python3 (yet). We performed the analysis in another file [(dpkt_analysis_py27.ipynb)](dpkt_analysis_py27.ipynb). And the result was the following:

In [3]:
analysis_dpkt = pd.DataFrame([['time_wo_resolvingnames',np.nan],\
              ['time_wo_resolvingnames_wo_time',np.nan],\
              ['time_wo_resolvingnames_w_timestamp',np.nan],\
              ['time_wo_resolvingnames_w_timestamp_verbose',np.nan],\
              ['time_wo_resolvingnames_w_timestamp_verboseplus',np.nan],\
              ['time_custom',float(82.45)]],\
             columns=['description','time'])

analysis_dpkt

NameError: name 'pd' is not defined

## Summary: 

In [93]:
comparison = pd.merge(analysis_tcpdump, analysis_tshark, on='description')
comparison = pd.merge(comparison,analysis_scapy, on='description')
comparison.columns = ['description','tcpdump','tshark','scapy']
comparison

Unnamed: 0,description,tcpdump,tshark,scapy
0,time_wo_resolvingnames,7.850586,36.347191,
1,time_wo_resolvingnames_wo_time,6.884278,,
2,time_wo_resolvingnames_w_timestamp,7.259586,38.847519,
3,time_wo_resolvingnames_w_timestamp_verbose,15.678692,,
4,time_wo_resolvingnames_w_timestamp_verboseplus,32.095101,,
5,time_custom,46.229947,138.074358,2628.954047
