#### Imports

In [6]:
from scapy.all import *
import pandas as pd
import warnings
warnings.filterwarnings(action="ignore")

#### Constants

Accessing the dataset "USTC 2016" that I will use for implementing a real-time IDS for network attacks

In [9]:
WORK_DIR = "datasets/USTC-TFC2016-master"

In [100]:
def extract_conversations(pcap_file):
    conversations = {}  # Dictionary to store conversations

    for packet in rdpcap(pcap_file):
        
        if packet.haslayer(IP):
            src_ip = packet[IP].src
            dst_ip = packet[IP].dst

            if packet.haslayer(TCP):
                src_port = packet[TCP].sport
                dst_port = packet[TCP].dport
                key = (src_ip, dst_ip, src_port, dst_port)
            elif packet.haslayer(UDP):
                src_port = packet[UDP].sport
                dst_port = packet[UDP].dport
                key = (src_ip, dst_ip, src_port, dst_port)
            else:
                continue

            if key not in conversations:
                conversations[key] = []
            conversations[key].append(packet)
    return conversations

Exctracting conversations based on 4 dimentional key - ip src, ip dst, src port, dst port.

In [25]:
pcap_file = f"{WORK_DIR}/Benign/FTP.pcap"
conversations = extract_conversations(pcap_file)

In [26]:
len(conversations)

202034

We have extracted from this single FTP pcap 202034 conversations (all benign).
For each conversation we want to take the first two frames, but first let's look at a conversation that contains more than a single frame.

In [126]:
conversations[('1.1.153.239', '1.2.53.138', 17225, 51825)][0][IP][Raw].load.hex()

'd26de740aef6245e6e440ef6c2dfe42fabc0dd7b8611fbfad3ec5baa1d5d5a95548841051a36cee600a814dc539494ff347720f1296c4a3bc1d84c84f8bed199270a17577a1860ba52ea418a3fc6d0542781fe4a5c695db59f9fc68aaf1e7c5a4da94821abd8958a23d561330423f1d14f934682b5bede88f11d5357e4c6d28ea6e2dc4536b17374911849cdf30fc7fbde3e7f2523a506a1c4cf6c16dd76f04ae41acf38ede6694ae52bdf28f61479465f89edb63caa69de5852fa0408a93f75180db812efed55ccb53dea213235a6a522ca027fb728ddc0062ebd04f43e45ad3c3e9bf2583d4ee53995ca6327a77cbdcee08c7e68d6a9de792414d6845413faa99cedb8b2bfa5c12c819ab21ef92432c64382b6b4c614b33656f624804d0d32a70c2c0cf5c0bc3733b1e629db91628d9f276110a005cbc8bbe4fdb45d0fdeb7c603cee91bd76b9acd1ddfb03b82b5faf24bb139b81f5a8010a5ba8cd1d0d4d3e87e959d8ddd649d783106fb0e32cdef0ba128576d984d02129595b152460b08faf35dba56670aa7a5b6a0e6626480e724f94a0dc644692748ec01aae36aa92524273f81cb5fed4c462ed8c29e6ace4d14f57067664aebf244bbcfe0c257fcf449fd0ea9edb97d45315c9a73af01357bfa777a9bf4d822718c97509d8fb424dd78e25a329fa7431c61e6d9b8034c4f80b7522a3

This is the IP stack that i will convert to an image in the dataframe.
Let's populate a df with the data that will be analyzed later in the next notebook.

In [236]:
def convert_data_to_df(conversations, n=2, save_images=False):
    
    def save_data_as_bmp(index, row):
        predifined_header = f"424d1e06000000000000360000002800000018000000{hex(height)}0000000100180000000000e8050000000000000000000000000000000000004500"

        with open(f"{index}.bmp", "wb") as bmp_file:
          # Write the byte string to the file in binary mode
          bmp_file.write(bytes.fromhex(predifined_header + row.padded_image))
    
    width = 24
    height = 21*n
    df = pd.DataFrame(columns=["dataset", "pcap", "padded_image"])

    for index, (key, conv) in enumerate(conversations.items()):
        textual_data = ""
        
        for i in range(n):
            try:
                if len(conv) > i:
                    textual_data += conv[i][IP].load.hex()
            except:
                print(key)
                return conv
        # Pad data
        textual_data = textual_data + "0"*(height*width*3*2-len(textual_data))
        
        tmp_df = pd.DataFrame({
            "dataset": ["USTC16"],
            "pcap": ["FTP"],
            "padded_image": [textual_data]
        })
        df = pd.concat([df, tmp_df])
        print(f"{index}/{len(conversations)}", end="\r")
        
    if save_images:
        for index, row in df.iterrows():
            save_data_as_bmp(index, row)
    return df

In [None]:
df_converted = convert_data_to_df(conversations, n=1)

Now that we transformed a pcap conversations to images we can implement the process the research "Detection of Malicious Network Flows with Low Preprocessing Overhead" proposes.