# Traffic Analysis Lab: OpenVPN and DNS with pypcap

This lab provides a hands-on introduction to network traffic analysis using `pcap-ct` (or `pypcap`) and `dpkt` in Python. We will analyze a PCAP file containing OpenVPN and DNS traffic, extracting relevant information to answer specific questions about the network communication. `pcap-ct` works on windows prety well.

## What is pypcap and dpkt?

*   **pypcap:** A Python interface to the libpcap packet capture library. It allows you to capture and read network traffic from live interfaces or PCAP files.
*   **dpkt:** A fast, simple packet parsing library for Python. It can parse various network protocols, including IP, TCP, UDP, and DNS.

## What is OpenVPN?

OpenVPN is a popular open-source VPN (Virtual Private Network) solution that creates secure point-to-point or site-to-site connections. It can use various protocols (UDP or TCP) and encryption algorithms to secure network traffic.

## Lab Objective

In this lab, you will learn how to:

*   Read and parse PCAP files using `pcap-ct` (or `pypcap`) and `dpkt`.
*   Extract information from OpenVPN and DNS packets.
*   Understand the challenges of dissecting encrypted traffic.
*   Use Python to analyze network traffic and answer specific questions.

1.  **Install required libraries:**

    ```bash
    sudo apt install libpcap-dev  # Install libpcap development headers
    pip3 install pcap-ct dpkt #(or `pypcap`)
    ```

2.  **Obtain a PCAP file:** You will need a PCAP file containing OpenVPN and DNS traffic. You can create one yourself using `tcpdump` or Wireshark, or use a sample PCAP file if provided for the lab. A good way to generate this pcap is to connect to a vpn server and then do some dns queries, then stop the capture.

## Lab Procedure

1.  **Create the Python script:**
If for any reason this strcture don't work for you, change all the code at your will.

First let's get everything we need straight:

In [2]:
#Make sure you have the following installed:
#%pip install pypcap

import pcap
import dpkt
import socket
import struct
import binascii
from collections import Counter

def ip_to_str(address):
    """Convert an IP address to a readable string."""
    return socket.inet_ntoa(address)

# Your code here
# Hypothetical control packet structure (REPLACE WITH ACTUAL STRUCTURE)
#OVPN_CONTROL_PACKET_FORMAT = "!BBH"  # ! = network byte order, B = unsigned char (1 byte), H = unsigned short (2 bytes)
#OVPN_CONTROL_PACKET_FORMAT = "!B4I20sIIBB"  # ! = network byte order, B = unsigned char (1 byte), H = unsigned short (2 bytes)
OVPN_CONTROL_PACKET_FORMAT = "!B8s20sIIBI"  # ! = network byte order, B = unsigned char (1 byte), H = unsigned short (2 bytes)
#OVPN_CONTROL_PACKET_SIZE = struct.calcsize(OVPN_CONTROL_PACKET_FORMAT)
OVPN_CONTROL_PACKET_SIZE = struct.calcsize(OVPN_CONTROL_PACKET_FORMAT)

In [None]:
"""
Analyzes the Payload of a UDP PCAP packet and extracts information about DNS queries and responses.
Args:
    pcap_file (packet): UDP Packet to be analyzed.
Returns: (This is a hypothetical structure, replace with actual structure if you need to)
    tuple: A tuple containing:
        - dns_queries (dict): A dictionary with DNS query names as keys and their counts as values. It should be as follow:
            - query_name (str): The name of the DNS query.
                - count (int): The number of times the query was made.
                - types (dict): A dictionary with DNS query types as keys and their counts as values.
                    For Request queries, the key should be "Request"
                    - query_type (str): The type of the DNS query. (Request or Response)
                    - count (int): The number of times the query type was made.
                    - query_data (str): The data of the DNS query.
                    Example:
                    {
                        'example.com': {
                            'count': 2,
                            'types': {
                                'A': 1,
                                'AAAA': 1
                            },
                            'query_data': '1.2.3.4'
                        }
                    }
                    dns_queries[q.name] = example.com
                    dns_queries[q.name]['count'] = 2
                    dns_queries[q.name]['types']['A'] = 1
                    dns_queries[q.name]['types']['AAAA'] = 1
                    dns_queries[q.name]['query_data'] = '1.2.3.4'

                    And for Response queries, the key should be the response type (e.g., "A", "AAAA", "CNAME", etc.).
                    - response_type (str): The type of the DNS response. (A, AAAA, CNAME, etc.)
                    - count (int): The number of times the response type was made.
                    - response_data (str): The data of the DNS response.
"""
import struct
import binascii
from collections import Counter

def analyze_dns(udp_data):
    dns_types = {
        dpkt.dns.DNS_A: 'A',
        dpkt.dns.DNS_NS: 'NS',
        dpkt.dns.DNS_CNAME: 'CNAME',
        dpkt.dns.DNS_SOA: 'SOA',
        dpkt.dns.DNS_PTR: 'PTR',
        dpkt.dns.DNS_MX: 'MX',
        dpkt.dns.DNS_AAAA: 'AAAA',
        dpkt.dns.DNS_SRV: 'SRV',
        dpkt.dns.DNS_TXT: 'TXT',
        dpkt.dns.DNS_OPT: 'OPT',
        dpkt.dns.DNS_AA: 'AA',
        dpkt.dns.DNS_ANY: 'ANY',
        dpkt.dns.DNS_IN: 'IN',
        dpkt.dns.DNS_RA: 'RA',
    }
    dns = dpkt.dns.DNS(udp_data)
    dns_queries = {}
    ### Your code here
    ### You can use the following code to extract DNS queries and responses:
    ### this will be the query name as an iterable object
    ### if dns.qr == dpkt.dns.DNS_R:  # Response
    ### this will be the response name as an iterable object

    # Solution
    

    return dns_queries

"""
Analyzes the Payload of a UDP OpenVPN packet and extracts information about Packet control items.
Args:
    pcap_file (packet): UDP Packet to be analyzed.
Returns: (This is a hypothetical structure, replace with actual structure if you need to)
    tuple: A tuple containing:
        - vpn_packets (list): A list of dictionaries, each containing details about an OpenVPN packet. Each dictionary contains:
            - type (str): The type of the packet, either "Control Packet" or "Data Packet"
            - packet_type (int, optional): The type of the control packet (only present if the packet is a control packet).
            - message_id (int, optional): The message ID of the control packet (only present if the packet is a control packet).
            - message_length (int, optional): The length of the message in the control packet (only present if the packet is a control packet).
            - message_content (str, optional): The hexadecimal representation of the message content (only present if the packet is a control packet and the message length is valid).
"""
def analyze_openvpn(udp_data):
    udp = udp_data
    vpn_packet_info = {}
    if len(udp) < OVPN_CONTROL_PACKET_SIZE:
        vpn_packet_info["type"] = "Data Packet"
        return vpn_packet_info
    ### Your code here
    ### Hypothetical control packet structure was given (REPLACE WITH ACTUAL STRUCTURE)
    ### You can use this function: packet_type, message_id, message_length = struct.unpack(CONTROL_PACKET_FORMAT, udp.data[:CONTROL_PACKET_SIZE])
    ### Note: You may need to handle exceptions for struct.unpack errors (Meaning this is a possible Data Packet, Malformed Control Packet or just another message, so type will be "Data Packet")
    ### Use the following code to extract the message content: binascii.hexlify(message_content).decode()
    # Solution
    
    return vpn_packet_info

"""
Analyzes a PCAP file to extract DNS queries and OpenVPN packet information.
    pcap_file (str): Path to the PCAP file to be analyzed.
    - dns_queries (dict): A dictionary with DNS query names as keys and their counts as values. (This is a hypothetical structure, replace with actual structure if you need to)
    - vpn_packets (list): A list of dictionaries, each containing details about an OpenVPN packet. (This is a hypothetical structure, replace with actual structure if you need to)
        After analyzing the OpenVPN packets, to each dictionary should be added the following keys:
            - timestamp (float): The timestamp of the packet.
            - src_ip (str): The source IP address of the packet.
            - dst_ip (str): The destination IP address of the packet.
            - src_port (int): The source port of the packet.
            - dst_port (int): The destination port of the packet.
            - length (int): The total length of the raw packet bytes.
            - udp_length (int): The length of the UDP data.
            - data (str): The hexadecimal representation of the UDP data.
    - vpn_src_ips (set): A set of source IP addresses for OpenVPN packets.
    - vpn_dst_ips (set): A set of destination IP addresses for OpenVPN packets.
Returns: (This is a hypothetical structure, replace with actual structure if you need to)
    tuple: A tuple containing:
        - dns_queries (dict): A dictionary with DNS query names as keys and their counts as values.
        - vpn_packets (list): A list of dictionaries, each containing details about an OpenVPN packet.
        - vpn_src_ips (set): A set of source IP addresses for OpenVPN packets.
        - vpn_dst_ips (set): A set of destination IP addresses for OpenVPN packets.
"""
def analyze_pcap(pcap_file):
    vpn_packets = []
    dns_queries = {}
    vpn_src_ips = set()
    vpn_dst_ips = set()
    
    for timestamp, raw_bytes in pcap_file:
        ### Your code here
        ### First you need to parse the Ethernet frame, then the IP packet, and finally the UDP packet.
        ### You can use the dpkt library to parse the packets. packet = dpkt.ethernet.Ethernet(raw_bytes) will parse the Ethernet frame. Remember, to parse the IP packet, you need to access the packet.data attribute, same for UDP.
        ### You can use the port numbers to identify DNS and OpenVPN packets.

        # Solution
       
    return dns_queries, vpn_packets, vpn_src_ips, vpn_dst_ips
"""
Prints the DNS queries and their counts.
Args:
    dns_queries (dict): A dictionary with DNS query names as keys and their counts as values.
"""
def print_dns_queries(dns_queries):
    if(len(dns_queries) == 0):
        print("No DNS queries found")
        return
    print("DNS Queries:")
    for query in dns_queries:
        print(f"- {query}: {len(query)}")

"""
Prints a summary of the OpenVPN packets, including the total number of packets, source and destination IPs, and other details. This could be a good place to answer the questions.
Args:
    vpn_packets (list): A list of dictionaries, each containing details about an OpenVPN packet.
    vpn_src_ips (set): A set of source IP addresses for OpenVPN packets.
    vpn_dst_ips (set): A set of destination IP addresses for OpenVPN packets.
"""
def print_vpn_summary(vpn_packets, vpn_src_ips, vpn_dst_ips):
    print(f"\nTotal OpenVPN packets: ")
    print(f"Possible OpenVPN Source IPs: ")
    print(f"Possible OpenVPN Destination IPs: ")

    print(f"1. How many DNS queries were made?")

    if dns_queries:
        
        most_frequent_query = None
        print(f"2. What was the most frequent DNS query? {most_frequent_query}")
    else:
        print("2. What was the most frequent DNS query? No DNS queries found")
    print(f"3. How many OpenVPN Packets were captured?")
    print(f"4. What was the source IP of the OpenVPN Client? ")
    print(f"5. What was the Destination IP of the OpenVPN Server?")
    print(f"6. Did the OpenVPN use UDP? ")
    print(f"7. What port was used for OpenVPN?")
"""
Prints detailed information about each OpenVPN packet.
Args:
    vpn_packets (list): A list of dictionaries, each containing details about an OpenVPN packet.
"""
def print_vpn_packet_details(vpn_packets):
    if vpn_packets:
        print("\nOpenVPN Packet Details:")
        for i, packet in enumerate(vpn_packets):
            print(f"\nPacket #{i+1}:")
            for key, value in packet.items():
                print(f"  {key}: {value}")
"""
Performs advanced analysis on the OpenVPN packets, including control packet types, message lengths, message IDs, and timing analysis.
Args:
    vpn_packets (list): A list of dictionaries, each containing details about an OpenVPN packet.

Note: This function is a hypothetical example of advanced analysis. You can modify it to perform other types of analysis based on the OpenVPN packet data.
"""
def advanced_vpn_analysis(vpn_packets):
    control_packet_types = set()
    message_lengths = []

    for packet in vpn_packets:
        if packet.get("type") == "Control Packet":
            control_packet_types.add(packet["packet_type"])
            message_lengths.append(packet["message_length"])

    print("\nAdvanced OpenVPN Analysis:")
    print(f"1. Observed Control Packet Types: {control_packet_types}")

    if message_lengths:
        print(f"3. Message Length Distribution:")


    message_ids = []


    if message_ids:
        print("\n2. Message ID Analysis:")
        print(f"   - Message IDs: {message_ids}")
    else:
        print("No control packets found to analyze their message IDs")

    print("\n4. Correlating Control Packets (Basic Example):")


    print("\n5. Timing Analysis of Control Packets:")


if __name__ == "__main__":
    pcap_file_path = f"OpenVPN.pcapng"  # Replace with your PCAP file
    pcap_file = pcap.pcap(pcap_file_path) 
    dns_queries, vpn_packets, vpn_src_ips, vpn_dst_ips = analyze_pcap(pcap_file)

    print_dns_queries(dns_queries)
    print_vpn_summary(vpn_packets, vpn_src_ips, vpn_dst_ips)
    print_vpn_packet_details(vpn_packets)
    advanced_vpn_analysis(vpn_packets)


# Lab Tasks and Questions

*    DNS Queries: How many unique DNS queries were made in the PCAP file?
*    Most Frequent Query: What was the most frequently queried domain?
*    Total VPN Packets: How many OpenVPN packets were captured?
*    VPN Client IP: What was the source IP address of the OpenVPN client?
*    VPN Server IP: What was the destination IP address of the OpenVPN server?
*    VPN Protocol: What transport protocol was used for OpenVPN (UDP)?
*    VPN Port: What port was used for OpenVPN traffic?

## Advanced Lab Tasks and Questions (Focus on OpenVPN Control Packets):

These questions assume you have correctly set the CONTROL_PACKET_FORMAT and are able to extract packet_type, message_id, and message_length from control packets.

*    Control Packet Types: What different control packet types were observed in the capture? (List the unique packet_type values). This can give you insights into the different stages of the OpenVPN connection (e.g., initial handshake, keep-alives, data channel establishment).

*    Message ID Analysis: Are there any patterns in the message_id values? Are they sequential, random, or do they follow any other discernible pattern? This might reveal information about the communication flow between the client and server.

*    Message Length Distribution: What is the distribution of message_length values for control packets? Are there any common sizes or outliers? This could indicate different types of control messages being exchanged.

*    Correlating Control Packets: Can you identify pairs or sequences of control packets that seem to be related (e.g., a request followed by a response)? This requires analyzing the packet_type and potentially the message_id and timestamps. For example, a PUSH_REQUEST should be followed by a PUSH_REPLY.

*    Timing Analysis of Control Packets: What is the time interval between consecutive control packets? Are there any periods of inactivity followed by bursts of control traffic? This could indicate connection maintenance or re-establishment attempts.

*    (Even More Advanced - Requires more knowledge about OpenVPN internals - Huge bonus): If you can identify specific control packet types (by matching packet_type to known values from the OpenVPN source code or documentation), try to interpret the message_content (even in its hex form) based on the expected structure for that packet type. This is the closest you can get to actual payload analysis without decryption.