<a href="https://colab.research.google.com/github/shubhamsaloni/LLM_WiFi/blob/main/SSALONI_LLM_For_Business_NetworkAnalyzer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This tool analyzes packet captures and provides useful insights to a network admin/support engineer to effectively debug and locate issues in a network
It is solved using 2 methods, openAI LLM and agentic system using crewAI

Flow
- convert pcap to text so that LLMs can work on them
- Strip sensitive network data to avoid data leak to LLMs
- call openAI LLM to analyze the pcaps and provide insights and final verdict
- if final verdict is NOT OK, suggest causes and next debugging steps


In [81]:
!pip install scapy
!pip install pyshark



In [92]:
#Method 1 : pre-processing data and then using LLM call with prompt engineering
from scapy.all import rdpcap
from openai import OpenAI
import os
from google.colab import userdata

import textwrap
import re

# Initialize OpenAI client
client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))

# Step 1: Read and extract key info from pcap
def extract_pcap_summary(file_path):
    packets = rdpcap(file_path)
    summary_lines = []

    for i, pkt in enumerate(packets[:100]):  # Limit to first 100 packets to avoid overload
        line = f"{i+1}. Time: {pkt.time}, Summary: {pkt.summary()}"
        summary_lines.append(line)

    return "\n".join(summary_lines)

# Step 2 : remove sensitive information from packet captures

def sanitize_text(text):
    # Redact IP addresses (especially private ranges)
    text = re.sub(r"\b(?:192\.168|10\.\d{1,3}|172\.(?:1[6-9]|2\d|3[0-1]))(?:\.\d{1,3}){2}\b", "[REDACTED_IP]", text)
    # Redact all other IPv4 addresses (optional - uncomment if needed)
    # text = re.sub(r"\b(?:\d{1,3}\.){3}\d{1,3}\b", "[REDACTED_IP]", text)

    # Redact MAC addresses
    text = re.sub(r"\b(?:[0-9A-Fa-f]{2}[:-]){5}(?:[0-9A-Fa-f]{2})\b", "[REDACTED_MAC]", text)

    # Redact email addresses
    text = re.sub(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b", "[REDACTED_EMAIL]", text)

    # Redact domain names (simple heuristic)
    text = re.sub(r"\b(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,6}\b", "[REDACTED_DOMAIN]", text)

    # Redact URLs
    text = re.sub(r"https?://[^\s]+", "[REDACTED_URL]", text)

    # Redact SSID patterns
    text = re.sub(r"(SSID=|Beacon\s\(|Probe Request for\s+|\"|')([^\s\"')]+)", r"\1[REDACTED_SSID]", text, flags=re.IGNORECASE)


    return text

# Step 3: Send to OpenAI for analysis
def analyze_pcap_with_llm(summary_text, input_model):
    prompt = f"""
You are a network packet capture analyst. Here's a summary of a pcap file:

{summary_text}

Please analyze the traffic. Explain the protocols seen in 1 line. Summarize insights on what is happening in 1 line and Give final verdict on OK or NOT OK. For NOT OK suggest follow up causes and debugging steps . Do not create new info, say don't know' when lacking information ?
    """

    # Truncate if too large
    prompt = textwrap.shorten(prompt, width=12000, placeholder="...")

    response = client.chat.completions.create(
        model=input_model,
        messages=[
            {"role": "system", "content": "You are a network packet capture expert."},
            {"role": "user", "content": prompt}
        ]
    )

    return response.choices[0].message.content

# Step 4 : get results
def get_results(pcap_path, input_model):
    summary = extract_pcap_summary(pcap_path)
    print("=== PCAP Summary ===")
    print(summary[:500] + "\n...")  # Preview
    print("\n=== Sanitized Summary ===")
    sanitized_summary = sanitize_text(summary)
    print(sanitized_summary[:500] + "\n...")

    result = analyze_pcap_with_llm(sanitized_summary, input_model)
    print("\n=== LLM Analysis ===")
    print(result)

# Step 5: Get results on 2 samples of pcap files, one OK and other NOT OK

# Run the flow
if __name__ == "__main__":
    print("**************************************")
    pcap_path = "/content/wifi1.pcapng"
    get_results(pcap_path, "gpt-4")
    print("**************************************")
    pcap_path = "/content/wifi2.pcapng"
    get_results(pcap_path, "gpt-4")



**************************************
=== PCAP Summary ===
1. Time: 1462386841.034384, Summary: RadioTap / Dot11FCS / Dot11Beacon / SSID='DENVEROFFICE' / Dot11EltRates / Dot11EltDSSSet / Dot11Elt / Dot11EltCountry / Dot11EltERP / Dot11EltMicrosoftWPA / Dot11EltRates / Dot11EltVendorSpecific / Dot11EltVendorSpecific / Dot11EltHTCapabilities / Dot11EltVendorSpecific / Dot11Elt / Dot11EltVendorSpecific / Dot11EltVendorSpecific / Dot11EltVendorSpecific
2. Time: 1462386841.325902, Summary: RadioTap / Dot11FCS / Dot11ProbeReq / SSID='DENVEROFFICE' / Dot11EltR
...

=== Sanitized Summary ===
1. Time: 1462386841.034384, Summary: RadioTap / Dot11FCS / Dot11Beacon / SSID='[REDACTED_SSID]' / Dot11EltRates / Dot11EltDSSSet / Dot11Elt / Dot11EltCountry / Dot11EltERP / Dot11EltMicrosoftWPA / Dot11EltRates / Dot11EltVendorSpecific / Dot11EltVendorSpecific / Dot11EltHTCapabilities / Dot11EltVendorSpecific / Dot11Elt / Dot11EltVendorSpecific / Dot11EltVendorSpecific / Dot11EltVendorSpecific
2. Time: 14

The following is same problem solved using agentic method via crew ai

In [94]:
!pip install "crewai[tools]"

Collecting crewai[tools]
  Downloading crewai-0.152.0-py3-none-any.whl.metadata (35 kB)
Collecting chromadb>=0.5.23 (from crewai[tools])
  Downloading chromadb-1.0.15-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.0 kB)
Collecting instructor>=1.3.3 (from crewai[tools])
  Downloading instructor-1.10.0-py3-none-any.whl.metadata (11 kB)
Collecting json-repair==0.25.2 (from crewai[tools])
  Downloading json_repair-0.25.2-py3-none-any.whl.metadata (7.9 kB)
Collecting json5>=0.10.0 (from crewai[tools])
  Downloading json5-0.12.0-py3-none-any.whl.metadata (36 kB)
Collecting jsonref>=1.1.0 (from crewai[tools])
  Downloading jsonref-1.1.0-py3-none-any.whl.metadata (2.7 kB)
Collecting litellm==1.74.3 (from crewai[tools])
  Downloading litellm-1.74.3-py3-none-any.whl.metadata (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.3/40.3 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting onnxruntime==1.22.0 (from crewai[tools])
  Downloading on

In [105]:
# Method 2 using Agents
# serial execution
# 1. convert data to text
# 2. Remove sensitive information
# 3. call agent that does packet capture analysis
# 4. if verdict is not OK, debugger agent suggests reasons and next steps

from crewai import Agent, Task, Crew, Process
from crewai.tools import BaseTool
# from crewai_tools import SerperDevTool, WebsiteSearchTool
from IPython.display import Markdown, display


# Step 1: Read and extract key info from pcap
def extract_pcap_summary():
    file_path = "/content/wifi2.pcapng"
    packets = rdpcap(file_path)
    summary_lines = []

    for i, pkt in enumerate(packets[:100]):  # Limit to first 100 packets to avoid overload
        line = f"{i+1}. Time: {pkt.time}, Summary: {pkt.summary()}"
        summary_lines.append(line)

    return "\n".join(summary_lines)

# Step 2 : remove sensitive information from packet captures

def sanitize_text(text):
    # Redact IP addresses (especially private ranges)
    text = re.sub(r"\b(?:192\.168|10\.\d{1,3}|172\.(?:1[6-9]|2\d|3[0-1]))(?:\.\d{1,3}){2}\b", "[REDACTED_IP]", text)
    # Redact all other IPv4 addresses (optional - uncomment if needed)
    # text = re.sub(r"\b(?:\d{1,3}\.){3}\d{1,3}\b", "[REDACTED_IP]", text)

    # Redact MAC addresses
    text = re.sub(r"\b(?:[0-9A-Fa-f]{2}[:-]){5}(?:[0-9A-Fa-f]{2})\b", "[REDACTED_MAC]", text)

    # Redact email addresses
    text = re.sub(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b", "[REDACTED_EMAIL]", text)

    # Redact domain names (simple heuristic)
    text = re.sub(r"\b(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,6}\b", "[REDACTED_DOMAIN]", text)

    # Redact URLs
    text = re.sub(r"https?://[^\s]+", "[REDACTED_URL]", text)

    # Redact SSID patterns
    text = re.sub(r"(SSID=|Beacon\s\(|Probe Request for\s+|\"|')([^\s\"')]+)", r"\1[REDACTED_SSID]", text, flags=re.IGNORECASE)


    return text

class PcapTool(BaseTool):
    name: str = "Pcap to text converter"
    description: str = "Converts pcap files to text and sanitizes them"

    def _run(self, file_path: str) -> str:
        summary = extract_pcap_summary()
        return sanitize_text(summary)

# Define the Action needed
topic = "Analyse a network packet capture and identify if its OK or not and suggest debugging steps if not ok"


# Create the pcap pre-processing agent
pcap_processor = Agent(
    role="converts pcap to text",
    goal="converts different kind of packet capture files to text and then sanitizes them of sensitve data",
    backstory="An expert file to text converter",
    tools=[PcapTool()],
    verbose=True
)

# Create the packet capture analyser agent
analyzer = Agent(
    name="Pcap_analyzer",
    role="Packet capture analyzer",
    goal="Summarize findings after analysis of packet captures clearly",
    backstory="A skilled network engineer who is expert in wi-Fi packet capture analysis.",
    verbose=True
)

# Define tasks

pcap_task = Task(
    description=f"convert the pcap file to text and sanitize the text of sensitive network data for file /content/wifi1.pcapng",
    expected_output=f"text of pcap with sanitized output",
    agent=pcap_processor,

)

analysis_task = Task(
    description=f"Analyze '{topic}' and provide final verdict as OK, NOT OK",
    expected_output=f"Explain the protocols seen in 1 line. Summarize insights on what is happening in 1 line and Give final verdict on OK or NOT OK. For NOT OK suggest follow up causes and debugging steps",
    context=[pcap_task],
    agent=analyzer,
)

# Assemble the crew and run tasks
crew = Crew(
    agents=[pcap_processor, analyzer],
    tasks=[pcap_task, analysis_task],
    process=Process.sequential
)

result = crew.kickoff()

# Display the final report
display(Markdown(result.raw))

The protocols observed in this packet capture include 802.11 management frames such as Beacon, Probe Requests, Authentication, Association Requests and Responses, EAPOL, and Deauthentication. This sequence indicates that a device is attempting to connect to the Wi-Fi network with the SSID '[REDACTED_SSID]', going through the initial phases of network discovery, authentication, and association, followed by EAPOL exchanges for establishing security (WPA). Based on the deauthentication packet noted as the last entry, the final verdict is NOT OK, which likely indicates that the device was deauthenticated from the network. Suggested debugging steps include: checking the Wi-Fi access point's logs for potential reasons for the deauth event, verifying the configuration of WPA/WPA2 settings, ensuring the device's credentials are correct, and examining signal strength or interference issues that might have prompted disconnection.