# USB Protocol Analysis

This notebook analyzes the comprehensive USB dataset created by the Rust pcap converter.
The dataset contains complete USB communication data across multiple devices and sessions.


In [None]:
# Import required libraries
import sys
sys.path.append('../scripts')

import polars as pl
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from helpers import (
    load_master_dataset, get_session_stats, get_device_summary,
    print_session_summary, print_device_summary, analyze_control_packets,
    analyze_urb_transactions, get_payload_patterns, filter_by_device,
    hex_to_ascii
)
from protocol_parser import apply_parser_to_df


# Set up plotting
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)

print("📊 USB Protocol Analysis Environment Ready!")


In [None]:
# Load the master USB dataset
df = load_master_dataset('../../usb_master_dataset.parquet')

print(f"\n📈 Dataset Overview:")
print(f"Total packets: {len(df):,}")
print(f"Total fields: {len(df.columns)}")
print(f"Devices: {sorted(df['device_address'].unique().to_list())}")
print(f"Sessions: {len(df['session_id'].unique())}")
print(f"Time span: {df['timestamp'].min():.1f}s to {df['timestamp'].max():.1f}s")


## Packet Classification on `orig_adc_1000hz.6`

We will now use our robust, Rust-inspired parser to classify every packet in the `orig_adc_1000hz.6` session, which we have confirmed contains valid ADC data. This approach correctly separates packet identification from data parsing.

In [None]:
# Filter for the correct session and apply the new parser
session_id = 'orig_adc_1000hz.6'
df_session = df.filter(pl.col('session_id') == session_id)
df_parsed = apply_parser_to_df(df_session)

# Show the distribution of packet types found
print("Found the following packet types:")
df_parsed['packet_type'].value_counts()

### Filtering for ADC Data

Now that the packets are classified, we can confidently filter for `ADC_DATA` packets to analyze the measurements.


In [None]:
# Filter for ADC_DATA packets and show the head
df_adc_data = df_parsed.filter(pl.col('packet_type') == 'ADC_DATA')

print(f"Found {len(df_adc_data)} valid ADC data packets.")

df_adc_data.select([
    "timestamp",
    "vbus_v",
    "ibus_a",
    "power_w",
    "temp_c"
]).head()


### Plotting ADC Data

With the data correctly parsed and filtered, we can now create a meaningful visualization.

In [None]:
# Get only packets with payloads and parse them
df_payload = df_adc.filter(pl.col("payload_hex") != "")
parsed_adc_data = parse_payloads_to_adc(df_payload)

# Show the parsed data for the first few packets
parsed_adc_data.select([
    "timestamp",
    "vbus_uv",
    "ibus_ua",
    "temp_raw"
]).head()


In [None]:
# Get only packets with payloads
df_payload = df_adc.filter(pl.col("payload_hex") != "")

# Parse ADC data from payloads
parsed_adc_data = df_payload.with_columns([
    pl.col("payload_hex").map_elements(parse_adc_data, return_dtype=pl.Object).alias("parsed_adc")
]).unnest("parsed_adc")

# Show the parsed data for the first few packets
parsed_adc_data.select([
    "timestamp",
    "vbus_uv",
    "ibus_ua",
    "temp_raw"
]).head()


### URB Transaction Analysis

We can analyze the URB transactions to find request/response pairs.


In [None]:
analyze_urb_transactions(df_adc)
