# **UNITY XR Streaming - Dataset Playground**

# **HOW TO**



***Parameters:***

    This section contains the parameters that need to be set before running the code.

1.   **Variables:** The following parameters need to be set:


*   Server and client IP@
*   Analysis Interval
*   Zoom Interval
*   Grouping Time: if desired to group packets by time
*   Google Drive folder path



***Build:***

    This section contains the code that needs to be executed to set up the environment.

1.   **Configuration**
2.   **File build**

Required for wireshark traces results:
3.  **Wireshark Traces Load**
4.  **Wireshark Figures build**
5.  **Wireshark Computations build**

Required for WebRTC statistics results:
6.  **WebRTC Statistics Load**
7.  **WebRTC Figures build**
8.  **WebRTC Computations build**

Others:
9.   **Color palette**

***Support:***

    This section contains helper functions.

1.   **File Demo Helper:** Find existing files in Google Drive and get file paths.

***Results:***

    This section contains the code for running the analysis and generating the results.

Wireshark results:
1.   **Run Wireshark Demo**:

    Set the list of file paths containing Wireshark traces. The file should have been extracted using Tshark and saved in a csv file containing the following fields:
    * ip.src
    * ip.dst
    * frame.time_relative
    * frame.time_delta
    * frame.len
    * udp.length
2.   **Wireshark Demo Figures**:

    Generate figures from Wireshark traces.
3.   **Wireshark Demo Computations**:

    Compute metrics from Wireshark traces.

WebRTC statistics results:
1.   **Run WebRTC statistics Demo**:

    Set the list of file paths containing WebRTC statistics

2.   **WebRTC statistics Demo Figures**:

    Generate figures from WebRTC statistics.
3.   **WebRTC statistics Demo Computations**:

    Compute metrics from WebRTC statistics.

# **PARAMETERS**

## **Variables**

In [1]:
# Set server and client IPs
server_ip = "192.168.50.185"
client_ip = "192.168.50.128"

# Set the start and end time to filter the complete trace from the start of the communication between client and server (filter)
complete_trace_start_time = 7.5
complete_trace_end_time = 37.5

# Set the start and end time to filter a specific portion of the filtered trace (zoom)
specific_portion_start_time = 5
specific_portion_end_time = 5.6

# Grouping time
grouping_time = 0.000

# Define the Google Drive folder path
directory = '/content/drive/Shareddrives/XR-Wi-Fi-WN/Unity XR streaming/'

# **BUILD**

## **Configuration**

In [None]:
# Import necessary libraries
from google.colab import drive
import os
import ipywidgets as widgets
from IPython.display import display
from typing import List
import sys
import numpy as np
import pandas as pd
import plotly.io as pio
import plotly.express as px
import plotly.graph_objects as go
import chardet
from collections import Counter
import seaborn as sns
import matplotlib.pyplot as plt
import json
import ast
import warnings
from datetime import datetime

## **File build**

#### **Mount Google Drive Folder**

In [None]:
# Mount the Google Drive folder
drive.mount('/content/drive')

#### **Dropdown**

In [None]:
def create_dropdown(directory:str):
  """
  Creates a dropdown menu with all the files and subdirectories in the specified directory.

  Parameters:
    directory (str): The directory path to create the dropdown for.

  Returns:
    A Dropdown widget containing all the files and subdirectories in the specified directory.
  """
  # Get a list of all files and subdirectories in the specified directory
  files = os.listdir(directory)

  # Create an empty dictionary to store the options for the dropdown
  options = {}

  # Loop through all files and subdirectories and add them to the options dictionary
  for file in files:
    # Get the full path of the file or subdirectory
    full_path = os.path.join(directory, file)

    # If the item is a file, add it to the options dictionary
    if os.path.isfile(full_path):
      options[file] = full_path
    # If the item is a subdirectory, create a sub-dropdown and add it to the options dictionary
    elif os.path.isdir(full_path):
      sub_dropdown = create_dropdown(full_path)
      options[file] = sub_dropdown

  # Create the dropdown widget for the current directory
  dropdown = widgets.Dropdown(options=options, value=None, layout=widgets.Layout(width='800px', height='24px'))
  return dropdown


In [None]:
def update_dropdowns(change: object, dropdown: object, box: object, filePath: dict) -> None:
  '''
  This function updates the dropdowns based on the user's selection.

  Parameters:
    change (object): An object representing the change event.
    dropdown (object): A Dropdown object.
    box (object): A container for the dropdowns.
    filePath (dict): A dictionary with a path key representing the selected file path.

  Returns:
      None
  '''
  # Get the selected option
  selected = change.new

  # If the selected option is a dropdown, display it
  if isinstance(selected, widgets.Dropdown):
    selected_index = list(box.children).index(dropdown)

    # Remove the child dropdowns
    for child in box.children[selected_index+1:]:
      child.value = None
    box.children = box.children[:selected_index+1]

    # Add the selected dropdown to the container
    box.children += (selected,)

    # Listen for changes to the new dropdown
    selected.observe(lambda change: update_dropdowns(change, selected, box, filePath), names='value')

    with output_widget:
      output_widget.clear_output()
      print("\033[38;2;255;165;0mSelect a file from the directory\033[0m")

  # If the selected option is a file, print a message
  else:
    with output_widget:
      output_widget.clear_output()
      print("\033[38;2;0;255;0mFile selected succesfully.\033[0m")
      filePath['path'] = change.new

#### **File locator**

In [None]:
def select_file(directory:str, file_path: dict, output_widget: widgets.Output):
  """
  Displays a dropdown menu with file paths from the given directory and listens for changes to it.
  When a file is selected, it updates the file path and clears the output widget.

  Args:
    directory (str): The directory path to create the dropdown for.
    file_path (dict): A dictionary containing the file path to display and update.
    output_widget (widgets.Output): The output widget to clear after a file is selected.
  """
  # Create the dropdown menu for the path
  top_dropdown = create_dropdown(directory)

  # Add the dropdown menu to a Box container
  box = widgets.VBox(children=[top_dropdown])

  # Listen for changes to the dropdown menu
  top_dropdown.observe(lambda change: update_dropdowns(change, top_dropdown, box, file_path), names='value')

  # Display the dropdown menus
  display(box, output_widget)

  # Clear the output widget and print a message
  with output_widget:
    output_widget.clear_output()
    print("\033[38;2;255;165;0mSelect a file from the directory\033[0m")

#### **Extension validation**

In [None]:
def check_extension(path: str, ext: str) -> bool:
  """
  Checks if the file extension of the given file path matches the specified extension.

  Parameters:
    path (str): The file path.
    ext (str): The desired file extension.

  Returns:
    bool: True if the file extension matches the desired extension, False otherwise.
  """
  # Check if the file path ends with the specified extension
  if not path.endswith(ext):
    return False
  else:
    # Print message indicating that the correct file has been selected
    print("\033[38;2;0;255;0m", ext, " file selected.\033[0m")
    return True


In [None]:
def validate_extension(file: dict) -> None:
  """
  Validates if the file extension is valid for visualization purposes.

  Parameters:
    file: A dictionary that contains the file path and other information about the file.

  Returns:
    None
  """
  # Extract the file path from the dictionary
  file_path = file['path']

  # Check if the file extension is .csv
  if check_extension(file_path, '.csv'):
    print("The selected file can be used for Wireshark traces visualization")

  # Check if the file extension is .txt or .json
  elif check_extension(file_path, '.txt') or check_extension(file_path, '.json'):
    print("The selected file can be used for WebRTC statistics visualization")

  # If the file extension is not .csv, .txt or .json, display an error message
  else:
    print("\033[38;2;255;0;0mError: This file can not be used. Please select a .csv, .json or .txt file.\033[0m")


#### **File path getter**

In [None]:
def get_file_path(file: dict, directory: str) -> str:
  """
  This function takes in a dictionary object and a path string and returns the file path.

  Parameters:
    file (dict): A dictionary object that contains the path of the file.
    path (str): A string that represents the root directory path.

  Returns:
    A string that represents the full path of the selected file.
  """
  file_path = file['path']  # Get the path of the file from the dictionary object.
  selected_file = os.path.basename(file_path)  # Extract the selected file name.
  print("\033[38;2;255;165;0mSelected file: \033[0m", selected_file)
  return file_path

In [None]:
def validate_and_retrieve_file_path(file: dict, directory: str) -> None:
  """
  Validate the file path and extension and retrieve the file path.

  Parameters:
    file (dict): A dictionary containing the path of the file.
    directory (str): The directory where the file is located.

  Returns:
    None
  """
  if file is not None:
    if file.get('path') is not None:
      validate_extension(file)
      file_path = get_file_path(file, directory)
      print("\033[38;2;255;165;0mFile Path: \033[0m", file_path[len(directory):])
    else:
      print("\033[38;2;255;165;0mPlease locate a file. \033[0m")
  else:
    print("\033[38;2;255;165;0mPlease run the file locator code. \033[0m")


## **Wireshark**

### **Wireshark Traces Load**

#### **Load CSV file**

In [None]:
def load_csv_file(file_path: str, directory: str) -> pd.DataFrame:
  """
  Load a CSV file into a pandas DataFrame.

  Parameters:
    file_path (str): The relative or absolute path to the CSV file.
    directory (str): The base path to the directory containing the file.

  Returns:
    A pandas DataFrame with the contents of the CSV file, or None if the file could not be loaded.
  """
  # Combine the base path (if any) with the file path to get the complete path
  complete_path = os.path.join(directory, file_path)

  df = None

  # Detect the encoding of the file
  try:
      with open(complete_path, 'rb') as f:
        encoding_detection = chardet.detect(f.read())

      # Load the CSV file into a pandas DataFrame - set delimiter and that first row is the header
      df = pd.read_csv(complete_path, encoding=encoding_detection['encoding'], delimiter='\t', header=0)
  except FileNotFoundError:
      print(f"\033[38;2;255;0;0mError: The selected file \033[0m '{file_path}'\033[38;2;255;0;0m does not exist.\033[0m")
  except UnicodeDecodeError:
      print(f"\033[38;2;255;0;0mError: The encoding of the selected file \033[0m'{file_path}'\033[38;2;255;0;0m could not be detected.\033[0m")

  return df

#### **Dataframe format validation**

In [None]:
def validate_wireshark_data(df: pd.DataFrame, file_path: str) -> bool:
  """
  Validates if the input DataFrame has all the required columns.

  Parameters:
    df (pandas.DataFrame): Input DataFrame to be validated.
    file_path (str): File path of the input DataFrame.

  Returns:
    bool: True if all the required columns are present, else False.
  """

  # List of required columns
  required_columns = [
    "ip.src",
    "ip.dst",
    "frame.time_relative",
    "frame.time_delta",
    "frame.len",
    "udp.length",
    "_ws.col.Protocol",
    "_ws.col.Info"
  ]

  # Check if all the required columns are present in the DataFrame
  if not all(col in df.columns for col in required_columns):
    print(
      "\033[38;2;255;0;0mError: The selected file \033[0m",
      file_path,
      " \033[38;2;255;0;0mdoes not have all the required columns.\033[0m"
    )
    print("\033[38;2;255;0;0mRequired columns:", required_columns, "\033[0m")
    return False
  else:
    print(
      "\033[38;2;0;255;0mThe selected file\033[0m",
      file_path,
      " \033[38;2;0;255;0mhas all the required columns and will be considered.\033[0m"
    )
    return True


#### **Filter Downlink and Uplink traffic**


In [None]:
def get_DL_UL_df(df: pd.DataFrame, server_ip: str, client_ip: str) -> tuple:
  """
  Filters the input DataFrame for downlink and uplink traffic.

  Parameters:
    df (pandas.DataFrame): Input DataFrame containing network traffic data.
    server_ip (str): IP address of the server.
    client_ip (str): IP address of the client.

  Returns:
    tuple: A tuple of two DataFrames, where the first DataFrame is the downlink traffic
            (from server to client) and the second DataFrame is the uplink traffic
            (from client to server).
  """
  df_copy = df.copy() # create a copy of the original DataFrame to avoid modifying it
  # Identify the first packet in the communication between server and client
  first_packet = df_copy[((df_copy['ip.src'] == server_ip) & (df_copy['ip.dst'] == client_ip)) | ((df_copy['ip.src'] == client_ip) & (df_copy['ip.dst'] == server_ip))].iloc[0]


  # Calculate the time offset
  time_offset = first_packet['frame.time_relative']


  # Adjust the time relative of all packets in the DataFrame
  df_copy['frame.time_relative'] -= time_offset


  # Filter the DataFrame for downlink traffic
  df_dl = df_copy[(df_copy['ip.src'] == server_ip) & (df_copy['ip.dst'] == client_ip)]

  # Filter the DataFrame for uplink traffic
  df_ul = df_copy[(df_copy['ip.src'] == client_ip) & (df_copy['ip.dst'] == server_ip)]

  # Return the filtered DataFrames as a tuple
  return df_dl, df_ul


#### **Filter by time**

In [None]:
def get_time_filtered_dataset(df: pd.DataFrame, start_time: float, end_time: float) -> pd.DataFrame:
  """
  Returns a filtered dataframe with rows between the specified start and end times.

  Parameters:
    df (pd.DataFrame): Input dataframe containing timestamp information.
    start_time (float): The start time in seconds.
    end_time (float): The end time in seconds.

  Returns:
    pd.DataFrame: A filtered dataframe containing rows within the specified time range.
  """
  df_copy = df.copy() # create a copy of the original DataFrame to avoid modifying it
  filtered_df = df_copy[(df_copy["frame.time_relative"] >= start_time) & (df_copy["frame.time_relative"] <= end_time)] # filter the rows that fall within the frame.time_relative range
  return filtered_df


#### **Filter SRTP type**

In [None]:
def get_srtp_type(df: pd.DataFrame) -> pd.DataFrame:
  """
  Checks whether the SRTP protocol is being used for audio or video, and replaces
  the value in the '_ws.col.Protocol' column accordingly, It then drops the
  '_ws.col.Info' column and returns  the modified DataFrame.

  Parameters:
    df (Pandas DataFrame): Input DataFrame to be modified

  Returns:
    df (Pandas DataFrame): Modified DataFrame with '_ws.col.Protocol' values replaced
    and '_ws.col.Info' column dropped
  """
  df_copy = df.copy() # create a copy of the original DataFrame to avoid modifying it
  # Replace SRTP values in '_ws.col.Protocol' column based on type
  df_copy.loc[(df_copy["_ws.col.Protocol"]=="SRTP") &
          (df_copy["_ws.col.Info"].str.contains("PT=DynamicRTP-Type-96")),
          "_ws.col.Protocol"] = "SRTP Audio"
  df_copy.loc[(df_copy["_ws.col.Protocol"] == "SRTP") &
        (df_copy["_ws.col.Info"].str.contains("PT=DynamicRTP-Type-104") |
          df_copy["_ws.col.Info"].str.contains("PT=DynamicRTP-Type-102")|
          df_copy["_ws.col.Info"].str.contains("PT=DynamicRTP-Type-98")),
        "_ws.col.Protocol"] = "SRTP Video"

  # Drop '_ws.col.Info' column
  #df_copy.drop("_ws.col.Info", axis=1, inplace=True)

  # Return modified DataFrame
  return df_copy


#### **Get traffic dataframe**

In [None]:
def get_traffic_df(df: pd.DataFrame, seconds: float) -> pd.DataFrame:
  """
  Resamples a DataFrame of packet capture data to a specified time interval and calculates the sum of packet lengths.

  Parameters:
    df (pd.DataFrame): DataFrame containing packet capture data.
    seconds (float): Time interval in seconds to resample the packets.

  Returns:
    traffic_df (pd.DataFrame): DataFrame with resampled packet data and calculated traffic.
  """

  # Convert the time column to datetime format
  df_copy = df.copy() # create a copy of the original DataFrame to avoid modifying it
  min_time = df_copy['frame.time_relative'].min()
  df_copy['frame.time_relative'] = df_copy['frame.time_relative'] - min_time
  df_copy['frame.time_relative'] = pd.to_datetime(df_copy['frame.time_relative'], unit='s')
  # Resample the packets to the specified time interval and calculate the sum of the packet lengths
  resample_str = f'{seconds}S'
  traffic_df = df_copy.set_index('frame.time_relative').resample(resample_str).agg({'frame.len': 'sum'}).reset_index()
  # Calculate traffic in Mbps and elapsed time in seconds
  traffic_df['traffic'] = traffic_df['frame.len'] * 8 / 10**6
  traffic_df['seconds'] = (traffic_df['frame.time_relative'] - traffic_df['frame.time_relative'].iloc[0]).dt.total_seconds()

  return traffic_df


#### **Group dataframes by time and protocol**

In [None]:
def group_dataframes(df_list: List[pd.DataFrame], grouping_time: float) -> List[pd.DataFrame]:
  """
  Groups DataFrames based on the time relative and protocol column.

  Parameters:
    df_list: A list of DataFrames to group.
    grouping_time (float): Time interval for grouping the dataframes.

  Returns:
    A list of grouped DataFrames.
  """
  grouped_dfs = []

  # Loop over each DataFrame in the list
  for df in df_list:

    df_copy = df.copy() # create a copy of the original DataFrame to avoid modifying it
    # Sort DataFrame by protocol and frame time
    df_copy = df_copy.sort_values(by=['_ws.col.Protocol', 'frame.time_relative'])

    # Determine the groups based on the protocol and frame time
    groups = (df_copy['_ws.col.Protocol'] != df_copy['_ws.col.Protocol'].shift()) | \
              (df_copy['frame.time_relative'].diff() > grouping_time)

    # Group the DataFrame by the calculated groups
    grouped_df = df_copy.groupby(groups.cumsum())
    # Compute the number of packets for each group
    num_pack = grouped_df.count()['ip.src'].rename('num.packets')

    # Compute the mean of the frame length and udp length for each group.
    mean_frame_lengths = grouped_df['frame.len'].mean().rename('group.frame.len.mean')
    mean_udp_lengths = grouped_df['udp.length'].mean().rename('group.udp.length.mean')

    # Compute the sum of the frame length and udp length for each group.
    tot_frame_lengths = grouped_df['frame.len'].sum().rename('group.frame.len.tot')
    tot_udp_lengths = grouped_df['udp.length'].sum().rename('group.udp.length.tot')

    # Compute the start and end times for each group.
    start_times = grouped_df['frame.time_relative'].min().rename('start.time')
    end_times = grouped_df['frame.time_relative'].max().rename('end.time')

    # Compute the difference between start and end times for each group.
    group_time = (end_times - start_times)*1000

    # Compute the average time between packets in the group
    avg_time_diff = (group_time / (num_pack - 1))
    avg_time_diff = avg_time_diff.rename('avg.time.between.packets')


    # Combine the columns and reset the index.
    grouped_df = pd.concat([grouped_df.first()[['ip.src', 'ip.dst', '_ws.col.Protocol', 'frame.time_relative', '_ws.col.Info']],
                            num_pack, mean_frame_lengths, mean_udp_lengths, tot_frame_lengths, tot_udp_lengths, start_times, end_times ], axis=1)
    grouped_df['group.time'] =  group_time
    grouped_df['avg.time.between.packets'] = avg_time_diff

    # Append the grouped DataFrame to the list of grouped DataFrames
    grouped_dfs.append(grouped_df.reset_index(drop=True))

  return grouped_dfs


#### **Get wireshark dataframes**

In [None]:
def get_dataframes(files_path: List[str], directory: str, server_ip: str, client_ip: str,
                   complete_trace_start_time : str, complete_trace_end_time : str, specific_portion_start_time : str,
                   specific_portion_end_time : str) -> tuple:
  """
  Given a list of file paths, generates a set of pandas dataframes for each file.

  Parameters:
    files_path (List[str]): A list of file paths.
    directory (str): The directory containing the files.
    server_ip (str): The server's IP address.
    client_ip (str): The client's IP address.
    complete_trace_start_time  (str): The start time to consider of the complete trace.
    complete_trace_end_time  (str): The end time to consider of the complete trace.
    specific_portion_start_time  (str): The start time of the filtered by time trace.
    specific_portion_end_time  (str): The end time of the filtered by time trace.

  Returns:
    A tuple containing the following lists:
    - df_list: A list of all dataframes.
    - df_dl_list: A list of all downlink dataframes.
    - df_ul_list: A list of all uplink dataframes.
    - df_dl_z_list: A list of all downlink dataframes filtered by time.
    - df_ul_z_list: A list of all uplink dataframes filtered by time.
    - files_path_list: A list of file paths that were used to generate the dataframes.
  """
  df_list = []
  df_dl_list = []
  df_ul_list = []
  df_dl_z_list = []
  df_ul_z_list = []
  files_path_list = []

  for file_path in files_path:
      # Call the 'run_wireshark_traces' function to generate the dataframes.
      df, df_dl, df_ul, df_dl_z, df_ul_z = run_wireshark_traces(file_path, directory, server_ip,
                                                                client_ip, complete_trace_start_time , complete_trace_end_time ,
                                                                specific_portion_start_time , specific_portion_end_time )

      # Append the dataframes to their respective lists.
      if df is not None:
          df_list.append(df)
          df_dl_list.append(df_dl)
          df_ul_list.append(df_ul)
          df_dl_z_list.append(df_dl_z)
          df_ul_z_list.append(df_ul_z)

          # Append the file path to the list.
          files_path_list.append(file_path)
  # Return the lists of dataframes and file paths as a tuple.
  return df_list, df_dl_list, df_ul_list, df_dl_z_list, df_ul_z_list, files_path_list

In [None]:
def run_wireshark_traces(file_path: str, directory: str, server_ip: str, client_ip: str,
                        complete_trace_start_time : float, complete_trace_end_time : float, specific_portion_start_time : str,
                        specific_portion_end_time : str) -> tuple:
  """
  Reads a CSV file and extracts various DataFrames.

  Parameters:
    file_path (str): The name of the CSV file.
    directory (str): The path to the directory containing the CSV file.
    server_ip (str): The IP address of the server.
    client_ip (str): The IP address of the client.
    complete_trace_start_time  (float): The start time of the data range to extract.
    complete_trace_end_time  (float): The end time of the data range to extract.
    specific_portion_start_time  (str): The start time of the filtered by time trace.
    specific_portion_end_time  (str): The end time of the filtered by time trace.

  Returns:
    A tuple of five DataFrames:
      1. The original DataFrame loaded from the CSV file.
      2. The DataFrame containing the downlink data.
      3. The DataFrame containing the uplink data.
      4. The DataFrame containing only the downlink data filtered by time range (zoom).
      5. The DataFrame containing only the uplink data filtered by time range (zoom).
  """

  # Load the CSV file into a DataFrame
  df = load_csv_file(file_path, directory)

  # If the DataFrame is not empty, proceed
  if df is not None:
    # Validate the Wireshark data in the DataFrame
    if validate_wireshark_data(df, file_path):
      # Filter the DataFrame by SRTP type
      df = get_srtp_type(df)
      # Split the DataFrame into separate downlink and uplink DataFrames
      df_dl, df_ul = get_DL_UL_df(df, server_ip, client_ip)
      df_dl = get_time_filtered_dataset(df_dl, complete_trace_start_time , complete_trace_end_time )
      df_ul = get_time_filtered_dataset(df_ul, complete_trace_start_time , complete_trace_end_time )

      # Filter the downlink and uplink DataFrames by time range
      df_dl_z = get_time_filtered_dataset(df_dl, complete_trace_start_time+specific_portion_start_time , complete_trace_start_time+specific_portion_end_time )
      df_ul_z = get_time_filtered_dataset(df_ul, complete_trace_start_time+specific_portion_start_time , complete_trace_start_time+specific_portion_end_time )

      # Return all five DataFrames
      return df, df_dl, df_ul, df_dl_z, df_ul_z

    else:
      # If the Wireshark data is invalid, print an error message and return None for all DataFrames
      print("\033[38;2;255;0;0mError: The selected file \033[0m", file_path, " \033[38;2;255;0;0mwill not be considered.\033[0m")
      return None, None, None, None, None

  else:
    # If the DataFrame is empty, print an error message and return None for all DataFrames
    print("\033[38;2;255;0;0mError: The selected file \033[0m", file_path, " \033[38;2;255;0;0mwill not be considered.\033[0m")
    return None, None, None, None, None

In [None]:
def get_grouped_dataframes(df_dl_list: List[pd.DataFrame], df_ul_list: List[pd.DataFrame],
                           df_dl_z_list: List[pd.DataFrame], df_ul_z_list: List[pd.DataFrame],
                           grouping_time: float) -> tuple:
  """
  Groups dataframes by protocol and time_relatvie columns

  Parameters:
    df_dl_list (List[pd.DataFrame]): List of pandas dataframes containing downlink data
    df_ul_list (List[pd.DataFrame]): List of pandas dataframes containing uplink data
    df_dl_z_list (List[pd.DataFrame]): List of pandas dataframes containing downlink data filtered by time (zoom).
    df_ul_z_list (List[pd.DataFrame]): List of pandas dataframes containing uplink data filtered by time (zoom).
    grouping_time (float): Time interval for grouping the dataframes.

  Returns:
    A tuple of lists of pandas dataframes that are grouped by their timestamp column
  """
  # Group the dataframes
  df_grouped_dl_list = group_dataframes(df_dl_list, grouping_time)
  df_grouped_ul_list = group_dataframes(df_ul_list, grouping_time)
  df_grouped_dl_z_list = group_dataframes(df_dl_z_list, grouping_time)
  df_grouped_ul_z_list = group_dataframes(df_ul_z_list, grouping_time)

  # Return the grouped dataframes as a tuple
  return df_grouped_dl_list, df_grouped_ul_list, df_grouped_dl_z_list, df_grouped_ul_z_list

In [None]:
def get_traffic_dataframes(df_dl_list: List[pd.DataFrame] , df_ul_list: List[pd.DataFrame]) -> tuple:
  """
  Get the traffic dataframes for the given downlink and uplink dataframes.

  Parameters:
    df_dl_list (List[pd.DataFrame]): A list of downlink dataframes.
    df_ul_list (List[pd.DataFrame]): A list of uplink dataframes.

  Returns:
    A tuple of traffic dataframes for the given downlink and uplink dataframes.
  """
  # Get the traffic dataframes for downlink dataframes
  df_traffic_dl_list = [get_traffic_df(df_dl, 1) for df_dl in df_dl_list]

  # Get the traffic dataframes for uplink dataframes
  df_traffic_ul_list = [get_traffic_df(df_ul, 1) for df_ul in df_ul_list]

  # Return the traffic dataframes tuple
  return df_traffic_dl_list, df_traffic_ul_list


#### **SRTP Video Frame Grouping**

In [None]:
def extract_time(info: str) -> str:
  """
  Extracts the time value from the given info string.

  Parameters:
    info (str): Info string containing time value.

  Returns:
    str: Extracted time value.
  """
  info_values = info.split(",")
  for value in info_values:
    if "Time" in value:
      return value
  return ""

def group_vid_dataframes_by_time_info(df_list: List[pd.DataFrame]) -> List[pd.DataFrame]:
  """
  Groups DataFrames based on the time relative and protocol column.

  Parameters:
    df_list: A list of DataFrames to group.

  Returns:
    A list of grouped DataFrames.
  """
  grouped_dfs = []
  # Loop over each DataFrame in the list
  for df in df_list:
      df_copy = df.copy()  # create a copy of the original DataFrame to avoid modifying it

      # Sort DataFrame by protocol and frame time
      df_copy = df_copy[df_copy['_ws.col.Protocol']=='SRTP Video']
      # Extract time from info column
      df_copy['_ws.col.Time'] = df_copy['_ws.col.Info'].apply(extract_time)

      # Group the DataFrame based on the extracted time
      grouped_df = df_copy.groupby('_ws.col.Time')
       # Compute the number of packets for each group
      num_pack = grouped_df.count()['ip.src'].rename('num.packets')

      # Compute the mean of the frame length and udp length for each group.
      mean_frame_lengths = grouped_df['frame.len'].mean().rename('group.frame.len.mean')
      mean_udp_lengths = grouped_df['udp.length'].mean().rename('group.udp.length.mean')

      # Compute the sum of the frame length and udp length for each group.
      tot_frame_lengths = grouped_df['frame.len'].sum().rename('group.frame.len.tot')
      tot_udp_lengths = grouped_df['udp.length'].sum().rename('group.udp.length.tot')

      # Compute the start and end times for each group.
      start_times = grouped_df['frame.time_relative'].min().rename('start.time')
      end_times = grouped_df['frame.time_relative'].max().rename('end.time')

      # Compute the difference between start and end times for each group.
      group_time = (end_times - start_times)*1000

      # Compute the average time between packets in the group
      avg_time_diff = (group_time / (num_pack - 1))
      avg_time_diff = avg_time_diff.rename('avg.time.between.packets')


      # Combine the columns and reset the index.
      grouped_df = pd.concat([grouped_df.first()[['ip.src', 'ip.dst', '_ws.col.Protocol', 'frame.time_relative']],
                              num_pack, mean_frame_lengths, mean_udp_lengths, tot_frame_lengths, tot_udp_lengths, start_times, end_times], axis=1)
      grouped_df['group.time'] =  group_time
      grouped_df['avg.time.between.packets'] = avg_time_diff

      # Append the grouped DataFrame to the list of grouped DataFrames
      grouped_dfs.append(grouped_df.reset_index(drop=True))
      # Append the grouped DataFrame to the result list
  return grouped_dfs

In [None]:
def group_dataframes_temp(df_list: List[pd.DataFrame], grouping_time: float) -> List[pd.DataFrame]:
  """
  Groups  DataFrames based on the time relative and protocol column only for SRTP Video and considering that they are from the same frame.

  Parameters:
    df_list: A list of DataFrames to group.
    grouping_time (float): Time interval for grouping the dataframes.

  Returns:
    A list of grouped DataFrames.
  """
  grouped_dfs = []

  # Loop over each DataFrame in the list
  for df in df_list:

    df_copy = df.copy() # create a copy of the original DataFrame to avoid modifying it
    # Sort DataFrame by protocol and frame time
    df_copy = df_copy[df_copy['_ws.col.Protocol']=='SRTP Video']
    # Extract time from info column
    df_copy['_ws.col.Time'] = df_copy['_ws.col.Info'].apply(extract_time)

    df_copy = df_copy.sort_values(by=['_ws.col.Protocol', '_ws.col.Time', 'frame.time_relative'])

    # Determine the groups based on the protocol and frame time
    groups = (df_copy['_ws.col.Protocol'] != df_copy['_ws.col.Protocol'].shift()) | \
              (df_copy['frame.time_relative'].diff() > grouping_time) | \
              (df_copy['_ws.col.Time'] != df_copy['_ws.col.Time'].shift())

    # Group the DataFrame by the calculated groups
    grouped_df = df_copy.groupby(groups.cumsum())
    # Compute the number of packets for each group
    num_pack = grouped_df.count()['ip.src'].rename('num.packets')

    # Compute the mean of the frame length and udp length for each group.
    mean_frame_lengths = grouped_df['frame.len'].mean().rename('group.frame.len.mean')
    mean_udp_lengths = grouped_df['udp.length'].mean().rename('group.udp.length.mean')

    # Compute the sum of the frame length and udp length for each group.
    tot_frame_lengths = grouped_df['frame.len'].sum().rename('group.frame.len.tot')
    tot_udp_lengths = grouped_df['udp.length'].sum().rename('group.udp.length.tot')

    # Compute the start and end times for each group.
    start_times = grouped_df['frame.time_relative'].min().rename('start.time')
    end_times = grouped_df['frame.time_relative'].max().rename('end.time')

    # Compute the difference between start and end times for each group.
    group_time = (end_times - start_times)*1000

    # Compute the average time between packets in the group
    avg_time_diff = (group_time / (num_pack - 1))
    avg_time_diff = avg_time_diff.rename('avg.time.between.packets')


    # Combine the columns and reset the index.
    grouped_df = pd.concat([grouped_df.first()[['ip.src', 'ip.dst', '_ws.col.Protocol', 'frame.time_relative', '_ws.col.Info']],
                            num_pack, mean_frame_lengths, mean_udp_lengths, tot_frame_lengths, tot_udp_lengths, start_times, end_times ], axis=1)
    grouped_df['group.time'] =  group_time
    grouped_df['avg.time.between.packets'] = avg_time_diff

    # Append the grouped DataFrame to the list of grouped DataFrames
    grouped_dfs.append(grouped_df.reset_index(drop=True))

  return grouped_dfs


def group_grouped_vid_dataframes_by_time_info(df_list: List[pd.DataFrame]) -> List[pd.DataFrame]:
  """
  Groups grouped DataFrames based on the time relative and protocol column.

  Parameters:
    df_list: A list of DataFrames to group.

  Returns:
    A list of grouped DataFrames.
  """
  grouped_dfs = []
  grouped_dfs_not_na = []
  # Loop over each DataFrame in the list
  for df in df_list:
      df_copy = df.copy()  # create a copy of the original DataFrame to avoid modifying it

      # Sort DataFrame by protocol and frame time
      df_copy = df_copy[df_copy['_ws.col.Protocol']=='SRTP Video']
      # Extract time from info column
      df_copy['_ws.col.Time'] = df_copy['_ws.col.Info'].apply(extract_time)

      # Group the DataFrame based on the extracted time
      grouped_df = df_copy.groupby('_ws.col.Time')

      df_without_last_row = grouped_df.apply(lambda x: x.drop(x.index[-1]))
      #print(df_without_last_row['num.packets'].mean())
      #print(df_without_last_row['num.packets'].std())

      # Compute the number of packets for each group
      num_groups = grouped_df.count()['ip.src'].rename('num.groups')

      start_times = grouped_df['frame.time_relative'].min().rename('start.time')
      end_times = grouped_df['frame.time_relative'].max().rename('end.time')

      # Compute the difference between start and end times for each group.
      group_time = (end_times - start_times)*1000

      # Compute the average time between packets in the group
      avg_time_diff = (group_time / (num_groups - 1))
      avg_time_diff = avg_time_diff.rename('avg.time.between.groups')


      # Combine the columns and reset the index.
      grouped_df = pd.concat([grouped_df.first()[['ip.src', 'ip.dst', '_ws.col.Protocol', 'frame.time_relative']],
                              num_groups, start_times, end_times], axis=1)
      grouped_df['group.time'] =  group_time
      grouped_df['avg.time.between.groups'] = avg_time_diff
      grouped_df_not_na = grouped_df[grouped_df['avg.time.between.groups'].notna()]
      # Append the grouped DataFrame to the list of grouped DataFrames
      grouped_dfs.append(grouped_df.reset_index(drop=True))
      grouped_dfs_not_na.append(grouped_df_not_na.reset_index(drop=True))
      # Append the grouped DataFrame to the result list
  return grouped_dfs, grouped_dfs_not_na


### **Wireshark Figures build**

#### **Scatter plot**

In [None]:
def create_scatter_plot(df_dl: pd.DataFrame, df_ul: pd.DataFrame, x_col: str, y_col: str, title: str,
                        x_title: str, y_title: str, protocols: bool = False, text_show: bool = False, number_column: str = None,
                        from0: bool = False, stem: bool = False, spec_prot: List[str] = None) -> None:
  """
  Function to create a scatter plot with optional grouping by protocol and optional display of text labels.

  Parameters:
    df_dl (pd.DataFrame): The downlink data as a Pandas DataFrame.
    df_ul (pd.DataFrame): The uplink data as a Pandas DataFrame.
    x_col (str): The column name to use for the x-axis.
    y_col (str): The column name to use for the y-axis.
    title (str): The title of the plot.
    x_title (str): The title of the x-axis.
    y_title (str): The title of the y-axis.
    protocols (bool, optional): Whether to group data by protocol. Default is False.
    text_show (bool, optional): Whether to display text labels on the plot. Default is False.
    number_column (str, optional): The column name to use for the text labels. Default is None.
    from0 (bool, optional): Whether to make the plot from 0.
    stem (bool, optional): Whether to include vertical lines to simulate a stem plot.
    spec_prot (List[str], optional): Whether to plot specific protocols
  Returns:
    None
  """
  if protocols:

    dl_protocols = ['DTLSv1.2', 'DTLS', 'SRTP Audio', 'SRTP Video','STUN' ,'UDP']  # Define manually all the DL protocols so that they always have the same assigned color
    # dl_protocols = sorted(df_dl['_ws.col.Protocol'].unique()) # Get unique protocols in DL and assign a color to each - might not produce same colors in different executions
    dl_colors = px.colors.qualitative.Plotly[:len(dl_protocols)] # Plotly has 10 colors
    protocol_color_map_dl = dict(zip(dl_protocols, dl_colors))

    ul_protocols = ['DTLSv1.2', 'SRTCP', 'STUN', 'UDP'] # Define manually all the UL protocols so that they always have the same assigned color
    # ul_protocols = sorted(df_ul['_ws.col.Protocol'].unique())    # Get unique protocols in UL and assign a color to each - might not produce same colors in different executions
    ul_colors = px.colors.qualitative.Plotly[len(dl_protocols):len(dl_protocols) + len(ul_protocols)] # Plotly has 10 colors
    protocol_color_map_ul = dict(zip(ul_protocols, ul_colors))

  else:
    dataframes = ['DL', 'UL']
    dataframes_col = px.colors.qualitative.Plotly[:len(dataframes)]
    color_map_dl = dict(zip(dataframes, dataframes_col))

  # Set the mode of the plot
  if text_show:
    mode_plt = 'markers+text'
  else:
    mode_plt = 'markers' #default: markers. lines for some

  fig = go.Figure()

  if protocols:
    # Add traces for each protocol in dl and ul
    combined_df = pd.concat([df_dl, df_ul])  # Combine df_dl and df_ul
    combined_min = combined_df[x_col].min()  # Minimum value from combined data frames
    for i, df in enumerate([df_dl, df_ul]):
      #list_prot = [[ 'SRTP Video','SRTP Audio','UDP','DTLSv1.2'],[ 'SRTCP', 'UDP', 'DTLSv1.2']]
      for j, prot in enumerate(sorted(df['_ws.col.Protocol'].unique())):
        if spec_prot!=None and prot not in spec_prot:
          continue
        color = protocol_color_map_dl[prot] if i == 0 else protocol_color_map_ul[prot]
        filtered_df = df[df['_ws.col.Protocol'] == prot]

        if from0: #plot x_axis starting from 0
          # Comparison because with min was giving problems
            x_val = filtered_df[x_col]- combined_min
        else:
          x_val = filtered_df[x_col]

        if "ms" in x_title:
          x_val*=1000 # convert to ms

        fig.add_trace(
          go.Scatter(
            mode = mode_plt,
            x = x_val,
            y = filtered_df[y_col],
            name = f"{'DL' if i == 0 else 'UL'} {prot}",
            marker = dict(color = color),
            text = filtered_df[number_column].tolist() if number_column else None,
            textfont = dict(size=10),
            textposition = 'top center',
                    )
        )

        if stem:
          for x_v,y_v in zip(x_val,filtered_df[y_col]):
            fig.add_shape(
              type='line',
              x0=x_v,
              y0=0,
              x1=x_v,
              y1=y_v,
              line=dict(color = color),
              opacity=1
            )



  else:
    for i, df in enumerate([df_dl, df_ul]):
      name_df = 'DL' if i == 0 else 'UL'
      color = color_map_dl[name_df]
      filtered_df = df
      if from0: #plot x_axis starting from 0
        if df_dl[x_col].min() >=df_ul[x_col].min():
          x_val = (filtered_df[x_col]- df_dl[x_col].min())
        else:
          x_val = (filtered_df[x_col]- df_ul[x_col].min())
      else:
        x_val = filtered_df[x_col]

      if "ms" in x_title:
        x_val*=1000 # convert to ms

      fig.add_trace(
        go.Scatter(
          mode = mode_plt,
          x = x_val,
          y = filtered_df[y_col],
          name = name_df,
          marker = dict(color = color),
          text = filtered_df[number_column].tolist() if number_column else None,
          textfont = dict(size=10),
          textposition = 'top center'
                  )
        )

  fig.update_layout(
    title=title,
    font_family="Nimbus Roman",
    xaxis=dict(title=x_title),
    yaxis=dict(title=y_title),
    showlegend=True,
    legend=dict(x=1, y=1),
  )
  font_family = "Times New Roman"
  font_size = 20
  fig.update_layout(
      font=dict(family=font_family, size=font_size),
      template='plotly_white',
      xaxis=dict(showgrid=False), #,dtick=2
      yaxis=dict(showgrid=False),
      legend=dict(
      orientation="h",
      yanchor="bottom",
      y=1.02,
      xanchor="right",
      x=1)
  )

  fig.update_layout(
      autosize=False,
      width=1200, #change width
      height=450) #change height

  fig.show()

##### **Matlab stem plot**

In [None]:
def create_stem_plot(df_dl: pd.DataFrame, df_ul: pd.DataFrame, x_col: str, y_col: str, title: str,
                      x_title: str, y_title: str, protocols: bool = False, text_show: bool = False,
                     number_column: str = None, from0: bool = False, spec_prot: List[str] = None) -> None:
  """
  Function to create a scatter plot with optional grouping by protocol and optional display of text labels.

  Parameters:
    df_dl (pd.DataFrame): The downlink data as a Pandas DataFrame.
    df_ul (pd.DataFrame): The uplink data as a Pandas DataFrame.
    x_col (str): The column name to use for the x-axis.
    y_col (str): The column name to use for the y-axis.
    title (str): The title of the plot.
    x_title (str): The title of the x-axis.
    y_title (str): The title of the y-axis.
    protocols (bool, optional): Whether to group data by protocol. Default is False.
    text_show (bool, optional): Whether to display text labels on the plot. Default is False.
    number_column (str, optional): The column name to use for the text labels. Default is None.
    from0 (bool, optional): Whether to make the plot from 0.
    spec_prot (List[str], optional): Whether to plot specific protocols

  Returns:
    None
  """
  if protocols:

    dl_protocols = ['DTLSv1.2','DTLS', 'SRTP Audio', 'SRTP Video','STUN' ,'UDP']  # Define manually all the DL protocols so that they always have the same assigned color
    # dl_protocols = sorted(df_dl['_ws.col.Protocol'].unique()) # Get unique protocols in DL and assign a color to each - might not produce same colors in different executions
    dl_colors = px.colors.qualitative.Plotly[:len(dl_protocols)] # Plotly has 10 colors
    protocol_color_map_dl = dict(zip(dl_protocols, dl_colors))

    ul_protocols = ['DTLSv1.2', 'SRTCP', 'STUN', 'UDP'] # Define manually all the UL protocols so that they always have the same assigned color
    # ul_protocols = sorted(df_ul['_ws.col.Protocol'].unique())    # Get unique protocols in UL and assign a color to each - might not produce same colors in different executions
    ul_colors = px.colors.qualitative.Plotly[len(dl_protocols):len(dl_protocols) + len(ul_protocols)] # Plotly has 10 colors
    protocol_color_map_ul = dict(zip(ul_protocols, ul_colors))

  else:
    dataframes = ['DL', 'UL']
    dataframes_col = px.colors.qualitative.Plotly[:len(dataframes)]
    color_map_dl = dict(zip(dataframes, dataframes_col))

  plt.figure(figsize=(20, 5))

  if protocols:
    for i, df in enumerate([df_dl, df_ul]):
      for j, prot in enumerate(sorted(df['_ws.col.Protocol'].unique())):
        if spec_prot!=None and prot not in spec_prot:
          continue
        color = protocol_color_map_dl[prot] if i == 0 else protocol_color_map_ul[prot]

        filtered_df = df[df['_ws.col.Protocol'] == prot]

        if from0: #plot x_axis starting from 0
          if df_dl[x_col].min() >=df_ul[x_col].min():
            x_val = (filtered_df[x_col]- df_dl[x_col].min())
          else:
            x_val = (filtered_df[x_col]- df_ul[x_col].min())
        else:
          x_val = filtered_df[x_col]

        if "ms" in x_title:
          x_val*=1000 # convert to ms

        plt.stem(x_val, filtered_df[y_col], linefmt=color, markerfmt=color, basefmt='k',
                label=f"{'DL' if i == 0 else 'UL'} {prot}")
  else:
    for i, df in enumerate([df_dl, df_ul]):
      name_df = 'DL' if i == 0 else 'UL'
      color = color_map_dl[name_df]
      filtered_df = df
      plt.stem(filtered_df[x_col], filtered_df[y_col], linefmt=color, markerfmt=color, basefmt='k',
                label=name_df)

  plt.title(title)
  plt.xlabel(x_title)
  plt.ylabel(y_title)
  plt.legend(loc='upper center', bbox_to_anchor=(0.5, 1.45), ncol=2)  # Move the legend to the top and outside of the plot
  plt.show()

#### **Traffic load**

In [None]:
def plot_traffic_load_over_time(traffic_df_list: List[pd.DataFrame], title: str, x_title: str,  y_title: str,
                                label_list: list, color_list: list) -> None:
  """
  Plots traffic load over time for multiple dataframes in a single plot.

  Parameters:
    traffic_df_list (list): The list of dataframes containing the traffic data to be plotted.
    title (str): The title of the plot.
    x_title (str): The title of the x-axis.
    y_title (str): The title of the y-axis.
    label_list (list): The list of labels for the traces in the plot.
    color_list (list): The list of colors for the traces in the plot.

  Returns:
    None
  """
  fig = go.Figure()

  # Add traces for each dataframe in traffic_df_list
  for i, df in enumerate(traffic_df_list):
    traffic_df = traffic_df_list[i]

    # Add trace to the plot
    fig.add_trace(go.Scatter(x=traffic_df['seconds'],
                              y=traffic_df['traffic'],
                              name=label_list[i] if label_list else None,
                              marker_color = color_list[i] if color_list else None,
                              yaxis='y'))

  # Update layout with title and axis labels
  fig.update_layout(title=title,
                    xaxis_title=x_title,
                    yaxis_title=y_title)

  # Display plot
  fig.show()

#### **Correlation**


In [None]:
def plot_correlation(df_traffic_dl_list: List[pd.DataFrame], df_traffic_ul_list: List[pd.DataFrame], interval: float) -> None:
  """
  Plots the correlation between uplink and downlink traffic for a list of DataFrames

  Parameters:
    df_traffic_dl_list (List[pd.DataFrame]): List of DataFrames containing downlink traffic data
    df_traffic_ul_list (List[pd.DataFrame]): List of DataFrames containing uplink traffic data
    interval (float): The sampling interval in seconds

  Returns:
    None
  """

  for df_downlink, df_uplink in zip(df_traffic_dl_list, df_traffic_ul_list):
    print()
    if df_downlink.shape[0] != df_uplink.shape[0]:
      # Find the maximum value of seconds
      max_seconds_df_dl = df_downlink['seconds'].max()
      max_seconds_df_ul = df_uplink['seconds'].max()

      if max_seconds_df_ul < max_seconds_df_dl:
        # Truncate dataset to match the size based on the seconds column
        df_downlink = df_downlink.loc[df_downlink['seconds'] <= max_seconds_df_ul].reset_index(drop=True)
      else:
        # Truncate dataset to match the size based on the seconds column
        df_uplink = df_uplink.loc[df_uplink['seconds'] <= max_seconds_df_dl].reset_index(drop=True)

    # Compute the cross-correlation between the two traffic columns
    cross_corr = np.correlate(df_uplink['traffic'], df_downlink['traffic'], mode='full')

    # Compute the lag values
    lags = np.arange(-len(df_uplink)+1, len(df_downlink))

    # Find the index of the maximum cross-correlation value
    max_index = np.argmax(cross_corr)

    # Find the maximum lag that maximizes the cross-correlation
    max_lag = lags[max_index]

    # Shift the uplink traffic column by the lag value
    df_uplink_shifted = df_uplink.shift(-max_lag)

    # Compute the correlation coefficient between the shifted uplink traffic and the downlink traffic
    correlation_coefficient_shifted = df_uplink_shifted['traffic'].corr(df_downlink['traffic'])

    # Create a scatter plot with a regression line for the shifted data
    fig = px.scatter(df_downlink, x='traffic', y=df_uplink_shifted['traffic'], trendline='ols',
                    labels={'traffic': 'Downlink Traffic', 'y': 'Uplink Traffic (Shifted)'},
                    title=f'Uplink and Downlink Traffic Correlation (Shifted by {max_lag} samples, resampled by {interval*1000} ms)')
    fig.update_traces(line=dict(color='#e5523e'))
    fig.add_annotation(x=0.5, y=0.95, text=f'Correlation coefficient: {correlation_coefficient_shifted:.4f}',
                      showarrow=False, xref='paper', yref='paper', font=dict(size=12))

    # Display the plot
    fig.show()

#### **ECDF**

In [None]:
def create_ecdf_plot( df_list: List[pd.DataFrame], column: str, title: str, x_title: str, y_title: str, label_list: list,
                     color_list: list, factor: float = 1, diff: bool = False) -> None:
  """
  Create a plot of the empirical cumulative distribution function (ECDF) for the given column in each DataFrame
  in df_list.

  Parameters:
    df_list (pd.DataFrame): The list of dataframes containing the traffic data to be plotted.
    column (str): The column name for which to create the ECDF plot.
    title (str): The title of the plot.
    x_title (str): The title of the x-axis.
    y_title (str): The title of the y-axis.
    label_list (list): The list of labels for the traces in the plot.
    color_list (list): The list of colors for the traces in the plot.
    factor (float, optional): A float representing the factor by which to scale the data in the column.
    diff (bool, optional): A boolean representing whether to compute the differences of the sorted column data.

  Returns:
    None
  """
  fig = go.Figure()

  for i, df in enumerate(df_list):
    if diff:
      df_sorted = np.sort(df[column].diff())
    else:
      df_sorted = np.sort(df[column])
    df_sorted *= factor
    y = np.arange(1, len(df[column]) + 1) / len(df[column])
    fig.add_trace(
      go.Scatter(
        x=df_sorted,
        y=y,
        name=label_list[i] if label_list else None,
        marker_color=color_list[i] if color_list else None,
        yaxis='y'
      )
    )

  fig.update_layout(
    title=title,
    xaxis_title=x_title,
    yaxis_title=y_title
  )
  font_family = "Times New Roman"
  font_size = 20
  fig.update_layout(
      font=dict(family=font_family, size=font_size),
      template='plotly_white',
      xaxis=dict(showgrid=False), #,dtick=10, ,range=[75, 95]
      yaxis=dict(showgrid=False),
      legend=dict(
      orientation="h",
      yanchor="bottom",
      y=1.02,
      xanchor="right",
      x=1)
  )

  fig.update_layout(
      autosize=False,
      width=600,
      height=400)

  fig.show()

#### **Histogram**

In [None]:
def create_histogram_plot(df_list: List[pd.DataFrame], column: str, title: str, x_title: str, y_title: str,
                          label_list: List[str], color_list: List[str], factor: float = 1, diff: bool = False) -> None:
  """
  Create a plot of histograms for the given column in each DataFrame in df_list.

  Parameters:
    df_list (List[pd.DataFrame]): The list of dataframes containing the traffic data to be plotted.
    column (str): The column name for which to create the histogram plot.
    title (str): The title of the plot.
    x_title (str): The title of the x-axis.
    y_title (str): The title of the y-axis.
    label_list (List[str]): The list of labels for the histograms in the plot.
    color_list (List[str]): The list of colors for the histograms in the plot.
    factor (float, optional): A float representing the factor by which to scale the data in the column.
    diff (bool, optional): A boolean representing whether to compute the differences of the sorted column data.

  Returns:
    None
  """
  fig = go.Figure()
  for i, df in enumerate(df_list):
      if diff:
          df_sorted = np.sort(df[column].diff())
      else:
          df_sorted = np.sort(df[column])
      df_sorted *= factor
      fig.add_trace(
          go.Histogram(
              x=df_sorted,
              name=label_list[i] if label_list else None,
              histnorm='probability',
              histfunc='count',
              #autobinx=False,
              #xbins=dict(start=min(df_sorted), end=max(df_sorted), size=1)

          )
      )

  fig.update_layout(
      title=title,
      xaxis_title=x_title,
      yaxis_title=y_title
  )

  fig.show()

#### **Box plot**

In [None]:
def box_plot(df_list: List[pd.DataFrame], title: str, parameter_list: List[str], column: str,
            unit: str, protocol: str = None, grouped: bool = False, factor: float = 1, diff: bool = False) -> None:
  """
  Creates a box plot for the given dataframe(s) and column(s).

  Parameters:
    df_list (pd.DataFrame): The list of dataframes containing the traffic data to be plotted.
    title (str): The title of the plot.
    parameter_list (ist[str]): A list of strings describing the dataframes in df_list.
    column (str): The column name to plot.
    unit (str): The unit of measurement for the column.
    protocol (str, optional): The protocol to filter by.
    grouped (bool, optional): Whether the dataframes are grouped by protocol.
    factor (float, optional): A float representing the factor by which to scale the data in the column.
    diff (bool, optional): A boolean representing whether to compute the differences of the sorted column data.

  Returns:
    None
  """
  data = []
  labels = []

  if protocol:
    for i, df in enumerate(df_list):
      df_filtered = df[df['_ws.col.Protocol'] == protocol]
      if diff:
        df_sorted = np.sort(df_filtered[column].diff())
      else:
        df_sorted = np.sort(df_filtered[column])
      df_sorted *= factor
      data.append(df_sorted)
      labels.append(f"{parameter_list[i]}{protocol}")
  else:
    for i, df in enumerate(df_list):
      for protocol in df['_ws.col.Protocol'].unique():
        df_filtered = df[df['_ws.col.Protocol'] == protocol]
        if diff:
          df_sorted = np.sort(df_filtered[column].diff())
        else:
          df_sorted = np.sort(df_filtered[column])
        df_sorted *= factor
        data.append(df_sorted)
        labels.append(f"{parameter_list[i]}{protocol}")

  # Create box plot using plotly
  fig = go.Figure()
  for i in range(len(data)):
    fig.add_trace(go.Box(y=data[i], name=labels[i]))
  if grouped:
    title = 'Grouped ' + title

  if column == 'frame.time_relative':
      column = 'time.btw.groups'
  fig.update_layout(
    title=title + " Box Plot",
    xaxis_title="Protocol",
    yaxis_title=column + unit
  )
  font_family = "Times New Roman"
  font_size = 20
  fig.update_layout(
      font=dict(family=font_family, size=font_size),
      template='plotly_white',
      xaxis=dict(showgrid=False),
      legend=dict(
      orientation="h",
      yanchor="bottom",
      y=1.02,
      xanchor="right",
      x=1)
  )

  fig.update_layout(
      autosize=False,
      width=1200,
      height=600)
  fig.show()

In [None]:
def comparative_box_plot(df_list: List[List[pd.DataFrame]], df_labels: List[str], title: str, category_list: List[str], column: str, y_title: str,
            unit: str, protocol: List[str] = None, grouped: bool = False, factor: float = 1, diff: bool = False) -> None:
  """
  Creates a box plot for the given dataframe(s) and column(s).

  Parameters:
    df_list (List[List[pd.DataFrame]]): The list of dataframes containing the traffic data to be plotted.
    df_labels (List[str]]): The list of labels for each dataframe list in dataframe
    title (str): The title of the plot.
    category_list (List[str]): A list of strings describing the dataframes in df_list.
    column (str): The column name to plot.
    y_title (str): Title for the y axis column
    unit (str): The unit of measurement for the column.
    protocol (List[str], optional): List of protocol to filter by.
    grouped (bool, optional): Whether the dataframes are grouped by protocol.
    factor (float, optional): A float representing the factor by which to scale the data in the column.
    diff (bool, optional): A boolean representing whether to compute the differences of the sorted column data.

  Returns:
    None
  """
  merged_list = []
  if len(df_list)==1:
    merged_list = df_list[0]
  elif len(df_list) == 2:
    df_list1 = []
    df_list2 = []
    for df in df_list[0]:
      df_copy = df.copy()  # Create a copy of the dataframe
      df_copy['_ws.col.Protocol'] = df_copy['_ws.col.Protocol'].replace('STUN', 'STUN ' + df_labels[0])
      df_copy['_ws.col.Protocol'] = df_copy['_ws.col.Protocol'].replace('DTLSv1.2','DTLSv1.2 ' + df_labels[0])
      df_copy['_ws.col.Protocol'] = df_copy['_ws.col.Protocol'].replace('UDP', 'UDP ' + df_labels[0])
      df_list1.append(df_copy)
    for df in df_list[1]:
      df_copy = df.copy()  # Create a copy of the dataframe
      df_copy['_ws.col.Protocol'] = df_copy['_ws.col.Protocol'].replace('STUN', 'STUN ' + df_labels[1])
      df_copy['_ws.col.Protocol'] = df_copy['_ws.col.Protocol'].replace('DTLSv1.2', 'DTLSv1.2 ' + df_labels[1])
      df_copy['_ws.col.Protocol'] = df_copy['_ws.col.Protocol'].replace('UDP', 'UDP ' + df_labels[1])
      df_list2.append(df_copy)
    for i in range(len(df_list1)):
      merged_df = pd.concat([df_list1[i], df_list2[i]], axis=0)
      merged_list.append(merged_df)
  else:
    return None

  protocols = sorted(merged_list[0]['_ws.col.Protocol'].unique())

  fig = go.Figure()

  data_list = []
  labels_list = []

  if protocol:
    protocols = protocol

  for i, df in enumerate(merged_list):
    data = []
    labels = []
    for p in protocols:
      df_filtered = df[df['_ws.col.Protocol'] == p]
      if diff:
        df_sorted = np.sort(df_filtered[column].diff())
      else:
        df_sorted = np.sort(df_filtered[column])
      df_sorted *= factor
      data.append(df_sorted)
      labels.extend([p]*len(df_sorted))
    data_list.append(data)
    labels_list.append(labels)

  print(labels_list)
  # Create grouped box plot using plotly

  for i, data in enumerate(data_list):
    #must be 1D
    flat_data = np.concatenate(data)  # Flatten the nested arrays
    fig.add_trace(go.Box(y=flat_data, x=labels_list[i], name=category_list[i], boxmean=True
                         ))
  font_family = "Times New Roman"
  font_size = 20
  fig.update_layout(
      title=title,
      yaxis_title=y_title + unit,
      font=dict(family=font_family, size=font_size),
      boxmode='group',
      template='plotly_white',
      xaxis=dict(showgrid=False),
      legend=dict(
      orientation="h",
      yanchor="bottom",
      y=1.02,
      xanchor="right",
      x=1)
  )

  fig.update_layout(boxgap=0.2)
  fig.update_layout(
      autosize=False,
      width=1200,
      height=600)

  fig.update_layout(yaxis_type="log")

  fig.show()


In [None]:
def comparative_box_plot_category(df_list: List[List[pd.DataFrame]], df_labels: List[str], title: str, category_list: List[str], column: str, y_title: str,
            unit: str, protocol: List[str] = None, grouped: bool = False, factor: float = 1, diff: bool = False) -> None:
  """
  Creates a box plot for the given dataframe(s) and column(s). X labels the categories

  Parameters:
    df_list (List[List[pd.DataFrame]]): The list of dataframes containing the traffic data to be plotted.
    df_labels (List[str]]): The list of labels for each dataframe list in dataframe
    title (str): The title of the plot.
    category_list (List[str]): A list of strings describing the dataframes in df_list.
    column (str): The column name to plot.
    y_title (str): Title for the y axis column
    unit (str): The unit of measurement for the column.
    protocol (List[str], optional): List of protocol to filter by.
    grouped (bool, optional): Whether the dataframes are grouped by protocol.
    factor (float, optional): A float representing the factor by which to scale the data in the column.
    diff (bool, optional): A boolean representing whether to compute the differences of the sorted column data.

  Returns:
    None
  """
  merged_list = []
  if len(df_list)==1:
    merged_list = df_list[0]
  elif len(df_list) == 2:
    df_list1 = []
    df_list2 = []
    for df in df_list[0]:
      df_copy = df.copy()  # Create a copy of the dataframe
      df_copy['_ws.col.Protocol'] = df_copy['_ws.col.Protocol'].replace('STUN', 'STUN ' + df_labels[0])
      df_copy['_ws.col.Protocol'] = df_copy['_ws.col.Protocol'].replace('DTLSv1.2','DTLSv1.2 ' + df_labels[0])
      df_copy['_ws.col.Protocol'] = df_copy['_ws.col.Protocol'].replace('UDP', 'UDP ' + df_labels[0])
      df_list1.append(df_copy)
    for df in df_list[1]:
      df_copy = df.copy()  # Create a copy of the dataframe
      df_copy['_ws.col.Protocol'] = df_copy['_ws.col.Protocol'].replace('STUN', 'STUN ' + df_labels[1])
      df_copy['_ws.col.Protocol'] = df_copy['_ws.col.Protocol'].replace('DTLSv1.2', 'DTLSv1.2 ' + df_labels[1])
      df_copy['_ws.col.Protocol'] = df_copy['_ws.col.Protocol'].replace('UDP', 'UDP ' + df_labels[1])
      df_list2.append(df_copy)
    for i in range(len(df_list1)):
      merged_df = pd.concat([df_list1[i], df_list2[i]], axis=0)
      merged_list.append(merged_df)
  else:
    return None

  protocols = sorted(merged_list[0]['_ws.col.Protocol'].unique())
  fig = go.Figure()

  data_list = []
  labels_list = []

  if protocol:
    protocols = protocol

  for p in protocols:
    data = []
    labels = []
    for i, df in enumerate(merged_list):
      df_filtered = df[df['_ws.col.Protocol'] == p]
      if diff:
        df_sorted = np.sort(df_filtered[column].diff())
      else:
        df_sorted = np.sort(df_filtered[column])
      df_sorted *= factor
      data.append(df_sorted)
      labels.extend([category_list[i]]*len(df_sorted))
    data_list.append(data)
    labels_list.append(labels)

  # Create grouped box plot using plotly

  for i, data in enumerate(data_list):
    #must be 1D
    flat_data = np.concatenate(data)  # Flatten the nested arrays
    fig.add_trace(go.Box(y=flat_data, x=labels_list[i], name=protocols[i], boxmean=True
                         ))

  fig.update_layout(
      title=title,
      yaxis_title=y_title + unit,
      boxmode='group'
  )
  fig.update_layout(boxgap=0.2)

  font_family = "Times New Roman"
  font_size = 20
  fig.update_layout(
      font=dict(family=font_family, size=font_size),
      template='plotly_white',
      xaxis=dict(showgrid=False,dtick=2),
      legend=dict(
      orientation="h",
      yanchor="bottom",
      y=1.02,
      xanchor="right",
      x=1)
  )

  fig.update_layout(
      autosize=False,
      width=1200,
      height=600)

  fig.update_layout(yaxis_type="log")

  fig.show()


In [None]:
def traffic_load_box_plot(traffic_df_list: List[pd.DataFrame], title: str, parameter: str,
                                label_list: list) -> None:
  """
  Creates a traffic box plot for the given dataframe(s).

  Parameters:
    traffic_df_list (pd.DataFrame): The list of dataframes containing the traffic data to be plotted.
    title (str): The title of the plot.
    parameter (str): The name of the transmission link (Uplink or Downlink).


  Returns:
    None
  """
  data = []

  for df in traffic_df_list:
    df_sorted = np.sort(df['traffic'])
    data.append(df_sorted)


  # Create box plot using plotly
  fig = go.Figure()
  for i in range(len(data)):
    fig.add_trace(go.Box(y=data[i],
                         name=label_list[i],
                         boxmean=True,  # Show mean line
                         #boxpoints='all'
                         ))

  fig.update_layout(
    title=title,
    xaxis_title='',
    yaxis_title= parameter + 'Throughput (Mbps)'
  )

  font_family = "Times New Roman"
  font_size = 20
  fig.update_layout(
      font=dict(family=font_family, size=font_size),
      template='plotly_white',
      xaxis=dict(showgrid=False),
      legend=dict(
      orientation="h",
      yanchor="bottom",
      y=1.02,
      xanchor="right",
      x=1)
  )

  fig.update_layout(
      autosize=False,
      width=1200,
      height=600)

  fig.show()

#### **Bar plot**

In [None]:
def create_grouped_bar_plot(data: List[List[float]], title: str, x_labels: List[str], bar_labels: List[str] , y_label: str, type_y:str = None):
  """
  Creates a grouped bar plot for the given data.

  Parameters:
  data (List[List[float]]): A list of lists containing the values for each bar group.
  title (str): The title of the plot.
  x_labels (List[str]): A list of labels for each category.
  bar_labels (List[str]): A list of labels for each group of bars.
  y_label (str): The label for the y-axis.
  type_y (str): Defines the type of the y-axis

  Returns:
  None
  """
  colors = px.colors.qualitative.Plotly[:len(data)]

  fig = go.Figure()

  for i, values in enumerate(data):
    x = bar_labels
    y = values  # Values for the bar

    fig.add_trace(go.Bar(
      x=x,
      y=y,
      name=x_labels[i],
      marker_color=colors[i]
    ))

  fig.update_layout(
    title=title,
    yaxis=dict(title=y_label, type=type_y),
    barmode='group'
  )

  fig.show()

### **Wireshark Computations build**

#### **Streams characteristics**

In [None]:
def streams_characteristics(df_list: List[pd.DataFrame], title: str, parameter_list: List[str], grouped: bool = False):
  """
  Computes and and displays various characteristics of network streams from a list of pandas DataFrames.

  Parameters:
    df_list (List[pd.DataFrame]): A list of pandas DataFrames, where each DataFrame represents a network stream.
    title (str): The title for the resulting DataFrame.
    parameter_list (List[str]): A list of strings representing additional parameters for each stream, such as source/destination IP addresses, port numbers, etc.
    grouped (bool): A boolean indicating whether the streams are grouped or not. Default is False.

  Returns:
    None.
  """
  percentage_th = 0.90
  num_intervals = 3
  num_most_common = 3
  round_val = 4

  rows = []
  for i, df in enumerate(df_list):

    for protocol in df['_ws.col.Protocol'].unique():
      df_filtered = df[df['_ws.col.Protocol'] == protocol]

      row = [parameter_list[i] + protocol]
      col = ['Parameter']

      if grouped:
        group_num = df_filtered.shape[0]
        row.extend([group_num])
        col.extend(['Number of groups'])

      else:
        pkt_num = df_filtered.shape[0]
        load = round(df_filtered['frame.len'].sum() *8 / ((df['frame.time_relative'].max() - df['frame.time_relative'].min()) * 1e6), 5) # Mbps
        traffic_percentage = round(df_filtered['frame.len'].sum() / df['frame.len'].sum() * 100, 5)
        row.extend([pkt_num, load, traffic_percentage])
        col.extend(['Number of packets', 'Load (Mbps)', 'Traffic (%)'])

      if grouped:
        column = ['num.packets','group.frame.len.tot', 'frame.time_relative', 'group.time', 'avg.time.between.packets' ]
        column_name = ['Packet number','Group size (bytes)', 'Time btw groups (ms)', 'Group time (ms)', 'Time between group packets (ms)']
      else:
        column = ['frame.len']
        column_name = ['Packet size (bytes)']

      for j, col_p in  enumerate(column):
        if col_p == 'frame.time_relative':
          warnings.filterwarnings('ignore')
          df_filtered.loc[:, 'time.btw.group'] = df_filtered[col_p].diff() * 1000
          #df_filtered = df_filtered[df_filtered.index % 2 != 0] #just for some specific tests
          #df_filtered = df_filtered[df_filtered['num.packets'] == 9]#just for some specific tests

          col_p = 'time.btw.group'

        avg = round(df_filtered[col_p].mean(), round_val)
        stdev = round(df_filtered[col_p].std(), round_val)
        min_value = round(df_filtered[col_p].min(), round_val)
        max_value = round(df_filtered[col_p].max(), round_val)
        median_value = round(df_filtered[col_p].median(), round_val)

        if col_p != 'time.btw.group' and col_p != 'group.time' and col_p != 'avg.time.between.packets':
          most_common = df_filtered[col_p].value_counts(normalize=True).nlargest(num_most_common)

          most_common_str = '\n '.join(['{} ({:.2%})'.format(x, y) for x, y in zip(most_common.index.tolist(), most_common.tolist())])
          if (protocol=='SRTP Video' and col_p == 'group.frame.len.tot'):
            most_common_str = most_common_by_interval(percentage_th, num_intervals, df_filtered, max_value, min_value, most_common, most_common_str, col_p, 200)
          else:
            most_common_str = most_common_by_interval(percentage_th, num_intervals, df_filtered, max_value, min_value, most_common, most_common_str, col_p)

        row.extend([avg, stdev, min_value, max_value, median_value])
        col.extend(['Avg. ' + column_name[j], column_name[j] + ' Stdev', 'Min. ' + column_name[j], 'Max.' + ' ' + column_name[j], 'Median ' + column_name[j]])

        if col_p != 'time.btw.group' and col_p != 'group.time' and col_p != 'avg.time.between.packets':
          row.extend([most_common_str] )
          col.extend([ 'Most common ' + str(num_most_common) + ' ' + column_name[j]])

      rows.append(row)

  # create a DataFrame with the rows
  df = pd.DataFrame(rows, columns=col)
  # format the DataFrame
  styled_df = df.style \
    .set_caption(f'<b>{title}</b>') \
    .set_table_styles([{'selector': 'th', 'props': [('border', '1px solid black')]}]) \
    .set_properties(**{'border': '1px solid black', 'text-align': 'center', 'white-space': 'pre-wrap', 'font-size': '10pt'}) \
    .format(format_numeric_values) \
    .hide(axis="index")
  # display the styled DataFrame
  display(styled_df)

In [None]:
def format_numeric_values(val):
  """
  Formats a numeric value to three decimal places and removes trailing zeros.
  If the formatted value ends with '.000', it returns the integer value of the input.

  Parameters:
    val (str, int or float): The value to format.

  Returns:
    str or int: The formatted value.
  """
  if isinstance(val, (int, float)):
      formatted_val = f"{val:.4f}"
      if formatted_val.endswith(".0000"):
          return f"{int(val)}"
      return formatted_val.rstrip('0').rstrip('.')
  return val

In [None]:
def most_common_by_interval(percentage_th: float, num_intervals: int, df_filtered: pd.DataFrame,
                            max_value: float, min_value: float, most_common: pd.Series,
                            most_common_str: str, column: str, interval_inc: float = 1) -> str:
  """
  Computes the most common intervals of a column in a given dataframe.

  Parameters:
    percentage_th (float): The minimum percentage threshold for intervals to be considered most common.
    num_intervals (int): The number of most common intervals to display.
    df_filtered (pandas.DataFrame): The filtered dataframe to analyze.
    max_value (float): The maximum value for the column to be analyzed.
    min_value (float): The minimum value for the column to be analyzed.
    most_common (pandas.Series): A Pandas Series object containing the three most common values
                                  in the column, with their respective frequencies normalized.
    most_common_str (str): A string containing information about the most common intervals.
    column (str): The name of the column to be analyzed.
    interval_inc (float): The interval increment for computing intervals.

  Returns:
    most_common_str (str): The updated string containing information about the most common intervals.
  """
  top_intervals = None


  # Check if the sum of the most common values is less than the percentage threshold
  if most_common.sum() < percentage_th:
    interval_t = 0
    top_intervals_sum = 0

    # Loop until the sum of the top intervals exceeds the percentage threshold
    while top_intervals_sum < percentage_th:
      bottom_limit_included = False
      interval_t += interval_inc
      intervals = []
      top_limit = max_value

      # Compute intervals by iterating over top limit values
      while top_limit > min_value:
        bottom_limit = top_limit - interval_t

        # Set bottom limit to min_value if it is less than min_value
        if bottom_limit < min_value:
            bottom_limit = min_value

        # Compute the proportion of values in the current interval
        if bottom_limit == min_value:
          limit = ((df_filtered[column] >= bottom_limit) & (df_filtered[column] <= top_limit)).mean()
          bottom_limit_included = True
        else:
          limit = ((df_filtered[column] > bottom_limit) & (df_filtered[column] <= top_limit)).mean()

        # Add the interval and its proportion to the list of intervals
        intervals.append((bottom_limit, top_limit, limit))

        # Decrement top limit by interval increment
        top_limit -= interval_t

      # Select the top num_intervals intervals based on their proportion
      top_intervals = sorted(intervals, key=lambda x: x[2], reverse=True)[:num_intervals]
      top_intervals_sum = sum([x[2] for x in top_intervals])

      if interval_t >= max_value:
        break

  # Add information about top intervals to the most_common_str
  if top_intervals is not None:
    most_common_str += f'\n-INTERVALS-'
    for j, (bottom, top, limit) in enumerate(top_intervals):
      if bottom == min_value and bottom_limit_included:
        most_common_str += f'\n[{top}, {bottom}] ({limit:.2%})'
      else:
        most_common_str += f'\n[{top}, {bottom}) ({limit:.2%})'

  return most_common_str


#### **Traffic characteristics**

In [None]:
def traffic_stats(df_list: List[pd.DataFrame], title: str, parameter_list: List[str]) -> None:
  """
  Computes and displays traffic statistics in a styled Pandas DataFrame.
  To be used for DL and UL traffic characterization.

  Args:
    df_list (list): A list of Pandas DataFrames containing traffic data.
    title (str): The title of the table to be displayed.
    parameter_list (list): A list of strings containing the parameters associated with each DataFrame.

  Returns:
    None.
  """
  round_val = 4

  rows = []
  for i, df in enumerate(df_list):
    # Calculate traffic statistics
    avg_packet_size = round(df['frame.len'].mean(), round_val)
    stdev = round(df['frame.len'].std(), round_val)
    min_value = round(df['frame.len'].min(), round_val)
    max_value = round(df['frame.len'].max(), round_val)
    most_common = df['frame.len'].value_counts(normalize=True).nlargest(3)
    most_common_str = ', '.join([f"{x} ({y:.2%})" for x, y in zip(most_common.index.tolist(), most_common.tolist())])
    header_len = round((df['frame.len'] - df['udp.length']).mean(), round_val)
    time_diff = df['frame.time_relative'].diff() * 1000 # ms
    avg_time_diff = round(time_diff.mean(), round_val)
    load = round(df['frame.len'].sum() * 8 / ((df['frame.time_relative'].max() - df['frame.time_relative'].min()) * 1e6), round_val) # Mbps

    num_pkt = df.shape[0]

    # Add statistics to the row
    rows.append([parameter_list[i], avg_packet_size, stdev, min_value, max_value, most_common_str, header_len, avg_time_diff, load, num_pkt])

  # Create a DataFrame with the rows
  df = pd.DataFrame(rows, columns=['Parameter', 'Avg. packet size (bytes)', 'Stdev', 'Min (bytes)', 'Max (bytes)', '3 Most Common (bytes)', 'Frame Header Length (bytes)', 'Avg. time btw packets (ms)', 'Load (Mbps)', 'Total packet number'])

  # Format the DataFrame
  styled_df = (
    df.style
    .set_caption(f'<b>{title}</b>') # Set the table caption
    .set_table_styles([{'selector': 'th', 'props': [('border', '1px solid black')]}]) # Set the border style for the table header
    .set_properties(**{'border': '1px solid black', 'text-align': 'center', 'white-space': 'pre-wrap', 'font-size': '10pt'}) # Set the border style and text alignment for the table cells
    .set_table_attributes('style="margin-left:auto;margin-right:auto"') # Center the table on the page
    .format({'Avg. packet size (bytes)': '{:.3f}', 'Stdev': '{:.3f}', 'Frame Header Length (bytes)': '{:.0f}', 'Avg. time between packets (ms)': '{:.3f}', 'Load (Mbps)': '{:.4f}'}) # Set the format for the specified columns
    .hide(axis="index") # Hide the index column
  )

  # Display the styled DataFrame
  display(styled_df)


## **WebRTC**

### **WebRTC Statistics Load**

Dataframe load


In [None]:
def extract_webrtc_data(file_path,directory):
  path = directory
  completePath = path + file_path
  try:
    # Load the JSON file into a pandas DataFrame
    if file_path.endswith('.json') or file_path.endswith('.txt'):
        with open(completePath) as f:
          data = json.load(f)
          peer_connections = data.get("PeerConnections", {})
          if not peer_connections:
            print("\033[38;2;255;0;0mError: No PeerConnections data found in the file.\033[0m")
            return None, None, None, None
          # get the first PeerConnection
          first_peer_connection = list(peer_connections.values())[0]
          if len(list(peer_connections.values())) >1:
            print("\033[38;2;255;0;0mError: More than one PeerConnection in the file.\033[0m")
            return None, None, None, None
          stats = first_peer_connection.get("stats", {})
          if not stats:
            print("\033[38;2;255;0;0mError: No stats data found for the first PeerConnection in the file.\033[0m")
            return None, None, None, None
          if file_path.endswith('.json'):
            df, info, timestamp_list =  extract_webrtc_data_json(stats)
            side = 'server'
          else:
            df, info, timestamp_list =  extract_webrtc_data_text(stats)
            side = 'client'
        if df is None:
          print("\033[38;2;255;0;0mError: Something went wrong with \033[0m" , file_path, "\033[38;2;255;0;0m. Make sure the file exists and has the proper format. \033[0m")
          return None, None, None, None
        print("The" , file_path, "corresponds to the \033[38;2;255;165;0m", side, "\033[0mside WebRTC stats")
        return df , info, side, timestamp_list
    else:
      print("\033[38;2;255;0;0mError: The selected file \033[0m" , file_path, " \033[38;2;255;0;0mis not a .json or .txt file.\033[0m")
      return None, None, None, None

  except Exception as e:
    print(file_path)
    print("\033[38;2;255;0;0m", e ,":\033[0m")
    return None, None, None, None


In [None]:
# For data extracted from chrome-internals
## Note: This function retrieves data for a single video_id. It is assumed that there is only one candidate pair in the provided stats
def extract_webrtc_data_text(stats):
  try:
    info = {
      'startTime': None,
      'endTime': None,
      'vid_codec': None
      }

    metrics = {
      # Inbound-rtp stats - Downlink - RTP stream received by the Client
      ## Focusing on video
      'vid_frames_received_per_sec': [],
      'vid_frames_per_sec': [],
      'vid_frames_decoded_per_sec': [],
      'vid_frames_dropped_per_sec': [],
      'vid_packets_rec_per_sec': [],
      'vid_bits_rec_per_sec': [],
      'vid_tot_packets_lost': [],
      'vid_avg_jitter_buffer_delay': [],
      'vid_jitter': [],
      'inter_frame_delay': [],
      'inter_frame_delay_std': [],
      'frames_rec_minus_decode_and_dropped_tot': [],
      'frames_rec_minus_decode_and_dropped': [],

      'dec_time_per_frame': [],
      'process_del_per_frame': [],
      'assembly_time_per_frame': [],
      'discarded_pkt': [],

      # NO - Remote-outbound-rtp stats - stats reported by the Server (in this case the remote peer): will be looked at the server side webRTC stats

      # NO - Data-channel stats: are specific to the WebRTC data channel and does not include any RTP or RTCP packets. Includes packets to establish and maintain the data channel

      # NO - Transport stats - as if there is only one candidate pair being used for the WebRTC connection, then the candidate-pair and transport statistics will report the same values!

      # CandidatePair stats - reflects the traffic sent and received by the client
      ## Note: Stats only for the specific candidate pair
      'total_rtt': [], # time it took for a packet to travel from the client to the server and back again
      'average_rtt': [],
      'current_rtt': [],
      'packets_sent_per_sec': [],       # Uplink - RTCP stream sent by the Client
      'bits_sent_per_sec': [],          # Uplink - RTCP stream sent by the Client
      'packets_rec_per_sec': [],        # Downlink - RTP stream received by the Client
      'bits_rec_per_sec': []            # Downlink - RTP stream received by the Client
    }

    timestamp_list = []

    # To get the video inbound-rtp ID
    video_id=''
    for key, value in stats.items():
      if key.endswith("-kind") and value.get("statsType") == "inbound-rtp" and "video" in value.get("values"):
        video_id = key.split("-kind")[0]

    print('\033[38;2;255;165;0mVideo SDP identifier used:\033[0m', video_id)
    for key, value in stats.items():
      if key.startswith("AP-totalSamplesDuration"):
        info["startTime"] = value.get("startTime")
        info["endTime"] = value.get("endTime")
      # Video Inbound-rtp stats
      if key.startswith(video_id):
        if key.endswith("-[framesReceived/s]"): # the ones with [] are computed in chrome-internals
          metrics["vid_frames_received_per_sec"] = ast.literal_eval(value.get("values"))
        elif key.endswith("-framesPerSecond"):
          metrics["vid_frames_per_sec"] = ast.literal_eval(value.get("values"))
        elif key.endswith("-[framesDecoded/s]"):
          metrics["vid_frames_decoded_per_sec"] = ast.literal_eval(value.get("values"))
        elif key.endswith("-framesDropped"):
          framesDropped = ast.literal_eval(value.get("values")) #total frames dropped
          metrics["vid_frames_dropped_per_sec"] = [framesDropped[i] - framesDropped[i-1] for i in range(1, len(framesDropped))]
        elif key.endswith("-[packetsReceived/s]"):
          metrics["vid_packets_rec_per_sec"] = ast.literal_eval(value.get("values"))
        elif key.endswith("-[bytesReceived_in_bits/s]"):
          metrics["vid_bits_rec_per_sec"] = ast.literal_eval(value.get("values"))
        elif key.endswith("-packetsLost"):
          metrics["vid_tot_packets_lost"] = ast.literal_eval(value.get("values"))
        elif key.endswith("-[jitterBufferDelay/jitterBufferEmittedCount_in_ms]"):
          metrics["vid_avg_jitter_buffer_delay"] = ast.literal_eval(value.get("values"))
        elif key.endswith("-jitter"):
          metrics["vid_jitter"] = ast.literal_eval(value.get("values"))
        elif key.endswith("-[codec]"):
          info["vid_codec"] = json.loads(value.get("values"))[0]
        elif key.endswith("[totalInterFrameDelay/framesDecoded_in_ms]"):
          metrics["inter_frame_delay"] = ast.literal_eval(value.get("values"))
        elif key.endswith("-[interFrameDelayStDev_in_ms]"):
          metrics["inter_frame_delay_std"] = ast.literal_eval(value.get("values"))
        elif key.endswith("-[framesReceived-framesDecoded-framesDropped]"):
          framesLost = ast.literal_eval(value.get("values")) #total frames lost
          metrics["frames_rec_minus_decode_and_dropped_tot"] = framesLost
          metrics["frames_rec_minus_decode_and_dropped"] = [framesLost[i] - framesLost[i-1] for i in range(1, len(framesLost))]
        elif key.endswith("-[totalDecodeTime/framesDecoded_in_ms]"):
          metrics["dec_time_per_frame"] = ast.literal_eval(value.get("values"))
        elif key.endswith("-[totalProcessingDelay/framesDecoded_in_ms]"):
          metrics["process_del_per_frame"] = ast.literal_eval(value.get("values"))
        elif key.endswith("-[totalAssemblyTime/framesAssembledFromMultiplePackets_in_ms]"):
          metrics["assembly_time_per_frame"] = ast.literal_eval(value.get("values"))
        elif key.endswith("-packetsDiscarded"):
          metrics["discarded_pkt"] = ast.literal_eval(value.get("values"))
      # CandidatePair stats
      elif value.get('statsType') == "candidate-pair":
        try:
          if info["startTime"]== value.get("startTime") and info["endTime"] == value.get("endTime"):
            if key.endswith("-totalRoundTripTime"):
              metrics["total_rtt"] = ast.literal_eval(value.get("values"))
            elif key.endswith("-[totalRoundTripTime/responsesReceived]"):
              metrics["average_rtt"] = ast.literal_eval(value.get("values"))
            elif key.endswith("-currentRoundTripTime"):
              metrics["current_rtt"] = ast.literal_eval(value.get("values"))
            elif key.endswith("-[packetsSent/s]"):
              metrics["packets_sent_per_sec"] = ast.literal_eval(value.get("values"))
            elif key.endswith("-[packetsReceived/s]"):
              metrics["packets_rec_per_sec"] = ast.literal_eval(value.get("values"))
            elif key.endswith("-[bytesSent_in_bits/s]"):
              metrics["bits_sent_per_sec"] = ast.literal_eval(value.get("values"))
            elif key.endswith("-[bytesReceived_in_bits/s]"):
              metrics["bits_rec_per_sec"] = ast.literal_eval(value.get("values"))
            elif key.endswith("-[lastPacketSentTimestamp]"):
              for i in range(1, len(ast.literal_eval(value.get("values")))):
                current_time = datetime.strptime(ast.literal_eval(value.get("values"))[i], "%d/%m/%Y, %H:%M:%S")
                previous_time = datetime.strptime(ast.literal_eval(value.get("values"))[i-1], "%d/%m/%Y, %H:%M:%S")
                diff = (current_time - previous_time).total_seconds()
                timestamp_list.append(diff)
        except Exception as e:
            print(f"An error occurred while processing {key}: {e}")
            continue
    length_metrics = 0
    max_key = ''
    for key, val in metrics.items():
        length = len(val)
        if length > length_metrics:
            length_metrics = length
            max_key = key

    #max_key = max(metrics, key=lambda x: len(metrics[x])) #this sometimes gives error do not know why
    #length_metrics = len(metrics[max_key])
    for key, val in metrics.items():
      if len(val) < length_metrics:
          print("\033[38;2;255;165;0mWarning: The \033[0m", key, " \033[38;2;255;165;0m was empty or not complete for the\033[0m", filePath, " \033[38;2;255;165;0mfile. Its length was:\033[0m", len(val), " \033[38;2;255;165;0mand expected length was\033[0m",length_metrics )
          average = sum(val) / len(val) if len(val) > 0 else 0
          metrics[key] = [average] * (length_metrics - len(val)) + val
    #for m,v in metrics.items():
    # print(len(v))
    cumulative_timestamps = [0] + [sum(timestamp_list[:i+1]) for i in range(len(timestamp_list))]
    print("Statistics duration:", cumulative_timestamps[-1], "s")
    df = pd.DataFrame(metrics)
    print("\033[38;2;0;255;0mThe selected file\033[0m" , filePath, " \033[38;2;0;255;0m will be considered.\033[0m")
    return df , info, cumulative_timestamps
  except Exception as e:
    print("\033[38;2;255;0;0m", e ,":\033[0m")
    return None, None, None


In [None]:
# For data extracted from Unity webrtc stats
## Note: This function retrieves data for a single video_id. It is assumed that there is only one candidate pair in the provided stats
def extract_webrtc_data_json(stats):
  try:
    info = {
      'startTime': None,
      'endTime': None,
      'vid_codec': None
      }

    metrics = {
      # RTCOutboundRTPVideoStream - Downlink - RTP stream sent by the server
      ## Video
      'vid_frames_sent_per_sec': [],
      'vid_frames_per_sec': [],
      'vid_frames_encoded_per_sec': [],
      'vid_packets_sent_per_sec': [],
      'vid_bits_sent_per_sec': [],
      'vid_packets_retransmitted_per_sec': [],
      'vid_bits_retransmitted_per_sec': [],
      'vid_encode_time_per_sec': [], #  number of seconds that have been spent encoding the framesEncoded frames of this stream
      'vid_avg_encode_time': [],
      'vid_pkt_send_delay_per_sec': [], # number of seconds that packets have spent buffered locally before being transmitted onto the network
      'quality_res_changes': [],
      'quality_res_reason': [],

      # RTCRemoteInboundRtcVideoStream - stats reported by the Client (in this case the remote peer): will be looked in general at the client side
      'vid_packets_lost_per_sec': [],
      'vid_fraction_lost': [],

      # NO - RTCMediaStreamTrack - stats moved to RTCOutboundRTPVideoStream so already considered

      # NO - RTCDataChannel stats: are specific to the WebRTC data channel and does not include any RTP or RTCP packets. Includes packets to establish and maintain the data channel

      # RTCTransport stats - as if there is only one candidate pair being used for the WebRTC connection, then the candidate-pair and transport statistics will report the same values!
      'packets_sent_per_sec': [], # Downlink - RTP stream sent by the Server
      'packets_rec_per_sec': [], # Uplink - RTP stream received by the Server

      # CandidatePair stats - reflects the traffic sent and received by the server
      ## Note: Stats only for the specific candidate pair
      'total_rtt': [], # time it took for a packet to travel from the server to the client and back again
      'average_rtt': [],
      'current_rtt': [],
      'bits_sent_per_sec': [], # Downlink - RTP stream sent by the Server
      'bits_rec_per_sec': [] # Uplink - RTCP stream received by the Server
    }

    responses_rec = []
    codecId = None
    framesEnc = []
    timestamp_list = []

    #to get the codecId (used to get the codec) and responses received (used to compute the avg. rtt)
    for key, value in stats.items():
      if key.startswith('RTCOutboundRTPVideoStream'):
        if key.endswith("-codecId"):
          codecId = json.loads(value.get("values"))[0]
        elif key.endswith("-framesEncoded"):
          framesEnc = ast.literal_eval(value.get("values"))
      elif key.startswith('RTCIceCandidatePair'):
        if key.endswith("-responsesReceived"):
          responses_rec = ast.literal_eval(value.get("values"))

    for key, value in stats.items():
      # Video Outbound-rtp stats
      if key.startswith('RTCOutboundRTPVideoStream'):
        if key.endswith("-framesSent"):
          framesSent = ast.literal_eval(value.get("values")) #total frames sent
          metrics["vid_frames_sent_per_sec"] = [framesSent[i] - framesSent[i-1] for i in range(1, len(framesSent))]
          info["startTime"] = value.get("startTime")
          info["endTime"] = value.get("endTime")
        elif key.endswith("-framesPerSecond"):
          metrics["vid_frames_per_sec"] = ast.literal_eval(value.get("values"))
        elif key.endswith("-framesEncoded"):
          metrics["vid_frames_encoded_per_sec"] = [framesEnc[i] - framesEnc[i-1] for i in range(1, len(framesEnc))]
        elif key.endswith("-packetsSent"):
          pktSent = ast.literal_eval(value.get("values"))
          metrics["vid_packets_sent_per_sec"] = [pktSent[i] - pktSent[i-1] for i in range(1, len(pktSent))]
        elif key.endswith("-bytesSent"):
          bytesSent = ast.literal_eval(value.get("values"))
          metrics["vid_bits_sent_per_sec"] = [(bytesSent[i] - bytesSent[i-1])*8 for i in range(1, len(bytesSent))]
        elif key.endswith("-retransmittedPacketsSent"):
          retPktSent = ast.literal_eval(value.get("values"))
          metrics["vid_packets_retransmitted_per_sec"] = [retPktSent[i] - retPktSent[i-1] for i in range(1, len(retPktSent))]
        elif key.endswith("-retransmittedBytesSent"):
          retBytesSent = ast.literal_eval(value.get("values"))
          metrics["vid_bits_retransmitted_per_sec"] = [(retBytesSent[i] - retBytesSent[i-1])*8 for i in range(1, len(retBytesSent))]
        elif key.endswith("-totalEncodeTime"):
          totEncTime = ast.literal_eval(value.get("values"))
          metrics["vid_encode_time_per_sec"] =  [(totEncTime[i] - totEncTime[i-1]) for i in range(1, len(totEncTime))]
          metrics["vid_avg_encode_time"] = [x/y if y != 0 else 0 for x, y in zip(ast.literal_eval(value.get("values")), framesEnc)]
        elif key.endswith("-totalPacketSendDelay"):
          pcktDel = ast.literal_eval(value.get("values"))
          metrics["vid_pkt_send_delay_per_sec"] = [(pcktDel[i] - pcktDel[i-1]) for i in range(1, len(pcktDel))]
        elif key.endswith("-qualityLimitationResolutionChanges"):
          metrics["quality_res_changes"] = ast.literal_eval(value.get("values"))
        elif key.endswith("-qualityLimitationReason"):
          metrics["quality_res_reason"] = ast.literal_eval(value.get("values"))

      # RTCRemoteInboundRtcVideoStream stats
      elif key.startswith('RTCRemoteInboundRtpVideoStream'):
        if key.endswith("-packetsLost"):
          totPktLost = ast.literal_eval(value.get("values")) #total frames sent
          metrics["vid_packets_lost_per_sec"] = [(totPktLost[i] - totPktLost[i-1]) for i in range(1, len(totPktLost))]
        if key.endswith("-fractionLost"):
          metrics["vid_fraction_lost"] = ast.literal_eval(value.get("values"))

      # CandidatePair stats
      elif key.startswith('RTCIceCandidatePair'):
        if value.get("values") is not None:
          try:
            if abs(len(ast.literal_eval(value.get("values"))) - len(framesEnc)) <=2: # Check if the length of the two lists is equal or differs by 2. This is important for picking the correct candidate pair, and there  are cases where one list may have one less item than the other.
              if key.endswith("-totalRoundTripTime"):
                metrics["total_rtt"] = ast.literal_eval(value.get("values"))
                metrics["average_rtt"] = [x/y if y != 0 else 0 for x, y in zip(ast.literal_eval(value.get("values")), responses_rec)]
              elif key.endswith("-currentRoundTripTime"):
                metrics["current_rtt"] = ast.literal_eval(value.get("values"))
              elif key.endswith("-bytesSent"):
                bytesSent = ast.literal_eval(value.get("values"))
                metrics["bits_sent_per_sec"] = [(bytesSent[i] - bytesSent[i-1])*8 for i in range(1, len(bytesSent))]

              elif key.endswith("-bytesReceived"):
                bytesRec = ast.literal_eval(value.get("values"))
                metrics["bits_rec_per_sec"] =[0,0] + [(bytesRec[i] - bytesRec[i-1])*8 for i in range(1, len(bytesRec))]

          except Exception as e:
            print(f"An error occurred while processing {key}: {e}")
            continue

      # Transport stats
      elif key.startswith('RTCTransport'):
        if abs(len(ast.literal_eval(value.get("values"))) - len(framesEnc)) <=2:
          if key.endswith("-packetsSent"):
            pktSent = ast.literal_eval(value.get("values"))
            metrics["packets_sent_per_sec"] = [pktSent[i] - pktSent[i-1] for i in range(1, len(pktSent))]
          elif key.endswith("-packetsReceived"):
            pktRec = ast.literal_eval(value.get("values"))
            metrics["packets_rec_per_sec"] = [pktRec[i] - pktRec[i-1] for i in range(1, len(pktRec))]

      # Codec
      elif key.startswith(codecId):
        if key.endswith("-mimeType"):
          info["vid_codec"] = json.loads(value.get("values"))[0]
    length_metrics = 0
    max_key = ''
    for key, val in metrics.items():
        length = len(val)
        if length > length_metrics:
            length_metrics = length
            max_key = key
    #length_metrics = len(metrics[max(metrics, key=lambda x: len(metrics[x]))])  does not work sometime because i should use np.max?¿
    for key, val in metrics.items():
      if len(val) < length_metrics:
          print("\033[38;2;255;165;0mWarning: The \033[0m", key, " \033[38;2;255;165;0m was empty or not complete for the\033[0m", filePath, " \033[38;2;255;165;0mfile. Its length was:\033[0m", len(val), " \033[38;2;255;165;0mand expected length was\033[0m",length_metrics )
          average = sum(val) / len(val) if len(val) > 0 else 0
          metrics[key] = [average] * (length_metrics - len(val)) + val

    #for m,v in metrics.items():
     # print(len(v))
    df = pd.DataFrame(metrics)
    timestamp_list = [1]*length_metrics
    cumulative_timestamps = [0] + [sum(timestamp_list[:i+1]) for i in range(len(timestamp_list))]

    print("\033[38;2;0;255;0mThe selected file\033[0m" , filePath, " \033[38;2;0;255;0m will be considered.\033[0m")
    return df , info, cumulative_timestamps
  except Exception as e:
    print("\033[38;2;255;0;0m", e ,":\033[0m")
    return None, None, None




### **WebRTC Figures build**

In [None]:
# Define function to create time plot
def create_time_plot(df_list, column, title, x_title, y_title, timestamp_list, label_list=None, color_list=None, factor=1):
  fig = go.Figure()

  for i, df in enumerate(df_list):
    fig.add_trace(go.Scatter(x=timestamp_list[i], y=df[column]*factor, name=label_list[i], marker_color=color_list[i], yaxis='y', mode='lines'))

  fig.update_layout(
    title=title,
    font_family="Nimbus Roman",
    xaxis=dict(title=x_title),
    yaxis=dict(title=y_title),
    showlegend=True,
    legend=dict(x=1, y=1),
  )
  font_family = "Times New Roman"
  font_size = 20
  fig.update_layout(
      font=dict(family=font_family, size=font_size),
      template='plotly_white',
      xaxis=dict(showgrid=False), #dtick=10,range=[75, 95]
      yaxis=dict(showgrid=False),
      legend=dict(
      orientation="h",
      yanchor="bottom",
      y=1.02, #1.02 default, #change to 3 for some plots
      xanchor="right",
      x=1)
  )

  fig.update_layout(
      autosize=False,
      width=1200, #600 for some
      height=450) #450 for some or 300

  fig.show()



In [None]:
# Define function to create ECDF plot (same as for wireshark)
def create_ecdf_plot(df_list, column, title, x_title, y_title, label_list = None, color_list = None, factor = 1, diff = False):
  fig = go.Figure()
  for i, df in enumerate(df_list):
      if diff:
        df_sorted = np.sort(df[column].diff())
      else:
        df_sorted = np.sort(df[column])
      df_sorted = df_sorted * factor
      y= np.arange(1, len(df[column]) + 1) / len(df[column])
      fig.add_trace(go.Scatter(x=df_sorted, y=y, name=label_list[i], marker_color=color_list[i], yaxis='y', mode='lines'))
  fig.update_layout(title=title, xaxis_title=x_title, yaxis_title= y_title)
  fig.show()

In [None]:
def webrtc_grouped_bar_plot(df_list, column, title, y_title, label_list=None, color_list=None, factor=1, remove_name=True, diff=False, percentile = False):
  fig = go.Figure()
  colors = px.colors.qualitative.Plotly[:len(df_list)]
  for i, df in enumerate(df_list):
    values = df[column]*factor
    if diff:
      values = df[column].diff()*factor
    mean_value = np.mean(values)
    if not percentile:
      error = np.std(values)
      x = [label_list[i]]
      if remove_name:
        name_ = ''
      else:
        name_ = label_list[i]
      fig.add_trace(go.Bar(x=x, y=[mean_value], name=name_,marker_color=colors[i], yaxis='y', error_y=dict(
                  type='data',
                  array=[error],
                  visible=True,symmetric=True,

              )))
    else:
      p_99 = np.nanpercentile(values, 95)
      x = [label_list[i]]
      if remove_name:
        name_ = ''
      else:
        name_ = label_list[i]
      fig.add_trace(go.Bar(
          x=x,
          y=[p_99],
          name=name_,
          marker_color=colors[i],
          opacity=0.3,
          yaxis='y',
          offset=-0.4  # negative offset to stack the bar on top of the mean bar
      ))

      fig.add_trace(go.Bar(
          x=x,
          y=[mean_value],
          name=name_,
          marker_color=colors[i],
          yaxis='y',
      ))



  fig.update_layout(
    title=title,
    font_family="Nimbus Roman",
    yaxis=dict(title=y_title),
    showlegend=True,
    legend=dict(x=1, y=1),

  )
  font_family = "Times New Roman"
  font_size = 20
  fig.update_layout(
      font=dict(family=font_family, size=font_size),
      template='plotly_white',
      xaxis=dict(showgrid=False), #dtick=10,range=[75, 95]
      yaxis=dict(showgrid=False),
      legend=dict(
      orientation="h",
      yanchor="bottom",
      y=1.02,
      xanchor="right",
      x=1)
  )

  fig.update_layout(
      autosize=False,
      width=600, #600
      height=600) #450

  fig.show()

In [None]:
def webrtc_grouped_box_plot(df_list, column, title, y_title, label_list=None, color_list=None, factor=1, remove_name=True):
    fig = go.Figure()
    colors = px.colors.qualitative.Plotly[:len(df_list)]

    for i, df in enumerate(df_list):
        values = df[column] * factor
        x = [label_list[i]] * len(values)
        if remove_name:
            name_ = ''
        else:
            name_ = label_list[i]
        fig.add_trace(go.Box(x=x, y=values, name=name_, marker_color=colors[i]))

    fig.update_layout(
        title=title,
        font_family="Nimbus Roman",
        yaxis=dict(title=y_title),
        showlegend=True,
        legend=dict(x=1, y=1),
    )
    font_family = "Times New Roman"
    font_size = 20
    fig.update_layout(
        font=dict(family=font_family, size=font_size),
        template='plotly_white',
        xaxis=dict(showgrid=False),
        yaxis=dict(showgrid=False),
        legend=dict(
            orientation="h",
            yanchor="bottom",
            y=1.02,
            xanchor="right",
            x=1
        )
    )

    fig.update_layout(
        autosize=False,
        width=600,
        height=450
    )

    fig.show()


### **WebRTC Computations build**

In [None]:
def webRtc_traffic_characteristics(df, title, rtc_values, info):
    # create an empty DataFrame with columns for each parameter
    df_t = pd.DataFrame(columns=['startTime', 'endTime', 'vid_codec'] + list(rtc_values.values()))

    # create a list with the statistics for the row
    row = [info['startTime'], info['endTime'], info['vid_codec']]
    # calculate the mean for each parameter and add it to the row
    for key, value in rtc_values.items():
      factor = 1
      if '(ms)' in value:
        factor = 1000
      elif  'Mbps' in value:
        factor = 1/1e6
      if 'Total' in value:
        calc_val = round(df[key].max()*factor, 4) #get the total (last value)
      else:
        calc_val = round(df[key].mean()*factor, 4) #compute the mean
        print(key, round(df[key].std()*factor, 4))
        print(key, round(df[key].max()*factor, 4))
      row.append(calc_val)
    # add the row to the DataFrame
    df_t.loc[0] = row

    # format the DataFrame
    styled_df = df_t.style \
        .set_caption(f'<b>{title}</b>') \
        .set_table_styles([{'selector': 'th', 'props': [('border', '1px solid black')]}]) \
        .set_properties(**{'border': '1px solid black', 'text-align': 'center'}) \
        .set_table_attributes('style="margin-left:auto;margin-right:auto"') \
        .format(precision=4)

    # display the styled DataFrame
    display(styled_df)

## **Color palette**

In [None]:
def generate_color_palette(df_list: List[pd.DataFrame]) -> List[tuple]:
  """
  Generates a list of RGB tuples from a list of dataframes using Seaborn's color palette.

  Parameters:
    df_list (List[pd.DataFrame]): A list of pandas dataframes.

  Returns:
    color_list (List[tuple]): A list of RGB tuples.
  """
  # Number of colors to generate
  n_colors = len(df_list)

  # Generate a list of n_colors using the color palette
  color_palette = sns.color_palette(n_colors=n_colors)

  # Convert the color_palette to a list of RGB tuples
  color_list = [tuple(map(lambda x: int(x*255), color)) for color in color_palette]

  return color_list

# **SUPPORT**

## **File Demo Helper**

### **File locator**

In [None]:
file = {'path': None}

# Create an output widget to display messages
output_widget = widgets.Output()

select_file(directory, file, output_widget)

### **File format validator and path retrieval**

In [None]:
validate_and_retrieve_file_path(file, directory)

# **RESULTS**

## **Wireshark**

### **Run Wireshark Demo**

Insert traces path in *files_path* list

In [None]:
# Insert the traces path
files_path = ['Datasets/90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Server/Tshark Server - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.csv'
              #,'Datasets/90fps - 50Mbps - 3664x1920 - VP9 - 80 Mhz/Client/Tshark Client - 90fps - 50Mbps - 3664x1920 - VP9 - 80 Mhz.csv'
              ]

df_list, df_dl_list, df_ul_list, df_dl_z_list, df_ul_z_list, files_path_list = get_dataframes(files_path, directory, server_ip,
                                                                                              client_ip, complete_trace_start_time , complete_trace_end_time ,
                                                                                              specific_portion_start_time , specific_portion_end_time )

df_grouped_dl_list, df_grouped_ul_list, df_grouped_dl_z_list, df_grouped_ul_z_list = get_grouped_dataframes(df_dl_list, df_ul_list,
                                                                                                            df_dl_z_list, df_ul_z_list, grouping_time)

df_traffic_dl_list, df_traffic_ul_list = get_traffic_dataframes(df_dl_list, df_ul_list)


df_dl_list_grouped_mark = group_vid_dataframes_by_time_info(df_dl_list)
df_dl_list_grouped_z_mark = group_vid_dataframes_by_time_info(df_dl_z_list)

label_list = [os.path.splitext(os.path.basename(file_path))[0] for file_path in files_path_list]

color_list = generate_color_palette(df_list)


#### **Add phone to demo**

In [None]:
#to add the smartphone traces to the comparison set to True. If you want to use only the client traces just change the client_ip to the following in the Parameters section
client_traces = False
client_ip = "192.168.50.105"#phone

files_path = ['Datasets/PhoneClientMoves - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Server/Tshark Server - PhoneClientMoves - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.csv'
              ]
if client_traces:
  df_list_2, df_dl_list_2, df_ul_list_2, df_dl_z_list_2, df_ul_z_list_2, files_path_list_2 = get_dataframes(files_path, directory, server_ip,
                                                                                                client_ip, complete_trace_start_time , complete_trace_end_time ,
                                                                                                specific_portion_start_time , specific_portion_end_time )

  df_grouped_dl_list_2, df_grouped_ul_list_2, df_grouped_dl_z_list_2, df_grouped_ul_z_list_2 = get_grouped_dataframes(df_dl_list_2, df_ul_list_2,
                                                                                                              df_dl_z_list_2, df_ul_z_list_2, grouping_time)

  df_traffic_dl_list_2, df_traffic_ul_list_2 = get_traffic_dataframes(df_dl_list_2, df_ul_list_2)

  df_list = df_list + df_list_2
  df_dl_list = df_dl_list + df_dl_list_2
  df_ul_list = df_ul_list + df_ul_list_2
  df_dl_z_list = df_dl_z_list + df_dl_z_list_2
  df_ul_z_list = df_ul_z_list + df_ul_z_list_2
  files_path_list = files_path_list + files_path_list_2
  df_grouped_dl_list = df_grouped_dl_list + df_grouped_dl_list_2
  df_grouped_ul_list = df_grouped_ul_list + df_grouped_ul_list_2
  df_grouped_dl_z_list = df_grouped_dl_z_list + df_grouped_dl_z_list_2
  df_grouped_ul_z_list = df_grouped_ul_z_list + df_grouped_ul_z_list_2
  df_traffic_dl_list = df_traffic_dl_list + df_traffic_dl_list_2
  df_traffic_ul_list = df_traffic_ul_list + df_traffic_ul_list_2


  label_list = [os.path.splitext(os.path.basename(file_path))[0] for file_path in files_path_list]

  color_list = generate_color_palette(df_list)

#### **Run for SRTP Video**

In [None]:
# Video stream packets grouped per frame
df_dl_list_grouped_mark = group_vid_dataframes_by_time_info(df_dl_list)
df_dl_list_grouped_z_mark = group_vid_dataframes_by_time_info(df_dl_z_list)

df_dl_list_srtp_correct = []
for df in df_dl_list:
  df_copy = df.copy()
  df_copy['_ws.col.Time'] = df_copy['_ws.col.Info'].apply(extract_time)
  df_dl_list_srtp_correct.append(df_copy.reset_index(drop=True))
df_dl_list_z_srtp_correct = []
for df in df_dl_z_list:
  df_copy = df.copy()
  df_copy['_ws.col.Time'] = df_copy['_ws.col.Info'].apply(extract_time)
  df_dl_list_z_srtp_correct.append(df_copy.reset_index(drop=True))

# Video stream packets grouped per batches within a frame
df_grouped_dl_list_srtp_correct = group_dataframes_temp(df_dl_list, grouping_time) #because the other grouping did not take into account if they were from the same frame
df_grouped_dl_list_z_srtp_correct = group_dataframes_temp(df_dl_z_list, grouping_time) #because the other grouping did not take into account if they were from the same frame

df_grouped_dl_list_grouped_mark, df_grouped_dl_list_grouped_mark_not_nan = group_grouped_vid_dataframes_by_time_info(df_grouped_dl_list_srtp_correct)

### **Wireshark Demo Figures**

#### **Packet size vs Time**

##### **Time-filtered - Non-Grouped**

In [None]:
# Set to True to make the plot. Warning: if True, the plot can be computationally intensive.

plot_time_filtered = True

if plot_time_filtered:
  # Loop over all dataframes in df_list and plot packet size vs time for each one
  for i, df in enumerate(df_list):
    # Get the file name from the path
    file_name = os.path.splitext(os.path.basename(files_path_list[i]))[0]
    title = file_name
    """
    Call the create_scatter_plot function to create the plot

    Parameters:
      df_dl_t_list[i]: DataFrame containing downlink traffic data filtered by time.
      df_ul_t_list[i]: DataFrame containing upload traffic data filtered by time.
      'frame.time_relative': Column name for the x-axis data.
      'frame.len': Column name for the y-axis data.
      title: The title of the plot.
      'Time (s)': Label for the x-axis.
      'Packet size (bytes)': Label for the y-axis.
      True if the data should be grouped by protocol, False otherwise.
      False
      None
      True if x_axis should start from 0
      True if vertical lines should be plotted (stem plot)
      None if not specific protocol should be plotted, list of str specifying the specific protocl otherwise.

    """
    create_scatter_plot(df_dl_list[i], df_ul_list[i], 'frame.time_relative',
                        'frame.len', '', 'Time (s)', 'Packet size (bytes)', True, False, None, True, False, 'DTLSv1.2')

##### **Time-filtered - Grouped**

In [None]:
# Set to True to make the plot. Warning: if True, the plot can be computationally intensive.
plot_time_filtered = True

if plot_time_filtered:
  # Loop over all dataframes and plot packet size vs time for each one
  for i, df in enumerate(df_list):
    # Get the file name from the path
    file_name = os.path.splitext(os.path.basename(files_path_list[i]))[0]
    title = 'Grouped ' + file_name
    """
    Call the create_scatter_plot function to create the plot

    Parameters:
      df_grouped_dl_t_list[i]: DataFrame containing downlink traffic data grouped and filtered by time.
      df_grouped_ul_t_list[i]: DataFrame containing upload traffic data grouped and filtered by time.
      'frame.time_relative': Column name for the x-axis data.
      'group.frame.len.tot': Column name for the y-axis data.
      title: The title of the plot.
      'Time (s)': Label for the x-axis.
      'Total Group Packet size (bytes)': Label for the y-axis.
      True if the data should be grouped by protocol, False otherwise.
      True if the text should be shown on markers, False otherwise.
      'num.packets': The column name to use for the text labels.
      True if x_axis should start from 0
      True if vertical lines should be plotted (stem plot)
      None if not specific protocol should be plotted, list of str specifying the specific protocl otherwise
    """
    create_scatter_plot(df_grouped_dl_list[i], df_grouped_ul_list[i], 'frame.time_relative',
                       'group.frame.len.tot', '', 'Time (s)', 'Total Group Packet size (bytes)', True, False, 'num.packets', True, False, ['SRTP Video'])
    #for video by frames
    #create_scatter_plot(df_grouped_dl_list_srtp_correct[i], df_grouped_ul_list[i], 'frame.time_relative',
     #                     'group.frame.len.tot', '', 'Time (s)', 'Total Group Packet size (bytes)', True, False, 'num.packets', True, False, ['SRTP Video'])

##### **Zoomed - Non-Grouped**

In [None]:
# Loop over all dataframes in df_list and plot packet size vs time for each one
for i, df in enumerate(df_list):
  # Get the file name from the path
  file_name = os.path.splitext(os.path.basename(files_path_list[i]))[0]
  title = file_name
  """
  Call the create_scatter_plot function to create the plot

  Parameters:
    df_dl_z_list[i]: DataFrame containing downlink traffic data filtered by time.
    df_ul_z_list[i]: DataFrame containing upload traffic data filtered by time.
    'frame.time_relative': Column name for the x-axis data.
    'frame.len': Column name for the y-axis data.
    title: The title of the plot.
    x_label: Label for the x-axis.
    'Packet size (bytes)': Label for the y-axis.
    True if the data should be grouped by protocol, False otherwise.
    False
    None
    True if x_axis should start from 0
    True if vertical lines should be plotted (stem plot)
    None if not specific protocol should be plotted, list of str specifying the specific protocl otherwise

  """
  #create_scatter_plot(df_dl_z_list[i], df_ul_z_list[i], 'frame.time_relative',
        #              'frame.len', '', 'Time (ms)', 'Packet size (bytes)', True, False, None, True, False)
  #create_scatter_plot_temp(df_dl_z_list[i], df_ul_z_list[i], 'frame.time_relative',
                    #  'frame.len', '', 'Time (ms)', 'Packet size (bytes)', True, False, None, True, True, ['SRTP Audio'])
  #create_scatter_plot_temp(df_dl_z_list[i], df_ul_z_list[i], 'frame.time_relative',
        #              'frame.len', '', 'Time (ms)', 'Packet size (bytes)', True, False, None, True, False, ['SRTP Video'])
  #create_scatter_plot(df_dl_z_list[i], df_ul_z_list[i], 'frame.time_relative',
              #      'frame.len', '', 'Time (ms)', 'Packet size (bytes)', True, False, None, True, False, ['SRTP Video'])
  create_scatter_plot_temp_client_vid(df_dl_list_z_srtp_correct[i], df_ul_z_list[i], 'frame.time_relative',
                      'frame.len', '', 'Time (ms)', 'Packet size (bytes)', True, False, None, True, True, ['SRTP Video'])

##### **Zoomed - Grouped**

In [None]:

# Loop over all dataframes and plot packet size vs time for each one
for i, df in enumerate(df_list):
  # Get the file name from the path
  file_name = os.path.splitext(os.path.basename(files_path_list[i]))[0]
  title = 'Grouped ' + file_name
  """
  Call the create_scatter_plot function to create the plot

  Parameters:
    df_grouped_dl_z_list[i]: DataFrame containing downlink traffic data grouped and filtered by time.
    df_grouped_ul_z_list[i]: DataFrame containing upload traffic data grouped and filtered by time.
    'frame.time_relative': Column name for the x-axis data.
    'group.frame.len.tot': Column name for the y-axis data.
    title: The title of the plot.
    'Time (s)': Label for the x-axis.
    'Total Group Packet size (bytes)': Label for the y-axis.
    True if the data should be grouped by protocol, False otherwise.
    True if the text should be shown on markers, False otherwise.
    'num.packets': The column name to use for the text labels.
    True if x_axis should start from 0
    True if vertical lines should be plotted (stem plot)
    None if not specific protocol should be plotted, list of str specifying the specific protocl otherwise
  """
  #create_scatter_plot(df_grouped_dl_z_list[i], df_grouped_ul_z_list[i], 'frame.time_relative',
            #          'group.frame.len.tot', title, 'Time (ms)', 'Total Group Packet size (bytes)', True, True, 'num.packets', True, False, ['SRTP Audio'])
 # create_scatter_plot(df_grouped_dl_z_list[i], df_grouped_ul_z_list[i], 'frame.time_relative',
  #                    'group.frame.len.mean', title, 'Time (ms)', 'Packet size (bytes)', True, True, 'num.packets', True, True, ['SRTP Video'])

  create_scatter_plot_temp_client_vid(df_grouped_dl_list_z_srtp_correct[i], df_grouped_ul_z_list[i], 'frame.time_relative',
                      'group.frame.len.tot', '', 'Time (ms)', 'Group Size (bytes)', True, True, 'num.packets', True, False, ['SRTP Video'])


In [None]:

# Loop over all dataframes and plot packet size vs time for each one
for i, df in enumerate(df_list):
  # Get the file name from the path
  file_name = os.path.splitext(os.path.basename(files_path_list[i]))[0]
  title = 'Grouped ' + file_name
  """
  Call the create_scatter_plot function to create the plot

  Parameters:
    df_grouped_dl_z_list[i]: DataFrame containing downlink traffic data grouped and filtered by time.
    df_grouped_ul_z_list[i]: DataFrame containing upload traffic data grouped and filtered by time.
    'frame.time_relative': Column name for the x-axis data.
    'group.frame.len.tot': Column name for the y-axis data.
    title: The title of the plot.
    'Time (s)': Label for the x-axis.
    'Total Group Packet size (bytes)': Label for the y-axis.
    True if the data should be grouped by protocol, False otherwise.
    True if the text should be shown on markers, False otherwise.
    'num.packets': The column name to use for the text labels.
    True if x_axis should start from 0
    True if vertical lines should be plotted (stem plot)
    None if not specific protocol should be plotted, list of str specifying the specific protocl otherwise
  """
  create_scatter_plot(df_dl_list_grouped_z_mark[i], df_grouped_ul_z_list[i], 'frame.time_relative',
                      'group.frame.len.tot', title, 'Time (ms)', 'Total Group Packet size (bytes)', True, True, 'num.packets', True, False, ['SRTP Video'])

In [None]:
#Colored
def create_scatter_plot_temp(df_dl: pd.DataFrame, df_ul: pd.DataFrame, x_col: str, y_col: str, title: str,
                        x_title: str, y_title: str, protocols: bool = False, text_show: bool = False, number_column: str = None,
                        from0: bool = False, stem: bool = False, spec_prot: List[str] = None) -> None:
  """
  Function to create a scatter plot with optional grouping by protocol and optional display of text labels.


  Parameters:
    df_dl (pd.DataFrame): The downlink data as a Pandas DataFrame.
    df_ul (pd.DataFrame): The uplink data as a Pandas DataFrame.
    x_col (str): The column name to use for the x-axis.
    y_col (str): The column name to use for the y-axis.
    title (str): The title of the plot.
    x_title (str): The title of the x-axis.
    y_title (str): The title of the y-axis.
    protocols (bool, optional): Whether to group data by protocol. Default is False.
    text_show (bool, optional): Whether to display text labels on the plot. Default is False.
    number_column (str, optional): The column name to use for the text labels. Default is None.
    from0 (bool, optional): Whether to make the plot from 0.
    stem (bool, optional): Whether to include vertical lines to simulate a stem plot.
    spec_prot (List[str], optional): Whether to plot specific protocols
  Returns:
    None
  """
  if protocols:


    dl_protocols = ['DTLSv1.2', 'DTLS', 'SRTP Audio', 'SRTP Video','STUN' ,'UDP']  # Define manually all the DL protocols so that they always have the same assigned color
    # dl_protocols = sorted(df_dl['_ws.col.Protocol'].unique()) # Get unique protocols in DL and assign a color to each - might not produce same colors in different executions
    dl_colors = px.colors.qualitative.Plotly[:len(dl_protocols)] # Plotly has 10 colors
    protocol_color_map_dl = dict(zip(dl_protocols, dl_colors))


    ul_protocols = ['DTLSv1.2', 'SRTCP', 'STUN', 'UDP'] # Define manually all the UL protocols so that they always have the same assigned color
    # ul_protocols = sorted(df_ul['_ws.col.Protocol'].unique())    # Get unique protocols in UL and assign a color to each - might not produce same colors in different executions
    ul_colors = px.colors.qualitative.Plotly[len(dl_protocols):len(dl_protocols) + len(ul_protocols)] # Plotly has 10 colors
    protocol_color_map_ul = dict(zip(ul_protocols, ul_colors))


  else:
    dataframes = ['DL', 'UL']
    dataframes_col = px.colors.qualitative.Plotly[:len(dataframes)]
    color_map_dl = dict(zip(dataframes, dataframes_col))


  # Set the mode of the plot
  if text_show:
    mode_plt = 'markers+text'
  else:
    mode_plt = 'markers'


  fig = go.Figure()

  if protocols:
    # Add traces for each protocol in dl and ul
    for i, df in enumerate([df_dl, df_ul]):
      for j, prot in enumerate(sorted(df['_ws.col.Protocol'].unique())):
        if spec_prot!=None and prot not in spec_prot:
          continue
        color = protocol_color_map_dl[prot] if i == 0 else protocol_color_map_ul[prot]
        filtered_df = df[df['_ws.col.Protocol'] == prot]
        groups = []
        if prot=='SRTP Audio':
          # Define the number of rows per group

          group_size = 8
          filtered_df.reset_index(drop=True, inplace=True)


          # Create an additional column to indicate the group for each row
          #filtered_df['Group'] = (filtered_df.index // group_size) % 2  # Alternate between 0 and 1 for every group_size rows
          #filtered_df['Group'] = filtered_df.apply(lambda row: 0 if row['num.packets'] == 9 else 1, axis=1)

          # Split the DataFrame into two groups based on the 'Group' column
          #group_1 = filtered_df[filtered_df['Group'] == 0]
          #group_2 = filtered_df[filtered_df['Group'] == 1]
          #groups = [group_1,group_2]

          #Zoomed:
          df_group_1 = pd.concat([filtered_df[:9].copy(), filtered_df[16:25].copy()], ignore_index=True) #the escape room
          #df_group_1 = pd.concat([filtered_df[:8].copy(), filtered_df[16:24].copy()], ignore_index=True)
          df_group_1['Group'] = 0

          df_group_2 = pd.concat([filtered_df[9:16].copy(), filtered_df[25:32].copy()], ignore_index=True) #the escape room
          #df_group_2 = pd.concat([filtered_df[8:16].copy(), filtered_df[24:32].copy()], ignore_index=True)
          df_group_2['Group'] = 1
          groups = [df_group_1,df_group_2]

          color_groups = [color,'#02aeba']

        if prot=='SRTP Video':
          filtered_df.reset_index(drop=True, inplace=True)
          num_pkt = [110,164,162,174] #VP9
          num_pkt = [41,49,42, 108,35,88,45,61,94,105,57] #H264

          filtered_df['Group'] = ''
          group_idx = 0
          color_groups = ['#6f42f5','#a442f5','#d442f5' , '#f774c7', '#f5426f', '#f57842', '#f5b942', '#d6cc3e', '#7ef542', '#42f5c5','#3ec2d6']
          row_idx = 0
          groups = []
          for num in num_pkt:
            filtered_df.loc[row_idx:row_idx +num-1, 'Group'] = group_idx
            groups.append(filtered_df[filtered_df['Group'] == group_idx])
            group_idx+=1
            row_idx+=num
        for z in range(len(groups)):
          filtered_df = groups[z]
          if from0: #plot x_axis starting from 0
            #x_val = (filtered_df[x_col]- min(df_dl[x_col].min(), df_ul[x_col].min()))
            x_val = filtered_df[x_col]- df_dl[x_col].min()

          else:
            x_val = filtered_df[x_col]

          if "ms" in x_title:
            x_val*=1000 # convert to ms


          fig.add_trace(
            go.Scatter(
              mode = mode_plt,
              x = x_val,
              y = filtered_df[y_col],
              name = f"{prot} Batch Type {z+1}", # Change to group or frame
              marker = dict(color = color_groups[z]),
              text = filtered_df[number_column].tolist() if number_column else None,
              textfont = dict(size=10),
              textposition = 'top center'
                      )
          )
          if stem:
            for x_v,y_v in zip(x_val,filtered_df[y_col]):
              fig.add_shape(
                type='line',
                x0=x_v,
                y0=0,
                x1=x_v,
                y1=y_v,
                line=dict(color = color_groups[z]),
                opacity=1
              )



  fig.update_layout(
    title=title,
    xaxis=dict(title=x_title),
    yaxis=dict(title=y_title),
    showlegend=True,
    legend=dict(x=1, y=1),
  )

  font_family = "Times New Roman"
  font_size = 20
  fig.update_layout(
      font=dict(family=font_family, size=font_size),
      template='plotly_white',
      xaxis=dict(showgrid=False), #,dtick=2
      yaxis=dict(showgrid=False),
      legend=dict(
      orientation="h",
      yanchor="bottom",
      y=1.02,
      xanchor="right",
      x=1)
  )

  fig.update_layout(
      autosize=False,
      width=1200,
      height=300)



  fig.show()

In [None]:
#Colored
def create_scatter_plot_temp_client_vid(df_dl: pd.DataFrame, df_ul: pd.DataFrame, x_col: str, y_col: str, title: str,
                        x_title: str, y_title: str, protocols: bool = False, text_show: bool = False, number_column: str = None,
                        from0: bool = False, stem: bool = False, spec_prot: List[str] = None) -> None:
  """
  Function to create a scatter plot with optional grouping by protocol and optional display of text labels.


  Parameters:
    df_dl (pd.DataFrame): The downlink data as a Pandas DataFrame.
    df_ul (pd.DataFrame): The uplink data as a Pandas DataFrame.
    x_col (str): The column name to use for the x-axis.
    y_col (str): The column name to use for the y-axis.
    title (str): The title of the plot.
    x_title (str): The title of the x-axis.
    y_title (str): The title of the y-axis.
    protocols (bool, optional): Whether to group data by protocol. Default is False.
    text_show (bool, optional): Whether to display text labels on the plot. Default is False.
    number_column (str, optional): The column name to use for the text labels. Default is None.
    from0 (bool, optional): Whether to make the plot from 0.
    stem (bool, optional): Whether to include vertical lines to simulate a stem plot.
    spec_prot (List[str], optional): Whether to plot specific protocols
  Returns:
    None
  """
  if protocols:


    dl_protocols = ['DTLSv1.2', 'DTLS', 'SRTP Audio', 'SRTP Video','STUN' ,'UDP']  # Define manually all the DL protocols so that they always have the same assigned color
    # dl_protocols = sorted(df_dl['_ws.col.Protocol'].unique()) # Get unique protocols in DL and assign a color to each - might not produce same colors in different executions
    dl_colors = px.colors.qualitative.Plotly[:len(dl_protocols)] # Plotly has 10 colors
    protocol_color_map_dl = dict(zip(dl_protocols, dl_colors))


    ul_protocols = ['DTLSv1.2', 'SRTCP', 'STUN', 'UDP'] # Define manually all the UL protocols so that they always have the same assigned color
    # ul_protocols = sorted(df_ul['_ws.col.Protocol'].unique())    # Get unique protocols in UL and assign a color to each - might not produce same colors in different executions
    ul_colors = px.colors.qualitative.Plotly[len(dl_protocols):len(dl_protocols) + len(ul_protocols)] # Plotly has 10 colors
    protocol_color_map_ul = dict(zip(ul_protocols, ul_colors))


  else:
    dataframes = ['DL', 'UL']
    dataframes_col = px.colors.qualitative.Plotly[:len(dataframes)]
    color_map_dl = dict(zip(dataframes, dataframes_col))


  # Set the mode of the plot
  if text_show:
    mode_plt = 'markers+text'
  else:
    mode_plt = 'markers'


  fig = go.Figure()

  if protocols:
    # Add traces for each protocol in dl and ul
    for i, df in enumerate([df_dl, df_ul]):
      for j, prot in enumerate(sorted(df['_ws.col.Protocol'].unique())):
        if spec_prot!=None and prot not in spec_prot:
          continue
        color = protocol_color_map_dl[prot] if i == 0 else protocol_color_map_ul[prot]
        filtered_df = df[df['_ws.col.Protocol'] == prot]
        groups = []

        if prot=='SRTP Video':
          filtered_df['_ws.col.Time'] = filtered_df['_ws.col.Info'].apply(extract_time)
          filtered_df.reset_index(drop=True, inplace=True)

          filtered_df['Group'] = ''
          color_groups = ['#6f42f5','#a442f5','#d442f5' , '#f774c7', '#f5426f', '#f57842', '#f5b942', '#d6cc3e', '#7ef542', '#42f5c5','#3ec2d6', '#153359']
          row_idx = 0
          groups = []

          for val in  filtered_df['_ws.col.Time'].unique():
            filtered_df_temp = filtered_df.copy()
            filtered_df_temp = filtered_df_temp[filtered_df_temp['_ws.col.Time']==val]
            groups.append(filtered_df_temp)

        for z in range(len(groups)):
          filtered_df = groups[z]
          if from0: #plot x_axis starting from 0
            #x_val = (filtered_df[x_col]- min(df_dl[x_col].min(), df_ul[x_col].min()))
            x_val = filtered_df[x_col]- df_dl[x_col].min()

          else:
            x_val = filtered_df[x_col]

          if "ms" in x_title:
            x_val*=1000 # convert to ms


          fig.add_trace(
            go.Scatter(
              mode = mode_plt,
              x = x_val,
              y = filtered_df[y_col],
              name = f"{prot} Frame {z+1}", # Change to group or frame
              marker = dict(color = color_groups[z]),
              text = filtered_df[number_column].tolist() if number_column else None,
              textfont = dict(size=10),
              textposition = 'top center'
                      )
          )
          if stem:
            for x_v,y_v in zip(x_val,filtered_df[y_col]):
              fig.add_shape(
                type='line',
                x0=x_v,
                y0=0,
                x1=x_v,
                y1=y_v,
                line=dict(color = color_groups[z]),
                opacity=1
              )



  fig.update_layout(
    title=title,
    xaxis=dict(title=x_title),
    yaxis=dict(title=y_title),
    showlegend=True,
    legend=dict(x=1, y=1),
  )

  font_family = "Times New Roman"
  font_size = 20
  fig.update_layout(
      font=dict(family=font_family, size=font_size),
      template='plotly_white',
      xaxis=dict(showgrid=False), #,dtick=2
      yaxis=dict(showgrid=False),
      legend=dict(
      orientation="h",
      yanchor="bottom",
      y=1.02,
      xanchor="right",
      x=1)
  )

  fig.update_layout(
      autosize=False,
      width=1200,
      height=400)



  fig.show()

##### **Stem plot matlab (much faster doing the stem than previous)**

###### **Stem Time-filtered - Non-Grouped**

In [None]:
plot_stem_matlab = True
if plot_stem_matlab:
  # Loop over all dataframes in df_list and plot packet size vs time for each one
  for i, df in enumerate(df_list):
    # Get the file name from the path
    file_name = os.path.splitext(os.path.basename(files_path_list[i]))[0]
    title = file_name
    """
    Call the create_scatter_plot function to create the plot

    Parameters:
      df_dl_z_list[i]: DataFrame containing downlink traffic data filtered by time.
      df_ul_z_list[i]: DataFrame containing upload traffic data filtered by time.
      'frame.time_relative': Column name for the x-axis data.
      'frame.len': Column name for the y-axis data.
      title: The title of the plot.
      'Time (s)': Label for the x-axis.
      'Packet size (bytes)': Label for the y-axis.
      True if the data should be grouped by protocol, False otherwise.
      False
      None
      True if x_axis should start from 0
      None if not specific protocol should be plotted, list of str specifying the specific protocl otherwise

    """
    create_stem_plot(df_dl_list[i], df_ul_list[i], 'frame.time_relative',
                        'frame.len', title, 'Time (ms)', 'Packet size (bytes)', True, False, None, True, ['UDP'])

###### **Stem Zoomed - Non-Grouped**

In [None]:
plot_stem_matlab = True
if plot_stem_matlab:
  # Loop over all dataframes in df_list and plot packet size vs time for each one
  for i, df in enumerate(df_list):
    # Get the file name from the path
    file_name = os.path.splitext(os.path.basename(files_path_list[i]))[0]
    title = file_name
    """
    Call the create_scatter_plot function to create the plot

    Parameters:
      df_dl_z_list[i]: DataFrame containing downlink traffic data filtered by time.
      df_ul_z_list[i]: DataFrame containing upload traffic data filtered by time.
      'frame.time_relative': Column name for the x-axis data.
      'frame.len': Column name for the y-axis data.
      title: The title of the plot.
      'Time (s)': Label for the x-axis.
      'Packet size (bytes)': Label for the y-axis.
      True if the data should be grouped by protocol, False otherwise.
      False
      None
      True if x_axis should start from 0
      None if not specific protocol should be plotted, list of str specifying the specific protocl otherwise

    """
    create_stem_plot(df_dl_z_list[i], df_ul_z_list[i], 'frame.time_relative',
                        'frame.len', title, 'Time (ms)', 'Packet size (bytes)', True, False, None, True, None)

#### **Traffic load (1s intervals)**

In [None]:
# Create a list with the dataframes to plot
df_data = [df_traffic_dl_list, df_traffic_ul_list]

# Create a list with the names of the dataframes
df_data_name = ['DL', 'UL']

# Loop through the list of dataframes and their names
for dataframe, df_name in zip(df_data, df_data_name):

  # Call the plot_traffic_load_over_time function with the dataframe, title, x label, y label, label list, and color list as arguments
  plot_traffic_load_over_time(dataframe, df_name + ' traffic over Time', 'Time (s)', 'Traffic Load (Mbps)', label_list , color_list )

#### **Traffic load (specific interval)**

In [None]:
# Set to True to make the plot.
plot_t_load_specific = False
# Specify the interval in seconds
interval = 500/1000 # (s)

if plot_t_load_specific:

  df_traffic_dl_list_r = [get_traffic_df(df_dl, interval) for df_dl in df_dl_list]

  df_traffic_ul_list_r = [get_traffic_df(df_ul, interval) for df_ul in df_ul_list]

  # Create a list with the dataframes to plot
  df_data = [df_traffic_dl_list_r, df_traffic_ul_list_r]

  # Create a list with the names of the dataframes
  df_data_name = ['DL', 'UL']

  # Loop through the list of dataframes and their names
  for dataframe, df_name in zip(df_data, df_data_name):

    # Call the plot_traffic_load_over_time function with the dataframe, title, x label, y label, label list, and color list as arguments
    plot_traffic_load_over_time(dataframe, df_name + ' traffic over Time ' + 'resampled by ' + str(interval*1000) +   'ms', 'Time (s)', 'Traffic Load (Mbps)', label_list , color_list )

#### **Correlation (specific interval)**

In [None]:
# Specify the interval in seconds
interval = 1000/1000 # (s)

df_traffic_dl_list_r = [get_traffic_df(df_dl, interval) for df_dl in df_dl_list]

df_traffic_ul_list_r = [get_traffic_df(df_ul, interval) for df_ul in df_ul_list]

plot_correlation(df_traffic_dl_list_r, df_traffic_ul_list_r, interval)

#### **ECDF**

##### **Non-grouped - Time-Filtered - Non by protocol**

In [None]:
# Create a list with the dataframes to plot
df_data = [df_traffic_dl_list, df_traffic_ul_list]

# Create a list with the names of the dataframes
df_data_name = ['DL', 'UL']

plot_data = [('traffic', 'Traffic (Mbps)')]

# Loop through the list of dataframes and their names using enumerate
for i, dataframe in enumerate(df_data):

  # Get the name of the dataframe
  df_name = df_data_name[i]

  # Loop through plot data and create ECDF plots
  for data in plot_data:

    # Create the plot title using the dataframe name and ecdf column label
    plot_title = df_name + ' ' + data[1]

    # Call the create_ecdf_plot function with the dataframe, title, x label, y label, label list, and color list as arguments
    create_ecdf_plot(dataframe, data[0] , plot_title, data[1], 'ECDF', label_list , color_list)

##### **Non-grouped -Time-Filtered - By protocol**

In [None]:
# Create a list with the dataframes to plot
df_data = [df_dl_list, df_ul_list]

# Create a list with the names of the dataframes
df_data_name = ['Downlink', 'Uplink']

# List of tuples containing the column for the ecdf and its label
plot_data = [('frame.len', 'Packet size (bytes)')
             ,('frame.time_relative', 'Inter-Packet time (ms)')
             ]

# Loop through the list of dataframes and their names using enumerate
for i, df_list in enumerate(df_data): # df_dl_list

  # Get the name of the dataframe
  df_name = df_data_name[i]

  protocol_list = df_list[0]['_ws.col.Protocol'].unique()

  filtered_df_list_by_protocol = []

  # Loop through protocols
  for protocol in protocol_list:

    filtered_df_list= []

    # Loop through dataframes in list, to have them filtered by protocol
    for dataframe in df_list:

        # Filter the dataframe by protocol
        filtered_df = dataframe[dataframe['_ws.col.Protocol'] == protocol]
        filtered_df_list.append(filtered_df)

    filtered_df_list_by_protocol.append(filtered_df_list)

  # Loop through plot data and create ECDF plots
  for data in plot_data:
    # Loop through filtered dataframe list for each protocol in the protocol list
    for j, protocol in enumerate(protocol_list):
      factor = 1
      diff = False

      if data[0] == 'frame.time_relative':
        factor = 1000
        diff = True

      # Create the plot title using the dataframe name, protocol, and ecdf column label
      plot_title = df_name + ' ' + protocol

      # Call the create_ecdf_plot function with the protocol_df, title, x label, y label, label list, and color list as arguments
      #create_ecdf_plot(filtered_df_list_by_protocol[j], data[0], plot_title, data[1], 'ECDF', ['Server', 'Client'], color_list, factor, diff)
      #create_ecdf_plot(filtered_df_list_by_protocol[j], data[0], plot_title, data[1], 'ECDF', label_list, color_list, factor, diff)
      create_ecdf_plot(filtered_df_list_by_protocol[j], data[0], plot_title, data[1], 'ECDF', ['90 FPS - H264','60 FPS - H264', '30 FPS - H264', '30 FPS - VP9'], color_list, factor, diff)
      #create_ecdf_plot(filtered_df_list_by_protocol[j], data[0], plot_title, data[1], 'ECDF', ['Ideal', 'Iperf 50 Mbps','Iperf 100 Mbps', 'Iperf 200 Mbps'], color_list, factor, diff)


##### **Grouped - Time-Filtered - By protocol**

In [None]:
# Create a list with the dataframes to plot
df_data = [df_grouped_dl_list, df_grouped_ul_list]
#df_data = [[df_grouped_dl_list[0]], [df_grouped_ul_list[0]]]
#df_data = [df_grouped_ul_list]
df_data = [df_dl_list_grouped_mark, df_grouped_ul_list] # For video frames inter frame time
#df_data = [df_grouped_dl_list_srtp_correct, df_grouped_ul_list] # For batch time
# Create a list with the names of the dataframes
df_data_name = ['Downlink', 'Uplink']

# List of tuples containing the column for the ecdf and its label
plot_data = [('group.frame.len.tot', 'Group size (Bytes)')
             ,('frame.time_relative', 'Inter-Frame time (ms)')
             ,('num.packets', 'Group Packet Count')
             ,('avg.time.between.packets', 'Group Inter-Packet time (ms)')
             ,('group.time', 'Group time (ms)')
             ]

# Others - For video batches
#df_data = [df_grouped_dl_list_grouped_mark]
df_data = [df_grouped_dl_list_grouped_mark_not_nan]

#df_data_name = ['Downlink']
plot_data = [('avg.time.between.groups', 'Inter-Batch time (ms)'), #inter batch time
            ('num.groups', 'Frame Batch number')]


# Loop through the list of dataframes and their names using enumerate
for i, df_list in enumerate(df_data): # df_dl_list

  # Get the name of the dataframe
  df_name = df_data_name[i]

  protocol_list = df_list[0]['_ws.col.Protocol'].unique()

  filtered_df_list_by_protocol = []

  # Loop through protocols
  for protocol in protocol_list:

    filtered_df_list= []

    # Loop through dataframes in list, to have them filtered by protocol
    for dataframe in df_list:

        # Filter the dataframe by protocol
        filtered_df = dataframe[dataframe['_ws.col.Protocol'] == protocol]
        filtered_df_list.append(filtered_df)

    filtered_df_list_by_protocol.append(filtered_df_list)

  # Loop through plot data and create ECDF plots
  for data in plot_data:
    # Loop through filtered dataframe list for each protocol in the protocol list
    for j, protocol in enumerate(protocol_list):

      # Create the plot title using the dataframe name, protocol, and ecdf column label
      plot_title = df_name + ' ' + protocol

      factor = 1
      diff = False

      if data[0] == 'frame.time_relative':
        factor = 1000
        diff = True
      # Call the create_ecdf_plot function with the protocol_df, title, x label, y label, label list, and color list as arguments
      #create_ecdf_plot(filtered_df_list_by_protocol[j], data[0], plot_title, data[1], 'ECDF', label_list, color_list, factor, diff)
      #create_ecdf_plot(filtered_df_list_by_protocol[j], data[0], plot_title, data[1], 'ECDF', ['Server','Client'], color_list, factor, diff)
      #create_ecdf_plot(filtered_df_list_by_protocol[j], data[0], '', data[1], 'ECDF', ['90 FPS - H264','60 FPS - H264', '30 FPS - H264', '30 FPS - VP9'], color_list, factor, diff)
      #create_ecdf_plot(filtered_df_list_by_protocol[j], data[0], '', data[1], 'ECDF', ['Server','Client'], color_list, factor, diff)
      #Change the colors when studying the backgroundbackground colors:
      background_colors = px.colors.qualitative.Plotly[0:1] + px.colors.qualitative.Plotly[7:7 + len(df_list)]
      colors_bk_study = background_colors
      create_ecdf_plot(filtered_df_list_by_protocol[j], data[0], '', data[1], 'ECDF', ['0 Mbps', 'Iperf 50Mbps','Iperf 100Mbps','Iperf 200Mbps'], colors_bk_study, factor, diff)
      #create_ecdf_plot(filtered_df_list_by_protocol[j], data[0], '', data[1], 'ECDF', ['Laptop','Phone'], color_list, factor, diff)


#### **Histogram**

##### **Non-grouped -Time-Filtered - By protocol**

In [None]:
# Create a list with the dataframes to plot
df_data = [df_dl_list, df_ul_list]

# Create a list with the names of the dataframes
df_data_name = ['DL', 'UL']

# List of tuples containing the column for the ecdf and its label
plot_data = [('frame.len', 'Packet size (Bytes)'),
             ('frame.time_relative', 'Inter packet time (ms)')]

# Loop through the list of dataframes and their names using enumerate
for i, df_list in enumerate(df_data): # df_dl_list

  # Get the name of the dataframe
  df_name = df_data_name[i]

  protocol_list = df_list[0]['_ws.col.Protocol'].unique()

  filtered_df_list_by_protocol = []

  # Loop through protocols
  for protocol in protocol_list:

    filtered_df_list= []

    # Loop through dataframes in list, to have them filtered by protocol
    for dataframe in df_list:

        # Filter the dataframe by protocol
        filtered_df = dataframe[dataframe['_ws.col.Protocol'] == protocol]
        filtered_df_list.append(filtered_df)

    filtered_df_list_by_protocol.append(filtered_df_list)

  # Loop through plot data and create ECDF plots
  for data in plot_data:
    # Loop through filtered dataframe list for each protocol in the protocol list
    for j, protocol in enumerate(protocol_list):

      # Create the plot title using the dataframe name, protocol, and ecdf column label
      plot_title = df_name + ' ' + protocol + ' ' + data[1]

      factor = 1
      diff = False

      if data[0] == 'frame.time_relative':
        factor = 1000
        diff = True
      # Call the create_ecdf_plot function with the protocol_df, title, x label, y label, label list, and color list as arguments
      create_histogram_plot(filtered_df_list_by_protocol[j], data[0], plot_title, data[1], 'Probability Density', label_list, color_list, factor, diff)


##### **Grouped -Time-Filtered - By protocol**

In [None]:
# Create a list with the dataframes to plot
df_data = [df_grouped_dl_list, df_grouped_ul_list]

# Create a list with the names of the dataframes
df_data_name = ['DL', 'UL']

# List of tuples containing the column for the ecdf and its label
plot_data = [
             ('frame.time_relative', 'Inter Group time (ms)')]

# Loop through the list of dataframes and their names using enumerate
for i, df_list in enumerate(df_data): # df_dl_list

  # Get the name of the dataframe
  df_name = df_data_name[i]

  protocol_list = df_list[0]['_ws.col.Protocol'].unique()

  filtered_df_list_by_protocol = []

  # Loop through protocols
  for protocol in protocol_list:

    filtered_df_list= []

    # Loop through dataframes in list, to have them filtered by protocol
    for dataframe in df_list:

        # Filter the dataframe by protocol
        filtered_df = dataframe[dataframe['_ws.col.Protocol'] == protocol]
        filtered_df_list.append(filtered_df)

    filtered_df_list_by_protocol.append(filtered_df_list)

  # Loop through plot data and create ECDF plots
  for data in plot_data:
    # Loop through filtered dataframe list for each protocol in the protocol list
    for j, protocol in enumerate(protocol_list):

      # Create the plot title using the dataframe name, protocol, and ecdf column label
      plot_title = df_name + ' ' + protocol + ' ' + data[1]

      factor = 1
      diff = False

      if data[0] == 'frame.time_relative':
        factor = 1000
        diff = True
      # Call the create_ecdf_plot function with the protocol_df, title, x label, y label, label list, and color list as arguments
      create_histogram_plot(filtered_df_list_by_protocol[j], data[0], plot_title, data[1], 'Probability Density', label_list, color_list, factor, diff)


##### **Probability density functions fit**

In [None]:
from scipy import stats

data = df_ul_list[0][df_ul_list[0]['_ws.col.Protocol'] == 'SRTCP']['frame.len']

#data = df_dl_list[0][df_dl_list[0]['_ws.col.Protocol'] == 'UDP']['frame.time_relative'].diff()*1000
#data = data.dropna()
# Check if data is empty after removing NaN values
if data.empty:
    print("Data is empty after removing NaN values.")
else:
    # Fit normal distribution
    loc, scale = stats.norm.fit(data)

    # Generate x-values for plotting
    x = np.linspace(data.min(), data.max(), 100)

    # Calculate the corresponding y-values for the fitted normal distribution
    y = stats.norm.pdf(x, loc=loc, scale=scale)

    # Create histogram trace for data
    hist_trace = go.Histogram(x=data, histnorm='probability density', opacity=0.7, name='Data')

    # Create line trace for fitted normal distribution
    line_trace = go.Scatter(x=x, y=y, mode='lines', name='Fitted Normal Distribution')

    # Create layout
    layout = go.Layout(
        xaxis=dict(title='Value'),
        yaxis=dict(title='Density'),
        legend=dict(x=0.7, y=0.9)
    )

    # Create figure
    fig = go.Figure(data=[hist_trace, line_trace], layout=layout)
    font_family = "Times New Roman"
    font_size = 20
    fig.update_layout(
        font=dict(family=font_family, size=font_size),
        template='plotly_white',
        xaxis=dict(showgrid=False), #,dtick=10, ,range=[75, 95]
        yaxis=dict(showgrid=False),
        legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1)
    )
    fig.update_layout(
        autosize=False,
        width=600, #600 for some
        height=500)
    # Show the plot
    fig.show()
    print(loc, scale)

# Check if data is empty after removing NaN values
if data.empty:
    print("Data is empty after removing NaN values.")
else:
    # Fit Student's t-distribution
    df, loc, scale = stats.t.fit(data)

    # Generate x-values for plotting
    x = np.linspace(data.min(), data.max(), 100)

    # Calculate the corresponding y-values for the fitted Student's t-distribution
    y = stats.t.pdf(x, df=df, loc=loc, scale=scale)

    # Create histogram trace for data
    hist_trace = go.Histogram(x=data,histnorm='probability density', opacity=0.7, name='Data')

    # Create line trace for fitted Student's t-distribution
    line_trace = go.Scatter(x=x, y=y, mode='lines', name="Fitted Student's t-Distribution")

    # Create layout
    layout = go.Layout( xaxis=dict(title='Value'), yaxis=dict(title='Probability Density'),
                       legend=dict(x=0.7, y=0.9))

    # Create figure
    fig = go.Figure(data=[hist_trace, line_trace], layout=layout)
    font_family = "Times New Roman"
    font_size = 20
    fig.update_layout(
        font=dict(family=font_family, size=font_size),
        template='plotly_white',
        xaxis=dict(showgrid=False), #,dtick=10, ,range=[75, 95]
        yaxis=dict(showgrid=False),
        legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1)
    )
    fig.update_layout(
        autosize=False,
        width=600, #600 for some
        height=500)
    # Show the plot
    fig.show()
    print(loc, scale)

# Check if data is empty after removing NaN values
if data.empty:
    print("Data is empty after removing NaN values.")
else:
    # Fit Laplace distribution
    loc, scale = stats.laplace.fit(data)

    # Generate x-values for plotting
    x = np.linspace(data.min(), data.max(), 100)

    # Calculate the corresponding y-values for the fitted Laplace distribution
    y = stats.laplace.pdf(x, loc=loc, scale=scale)

    # Create histogram trace for data
    hist_trace = go.Histogram(x=data, histnorm='probability density', opacity=0.7, name='SRTCP Data')

    # Create line trace for fitted Laplace distribution
    line_trace = go.Scatter(x=x, y=y, mode='lines', name='Fitted Laplace Distribution')

    # Create layout
    layout = go.Layout( xaxis=dict(title='Size (bytes)'), yaxis=dict(title='Probability Density'),
                       legend=dict(x=0.7, y=0.9))

    # Create figure
    fig = go.Figure(data=[hist_trace, line_trace], layout=layout)
    font_family = "Times New Roman"
    font_size = 20
    fig.update_layout(
        font=dict(family=font_family, size=font_size),
        template='plotly_white',
        xaxis=dict(showgrid=False), #,dtick=10, ,range=[75, 95]
        yaxis=dict(showgrid=False),
        legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1)
    )
    fig.update_layout(
        autosize=False,
        width=600, #600 for some
        height=500)
    # Show the plot
    fig.show()
    print(loc, scale)

# Check if data is empty after removing NaN values
if data.empty:
    print("Data is empty after removing NaN values.")
else:
    # Fit exponential distribution
    loc, scale = stats.expon.fit(data)

    # Generate x-values for plotting
    x = np.linspace(data.min(), data.max(), 100)

    # Calculate the corresponding y-values for the fitted exponential distribution
    y = stats.expon.pdf(x, loc=loc, scale=scale)

    # Create histogram trace for data
    hist_trace = go.Histogram(x=data, histnorm='probability density', opacity=0.7, name='Data')

    # Create line trace for fitted exponential distribution
    line_trace = go.Scatter(x=x, y=y, mode='lines', name='Fitted Exponential Distribution')

    # Create layout
    layout = go.Layout(xaxis=dict(title='Value'), yaxis=dict(title='Probability Density'),
                       legend=dict(x=0.7, y=0.9))

    # Create figure
    fig = go.Figure(data=[hist_trace, line_trace], layout=layout)
    font_family = "Times New Roman"
    font_size = 20
    fig.update_layout(
        font=dict(family=font_family, size=font_size),
        template='plotly_white',
        xaxis=dict(showgrid=False),
        yaxis=dict(showgrid=False),
        legend=dict(
            orientation="h",
            yanchor="bottom",
            y=1.02,
            xanchor="right",
            x=1)
    )
    fig.update_layout(
        autosize=False,
        width=600,
        height=500)
    # Show the plot
    fig.show()
    print(loc, scale)

# Check if data is empty after removing NaN values
if data.empty:
    print("Data is empty after removing NaN values.")
else:
    # Fit gamma distribution
    shape, loc, scale = stats.gamma.fit(data, floc=0)

    # Generate x-values for plotting
    x = np.linspace(data.min(), data.max(), 100)

    # Calculate the corresponding y-values for the fitted gamma distribution
    y = stats.gamma.pdf(x, shape, loc, scale)

    # Create histogram trace for data
    hist_trace = go.Histogram(x=data, histnorm='probability density', opacity=0.7, name='Data')

    # Create line trace for fitted gamma distribution
    line_trace = go.Scatter(x=x, y=y, mode='lines', name='Fitted Gamma Distribution')

    # Create layout
    layout = go.Layout(xaxis=dict(title='Value'), yaxis=dict(title='Probability Density'),
                       legend=dict(x=0.7, y=0.9))

    # Create figure
    fig = go.Figure(data=[hist_trace, line_trace], layout=layout)
    font_family = "Times New Roman"
    font_size = 20
    fig.update_layout(
        font=dict(family=font_family, size=font_size),
        template='plotly_white',
        xaxis=dict(showgrid=False),
        yaxis=dict(showgrid=False),
        legend=dict(
            orientation="h",
            yanchor="bottom",
            y=1.02,
            xanchor="right",
            x=1)
    )
    fig.update_layout(
        autosize=False,
        width=600,
        height=500)
    # Show the plot
    fig.show()
    print(loc, scale)


from sklearn.mixture import GaussianMixture

# Convert pandas Series to NumPy array and reshape
data_array = data.values.reshape(-1, 1)

# Fit Gaussian Mixture Model
gm = GaussianMixture(n_components=2)
gm.fit(data_array)

# Generate x-values for plotting
x = np.linspace(data.min(), data.max(), 100).reshape(-1, 1)

# Calculate the corresponding y-values for each component of the GMM
y = np.exp(gm.score_samples(x))
#y_normalized = y / np.sum(y)
# Create a scatter trace for data
hist_trace = go.Histogram(x=data, histnorm='probability density', opacity=0.7, name='DL Generic UDP Data')

# Create a line trace for the GMM
gmm_trace = go.Scatter(x=x.flatten(), y=y, mode='lines', name='Gaussian Mixture Model')

# Create layout
layout = go.Layout(
    xaxis=dict(title='Packet arrival (ms)'),
    yaxis=dict(title='Probability Density'),
    legend=dict(x=0.7, y=0.9)
)

# Create figure
fig = go.Figure(data=[hist_trace, gmm_trace], layout=layout)
# Create figure
font_family = "Times New Roman"
font_size = 20
fig.update_layout(
    font=dict(family=font_family, size=font_size),
    template='plotly_white',
    xaxis=dict(showgrid=False),
    yaxis=dict(showgrid=False),
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1)
)
fig.update_layout(
    autosize=False,
    width=600,
    height=500)
# Show the plot
fig.show()

# Get the parameters of the GMM components
means = gm.means_  # Mean values of each component
covariances = gm.covariances_  # Covariance matrices of each component

# Print the parameters
for i in range(gm.n_components):
    print(f"Component {i+1}:")
    print(f"Mean: {means[i]}")
    print(f"Covariance Matrix:\n{covariances[i]}")
    print()

# Example representation for the first component:
component_1 = f"Gaussian({means[0]}, {covariances[0]})"

# Example representation for the second component:
component_2 = f"Gaussian({means[1]}, {covariances[1]})"

# Full representation of the GMM:
gmm_representation = f"GMM({component_1}, {component_2})"

print("GMM Representation:")
print(gmm_representation)


#### **Box Plots**

##### **Non-grouped - Time-filtered**

In [None]:
for i, dataframe in enumerate(df_list):
  box_plot([df_dl_list[i], df_ul_list[i]], os.path.splitext(os.path.basename(files_path_list[i]))[0], ['DL ', 'UL '], 'frame.len', '(bytes)')

##### **Grouped - Time-filtered**

In [None]:
for i, dataframe in enumerate(df_list):
  box_plot([df_grouped_dl_list[i], df_grouped_ul_list[i]], os.path.splitext(os.path.basename(files_path_list[i]))[0], ['DL ', 'UL '], 'group.frame.len.tot', '(bytes)', None, True)

In [None]:
for i, dataframe in enumerate(df_list):
  box_plot([df_grouped_dl_z_list[i], df_grouped_ul_z_list[i]], os.path.splitext(os.path.basename(files_path_list[i]))[0], ['DL ', 'UL '], 'frame.time_relative', '(ms)', None, True, 1000, True)

##### **Other boxplots**

In [None]:
#comparative_box_plot([df_dl_list , df_ul_list],['DL', 'UL'], '', ['AH','ITK ','ER'], 'frame.len', 'Packet size ', '(bytes)', None, True)
#comparative_box_plot([df_dl_list],['DL'], '', ['Alteration Hunting ','Interaction Toolkit ','The Escape Room '], 'frame.len', 'Packet size ', '(bytes)', None, True)
#comparative_box_plot([df_dl_list],['DL'], '', ['Alteration Hunting ','Interaction Toolkit ','The Escape Room '], 'frame.len', 'Packet size ', '(bytes)', None, True)
comparative_box_plot([df_dl_list , df_ul_list],['DL', 'UL'], '', ['Laptop','Phone'], 'frame.len', 'Packet size ', '(bytes)', None, True)


In [None]:
#comparative_box_plot([df_dl_list , df_ul_list],['DL', 'UL'], '', ['AH','ITK ','ER'], 'frame.time_relative', 'Inter-Packet Time ' ,'(ms)', None, True, 1000, True)
#comparative_box_plot([df_grouped_dl_list],['DL'], '', ['Alteration Hunting ','Interaction Toolkit ','The Escape Room '], 'frame.time_relative', 'Inter Packet Time ' ,'(ms)', None, True, 1000, True)
#comparative_box_plot([df_grouped_dl_list],['DL'], '', ['Alteration Hunting ','Interaction Toolkit ','The Escape Room '], 'frame.time_relative', 'Inter Packet Time ' ,'(ms)', ['STUN'], True, 1000, True)
comparative_box_plot([df_dl_list , df_ul_list],['DL', 'UL'], '', ['Laptop','Phone'], 'frame.time_relative', 'Inter-Packet Time ' ,'(ms)', None, True, 1000, True)


In [None]:
comparative_box_plot_category([df_grouped_dl_list , df_grouped_ul_list],['DL', 'UL'], '', ['Alteration Hunting ','Interaction Toolkit ','The Escape room '], 'frame.time_relative', 'Inter Packet Time ' ,'(ms)', None, True, 1000, True)
#comparative_box_plot_category([df_grouped_dl_list],['DL'], '', ['Alteration Hunting ','Interaction Toolkit ','The Escape Room '], 'frame.time_relative', 'Inter Packet Time ' ,'(ms)', None, True, 1000, True)
#comparative_box_plot_category([df_grouped_dl_list],['DL'], '', ['Alteration Hunting ','Interaction Toolkit ','The Escape Room '], 'frame.time_relative', 'Inter Packet Time ' ,'(ms)', ['STUN'], True, 1000, True)

In [None]:
comparative_box_plot([df_dl_list , df_ul_list],['DL', 'UL'], '', ['90FPS H264 ','60FPS H264 ','30FPS H264 ','90FPS VP9 ','60FPS VP9 ','30FPS VP9 '], 'frame.len', 'Packet size ' ,'(bytes)', ['SRTP Video'], True, 1, False)

##### **Traffic load comparison**

In [None]:
# Create a list with the dataframes to plot
df_data = [df_traffic_dl_list, df_traffic_ul_list]

# Create a list with the names of the dataframes
df_data_name = ['Downlink ', 'Uplink ']
game_label_list = ['Alteration Hunting', 'Interaction Toolkit Sample', 'The Escape Room']
# Loop through the list of dataframes and their names
for dataframe, df_name in zip(df_data, df_data_name):
  # Call the traffic_load_box_plot function with the arguments
  if game_label_list is None:
    game_label_list = label_list
  traffic_load_box_plot(dataframe, df_name + 'traffic', df_name , game_label_list)


### **Wireshark Demo Computations**

#### **Streams characteristics**

##### **Non-grouped - Time-filtered**

In [None]:
for i, dataframe in enumerate(df_list):
  streams_characteristics([df_dl_list[i], df_ul_list[i]], os.path.splitext(os.path.basename(files_path_list[i]))[0], ['DL ', 'UL '])

##### **Grouped - Time-filtered**

In [None]:
for i, dataframe in enumerate(df_list):
  # For any protocol
  streams_characteristics([df_grouped_dl_list[i], df_grouped_ul_list[i]], 'Grouped '+ os.path.splitext(os.path.basename(files_path_list[i]))[0], ['DL ', 'UL '], True)
  # For SRTP Video grouped by frame
  #streams_characteristics([df_dl_list_grouped_mark[i], df_grouped_ul_list[i]], 'Grouped '+ os.path.splitext(os.path.basename(files_path_list[i]))[0], ['DL ', 'UL '], True)

  #SRTP Audio
  srtp_audio = False
  if srtp_audio:
    df_func = df_grouped_dl_list[i]
    df_filtered = df_func[df_func['_ws.col.Protocol'] == 'SRTP Audio']
    df_filtered['time_difference'] = df_filtered['frame.time_relative'].diff(periods=-1)*(-1)
    df_filtered['inter_batch_time'] = (df_filtered['end.time'] - df_filtered['start.time'].shift(-1))*(-1)
    avg_inter_batch_time = round(df_filtered['inter_batch_time'].mean()*1000, 4)
    std_inter_batch_time = round(df_filtered['inter_batch_time'].std()*1000, 4)
    num_pkt = [9,7]
    print('GLOBAL Avg. Inter-batch time',avg_inter_batch_time)
    print('GLOBAL Std. Inter-batch time',std_inter_batch_time)
    for j in num_pkt:
      df_filtered_copy = df_filtered.copy()
      df_filtered_copy = df_filtered_copy[df_filtered_copy['num.packets'] == j]
      avg_inter_group_time = round(df_filtered_copy['time_difference'].mean()*1000, 4)
      std_inter_group_time = round(df_filtered_copy['time_difference'].std()*1000, 4)
      avg_inter_batch_time = round(df_filtered_copy['inter_batch_time'].mean()*1000, 4)
      std_inter_batch_time = round(df_filtered_copy['inter_batch_time'].std()*1000, 4)
      print('------',j, '------')
      print('Avg. Inter-group time',avg_inter_group_time)
      print('Std. Inter-group time',std_inter_group_time)
      print(' Avg. Inter-batch time',avg_inter_batch_time)
      print(' Std. Inter-batch time',std_inter_batch_time)

      #must be commented the part os streams_characteristics that computes intervals. TODO: add parameter
      streams_characteristics([df_filtered_copy, df_grouped_ul_list[i]], 'Grouped '+ os.path.splitext(os.path.basename(files_path_list[i]))[0], ['DL ', 'UL '], True)

#### **Traffic characteristics**

In [None]:
for i, dataframe in enumerate(df_list):
  traffic_stats([df_dl_list[i], df_ul_list[i]], os.path.splitext(os.path.basename(files_path_list[i]))[0], ['Downlink', 'Uplink'])

## **WebRTC**

### **Run WebRTC statistics Demo:**

**Insert files in list to plot results**

In [None]:
webRTCfilesPath = [#'Datasets/90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Client/WebRTC Client - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.txt'
             # ,'Datasets/60fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Client/WebRTC Client - 60fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.txt'
            'Datasets/90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Client/WebRTC Client - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.txt'
              ,'Datasets/60fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Client/WebRTC Client - 60fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.txt'
              ,'Datasets/30fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Client/WebRTC Client - 30fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.txt'
             ,'Datasets/90fps - 50Mbps - 3664x1920 - VP9 - 80 Mhz/Client/WebRTC Client - 90fps - 50Mbps - 3664x1920 - VP9- 80 Mhz.txt'
              ,'Datasets/60fps - 50Mbps - 3664x1920 - VP9 - 80 Mhz/Client/WebRTC Client - 60fps - 50Mbps - 3664x1920 - VP9 - 80 Mhz.txt'
              ,'Datasets/30fps - 50Mbps - 3664x1920 - VP9 - 80 Mhz/Client/WebRTC Client - 30fps - 50Mbps - 3664x1920 - VP9 - 80 Mhz.txt'
              , 'Datasets/90fps - 50Mbps - 2880x1600- H264 - 80 Mhz/Client/WebRTC Client - 90fps - 50Mbps - 2800x1600 - H264 - 80 Mhz.txt'
              ,'Datasets/90fps - 100Mbps - 3664x1920 - H264 - 80 Mhz/Client/WebRTC Client - 90fps - 100Mbps - 3664x1920 - H264 - 80 Mhz.txt'
            #  ,'Datasets/Iperf 50Mbps - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Client/WebRTC Client - Iperf 50Mbps - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.txt'
             # ,'Datasets/Iperf 100Mbps - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Client/WebRTC Client - Iperf 100Mbps - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.txt'
              #,'Datasets/Iperf 200Mbps - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Client/WebRTC Client - Iperf 200Mbps - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.txt'

                ]

start_time_webrtc = 10
end_time_webrtc = 40

from0 = True #x axis from 0

###### DO NOT MODIFY THE CODE BELOW #################################################################################################################################################
df_client_rtc_list = []
info_client__list = []
timestamp_client_list = []

df_server_rtc_list = []
info_server_list = []
timestamp_server_list = []

webRTCfilesPath_client_list = []
webRTCfilesPath_server_list = []

for filePath in webRTCfilesPath:
    df_rtc, info, side, timestamps_list = extract_webrtc_data(filePath, directory)
    if df_rtc is not None:
      if side == 'client':
        # Find the index of start_time_webrtc
        print(timestamps_list)
        start_index = timestamps_list.index(start_time_webrtc)

        # Find the index of end_time_webrtc
        end_index = timestamps_list.index(end_time_webrtc)
        df_client_rtc_list.append(df_rtc[start_index:end_index+1])
        info_client__list.append(info)
        webRTCfilesPath_client_list.append(filePath)
        if from0:
          timestamp_list = timestamps_list[start_index:end_index+1]
          timestamp_client_list.append([timestamp - np.min(timestamp_list) for timestamp in timestamp_list])

        else:
          timestamp_client_list.append(timestamps_list[start_index:end_index+1])

      else:
        start_index = timestamps_list.index(start_time_webrtc)

        # Find the index of end_time_webrtc
        end_index = timestamps_list.index(end_time_webrtc)
        df_server_rtc_list.append(df_rtc[start_index:end_index+1])
        info_server_list.append(info)
        webRTCfilesPath_server_list.append(filePath)
        if from0:
          timestamp_list = timestamps_list[start_index:end_index+1]
          timestamp_server_list.append([timestamp - min(timestamp_list)for timestamp in timestamp_list])
        else:
          timestamp_server_list.append(timestamps_list[start_index:end_index+1])



#CLIENT
rtc_label_client_list = [os.path.splitext(os.path.basename(file_path))[0] for file_path in webRTCfilesPath_client_list]
# Number of colors to generate
rtc_n_colors_client = len(webRTCfilesPath_client_list)
# Generate a list of n_colors using the color palette
rtc_client_color_palette = sns.color_palette(n_colors=rtc_n_colors_client)
# Convert the color_palette to a list of RGB tuples
rtc_client_color_list = [tuple(map(lambda x: int(x*255), color)) for color in rtc_client_color_palette]

#SERVER
rtc_label_server_list = [os.path.splitext(os.path.basename(file_path))[0] for file_path in webRTCfilesPath_server_list]
# Number of colors to generate
rtc_n_colors_server = len(webRTCfilesPath_server_list)
# Generate a list of n_colors using the color palette
rtc_server_color_palette = sns.color_palette(n_colors=rtc_n_colors_server)
# Convert the color_palette to a list of RGB tuples
rtc_server_color_list = [tuple(map(lambda x: int(x*255), color)) for color in rtc_server_color_palette]

# WebRTC stats web: https://w3c.github.io/webrtc-stats/

### **Client**

#### **WebRTC statistics Demo Figures**

##### **Time plot**

In [None]:
# Select which ones to plot
rtc_values = {
            'vid_frames_received_per_sec': 'Frames received per second',
            'vid_frames_per_sec': 'Decoded FPS',
            'vid_frames_decoded_per_sec': 'Frames decoded per second',
            'frames_rec_minus_decode_and_dropped': 'Frames lost per second',
            'frames_rec_minus_decode_and_dropped_tot': 'Total Frames lost',
            'vid_frames_dropped_per_sec': 'Frames dropped per second',
            'vid_packets_rec_per_sec': 'Video packets received per second',
            'vid_bits_rec_per_sec': 'Video Throughput (Mbps)',
            'vid_tot_packets_lost': 'Total Packets lost',

            'vid_avg_jitter_buffer_delay': 'Jitter Buffer Delay ( ms)',
            'vid_jitter': 'Jitter (ms)',
            'total_rtt': 'Total RTT (ms)',
            'average_rtt': 'Avg. RTT (ms)',
            'current_rtt': 'RTT (ms)',
            'packets_sent_per_sec': 'Traffic sent (packets/s)',
            'bits_sent_per_sec': 'Traffic sent (Mbps)',
            'packets_rec_per_sec': 'Traffic received (packets/s)',
            'bits_rec_per_sec': 'Traffic received (Mbps)',
            'inter_frame_delay': 'Inter frame delay ( ms)',
            'inter_frame_delay_std': 'Inter frame delay std ( ms)',
            'dec_time_per_frame': 'Decoding time per frame ( ms)',
            'process_del_per_frame': 'Processing time per frame ( ms)',
            'assembly_time_per_frame': 'Assembly time per frame ( ms)',
            'discarded_pkt': 'Total discarded packets'
          }
# Plot metric vs time
for key, value in rtc_values.items():
  factor = 1
  if '(ms)' in value:
    factor = 1000
  elif  'Mbps' in value:
    factor = 1/1e6
  #create_time_plot(df_client_rtc_list, key, '', 'Time (s)', value, timestamp_client_list, rtc_label_client_list, rtc_client_color_list, factor)

  # create_time_plot(df_client_rtc_list, key, '', 'Time (s)', value, timestamp_client_list, ['', '','','','','','','',''], rtc_client_color_list, factor)
  #create_time_plot(df_client_rtc_list, key, '', 'Time (s)', value, timestamp_client_list, ['H.264 - 90FPS', 'H.264 - 60FPS','H.264 - 30FPS','VP9 - 90FPS','VP9 - 60FPS','VP9 - 30FPS','2880x1600p','100Mbps'], rtc_client_color_list, factor)
  #Change the colors when studying the backgroundbackground colors:
  background_colors = px.colors.qualitative.Plotly[7:7 + len(df_client_rtc_list)]
  colors_bk_study = px.colors.qualitative.Plotly[0:1]+ background_colors
  create_time_plot(df_client_rtc_list, key, '', 'Time (s)', value, timestamp_client_list, ['Ideal', 'Iperf 50Mbps','Iperf 100Mbps','Iperf 200Mbps','','','','',''], colors_bk_study, factor)

##### **ECDF**

In [None]:
# Select which ones to plot
rtc_values = {
            #'vid_frames_received_per_sec': 'Frames received per second',
            'vid_frames_per_sec': 'FPS',
            #'vid_frames_decoded_per_sec': 'Frames decoded per second',
            #'vid_tot_frames_dropped': 'Total Frames dropped',
            #'vid_packets_rec_per_sec': 'Video packets received per second',
            #'vid_bits_rec_per_sec': 'Video traffic received (Mbps)',
            #'vid_tot_packets_lost': 'Total Packets lost',
            'vid_avg_jitter_buffer_delay': 'Jitter Buffer Delay ( ms)',
            'vid_jitter': 'Jitter (ms)',

            #'total_rtt': ' Total RTT (ms)',
            'average_rtt': 'Avg. RTT (ms)',
            'current_rtt': 'RTT (ms)',

            #'packets_sent_per_sec': 'Traffic sent (packets/s)',
            'bits_sent_per_sec': 'Traffic sent (Mbps)',
            #'packets_rec_per_sec': 'Traffic received (packets/s)',
            'bits_rec_per_sec': 'Traffic received (Mbps)'
          }

# Plot ECDF
for key, value in rtc_values.items():
  factor = 1
  if '(ms)' in value:
    factor = 1000
  elif  'Mbps' in value:
    factor = 1/1e6
  create_ecdf_plot(df_client_rtc_list, key, key, value, 'ECDF', rtc_label_client_list , rtc_client_color_list, factor)

##### **Bar plot**

In [None]:
# Select which ones to plot
rtc_values = {
            'vid_frames_received_per_sec': 'Frames received per second',
            'vid_frames_per_sec': 'Decoded FPS',
            'vid_frames_decoded_per_sec': 'Frames decoded per second',
            'vid_frames_dropped_per_sec': 'Frames dropped per second',
            'frames_rec_minus_decode_and_dropped': 'Frames lost',

            'vid_packets_rec_per_sec': 'Video packets received per second',
            'vid_bits_rec_per_sec': 'Video Throughput (Mbps)',
            'vid_tot_packets_lost': 'Total Packets lost',
            'vid_avg_jitter_buffer_delay': 'Jitter Buffer Delay ( ms)',
            'vid_jitter': 'Jitter (ms)',
            'total_rtt': 'Total RTT (ms)',
            'average_rtt': 'Avg. RTT (ms)',
            'current_rtt': 'RTT (ms)',

            'packets_sent_per_sec': 'Traffic sent (packets/s)',
            'bits_sent_per_sec': 'Traffic sent (Mbps)',
            'packets_rec_per_sec': 'Traffic received (packets/s)',
            'bits_rec_per_sec': 'Traffic received (Mbps)',
            'inter_frame_delay': 'Inter frame delay ( ms)',
            'inter_frame_delay_std': 'Inter frame delay std ( ms)',
            'dec_time_per_frame': 'Avg. decoding delay ( ms)',
            'process_del_per_frame': 'Avg. processing delay ( ms)',
            'assembly_time_per_frame': 'Avg. assembly delay ( ms)'
          }
# Plot metric vs time
for key, value in rtc_values.items():
  factor = 1
  if '(ms)' in value:
    factor = 1000
  elif  'Mbps' in value:
    factor = 1/1e6
  #webrtc_grouped_bar_plot(df_client_rtc_list, key, '', value, ['', '','','','','','',''], rtc_client_color_list, factor)
  webrtc_grouped_bar_plot(df_client_rtc_list, key, '', value, ['H.264 - 90FPS', 'H.264 - 60FPS','H.264 - 30FPS','VP9 - 90FPS','VP9 - 60FPS','VP9 - 30FPS','2880x1600p','100Mbps'], rtc_client_color_list, factor) #last to True to show 99percentile

In [None]:
# RTT with 99.999th
rtc_values = {
            'current_rtt': 'RTT (ms)',
          }
# Plot metric vs time
for key, value in rtc_values.items():
  factor = 1
  if '(ms)' in value:
    factor = 1000
  elif  'Mbps' in value:
    factor = 1/1e6

  webrtc_grouped_bar_plot(df_client_rtc_list, key, '', value, ['H.264 - 90FPS', 'H.264 - 60FPS','H.264 - 30FPS','VP9 - 90FPS','VP9 - 60FPS','VP9 - 30FPS','2880x1600p','100Mbps'], rtc_client_color_list, factor, True, False, True) #last to True to show 99percentile

#### **WebRTC Statistics Demo Computations**

In [None]:
# Select which ones to get the statistics
rtc_values = {
            'vid_frames_received_per_sec': 'Avg. Frames received per second',
            'vid_frames_per_sec': 'Avg. FPS',
            'vid_frames_decoded_per_sec': 'Avg. Frames decoded per second',
            'vid_frames_dropped_per_sec': 'Avg. Frames dropped',
            'frames_rec_minus_decode_and_dropped': 'Avg. Frames lost per sec',
            'frames_rec_minus_decode_and_dropped_tot': 'Total Frames lost',

            'vid_packets_rec_per_sec': 'Avg. Video packets received per second',
            'vid_bits_rec_per_sec': 'Avg. Video traffic received (Mbps)',
            'vid_tot_packets_lost': 'Total Packets lost',
            'vid_avg_jitter_buffer_delay': 'Avg. Jitter Buffer Delay ( ms)',
            'vid_jitter': 'Avg. Jitter (ms)',

            'total_rtt': 'Total RTT (ms)',
            'average_rtt': 'Avg. RTT (ms)',
            'current_rtt': 'RTT (ms)',
            'packets_sent_per_sec': 'Avg. Traffic sent (packets/s)',
            'bits_sent_per_sec': 'Avg. Traffic sent (Mbps)',
            'packets_rec_per_sec': 'Avg. Traffic received (packets/s)',
            'bits_rec_per_sec': 'Avg. Traffic received (Mbps)',
             'inter_frame_delay': 'Inter frame delay ( ms)',
            'inter_frame_delay_std': 'Inter frame delay std ( ms)',
            'dec_time_per_frame': 'Decoding time per frame ( ms)',
            'process_del_per_frame': 'Processing time per frame ( ms)',
            'assembly_time_per_frame': 'Assembly time per frame ( ms)'
          }
for i in range(len(df_client_rtc_list)):
  webRtc_traffic_characteristics(df_client_rtc_list[i], os.path.splitext(os.path.basename(webRTCfilesPath_client_list[i]))[0] ,rtc_values, info_client__list[i])


### **Server**

In [None]:
webRTCfilesPath = [
            'Datasets/90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Server/WebRTC Server - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.json'
            # ,'Datasets/60fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Server/WebRTC Server - 60fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.json'
             # ,'Datasets/30fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Server/WebRTC Server - 30fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.json'
             # ,'Datasets/90fps - 50Mbps - 3664x1920 - VP9 - 80 Mhz/Server/WebRTC Server - 90fps - 50Mbps - 3664x1920 - VP9 - 80 Mhz.json'
            #  ,'Datasets/60fps - 50Mbps - 3664x1920 - VP9 - 80 Mhz/Server/WebRTC Server - 60fps - 50Mbps - 3664x1920 - VP9 - 80 Mhz.json'
            #  ,'Datasets/30fps - 50Mbps - 3664x1920 - VP9 - 80 Mhz/Server/WebRTC Server - 30fps - 50Mbps - 3664x1920 - VP9 - 80 Mhz.json'
            #  , 'Datasets/90fps - 50Mbps - 2880x1600- H264 - 80 Mhz/Server/WebRTC Server - 90fps - 50Mbps - 2800x1600 - H264 - 80 Mhz.json'
             # , 'Datasets/90fps - 100Mbps - 3664x1920 - H264 - 80 Mhz/Server/WebRTC Server - 90fps - 100Mbps - 3664x1920 - H264 - 80 Mhz.json'
              ,'Datasets/Iperf 50Mbps - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Server/WebRTC Server - Iperf 50Mbps - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.json'
              ,'Datasets/Iperf 100Mbps - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Server/WebRTC Server - Iperf 100Mbps - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.json'
              ,'Datasets/Iperf 200Mbps - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Server/WebRTC Server- Iperf 200Mbps - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.json'
             # ,'Datasets/OculusLink - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz/Server/WebRTC Server - OculusLink - 90fps - 50Mbps - 3664x1920 - H264 - 80 Mhz.json'
                ]

start_time_webrtc_server = 11 #has 1 second more
end_time_webrtc_server =41

from0 = True #x axis from 0

###### DO NOT MODIFY THE CODE BELOW #################################################################################################################################################
df_server_rtc_list = []
info_server_list = []
timestamp_server_list = []

webRTCfilesPath_server_list = []

for filePath in webRTCfilesPath:
    df_rtc, info, side, timestamps_list = extract_webrtc_data(filePath, directory)
    if df_rtc is not None:
      start_index = timestamps_list.index(start_time_webrtc_server)

      # Find the index of end_time_webrtc
      end_index = timestamps_list.index(end_time_webrtc_server)
      df_server_rtc_list.append(df_rtc[start_index:end_index+1])
      info_server_list.append(info)
      webRTCfilesPath_server_list.append(filePath)
      if from0:
        timestamp_list = timestamps_list[start_index:end_index+1]
        timestamp_server_list.append([timestamp - np.min(timestamp_list) for timestamp in timestamp_list])
      else:
        timestamp_server_list.append(timestamps_list[start_index:end_index+1])


#SERVER
rtc_label_server_list = [os.path.splitext(os.path.basename(file_path))[0] for file_path in webRTCfilesPath_server_list]
# Number of colors to generate
rtc_n_colors_server = len(webRTCfilesPath_server_list)
# Generate a list of n_colors using the color palette
rtc_server_color_palette = sns.color_palette(n_colors=rtc_n_colors_server)
# Convert the color_palette to a list of RGB tuples
rtc_server_color_list = [tuple(map(lambda x: int(x*255), color)) for color in rtc_server_color_palette]
# WebRTC stats web: https://w3c.github.io/webrtc-stats/

#### **WebRTC statistics Demo Figures**

##### **Time plot**

In [None]:
# Select which ones to plot
rtc_values = {
            'vid_frames_sent_per_sec': 'Frames sent per second', #frames sent on the RTP stream
            'vid_frames_per_sec': 'Encoded FPS', # frames encoded
            'vid_frames_encoded_per_sec': 'Frames encoded per second', #frames succesfully encoded
            'vid_packets_sent_per_sec': 'Video packets sent per second',
            'vid_bits_sent_per_sec': 'Video traffic sent (Mbps)',
            'vid_packets_retransmitted_per_sec': 'Video packets retransmitted per second',
            'vid_bits_retransmitted_per_sec': 'Video traffic retransmitted (Mbps)',
            'vid_packets_lost_per_sec': 'Video packets lost per second',
            'vid_fraction_lost': 'Video Fraction lost',
            'vid_encode_time_per_sec': 'Video encode time per second (ms)',
            'vid_avg_encode_time': 'Video Avg. encode time (ms)',
            'vid_pkt_send_delay_per_sec': 'Video packet send delay per second (ms)',
            'total_rtt': ' Total RTT (ms)',
            'average_rtt': 'Avg. RTT (ms)',
            'current_rtt': 'RTT (ms)',
            'packets_sent_per_sec': 'Traffic sent (packets/s)',
            'bits_sent_per_sec': 'Traffic sent (Mbps)',
            'packets_rec_per_sec': 'Traffic received (packets/s)',
            'bits_rec_per_sec': 'Traffic received (Mbps)' ,
            'quality_res_changes': 'Avg. Number of quality res changes'

          }
# Plot metric vs time
for key, value in rtc_values.items():
  factor = 1
  if '(ms)' in value:
    factor = 1000
  elif  'Mbps' in value:
    factor = 1/1e6
  #create_time_plot(df_server_rtc_list, key, '', 'Time (s)', value, timestamp_server_list, rtc_label_server_list, rtc_server_color_list, factor)
  #create_time_plot(df_server_rtc_list, key, '', 'Time (s)', value, timestamp_server_list, ['', '','','','','','','','',''], rtc_server_color_list, factor)
 # background_colors = px.colors.qualitative.Plotly[7:7 + len(df_server_rtc_list)]
 # colors_bk_study = px.colors.qualitative.Plotly[0:1]+ background_colors
  #create_time_plot(df_server_rtc_list, key, '', 'Time (s)', value, timestamp_server_list, ['Ideal', 'Iperf 50Mbps','Iperf 100Mbps','Iperf 200Mbps','','','','',''], colors_bk_study, factor)
  oculus_colors = px.colors.qualitative.Plotly[4:4 + len(df_server_rtc_list)]
  colors_oculus_study = px.colors.qualitative.Plotly[0:1]+ oculus_colors
  create_time_plot(df_server_rtc_list, key, '', 'Time (s)', value, timestamp_server_list, ['Laptop Remote', 'Oculus Link','','','',''], colors_oculus_study, factor)


##### **ECDF**

In [None]:
# Select which ones to plot
rtc_values = {
            'vid_frames_sent_per_sec': 'Frames sent per second',
            'vid_frames_per_sec': 'FPS',
            'vid_frames_encoded_per_sec': 'Frames encoded per second',
            'vid_packets_sent_per_sec': 'Video packets sent per second',
            'vid_bits_sent_per_sec': 'Video traffic sent (Mbps)',
            'vid_packets_retransmitted_per_sec': 'Video packets retransmitted per second',
            'vid_bits_retransmitted_per_sec': 'Video traffic retransmitted (Mbps)',
            'vid_encode_time_per_sec': 'Video encode time per second (ms)',
            'vid_avg_encode_time': 'Video Avg. encode time (ms)',
            'vid_pkt_send_delay_per_sec': 'Video packet send delay per second (ms)',
            'vid_packets_lost_per_sec': ' Video packet lost per second',
            'total_rtt': ' Total RTT (ms)',
            'average_rtt': 'Avg. RTT (ms)',
            'packets_sent_per_sec': 'Traffic sent (packets/s)',
            'bits_sent_per_sec': 'Traffic sent (Mbps)',
            'packets_rec_per_sec': 'Traffic received (packets/s)',
            'bits_rec_per_sec': 'Traffic received (Mbps)'
          }

# Plot ECDF
for key, value in rtc_values.items():
  factor = 1
  if '(ms)' in value:
    factor = 1000
  elif  'Mbps' in value:
    factor = 1/1e6
  create_ecdf_plot(df_server_rtc_list, key, key, value, 'ECDF', rtc_label_server_list , rtc_server_color_list, factor)


#### **WebRTC Statistics Demo Computations**

In [None]:
# Select which ones to plot
rtc_values = {
            'vid_frames_sent_per_sec': 'Avg. Frames sent per second',
            'vid_frames_per_sec': 'Avg. FPS',
            'vid_frames_encoded_per_sec': 'Avg. Frames encoded per second',
            'vid_packets_sent_per_sec': 'Avg. Video packets sent per second',
            'vid_bits_sent_per_sec': 'Avg. Video traffic sent (Mbps)',
            'vid_packets_retransmitted_per_sec': 'Avg. Video packets retransmitted per second',
            'vid_bits_retransmitted_per_sec': 'Avg. Video traffic retransmitted (Mbps)',
            'vid_encode_time_per_sec': 'Avg.Video encode time per second (ms)',
            'vid_avg_encode_time': 'Avg. Video encode time (ms)',
            'vid_pkt_send_delay_per_sec': 'Avg. Video packet send delay per second (ms)',
            'vid_packets_lost_per_sec': 'Avg. Video packet lost per second',
            'total_rtt': ' Total RTT (ms)',
            'average_rtt': 'Avg. RTT (ms)',
            'current_rtt': 'RTT (ms)',
            'packets_sent_per_sec': 'Avg. Traffic sent (packets/s)',
            'bits_sent_per_sec': 'Avg. Traffic sent (Mbps)',
            'packets_rec_per_sec': 'Avg. Traffic received (packets/s)',
            'bits_rec_per_sec': 'Avg. Traffic received (Mbps)' ,
            'quality_res_changes': 'Avg. Number of quality res changes'
          }

for i in range(len(df_server_rtc_list)):
  webRtc_traffic_characteristics(df_server_rtc_list[i], os.path.splitext(os.path.basename(webRTCfilesPath_server_list[i]))[0] ,rtc_values, info_server_list[i])
