## Prologue
This evaluation contains the result from my WiFi- Mehs Test Setup. The goal was to find out if a WiFi Mesh cosisting of two accesspoints with a total distance of 100 meters  in between would allow a prototype autonomous vehicle to stream its video data via wifi to a host (See my [previous post](https://protogia.github.io/blog/wifi-bandwith-to-distance-relation/) about the test setup). 

After recoding multiple test traces and syncing the logging-outputs of iperf3 and gpspipe we'll visualise in this article the limitations of the setup.
To make sure that the setup will suit to the requirements, we have to evaluate the relation between bandwith and distance and as well as between latency and distance. To discover wifi blindspots in the test area we will also plot the results on a map. 

## Install dependencies

In [11]:
!pip install ipykernel;
!pip install matplotlib;
!pip install seaborn;
!pip install folium;
!pip install plotly;


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1

In [36]:
import pandas as pd;
import numpy as np;
import geopandas as gpd;
import folium;
import os;
import seaborn as sns;
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

## Load testtraces

We recorded multiple traces in two different setups. The synced testtraces within _results/bandwithtest_09072024_mesh_wlan-b_ contain additional informations about the latency. For the bandwith evaluation we will use all of them.

In [3]:
folder = 'results'

testtraces = {}
for dir in os.listdir(folder):
      testtraces[dir] = []
      try:
          for file in os.listdir(os.path.join(folder, dir)):
              if "pcap" not in file:
                  file_path = os.path.join(folder, dir, file)
                  df = pd.read_csv(file_path)
                  testtraces[dir].append(df)
      except Exception as e:
          print(f"Failed to read {file_path}: {e}")


In [4]:
testtraces['bandwithtest_09072924_mesh_wlan-b'][0].head()

Unnamed: 0.1,Unnamed: 0,Time,Latitude,Longitude,Bitrate,Latency,DISTANCE_CENTER,DISTANCE_AP_RUESTHALLE,DISTANCE_AP_GARAGE
0,0,11:41:55.248000,52.315723,10.564559,1.31,3.74,189.419552,228.346361,146.736744
1,1,11:41:56.248000,52.315727,10.564529,0.0,154.0,187.333062,226.257243,144.663165
2,2,11:41:57.248000,52.315732,10.564495,1.57,383.0,184.94154,223.862744,142.286387
3,3,11:41:58.248000,52.315735,10.564457,0.0,300.0,182.369709,221.288577,139.727164
4,4,11:42:00.248000,52.315744,10.564365,1.56,13.6,176.021568,214.936401,133.404155


The data consists of the following informations:

| Entity                 | Unit        |
|------------------------|-------------|
| Time                   | [localtime] |
| Latitude               | [wgs84]     |
| Longitude              | [wgs84]     |
| Bitrate                | [MB/s]      |
| Latency                | [ms]        |
| DISTANCE_CENTER        | [m]         |
| DISTANCE_AP_RUESTHALLE | [m]         |
| DISTANCE_AP_GARAGE     | [m]         |

The values for _DISTANCE_CENTER_, _DISTANCE_AP_RUESTHALLE_ and _DISTANCE_AP_GARAGE_ are calculated. _DISTANCE_CENTER_ describes the distance between the vehicle position and the middle point between both accesspoints. _DISTANCE_AP_RUESTHALLE_ and _DISTANCE_AP_GARAGE_ are describing the distance between accesspoint-1/accesspoint-2.The last two distances are calculated for proofing reasons.

## Bandwith results: Distance to center

For the evaluation we'll concatenate the single recordings to one dataframe. In the next step we'll remove the outliers for this we need to visualise the distribution and check for the most suitable methods. 

### Outlier Removal through IQR

IQR removes outliers by calculating thresholds of normal distributions. Each value that exceeds this threshold is signed as outlier. As you can see in the next plots the distribution of values for _Bitrate_ and _Latency_ is not a normal distribution. The Shapiro-Wilk-Test confirms this by calculating values for p that do not fulfill _p>0.05_. So IQR is not a suitable method for this dataset.

In [28]:
# Concatenate all dataframes to plot on a single figure
all_dfs = []
for testname, list_of_dfs in testtraces.items():
    all_dfs.extend(list_of_dfs)

combined_df = pd.concat(all_dfs, ignore_index=True)

In [37]:
# plot distributions
fig = make_subplots(rows=1, cols=2,
                    subplot_titles=('Distribution of Bitrate-data', 'Distribution of Latency-data'))

# bitrate
fig.add_trace(
    go.Histogram(
        x=combined_df['Bitrate'],
        nbinsx=100,
        name='Bitrate',
        marker=dict(line=dict(width=1, color='black')) # Apply your styling
    ),
    row=1, col=1
)

# latency
fig.add_trace(
    go.Histogram(
        x=combined_df['Latency'],
        nbinsx=100,  
        name='Latency',
        marker=dict(line=dict(width=1, color='black'))
    ),
    row=1, col=2
)

fig.update_layout(
    title_text="Distribution of Bitrate and Latency Data",
    showlegend=False
)

fig.update_xaxes(title_text="Bitrate", row=1, col=1)
fig.update_xaxes(title_text="Latency", row=1, col=2)

fig.update_yaxes(title_text="Count", row=1, col=1)
fig.update_yaxes(title_text="Count", row=1, col=2)

fig.show()

# calculate IQR
Q1 = combined_df['Bitrate'].quantile(0.25)
Q3 = combined_df['Bitrate'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Shapiro-Wilk-Test
stat, p_value = stats.shapiro(combined_df['Bitrate'].dropna())
print(f'Shapiro-Wilk: statistical value = {stat:.4f}, p-value = {p_value:.4f} for Bitrate')

stat, p_value = stats.shapiro(combined_df['Latency'].dropna())
print(f'Shapiro-Wilk: statistical value = {stat:.4f}, p-value = {p_value:.4f} for Latency')

Shapiro-Wilk: statistical value = 0.7010, p-value = 0.0000 for Bitrate
Shapiro-Wilk: statistical value = 0.5886, p-value = 0.0000 for Latency


Outlier Removal through z-score-method

The z-score measures how many standard deviations a value differs from the mean of a dataset. A common rule for outlier detection is a threshold of _z-score>3_. We'll calculate this for bitrate and latency and as you can see in the next plot all values on the right side of the threshold are identified as outliers.

In [44]:
combined_df['z_score_bitrate'] = (combined_df['Bitrate'] - combined_df['Bitrate'].mean()) / combined_df['Bitrate'].std()
combined_df['z_score_latency'] = (combined_df['Latency'] - combined_df['Latency'].mean()) / combined_df['Latency'].std()

outliers_z_b = combined_df[np.abs(combined_df['z_score_bitrate']) > 3]
outliers_z_l = combined_df[np.abs(combined_df['z_score_latency']) > 3]

In [41]:
outliers_z_b

Unnamed: 0.1,Unnamed: 0,Time,Latitude,Longitude,Bitrate,Latency,DISTANCE_CENTER,DISTANCE_AP_RUESTHALLE,DISTANCE_AP_GARAGE,z_score,z_score_bitrate,z_score_latency
60,2,11:35:55,52.31623,10.560745,510.0,182.0,81.589517,46.878905,123.901565,9.883251,9.883251,0.54971
62,4,11:35:57,52.316238,10.560775,521.0,253.0,80.285808,46.167503,122.430535,10.118899,10.118899,1.05948


In [35]:
outliers_z_l

Unnamed: 0.1,Unnamed: 0,Time,Latitude,Longitude,Bitrate,Latency,DISTANCE_CENTER,DISTANCE_AP_RUESTHALLE,DISTANCE_AP_GARAGE,z_score,z_score_bitrate,z_score_latency
59,1,11:35:54,52.316229,10.560738,0.0,754.0,81.947953,47.140851,124.284874,-1.042272,-1.042272,4.656589
164,28,11:40:33.500000,52.315911,10.560682,0.0,833.0,75.890957,37.326354,118.707602,-1.042272,-1.042272,5.223798
245,50,11:35:30,52.316008,10.560837,0.0,1570.0,66.827893,28.036789,110.126673,-1.042272,-1.042272,10.515353
246,51,11:35:31,52.316017,10.560805,10.5,547.0,69.193007,30.439653,112.494562,-0.817335,-0.817335,3.170358
343,37,11:37:54.248000,52.315712,10.561602,0.0,544.0,22.399245,35.217764,55.648914,-1.042272,-1.042272,3.148819


In [47]:
# outlier flags
combined_df['Outlier_Bitrate'] = np.where(np.abs(combined_df['z_score_bitrate']) > 3, 'Outlier (>|3σ|)', 'Normal')
combined_df['Outlier_Latency'] = np.where(np.abs(combined_df['z_score_latency']) > 3, 'Outlier (>|3σ|)', 'Normal')


# plotting
def plot_distribution_with_outliers(df, data_col, z_col, title):
    mean_val = df[data_col].mean()
    std_val = df[data_col].std()
    
    outlier_flag_col = f'Outlier_{data_col}' 

    fig = make_subplots(
        rows=2, cols=1, 
        shared_xaxes=True, 
        vertical_spacing=0.05,
        row_heights=[0.6, 0.4]
    )

    hist_fig = px.histogram(
        df, 
        x=data_col, 
        color=outlier_flag_col,
        color_discrete_map={
            'Normal': '#4C72B0',        # Normal/blue
            'Outlier (>|3σ|)': '#DC3912' # Outlier/red
        },
        marginal="box", # Adds a box plot on top
        nbins=50
    )

    # Add histogram traces to the main figure
    for trace in hist_fig.data:
        if trace.type == 'histogram':
            fig.add_trace(trace, row=1, col=1)

    # Add vertical lines for the Z-score thresholds (Mean ± 3*StdDev)
    z_thresholds = [
        {'x': mean_val - 3 * std_val, 'text': 'Z=-3σ', 'pos': 'top left'},
        {'x': mean_val + 3 * std_val, 'text': 'Z=+3σ', 'pos': 'top right'}
    ]
    
    for threshold in z_thresholds:
        fig.add_vline(
            x=threshold['x'], 
            line_width=2, 
            line_dash="dash", 
            line_color="gray",
            annotation_text=threshold['text'],
            annotation_position=threshold['pos'],
            row=1, col=1
        )
        
    scatter_fig = px.scatter(
        df, 
        x=data_col, 
        y=[0] * len(df), # All points on the same vertical line
        color=outlier_flag_col,
        color_discrete_map={
            'Normal': '#4C72B0',
            'Outlier (>|3σ|)': '#DC3912'
        },
        hover_data={data_col: ':.4f', z_col: ':.4f', 'y': False}
    )
    
    for trace in scatter_fig.data:
        if trace.type == 'scatter':
            fig.add_trace(trace, row=2, col=1)


    fig.update_layout(
        height=700, 
        title_text=f'**{title} Distribution with Z-Score Outliers ($|\mathbf{Z}| > 3$)**',
        bargap=0.05, # Space between histogram bars
        showlegend=True
    )
    
    # Update axes titles and properties
    fig.update_xaxes(title_text=data_col, row=2, col=1)
    fig.update_yaxes(title_text='Count', row=1, col=1)
    fig.update_yaxes(showticklabels=False, row=2, col=1, title_text='Data Points')
    fig.show()


plot_distribution_with_outliers(
    combined_df, 
    data_col='Bitrate', 
    z_col='z_score_bitrate', 
    title='Bitrate'
)

plot_distribution_with_outliers(
    combined_df, 
    data_col='Latency', 
    z_col='z_score_latency', 
    title='Latency'
)


invalid escape sequence '\m'


invalid escape sequence '\m'


invalid escape sequence '\m'


invalid escape sequence '\m'



NameError: A name conflict was encountered for argument 'y'. A column or index with name 'y' is ambiguous.

In [None]:
# Outlier removal
def remove_outliers_iqr(df, column):
    Q1 = df[column].quantile(0.25)
    Q3 = df[column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    return df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]

combined_df_filtered = combined_df.copy()
combined_df_filtered = remove_outliers_iqr(combined_df_filtered, 'Bitrate')
combined_df_filtered = remove_outliers_iqr(combined_df_filtered, 'DISTANCE_CENTER')

In [6]:
# plottling
scatter_fig = px.scatter(
    combined_df_filtered,
    x='DISTANCE_CENTER',
    y='Bitrate',
    title='Bitrate vs. Distance to Center with Trendline (Outliers Removed)',
    labels={'DISTANCE_CENTER': 'Distance to Center [m]', 'Bitrate': 'Bitrate [MBit/s]'},
    trendline='lowess',
)

histogram_fig = go.Figure(go.Histogram2dContour(
    x=combined_df_filtered['DISTANCE_CENTER'],
    y=combined_df_filtered['Bitrate'],
    colorscale='Viridis',
    colorbar=dict(title='Density'),
    contours=dict(coloring='heatmap'),
))

# Create subplots
fig = make_subplots(rows=1, cols=2, subplot_titles=('Bitrate vs. Distance to Center with Trendline', 'Spread of Bitrate vs. Distance to Center (Heatmap)'))

for trace in scatter_fig['data']:
    fig.add_trace(trace, row=1, col=1)

for trace in histogram_fig['data']:
     fig.add_trace(trace, row=1, col=2)


fig.update_layout(
    title_text='Bitrate Visualizations',
    template='none'
)

fig.update_xaxes(title_text='Distance to Center [m]', row=1, col=1)
fig.update_yaxes(title_text='Bitrate [MBit/s]', row=1, col=1)
fig.update_xaxes(title_text='Distance to Center [m]', row=1, col=2)
fig.update_yaxes(title_text='Bitrate [MBit/s]', row=1, col=2)

fig.show()

## Latency results: Distance to center

In [7]:
# Concatenate dataframes with 'Latency' column
all_dfs_latency = []
for testname, list_of_dfs in testtraces.items():
    for df in list_of_dfs:
        if 'Latency' in df.columns:
            all_dfs_latency.append(df)

combined_df_latency = pd.concat(all_dfs_latency, ignore_index=True)

# Outlier removal using IQR for Latency and DISTANCE_CENTER
def remove_outliers_iqr(df, column):
    Q1 = df[column].quantile(0.25)
    Q3 = df[column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    return df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]

combined_df_latency_filtered = combined_df_latency.copy()
combined_df_latency_filtered = remove_outliers_iqr(combined_df_latency_filtered, 'Latency')
combined_df_latency_filtered = remove_outliers_iqr(combined_df_latency_filtered, 'DISTANCE_CENTER')

In [8]:
# plotting
scatter_fig_latency = px.scatter(
    combined_df_latency_filtered,
    x='DISTANCE_CENTER',
    y='Latency',
    title='Latency vs. Distance to Center with Trendline (Outliers Removed)',
    labels={'DISTANCE_CENTER': 'Distance to Center [m]', 'Latency': 'Latency [ms]'},
    trendline='lowess'
)

histogram_fig_latency = go.Figure(go.Histogram2dContour(
    x=combined_df_latency_filtered['DISTANCE_CENTER'],
    y=combined_df_latency_filtered['Latency'],
    colorscale='Viridis',
    colorbar=dict(title='Density'),
    contours=dict(coloring='heatmap'),
))

# Create subplots
fig_latency = make_subplots(rows=1, cols=2, subplot_titles=('Latency vs. Distance to Center with Trendline (Outliers Removed)', 'Spread of Latency vs. Distance to Center (2D Histogram, Outliers Removed)'))

for trace in scatter_fig_latency['data']:
    fig_latency.add_trace(trace, row=1, col=1)

for trace in histogram_fig_latency['data']:
     fig_latency.add_trace(trace, row=1, col=2)


fig_latency.update_layout(
    title_text='Latency Visualizations (Outliers Removed)',
    showlegend=False,
    template='none'
)

fig_latency.update_xaxes(title_text='Distance to Center [m]', row=1, col=1)
fig_latency.update_yaxes(title_text='Latency [ms]', row=1, col=1)
fig_latency.update_xaxes(title_text='Distance to Center [m]', row=1, col=2)
fig_latency.update_yaxes(title_text='Latency [ms]', row=1, col=2)

fig_latency.show()

## Bandwith and Latency on geomap

In [9]:
# Concatenate all dataframes for plotting
all_dfs_map = []
for testname, list_of_dfs in testtraces.items():
    for df in list_of_dfs:
        all_dfs_map.append(df)

combined_df_map = pd.concat(all_dfs_map, ignore_index=True)

# Outlier removal using IQR
def remove_outliers_iqr(df, column):
    Q1 = df[column].quantile(0.25)
    Q3 = df[column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    return df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]

combined_df_map_filtered = combined_df_map.copy()
combined_df_map_filtered = remove_outliers_iqr(combined_df_map_filtered, 'Bitrate')
combined_df_map_filtered = remove_outliers_iqr(combined_df_map_filtered, 'DISTANCE_CENTER')

# Drop rows with NaN in Latency column for plotting size and color on the Latency map
if 'Latency' in combined_df_map_filtered.columns and not combined_df_map_filtered['Latency'].isnull().all():
    combined_df_map_filtered = remove_outliers_iqr(combined_df_map_filtered, 'Latency')

combined_df_map_latency_filtered = combined_df_map_filtered.dropna(subset=['Latency']).copy()

In [10]:
# plotting
fig = make_subplots(rows=1, cols=2, specs=[[{'type': 'mapbox'}, {'type': 'mapbox'}]],
                    subplot_titles=('Bitrate Map (Outliers Removed)', 'Latency Map (Outliers Removed)'))

fig.add_trace(go.Scattermapbox(
    lat=combined_df_map_filtered["Latitude"],
    lon=combined_df_map_filtered["Longitude"],
    mode='markers',
    marker=go.scattermapbox.Marker(
        size=16, # Increased marker size
        color=combined_df_map_filtered["Bitrate"],
        colorscale='viridis',
        colorbar=dict(title='Bitrate [MBit/s]', x=0.45) # Adjust x position
    ),
    text=combined_df_map_filtered["Bitrate"].apply(lambda x: f'Bitrate: {x:.2f}'),
    name='Bitrate'
), row=1, col=1)

green_to_red = [(0, 'green'), (0.5, 'grey'), (1, 'red')]
if not combined_df_map_latency_filtered.empty: # Check if there's data after dropping NaNs
    fig.add_trace(go.Scattermapbox(
        lat=combined_df_map_latency_filtered["Latitude"],
        lon=combined_df_map_latency_filtered["Longitude"],
        mode='markers',
        marker=go.scattermapbox.Marker(
            size=16, # Increased marker size
            color=combined_df_map_latency_filtered["Latency"],
            colorscale=green_to_red,
            colorbar=dict(title='Latency [ms]', x=1.0) # Adjust x position
        ),
        text=combined_df_map_latency_filtered["Latency"].apply(lambda x: f'Latency: {x:.2f}'),
        name='Latency'
    ), row=1, col=2)
else:
     print("No Latency data available after outlier removal for mapping.")


# mapbox subplots
fig.update_layout(
    mapbox1=dict(
        style="carto-positron",
        center=dict(lat=combined_df_map_filtered["Latitude"].mean(), lon=combined_df_map_filtered["Longitude"].mean()),
        zoom=16
    ),
    mapbox2=dict(
        style="carto-positron",
        center=dict(lat=combined_df_map_latency_filtered["Latitude"].mean(), lon=combined_df_map_latency_filtered["Longitude"].mean()),
        zoom=16
    ),
    showlegend=False,
)

fig.show()


*scattermapbox* is deprecated! Use *scattermap* instead. Learn more at: https://plotly.com/python/mapbox-to-maplibre/


*scattermapbox* is deprecated! Use *scattermap* instead. Learn more at: https://plotly.com/python/mapbox-to-maplibre/

