## Create new emulation profiles for the ERRANT tool

You need an input `csv` file describing the speedtest measurements you performed as well as some contextual information needed to build the profiles.

The input `csv` file must have the following columns:
* `operator`: The name of the ISP where the measurement was taken.
* `country`: The country where the measurement was taken.
* `access_technology`: The access technology (2G, 3G or 4G).
* `rssi`: The received signal strength indicator (RSSI) of the radio interface.
* `download_mbps`: The download rate measured in Mbit/s.
* `upload_mbps`: The upload rate measured in Mbit/s.
* `rtt_ms`: The latency measured in ms.

### Read the data

In [1]:
import pandas as pd
import pickle
from scipy.stats import gaussian_kde
import warnings; warnings.simplefilter('ignore')

INFILE = "speed_test_sample.csv"
OUTFILE= "errant_profiles_sample.pickle"
GROUP_KEYS = ["Network", "Country", "access_technology", "signal_quality", "is_roaming"]
MIN_OBSERVATIONS = 10

In [2]:
speedtests = pd.read_csv(INFILE)
speedtests.head()

Unnamed: 0,operator,country,access_technology,rssi,download_mbps,upload_mbps,rtt_ms
0,telenor,norway,4G,-55.133333,7.63758,6.116484,83.133333
1,telia,sweden,4G,-59.5,129.614944,45.679225,168.0
2,telia,sweden,4G,-80.571429,39.39705,4.923578,87.571429
3,telenor,norway,4G,-64.0,53.197882,26.300203,69.7
4,telenor,sweden,4G,-67.333333,41.335665,14.611385,63.166667


### Compute Signal Quality

In [3]:
def get_signal_quality(row):
    
    rssi = row.rssi
    devicemode = row.access_technology
    if devicemode == '4G':
        if rssi > -75:
            return "good"
        elif rssi <= -75 and rssi > -85:
            return "medium"
        elif rssi <= -85:
            return "bad"
    elif devicemode == '3G' or devicemode == '2G' :
        if rssi > -85:
            return "good"
        elif rssi <= -85 and rssi > -100:
            return "medium"
        elif rssi <= -100:
            return "bad"  
    else:
        return "unk"
    
speedtests["signal_quality"] = speedtests[["rssi", "access_technology"]].apply(get_signal_quality, axis=1)


### Create the models

In [7]:
def create_dists(df):
    count = len(df.index)
    samples = df[["download_mbps", "upload_mbps", "rtt_ms"]].dropna().values
    try:
        kde = gaussian_kde(samples.transpose(), "silverman" )
    except:
        kde = None
    return pd.Series({"count": count, "kde": kde})

models = speedtests.groupby(["operator", "country", "access_technology", "signal_quality"]).apply(create_dists)
models = models[models["count"]>=MIN_OBSERVATIONS]

models

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count,kde
operator,country,access_technology,signal_quality,Unnamed: 4_level_1,Unnamed: 5_level_1
3,sweden,3G,good,23.0,<scipy.stats.kde.gaussian_kde object at 0x7f77...
3,sweden,3G,medium,16.0,<scipy.stats.kde.gaussian_kde object at 0x7f77...
3,sweden,4G,good,101.0,<scipy.stats.kde.gaussian_kde object at 0x7f77...
3,sweden,4G,medium,27.0,<scipy.stats.kde.gaussian_kde object at 0x7f77...
net1,norway,4G,good,60.0,<scipy.stats.kde.gaussian_kde object at 0x7f77...
net1,norway,4G,medium,12.0,<scipy.stats.kde.gaussian_kde object at 0x7f77...
telenor,norway,4G,good,92.0,<scipy.stats.kde.gaussian_kde object at 0x7f77...
telenor,norway,4G,medium,17.0,<scipy.stats.kde.gaussian_kde object at 0x7f77...
telenor,sweden,3G,good,11.0,<scipy.stats.kde.gaussian_kde object at 0x7f77...
telenor,sweden,3G,medium,15.0,<scipy.stats.kde.gaussian_kde object at 0x7f77...


In [5]:
pickle.dump(models, open(OUTFILE, "wb"))