# LoRa variability data cleanup

In [212]:
import numpy as np
import pandas as pds
import matplotlib.pyplot as plt
import numexpr
%matplotlib inline
import seaborn as sb
sb.set()

The packet data are read from the CSV file.
They are then joined with the timestamp of the first valid packet per device.

In [213]:
loravar = pds.read_csv('../data/loravar.csv')
devices = pds.read_csv('../data/devices.csv')
loravar = loravar.merge(devices, on='dev_id')
loravar.shape

(111092, 15)

In [214]:
loravar.dtypes

received            object
dev_id              object
dev_eui             object
gtw_id              object
counter              int64
frequency          float64
data_rate           object
coding_rate         object
rssi                 int64
snr                float64
battery            float64
humidity           float64
pressure           float64
temperature        float64
start_timestamp     object
dtype: object

Data before the first valid timestamp must be dropped.

In [215]:
loravar['received'] = pds.to_datetime(loravar['received'])
loravar['start_timestamp'] = pds.to_datetime(loravar['start_timestamp'])
loravar = loravar.loc[loravar['received'] > loravar['start_timestamp']]

Some columns are redundant and can be dropped.

In [216]:
loravar.drop(columns=['dev_eui', 'counter', 'coding_rate', 'start_timestamp'], inplace=True)
loravar.shape

(100210, 11)

The dataset contains a few incomplete rows that we can drop.
These rows miss the data coming from the payload.

In [217]:
pds.isna(loravar).sum()

received        0
dev_id          0
gtw_id          0
frequency       0
data_rate       0
rssi            0
snr             0
battery        18
humidity       18
pressure       18
temperature    18
dtype: int64

In [218]:
loravar.dropna(inplace=True)
loravar.shape

(100192, 11)

Unfortunately, there exist many duplicates that must be get ridden of.

In [219]:
loravar.duplicated().sum()

81768

In [220]:
loravar.drop_duplicates(inplace=True)

After cleanup, the dataset covers the time range hereunder.
Although the situation might be very different per device and per gateway
As could be expected, the gateway on the roof of Cité Houzeau is the one that received the most packets.

In [221]:
loravar['received'].max() - loravar['received'].min()

Timedelta('23 days 07:56:06.429852')

In [222]:
loravar.groupby(['dev_id', 'gtw_id'])['received'].count().unstack()


gtw_id,eui-0000024b08030186,iotlab-rpi-03
dev_id,Unnamed: 1_level_1,Unnamed: 2_level_1
static_6_01,222.0,
static_6_02,1493.0,
static_6_03,3200.0,2526.0
static_7_01,3079.0,2365.0
static_7_02,3083.0,66.0
static_8_01,2390.0,
