# LoRa variability data cleanup

In [103]:
import numpy as np
import pandas as pds
import matplotlib.pyplot as plt
import numexpr
%matplotlib inline
import seaborn as sb
sb.set()

The packet data are read from the CSV file.
They are then joined with the timestamp of the first valid packet per device.

In [169]:
loravar = pds.read_csv('../data/loravar.csv')
devices = pds.read_csv('../data/devices.csv')
loravar = loravar.merge(devices, on='dev_id')
loravar.shape

(111092, 15)

In [170]:
loravar.dtypes

received            object
dev_id              object
dev_eui             object
gtw_id              object
counter              int64
frequency          float64
data_rate           object
coding_rate         object
rssi                 int64
snr                float64
battery            float64
humidity           float64
pressure           float64
temperature        float64
start_timestamp     object
dtype: object

Data before the first valid timestamp must be dropped.

In [171]:
loravar['received'] = pds.to_datetime(loravar['received'])
loravar['start_timestamp'] = pds.to_datetime(loravar['start_timestamp'])
loravar = loravar.loc[loravar['received'] > loravar['start_timestamp']]

Some columns are redundant and can be dropped.

In [172]:
loravar.drop(columns=['dev_eui', 'counter', 'coding_rate', 'start_timestamp'], inplace=True)
loravar.shape

(100210, 11)

The dataset contains a few incomplete rows that we can drop.
These rows miss the data coming from the payload.

In [173]:
pds.isna(loravar).sum()

received        0
dev_id          0
gtw_id          0
frequency       0
data_rate       0
rssi            0
snr             0
battery        18
humidity       18
pressure       18
temperature    18
dtype: int64

In [174]:
loravar.dropna(inplace=True)
loravar.shape


(100192, 11)