## Compressing Data to the IoT Gateway using Autoencoders

Dataset: http://db.csail.mit.edu/labdata/labdata.html

The goal of this project is to reduce the amount of data sent to the gateway layer from edge devices. We use a dataset collected from Intel Labs, and attempt to reduce the size of the data into a form that is a representation of the original data. By reducing the amount of data sent from the sensor to the gateway, we can increase data throughput and decrease network latency.

### Data Loading

In [1]:
import gzip
import pandas as pd

In [2]:
with gzip.open('data.txt.gz', 'rb') as data_bytes:
    data = pd.read_csv(data_bytes, header=None, sep=' ', parse_dates=[[0, 1]], squeeze=True)
data.columns = ['DATETIME','EPOCH','MOTE_ID','TEMPERATURE','HUMIDITY','LIGHT','VOLTAGE']
data = data.set_index('DATETIME')
data.shape

(2313682, 6)

### Data Pre-processing

We will consider sensor data between March 1st and March 10th, resampled every 5 minutes. We will ignore the epoch column, as it does not provide any statistical relevance.

In [3]:
data_samp = data.drop('EPOCH', axis=1)
data_samp = data_samp.loc['2004-03-01':'2004-03-10']
data_samp.head()

Unnamed: 0_level_0,MOTE_ID,TEMPERATURE,HUMIDITY,LIGHT,VOLTAGE
DATETIME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2004-03-01 00:01:57.130850,1.0,18.4498,43.1191,43.24,2.67532
2004-03-01 00:02:50.458234,1.0,18.44,43.0858,43.24,2.66332
2004-03-01 00:04:26.606602,1.0,18.44,43.1191,43.24,2.65143
2004-03-01 00:05:28.379208,1.0,18.4498,43.0524,43.24,2.65143
2004-03-01 00:05:50.456126,1.0,18.4302,43.1525,43.24,2.66332


For the sake of out experiment, let us only consider sensors 1-10. We will drop sensors where Sensor_ID is NA, and make Sensor_ID an integer.

In [4]:
data_samp.dropna(subset=['MOTE_ID'], inplace=True)
data_samp.MOTE_ID = data_samp.MOTE_ID.astype(int)

data_samp = data_samp[(data_samp.MOTE_ID >= 1) & (data_samp.MOTE_ID <= 10)].copy()
print('Sensor_ID - Min: {}, Max: {}'.format(data_samp.MOTE_ID.min(), data_samp.MOTE_ID.max()))
data_samp.shape

Sensor_ID - Min: 1, Max: 10


(154618, 5)

Constructing a dataframe where Sensor_ID is the key. This will be more representative of inbound samples.

In [5]:
sensor_df = data_samp.set_index('MOTE_ID', append=True).unstack()

Resampling the dataframe over 5 minutes, excluding the MOTE_ID index.

In [6]:
sensor_df = sensor_df.resample('5min').mean()

Checking for empty data values...

In [7]:
sensor_df.isna().sum()

             MOTE_ID
TEMPERATURE  1             0
             2            24
             3             0
             4             5
             5          2702
             6            16
             7             0
             8            20
             9             1
             10            2
HUMIDITY     1             0
             2            24
             3             0
             4             5
             5          2702
             6            16
             7             0
             8            20
             9             1
             10            2
LIGHT        1             0
             2            24
             3             0
             4             5
             5          2702
             6            16
             7             0
             8           174
             9           201
             10            2
VOLTAGE      1             0
             2            24
             3             0
             4        

Dropping Sensor_ID 5 since it appears to be missing every value.

In [8]:
sensor_df = sensor_df.T.drop(5, level='MOTE_ID')

Looking at missing data values, we can apply linear interpolation to fill in the missing values to complete our data set.

In [9]:
sensor_df.T.TEMPERATURE = sensor_df.T.TEMPERATURE.interpolate()
sensor_df.T.LIGHT = sensor_df.T.LIGHT.interpolate()
sensor_df.T.HUMIDITY = sensor_df.T.HUMIDITY.interpolate()
sensor_df.T.VOLTAGE = sensor_df.T.VOLTAGE.interpolate()

Reassign our transposed matrix to the original matrix.

In [10]:
sensor_df = sensor_df.T
sensor_df.describe()

Unnamed: 0_level_0,TEMPERATURE,TEMPERATURE,TEMPERATURE,TEMPERATURE,TEMPERATURE,TEMPERATURE,TEMPERATURE,TEMPERATURE,TEMPERATURE,HUMIDITY,...,LIGHT,VOLTAGE,VOLTAGE,VOLTAGE,VOLTAGE,VOLTAGE,VOLTAGE,VOLTAGE,VOLTAGE,VOLTAGE
MOTE_ID,1,2,3,4,6,7,8,9,10,1,...,10,1,2,3,4,6,7,8,9,10
count,2702.0,2702.0,2702.0,2702.0,2702.0,2702.0,2702.0,2702.0,2702.0,2702.0,...,2702.0,2702.0,2702.0,2702.0,2702.0,2702.0,2702.0,2702.0,2702.0,2702.0
mean,22.191947,22.126232,22.241036,22.250472,21.786246,21.844609,21.623283,21.802205,21.549315,36.87967,...,222.379571,2.647379,2.621846,2.6464,2.612014,2.594415,2.63649,2.639891,2.703093,2.624886
std,2.394438,1.944199,2.197299,2.048391,1.874742,1.954887,2.172489,2.258432,1.976303,5.150745,...,229.620618,0.037045,0.032103,0.036021,0.040019,0.028881,0.034912,0.038534,0.038082,0.034211
min,17.203975,17.6756,17.60504,18.043644,17.6168,17.801775,15.70825,17.522067,17.561267,22.550817,...,0.92,2.579066,2.558626,2.579066,2.51967,2.53812,2.57108,1.970964,2.6338,2.56
25%,20.586813,20.893308,20.771338,20.992287,20.543675,20.577994,20.089667,20.1403,20.170843,32.8374,...,53.36,2.61639,2.59354,2.61639,2.58226,2.57108,2.606345,2.609215,2.67532,2.596383
50%,22.0324,22.2515,22.213408,22.051562,21.824267,21.744914,21.717011,21.813825,21.618467,37.702394,...,75.44,2.647009,2.622588,2.645535,2.60491,2.59354,2.63526,2.63964,2.69964,2.624103
75%,23.877775,23.353475,23.705108,23.42785,23.16899,23.178028,22.9109,23.154719,22.81325,40.781445,...,426.88,2.67532,2.649072,2.67382,2.641539,2.61639,2.652916,2.659375,2.7244,2.649676
max,28.6418,27.3874,28.2008,27.63044,26.507578,26.4221,26.382356,27.1424,25.8194,47.562725,...,835.36,2.74239,2.699657,2.73696,2.7244,2.67532,2.73539,2.741183,2.809478,2.71353


In [11]:
#from keras.layers import Input, Dense
#from keras.models import Model

In [12]:
#input_layer = Input(shape=(4,))
# "encoded" is the encoded representation of the input
#encoded = Dense(1, activation='relu')(input_layer)
# "decoded" is the lossy reconstruction of the input
#decoded = Dense(4, activation='sigmoid')(encoded)

# this model maps an input to its reconstruction
#autoencoder = Model(input_layer, decoded)

In [13]:
# autoencoder.summary()