## Compressing Data to the IoT Gateway using Autoencoders

Dataset: http://db.csail.mit.edu/labdata/labdata.html

The goal of this project is to reduce the amount of data sent to the gateway layer from edge devices. We use a dataset collected from Intel Labs, and attempt to reduce the size of the data into a form that is a representation of the original data. By reducing the amount of data sent from the sensor to the gateway, we can increase data throughput and decrease network latency.

### Data Loading

In [1]:
import gzip
import pandas as pd

In [2]:
with gzip.open('data.txt.gz', 'rb') as data_bytes:
    data = pd.read_csv(data_bytes, header=None, sep=' ', parse_dates=[[0, 1]], squeeze=True)
data.columns = ['DATETIME','EPOCH','MOTE_ID','TEMPERATURE','HUMIDITY','LIGHT','VOLTAGE']
data = data.set_index('DATETIME')
data.shape

(2313682, 6)

We will consider sensor data between March 1st and March 10th, resampled every 5 minutes. We will ignore the epoch column, as it does not provide any statistical relevance.

In [3]:
data_samp = data.drop('EPOCH', axis=1)
data_samp = data_samp.loc['2004-03-01':'2004-03-10']
data_samp.head()

Unnamed: 0_level_0,MOTE_ID,TEMPERATURE,HUMIDITY,LIGHT,VOLTAGE
DATETIME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2004-03-01 00:01:57.130850,1.0,18.4498,43.1191,43.24,2.67532
2004-03-01 00:02:50.458234,1.0,18.44,43.0858,43.24,2.66332
2004-03-01 00:04:26.606602,1.0,18.44,43.1191,43.24,2.65143
2004-03-01 00:05:28.379208,1.0,18.4498,43.0524,43.24,2.65143
2004-03-01 00:05:50.456126,1.0,18.4302,43.1525,43.24,2.66332


For the sake of out experiment, let us only consider sensors 1-10. We will drop sensors where Sensor_ID is NA, and make Sensor_ID an integer.

In [4]:
data_samp.dropna(subset=['MOTE_ID'], inplace=True)
data_samp.MOTE_ID = data_samp.MOTE_ID.astype(int)

data_samp = data_samp[(data_samp.MOTE_ID >= 1) & (data_samp.MOTE_ID <= 10)].copy()
print('Sensor_ID - Min: {}, Max: {}'.format(data_samp.MOTE_ID.min(), data_samp.MOTE_ID.max()))
data_samp.shape

Sensor_ID - Min: 1, Max: 10


(154618, 5)

Constructing a dataframe where Sensor_ID is the key. This will be more representative of inbound samples.

In [81]:
sensor_df = data_samp.set_index('MOTE_ID', append=True).unstack()

sensor_df = sensor_df.resample('5min').mean()
#mask, idx = sensor_df.T.index.get_loc_level(5, level='MOTE_ID')

# Dropping Sensor_ID = 5
sensor_df = sensor_df.T.drop(5, level='MOTE_ID')
# Reassign transpose of sensor dataframe to sensor dataframe
sensor_df = sensor_df.T

sensor_df.isna().sum()

             MOTE_ID
TEMPERATURE  1            0
             2           24
             3            0
             4            5
             6           16
             7            0
             8           20
             9            1
             10           2
HUMIDITY     1            0
             2           24
             3            0
             4            5
             6           16
             7            0
             8           20
             9            1
             10           2
LIGHT        1            0
             2           24
             3            0
             4            5
             6           16
             7            0
             8          174
             9          201
             10           2
VOLTAGE      1            0
             2           24
             3            0
             4            5
             6           16
             7            0
             8           20
             9            1

Sensor 5 has no data, so we will drop it.

SyntaxError: invalid syntax (<ipython-input-34-943357cb03ae>, line 1)

TESTING STUFF

In [None]:
#from keras.layers import Input, Dense
#from keras.models import Model

In [None]:
#input_layer = Input(shape=(4,))
# "encoded" is the encoded representation of the input
#encoded = Dense(1, activation='relu')(input_layer)
# "decoded" is the lossy reconstruction of the input
#decoded = Dense(4, activation='sigmoid')(encoded)

# this model maps an input to its reconstruction
#autoencoder = Model(input_layer, decoded)

In [None]:
# autoencoder.summary()